Wednesday, April 22, 2026

Setting Up an OpenShift Cluster User-Provisioned Infrastructure in Air-Gapped Environments

OpenShift · UPI · Air-Gapped

Setting Up an OpenShift Cluster
User-Provisioned Infrastructure
in Air-Gapped Environments

A complete, step-by-step guide to manually provisioning VMs, load balancers, and DNS for an OpenShift cluster in a disconnected network.

1 What is UPI?

With User-Provisioned Infrastructure (UPI), you have maximum control over the cluster setup. You are responsible for manually preparing all virtual machines, load balancers, and DNS records before the OpenShift installer runs. This is the preferred approach for secure, air-gapped, or heavily regulated environments.

2 Air-Gapped Prerequisites

The primary challenge in a disconnected environment is the absence of direct access to the Red Hat Container Registry. You must bridge this gap before initiating the installation.

Mirror Registry

Establish a local container registry (e.g., Red Hat Quay or JFrog Artifactory) within your secure perimeter. Use the oc mirror plugin to sync OpenShift release images, operator catalogs, and Helm charts from the internet to a portable medium, then load them into your local registry.

Internal DNS & NTP

Precise time synchronization and split-horizon DNS are non-negotiable. Every node must be able to resolve the local registry hostname and all internal API endpoints.

3 OS Strategy

Red Hat enforces a specific OS strategy to ensure the self-healing nature of OpenShift.

⚠️
Control Plane (Masters): RHCOS is mandatory. Standard RHEL, Ubuntu, or any other OS cannot be used. Masters are managed by the Machine Config Operator (MCO), which requires an immutable, container-optimized OS to push updates, roll back kernel changes, and manage configurations automatically.
ℹ️
Compute Nodes (Workers): RHCOS is strongly recommended. When you update OpenShift, the OS on the workers updates automatically via rpm-ostree for safe, transactional updates. You have some flexibility here, but RHCOS is the supported default.

4 Architecture & Helper Node

This setup uses a Helper Node as the backbone — it acts as a bridge between the external network and the internal cluster network using two network interfaces.

OpenShift UPI Architecture Diagram

Network Interfaces

InterfaceZoneSubnetRole
ens192External192.168.0.XFront-end / internet-facing traffic and corporate LB
ens224Internal192.168.22.XBack-end cluster communication and storage traffic

Infrastructure Services Provided by the Helper Node

DNS (BIND)
Resolves cluster hostnames and API endpoints
DHCP
Assigns static IPs to all cluster nodes
NAT Gateway
Routes internal node traffic through the helper
HAProxy
Load balances API and Ingress traffic
Apache Web Server
Hosts Ignition files for automated installation
NFS Server
Provides persistent storage for the registry

5 Cluster Node Roles

Temporary

Bootstrap Node

Used only during the initial installation to orchestrate Control Plane creation. Decommissioned once the control plane is healthy.

×3 Nodes — HA

Control Plane (Masters)

The "brains" of the cluster. Runs the API server, etcd database, and controllers. Three nodes ensure high availability.

Compute

Worker Nodes

Where your actual applications, containers, and pods run. CSRs must be manually approved in UPI mode.

6 Core Deployment Workflow

Phase I — Configuration & Manifest Generation

On the Bastion host, define the cluster in install-config.yaml, pointing to your local mirror registry. Generate Kubernetes manifests and convert them to Ignition configs (.ign files) that RHCOS nodes execute on first boot.

Phase II — Infrastructure Provisioning

Configure a high-availability load balancer (HAProxy/F5) for the API (port 6443), Machine Config Server (port 22623), and Ingress (ports 80/443). Host the generated .ign files on an internal HTTP server.

Phase III — Bootstrap Sequence

Boot the Bootstrap node → it pulls its config and initiates control plane creation. Boot Masters → they form the etcd quorum. Boot Workers → manually approve their CSRs to join the cluster.

7 Detailed Setup Steps

Prepare Bastion /Helper Node.

    1. Helper Node OS shloud be RedHat or CentOS 8 x86_64 image

    2. Login to RedHat OpenShift Cluster Manager

    3. Select 'Create Cluster' from the 'Clusters' navigation menu

    4. Select 'RedHat OpenShift Container Platform'

    5. Select 'Run on Bare Metal'

    6. Download the following files:

      • Openshift Installer for Linux (openshift-install-linux.tar.gz)
      • Pull secret
      • Command Line Interface for Linux and your workstations OS (openshift-client-linux.tar.gz)
      • Red Hat Enterprise Linux CoreOS (RHCOS)
        • rhcos-X.X.X-x86_64-metal.x86_64.raw.gz
        • rhcos-X.X.X-x86_64-installer.x86_64.iso (or rhcos-X.X.X-x86_64-live.x86_64.iso for newer versions)

Notes: Before powering on a single node, these must be ready:
1) Load Balancer:

  • Port 6443 (API): Points to Bootstrap + 3 Masters.
  • Port 22623 (Machine Config): Points to Bootstrap + 3 Masters.
  • Ports 80/443 (Apps): Points to all Worker nodes.
2) DNS:
  • api.<cluster>.<domain> -> LB VIP for 6443.
  • api-int.<cluster>.<domain> -> LB VIP for 6443/22623.
  • *.apps.<cluster>.<domain> -> LB VIP for 80/443/8443.

Step 1 — Install Client Tools

# Extract and install the OpenShift client tools
tar xvf openshift-client-linux.tar.gz
mv oc kubectl /usr/local/bin

# Verify installation
kubectl version
oc version

# Extract the OpenShift Installer
tar xvf openshift-install-linux.tar.gz

Step 2 — Configure Static IP for Internal NIC

Run nmtui-edit ens224 or edit /etc/sysconfig/network-scripts/ifcfg-ens224 with these values:

Address:       192.168.22.1
DNS Server:    127.0.0.1
Search Domain: ocp.lan
Default Route: Disabled
Auto-connect:  Enabled

If changes don't apply, bounce the NIC: nmcli connection down ens224 && nmcli connection up ens224

Step 3 — Configure Firewall Zones

# Assign interfaces to zones
nmcli connection modify ens224 connection.zone internal
nmcli connection modify ens192 connection.zone external

# Enable masquerading (NAT) on both zones
firewall-cmd --zone=external --add-masquerade --permanent
firewall-cmd --zone=internal --add-masquerade --permanent
firewall-cmd --reload

# Verify zones and IP forwarding
firewall-cmd --get-active-zones
firewall-cmd --list-all --zone=internal
firewall-cmd --list-all --zone=external
cat /proc/sys/net/ipv4/ip_forward   # Should return 1

Step 4 — Clone Config Repository

dnf update -y
dnf install git -y
git clone https://github.com/ryanhay/ocp4-metal-install

Step 5 — Install & Configure DNS (BIND)

dnf install bind bind-utils -y
cp ~/ocp4-metal-install/dns/named.conf /etc/named.conf
cp -R ~/ocp4-metal-install/dns/zones /etc/named/

# Open firewall for DNS
firewall-cmd --add-port=53/udp --zone=internal --permanent
firewall-cmd --add-port=53/tcp --zone=internal --permanent  # Required for OCP 4.9+
firewall-cmd --reload

# Enable and start BIND
systemctl enable named && systemctl start named && systemctl status named

Update the external NIC (ens192) to use 127.0.0.1 as its DNS server and enable "Ignore automatically obtained DNS parameters" via nmtui-edit ens192, then restart NetworkManager:

systemctl restart NetworkManager

# Verify DNS resolution
dig ocp.lan
dig -x 192.168.22.200   # Should resolve to ocp-bootstrap.lab.ocp.lan

Step 6 — Install & Configure DHCP

⚠️
Before copying the config, update ~/ocp4-metal-install/dhcpd.conf with the actual MAC addresses of each cluster machine.
dnf install dhcp-server -y
cp ~/ocp4-metal-install/dhcpd.conf /etc/dhcp/dhcpd.conf

firewall-cmd --add-service=dhcp --zone=internal --permanent
firewall-cmd --reload

systemctl enable dhcpd && systemctl start dhcpd && systemctl status dhcpd

Step 7 — Install & Configure Apache Web Server

dnf install httpd -y
# Change Apache to listen on port 8080 (avoids conflicts)
sed -i 's/Listen 80/Listen 0.0.0.0:8080/' /etc/httpd/conf/httpd.conf

firewall-cmd --add-port=8080/tcp --zone=internal --permanent
firewall-cmd --reload

systemctl enable httpd && systemctl start httpd && systemctl status httpd

# Verify it's running
curl localhost:8080

Step 8 — Install & Configure HAProxy

dnf install haproxy -y
cp ~/ocp4-metal-install/haproxy.cfg /etc/haproxy/haproxy.cfg

Open the required firewall ports:

# Control plane API
firewall-cmd --add-port=6443/tcp --zone=internal --permanent
firewall-cmd --add-port=6443/tcp --zone=external --permanent

# Machine Config Server
firewall-cmd --add-port=22623/tcp --zone=internal --permanent

# Application ingress (HTTP/HTTPS)
firewall-cmd --add-service=http --zone=internal --permanent
firewall-cmd --add-service=http --zone=external --permanent
firewall-cmd --add-service=https --zone=internal --permanent
firewall-cmd --add-service=https --zone=external --permanent

# HAProxy stats UI (accessible at http://<helper-ip>:9000/stats)
firewall-cmd --add-port=9000/tcp --zone=external --permanent
firewall-cmd --reload
# Allow HAProxy SELinux binding and start the service
setsebool -P haproxy_connect_any 1
systemctl enable haproxy && systemctl start haproxy && systemctl status haproxy

Step 9 — Install & Configure NFS Server

Network File System (NFS) is a distributed file system protocol that allows a user on a client computer to access files over a network much like local storage is accessed. Originally developed by Sun Microsystems, it has become the standard for file sharing between Unix and Linux systems.

How NFS Works

  • NFS Server: Hosts the physical storage and "exports" (shares) specific directories to the network. It manages permissions and handles requests from clients.
  • NFS Client: Mounts the exported directory from the server onto its own local file system. To the user or application on the client side, the files appear to be stored locally.
dnf install nfs-utils -y

mkdir -p /shares/registry
chown -R nobody:nobody /shares/registry
chmod -R 777 /shares/registry

echo "/shares/registry  192.168.22.0/24(rw,sync,root_squash,no_subtree_check,no_wdelay)" > /etc/exports
exportfs -rv

firewall-cmd --zone=internal --add-service=mountd --permanent
firewall-cmd --zone=internal --add-service=rpc-bind --permanent
firewall-cmd --zone=internal --add-service=nfs --permanent
firewall-cmd --reload

systemctl enable nfs-server rpcbind
systemctl start nfs-server rpcbind nfs-mountd

Step 10 — Generate Installation Files

mkdir /var/www/html/ocp4
cp ~/ocp4-metal-install/install-config.yaml /var/www/html/ocp4
ℹ️
Edit install-config.yaml before proceeding: insert your Pull Secret and SSH public key. See the Configuration Details section below for guidance.
# Generate Kubernetes manifests
~/openshift-install create manifests --dir /var/www/html/ocp4

To control whether workloads can run on Control Plane nodes, edit the scheduler manifest:

ls ~/ocp-install/manifests/cluster-scheduler-02-config.yml

# Set mastersSchedulable: true  → allow workloads on masters
# Set mastersSchedulable: false → prevent workloads (default)
# Generate Ignition configs and auth files
~/openshift-install create ignition-configs --dir /var/www/html/ocp4

Step 11 — Host RHCOS Image and Set Permissions

# Move the RHCOS metal image to the web server
mv ~/rhcos-X.X.X-x86_64-metal.x86_64.raw.gz /var/www/html/ocp4/rhcos

# Set correct SELinux context, ownership, and permissions
chcon -R -t httpd_sys_content_t /var/www/html/ocp4/
chown -R apache: /var/www/html/ocp4/
chmod 755 /var/www/html/ocp4/

# Confirm all files are accessible
curl localhost:8080/ocp4/

Step 12 — Boot Cluster Nodes

Boot each node type using the RHCOS ISO or PXE, passing the appropriate Ignition file URL via kernel arguments.

You boot your nodes using the RHCOS (Red Hat Enterprise Linux CoreOS) ISO or PXE. During the boot process, you must pass a kernel argument to tell the node where its "brain" (Ignition file) is:coreos.inst.ignition_url=http:///bootstrap.ign
Order of Operations:

  • 1. Start Bootstrap node.
  • 2. Start 3 Master nodes.
  • 3. Wait for the API to come up.
  • 4. Start Worker nodes.

# Bootstrap Node
sudo coreos-installer install /dev/sda \
  -u http://192.168.22.1:8080/ocp4/rhcos \
  -I http://192.168.22.1:8080/ocp4/bootstrap.ign \
  --insecure --insecure-ignition

# Control Plane (Master) Nodes
sudo coreos-installer install /dev/sda \
  -u http://192.168.22.1:8080/ocp4/rhcos \
  -I http://192.168.22.1:8080/ocp4/master.ign \
  --insecure --insecure-ignition

# Worker Nodes
sudo coreos-installer install /dev/sda \
  -u http://192.168.22.1:8080/ocp4/rhcos \
  -I http://192.168.22.1:8080/ocp4/worker.ign \
  --insecure --insecure-ignition

Step 13 — Monitor Bootstrap & Finalize

# Monitor bootstrap progress from the Helper Node
~/openshift-install --dir /var/www/html/ocp4/ wait-for bootstrap-complete --log-level=debug

Once bootstrapping completes, remove the Bootstrap node from HAProxy and shut it down:

# Remove ocp-bootstrap from /etc/haproxy/haproxy.cfg, then reload
systemctl reload haproxy

# Approve Worker CSRs so workers can join the cluster
oc get csr
oc adm certificate approve <csr-name>

# Verify all nodes are Ready
oc get nodes

Step 14 — Post-Installation

# Retrieve console URL and kubeadmin password
cat ~/ocp-install/auth/console-url
cat ~/ocp-install/auth/kubeadmin-password
  • Configure Storage: Define StorageClasses (NFS, OCS, or local storage) so applications can persist data. See the Configure Storage section below.
  • Set Up Identity Providers: Replace the temporary kubeadmin user with a permanent solution such as LDAP or OAuth. See the Identity Providers section below.

Configure Storage Post-Install

Once the cluster is healthy and all nodes are Ready, you must configure persistent storage. Without a working StorageClass, the internal image registry, monitoring stack, and most operators cannot persist data.

ℹ️
Why this matters immediately: The OpenShift internal image registry is set to Removed or EmptyDir by default after a UPI install. You must back it with persistent storage before pushing any images.

Option A — NFS StorageClass (Lab / Air-Gapped)

If you provisioned an NFS share on the Helper Node (Step 9), expose it as a dynamic StorageClass using the NFS Subdir External Provisioner. This is the fastest path for lab and air-gapped environments.

1. Configure storage for the Image Registry

If you check the cluster operators oc get co, you will likely see the image-registry operator reporting AVAILABLE=False or PROGRESSING=True (but stuck) because it lacks the resources to deploy the registry pods.

Run the next command to create the 'image-registry-storage' PVC by updating the management state to 'Managed' and adding 'pvc' and 'claim' keys in the storage key

oc edit configs.imageregistry.operator.openshift.io

here is the default output before updates

apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
  name: cluster
spec:
  managementState: Removed        # <--- KEY OBSERVATION 1
  storage: {}                     # <--- KEY OBSERVATION 2
    

here is the file after our updates

apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
  name: cluster
spec:
  managementState: Managed        # <--- KEY OBSERVATION 1
  storage:                        # <--- KEY OBSERVATION 2
  	pvc:
  	   claim: # leave the claim blank

2. Verify the Storage

Confirm the 'image-registry-storage' pvc has been created and is currently in a 'Pending' state

oc get pvc -n openshift-image-registry

3. Create Persistent Volume

Create the persistent volume for the 'image-registry-storage' pvc to bind to NFS.

oc create -f ~/ocp4-metal-install/manifest/registry-pv.yaml

noted that registry-pv.yaml file contains the next contents

apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 100Gi
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /shares/registry
    server: 192.168.22.1

4. Verify the Storage again!

After a short wait the 'image-registry-storage' pvc should now be in a 'bound' state

oc get pvc -n openshift-image-registry

Option B — OpenShift Data Foundation / ODF (Production)

ODF provides software-defined block, file, and object storage via Ceph running directly on your worker nodes. Minimum requirement: 3 worker nodes, each with at least one raw, unformatted additional disk.

1. Label the storage nodes

oc label node worker-0.lab.ocp.lan cluster.ocs.openshift.io/openshift-storage=""
oc label node worker-1.lab.ocp.lan cluster.ocs.openshift.io/openshift-storage=""
oc label node worker-2.lab.ocp.lan cluster.ocs.openshift.io/openshift-storage=""

2. Install the ODF Operator

oc create namespace openshift-storage

# Create OperatorGroup
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-storage-operatorgroup
  namespace: openshift-storage
spec:
  targetNamespaces:
    - openshift-storage
EOF

# Subscribe to ODF (adjust channel to match your OCP version)
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: odf-operator
  namespace: openshift-storage
spec:
  channel: stable-4.14
  installPlanApproval: Automatic
  name: odf-operator
  source: redhat-operators       # Replace with mirrored CatalogSource in air-gapped
  sourceNamespace: openshift-marketplace
EOF

# Wait for all operator pods to be Running
oc get pods -n openshift-storage -w

3. Create the StorageCluster

cat <<EOF | oc apply -f -
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  manageNodes: false
  monDataDirHostPath: /var/lib/rook
  storageDeviceSets:
    - name: ocs-deviceset
      count: 1           # 1 OSD per node x 3 nodes = 3 OSDs total
      replica: 3
      portable: true
      dataPVCTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Gi   # Size of each raw disk to claim
          volumeMode: Block
          storageClassName: localblock   # SC that presents raw block devices
EOF

# Monitor cluster initialisation (typically 5-15 minutes)
oc get storagecluster -n openshift-storage -w

4. StorageClasses created by ODF

StorageClassTypeAccess ModeBest For
ocs-storagecluster-ceph-rbdBlock (Ceph RBD)RWODatabases (PostgreSQL, MongoDB), stateful apps
ocs-storagecluster-cephfsFile (CephFS)RWXShared media folders, CMS uploads, ML pipelines
openshift-storage.noobaa.ioObject (S3 API)S3Backups, AI/ML datasets, image registry

Option C — Local Storage Operator (LSO)

LSO presents raw node-local disks as PersistentVolumes without requiring a SAN or NFS server. It is commonly used as the backing layer for ODF.

# Install via Subscription
# channel: stable-4.14 | name: local-storage-operator

# After operator is Running, declare which disks to expose:
cat <<EOF | oc apply -f -
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: local-disks
  namespace: openshift-local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
      - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
              - worker-0.lab.ocp.lan
              - worker-1.lab.ocp.lan
              - worker-2.lab.ocp.lan
  storageClassDevices:
    - storageClassName: localblock
      volumeMode: Block
      devicePaths:
        - /dev/sdb      # The second raw disk on each node
EOF

Configure the Internal Image Registry

After storage is ready, switch the registry from Removed to Managed and back it with a PVC:

oc patch configs.imageregistry.operator.openshift.io cluster   --type merge   --patch '{
    "spec": {
      "managementState": "Managed",
      "storage": {"pvc": {"claim": ""}},
      "replicas": 1
    }
  }'

# The operator auto-creates a PVC; watch it bind
oc get pvc -n openshift-image-registry

# Confirm the registry pod is Running
oc get pods -n openshift-image-registry
ℹ️
Running more than 1 registry replica requires a ReadWriteMany (RWX) PVC, such as NFS or CephFS. For a single replica, ReadWriteOnce (RWO) is sufficient.

Set the Default StorageClass

# Mark one SC as the cluster default
oc patch storageclass nfs-client   -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

# Remove the default annotation from any previously default SC
oc patch storageclass old-sc   -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}'

# Verify
oc get storageclass

Enable Persistent Storage for the Monitoring Stack

Prometheus and Alertmanager use ephemeral storage by default. Configure persistence so metrics survive pod restarts:

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 15d
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client    # Or ocs-storagecluster-ceph-rbd
          resources:
            requests:
              storage: 50Gi
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client
          resources:
            requests:
              storage: 10Gi
EOF

Set Up Identity Providers Post-Install

After installation the only user is the temporary kubeadmin. You must configure a permanent Identity Provider and then delete kubeadmin to enforce proper authentication and RBAC across the cluster.

⚠️
Do not delete kubeadmin until at least one other user has been granted cluster-admin privileges and you have confirmed that you can log in successfully as that user.

Option A — HTPasswd (Simplest / Lab)

HTPasswd stores usernames and bcrypt-hashed passwords in a flat file. Ideal for small teams and fully air-gapped labs where an external directory is not available.

1. Create the htpasswd file and Kubernetes Secret

dnf install httpd-tools -y

# -c creates a new file; omit -c when appending users
htpasswd -c -B -b /tmp/htpasswd admin        RedHatAdmin1!
htpasswd    -B -b /tmp/htpasswd developer    DevPass123!

# Store the file as a Secret in openshift-config
oc create secret generic htpasswd-secret   --from-file=htpasswd=/tmp/htpasswd   -n openshift-config

2. Register the provider in the OAuth cluster object

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
    - name: htpasswd_provider
      mappingMethod: claim
      type: HTPasswd
      htpasswd:
        fileData:
          name: htpasswd-secret    # Must match the Secret name above
EOF

3. Grant cluster-admin and test login

# Allow ~30 s for oauth-server pods to restart, then:
oc adm policy add-cluster-role-to-user cluster-admin admin

oc login -u admin -p RedHatAdmin1! https://api.lab.ocp.lan:6443
oc whoami    # Should return: admin

4. Adding or changing users later

# Pull the current file out of the Secret
oc extract secret/htpasswd-secret -n openshift-config --to=/tmp --confirm

# Modify it — add a user, change a password, etc.
htpasswd -B -b /tmp/htpasswd newuser NewPass456!

# Push the updated file back — oauth pods restart automatically
oc set data secret/htpasswd-secret   --from-file=htpasswd=/tmp/htpasswd   -n openshift-config

Option B — LDAP / Active Directory

Integrate OpenShift with an existing LDAP directory (Microsoft AD, Red Hat Directory Server, OpenLDAP). Authentication is delegated to the directory; no separate password management is needed inside OpenShift.

1. Store the LDAP bind password as a Secret

oc create secret generic ldap-bind-password   --from-literal=bindPassword='BindUserPassword123!'   -n openshift-config

2. Store the LDAP CA certificate (required for ldaps://)

oc create configmap ldap-ca-cert   --from-file=ca.crt=/path/to/your-ldap-ca.crt   -n openshift-config

3. Configure the OAuth object for LDAP

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
    - name: ldap_provider
      mappingMethod: claim
      type: LDAP
      ldap:
        attributes:
          id:                [dn]
          email:             [mail]
          name:              [cn]
          preferredUsername: [sAMAccountName]   # For AD; use uid for OpenLDAP
        bindDN: "CN=ocp-svc,OU=ServiceAccounts,DC=corp,DC=lan"
        bindPassword:
          name: ldap-bind-password
        ca:
          name: ldap-ca-cert           # Remove this block for public-CA LDAP
        insecure: false
        url: "ldaps://dc01.corp.lan:636/OU=Users,DC=corp,DC=lan?sAMAccountName?sub?(objectClass=person)"
EOF
ℹ️
LDAP URL format: ldaps://<host>:<port>/<base-dn>?<attribute>?<scope>?<filter>. Always prefer ldaps:// (port 636) over ldap:// (port 389) to encrypt credentials in transit.

4. Sync LDAP groups into OpenShift groups

cat > /tmp/ldap-sync.yaml <<EOF
kind: LDAPSyncConfig
apiVersion: v1
url: ldaps://dc01.corp.lan:636
bindDN: "CN=ocp-svc,OU=ServiceAccounts,DC=corp,DC=lan"
bindPassword: "BindUserPassword123!"
ca: /path/to/your-ldap-ca.crt
rfc2307:
  groupsQuery:
    baseDN: "OU=OCP-Groups,DC=corp,DC=lan"
    scope: sub
    derefAliases: never
    filter: (objectClass=group)
  groupUIDAttribute: dn
  groupNameAttributes: [cn]
  groupMembershipAttributes: [member]
  usersQuery:
    baseDN: "OU=Users,DC=corp,DC=lan"
    scope: sub
    derefAliases: never
  userUIDAttribute: dn
  userNameAttributes: [sAMAccountName]
EOF

# Dry-run — preview what will change without applying
oc adm groups sync --sync-config=/tmp/ldap-sync.yaml

# Apply the sync
oc adm groups sync --sync-config=/tmp/ldap-sync.yaml --confirm

# View synced groups
oc get groups

Option C — GitHub / GitLab OAuth

For teams already using GitHub Enterprise or self-hosted GitLab. Only applicable when the cluster can reach the OAuth server endpoint.

1. Register an OAuth Application on GitHub or GitLab

  • GitHub path: Settings → Developer settings → OAuth Apps → New OAuth App
  • Set Authorization callback URL to: https://oauth-openshift.apps.lab.ocp.lan/oauth2callback/github
  • Copy the generated Client ID and Client Secret

2. Store the Client Secret

oc create secret generic github-client-secret   --from-literal=clientSecret=<your-github-client-secret>   -n openshift-config

3. Configure the OAuth object

oc apply -f - <<EOF
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
    - name: github
      mappingMethod: claim
      type: GitHub
      github:
        clientID: "<your-github-client-id>"
        clientSecret:
          name: github-client-secret
        organizations:
          - my-github-org      # Restrict access to members of this org
EOF

Assign Roles with RBAC

After users or groups are created, assign them the appropriate role. OpenShift ships with five built-in cluster roles:

RoleScopeWhat It Allows
cluster-adminCluster-wideFull, unrestricted access to every resource
cluster-readerCluster-wideRead-only access to all resources
adminNamespaceFull control within a specific project/namespace
editNamespaceCreate, update, and delete most resources in a project
viewNamespaceRead-only access within a project
# Cluster-wide role assignments
oc adm policy add-cluster-role-to-user  cluster-admin  admin
oc adm policy add-cluster-role-to-group cluster-reader ops-team

# Namespace-scoped role assignments
oc adm policy add-role-to-user  admin    alice    -n my-project
oc adm policy add-role-to-group edit     dev-team -n my-project
oc adm policy add-role-to-user  view     bob      -n my-project

# Verify what a user is allowed to do
oc auth can-i get pods --as=alice -n my-project

Delete the kubeadmin User

Once your permanent IdP is working and at least one user has cluster-admin, delete the kubeadmin Secret. This is a hard security requirement — the kubeadmin password is stored in etcd and must not remain permanently.

⚠️
This is irreversible. Confirm you can run oc get nodes as your new admin user before executing the delete command. You cannot recover kubeadmin without reinstalling the cluster.
# Log out of kubeadmin and verify your new admin works
oc logout
oc login -u admin -p RedHatAdmin1! https://api.lab.ocp.lan:6443
oc get nodes     # Must return all nodes in Ready state

# Now it is safe to delete kubeadmin
oc delete secret kubeadmin -n kube-system

Verify the Complete Authentication Configuration

# List all configured identity providers
oc get oauth cluster -o jsonpath='{.spec.identityProviders[*].name}'

# List every user OpenShift knows about
oc get users

# List all identities (shows which provider created each entry)
oc get identity

# Check all cluster-admin bindings
oc get clusterrolebindings -o wide | grep cluster-admin
ℹ️
Multiple IdPs at once: You can list more than one provider in spec.identityProviders. Each entry needs a unique name field. Users authenticating via different providers are treated as separate identities unless you configure a lookup or add mapping method to merge them.

info StorageClasses: Explain OpenShift Data Foundation (ODF)

After the cluster installation completes, the environment is "empty." You must now configure Persistent Storage for cluster operations.

OpenShift requires a StorageClass to fulfill Persistent Volume Claims (PVCs). Best storage depends on the environment (NFS for simplicity, ODF/OCS for production-grade software-defined storage).

Dive into the storage architecture of OpenShift, moving from the basic concepts of the Container Storage Interface (CSI) to advanced software-defined storage like OpenShift Data Foundation (ODF).

1. The OpenShift Storage Hierarchy

To understand OpenShift storage, you must distinguish between the physical storage and the virtualized requests made by applications.

  • Persistent Volume (PV): The actual "disk" (network-attached or local) provisioned by the administrator.
  • Persistent Volume Claim (PVC): The request made by a developer for a certain amount of storage.
  • StorageClass (SC): The "template" that defines how a PV is created (e.g., fast SSD vs. slow HDD).

2. Core Storage Types

OpenShift categorizes storage based on how many nodes can access it simultaneously.

A. Block Storage (RWO - ReadWriteOnce)

  • Technology: iSCSI, Fibre Channel, AWS EBS, OpenStack Cinder, ODF RBD.
  • Best For: Databases (PostgreSQL, MongoDB).
  • Behavior: Only one node can mount the volume at a time. It is highly performant and supports low-latency transactions.

B. File Storage (RWX - ReadWriteMany)

  • Technology: NFS, Azure Files, ODF CephFS.
  • Best For: Shared media folders, CMS uploads (WordPress), or data pipelines where multiple pods need to read/write the same files.
  • Behavior: Multiple nodes can mount the same volume simultaneously.

C. Object Storage

  • Technology: S3, MinIO, ODF NooBaa.
  • Best For: Backups, AI/ML datasets, and cloud-native applications.
  • Behavior: Accessed via API (HTTP/HTTPS) rather than a filesystem mount. It is virtually infinitely scalable.

3. Deep Dive: OpenShift Data Foundation (ODF)

Formerly known as OCS (OpenShift Container Storage), ODF is the "Gold Standard" for OpenShift storage. It is built on Ceph, Rook, and NooBaa.

Key Advantages:

  1. Platform Agnostic: Whether you are on-premise (VMware/Bare Metal) or in the cloud (AWS/Azure), ODF provides the same StorageClasses.
  2. Hyper-Converged: You don't need an external SAN. ODF uses the spare disks already inside your worker nodes.
  3. Dynamic Provisioning: It automatically creates volumes as soon as a developer creates a PVC.
  4. Resilience: By default, data is replicated across 3 different nodes. If one node fails, the data remains available.

ODF Component Breakdown:

Component Function Storage Type
Ceph RBD High-performance block storage Block (RWO)
CephFS Shared filesystem storage File (RWX)
NooBaa Multi-cloud object gateway Object (S3)

4. Hostpath and Local Storage

For edge cases or small-scale labs, you may encounter these:

  • HostPath: Uses a directory on the node’s local disk. Warning: If the pod moves to another node, the data stays behind and the pod loses access.
  • Local Storage Operator (LSO): A more robust way to use local NVMe/SSD disks. Unlike HostPath, LSO allows the scheduler to track which node "owns" the data.

5. Architectural Decision Matrix

As a Solution Architect, use this table to choose your storage backend:

Use Case Recommended Storage Access Mode
Database (Prod) ODF RBD (Block) RWO
Content Management ODF CephFS or NFS RWX
Machine Learning Models ODF NooBaa (S3) Object
Temporary Scratch Space emptyDir RWO
Registry Storage ODF CephFS RWX

6. Pro-Tips for Production

  • Snapshotting: Ensure your storage provider supports CSI Snapshots for quick backups before application updates.
  • Expansion: Use a StorageClass with allowVolumeExpansion: true. This allows you to grow a disk without deleting the pod.
  • IOPS Limiting: In multi-tenant clusters, use StorageQuotas to prevent one team from consuming all the storage bandwidth or capacity.

info Yaml Configuration Details

The install-config.yaml Blueprint

This is the only file you create manually. It acts as the blueprint for the entire installation. Key fields to populate:

  • pullSecret — Authorizes nodes to pull OpenShift images from Red Hat registries.
  • sshKey — Allows SSH access into RHCOS nodes as the core user for troubleshooting.
  • networking — Defines cluster and service network CIDRs.
  • imageContentSources — Points to your local mirror registry (required for air-gapped installs).

How to Get the Pull Secret

  1. Log in to the Red Hat OpenShift Cluster Manager at cloud.redhat.com/openshift.
  2. Download the pull secret using the "Download pull secret" button.
  3. Paste the entire single-line JSON string into your install-config.yaml inside single quotes.

How to Get the SSH Key

# Check for existing keys
ls ~/.ssh/id_rsa.pub || ls ~/.ssh/id_ed25519.pub

# Generate a new key pair (if needed)
ssh-keygen -t ed25519 -f ~/.ssh/id_ocp -C "admin@ocp-cluster"

# Output the public key to copy into install-config.yaml
cat ~/.ssh/id_ocp.pub

OpenShift Client vs. Installer — Quick Reference

FeatureOpenShift Client (oc)OpenShift Installer
Filenameopenshift-client-linux.tar.gzopenshift-install-linux.tar.gz
Primary GoalManaging an existing clusterCreating or destroying a cluster
Main Binaryoc (and kubectl)openshift-install
Usage PeriodDaily, for the life of the clusterPrimarily during Day 1 setup
CapabilitiesDeploy apps, check logs, manage usersProvision VMs, generate Ignition files

Helper Node Interface Roles

InterfaceTypical RoleDescription
ens192External / PublicFront-end traffic — connects to the internet or corporate load balancer to serve applications.
ens224Internal / PrivateBack-end traffic — master/worker node communication, storage traffic (CSI/NFS).

info Building an OpenShift cluster on-premises - Installation Methods

Building an OpenShift cluster on-premises requires shifting from the "push-button" automation of public clouds to a more hands-on infrastructure management approach. In 2026, the process is largely standardized through Red Hat's Assisted Installer or Agent-based methods.

1. Assisted Installer

A user-friendly web interface (hosted at https://www.google.com/search?q=console.redhat.com) that generates a discovery ISO. You boot your on-prem servers with this ISO, and they "call home" to the web console, allowing you to configure the cluster graphically.

2. IPI (Installer-Provisioned Infrastructure)

Full automation. The installer has API access to your infrastructure (like VMware vSphere or OpenStack) and creates the VMs, storage, and networking for you.

3. UPI (User-Provisioned Infrastructure)

Maximum control. You manually prepare the VMs, load balancers, and DNS. This is typical for Bare Metal or highly restricted "Air-Gapped" environments.

info Minimum Cluster Hardware (Production Grade)

Node Type CPU RAM Disk
Control Plane (3x) 4 vCPU 16 GB 120 GB (SSD preferred)
Compute/Worker (2x+) 4 vCPU 16 GB 120 GB
Bootstrap (1x) 4 vCPU 16 GB 120 GB (Deleted after install)


info Do we need Boostrap node to add new Control or Worker Node?

Once the cluster is up and running (Day 2 operations), the Control Plane (Masters) takes over all management tasks.

1. Adding a New Worker Node

When you boot a new Worker node with its ignition file, it communicates directly with the API Server on the Master nodes, CSR Approval: You will need to approve the Certificate Signing Requests (CSRs) using oc get csr and oc adm certificate approve {name}.

2. Adding a New Control Plane (Master) Node

OpenShift clusters are typically designed with an odd number of Control Plane nodes (usually 3) to maintain Etcd Quorum, If you want to move from 3 to 5 Masters, you add them to the existing, healthy cluster. The new Master joins the existing Etcd cluster managed by the current Masters..



Key Considerations for UPI

Ignition Expiry

Ignition files contain certificates valid for 24 hours. If you don't finish the install by then, you must regenerate them.

Disk Cleanup

If an install fails, you must wipe the disks of the nodes before retrying. RHCOS will not overwrite an existing partition table automatically.



How to Clean Disk after installation fails?

Wiping the disks is a critical step because if RHCOS detects an existing ignition configuration or a partition table, it may fail to apply the new configuration, leading to a "zombie" node state.

RHCOS uses Ignition, which runs in the initramfs stage. If Ignition sees a partition labeled boot or root already on the disk, it might assume the installation was already completed and skip critical configuration steps.

Pro-Tip: If you are debugging a failed bootstrap, always wipe the Bootstrap node first. It is the source of truth for the rest of the cluster. If the Bootstrap node has old data, it will feed incorrect information to the Master nodes.

The "Live ISO" Method (Easiest for Manual Labs)

  • Boot the node using the RHCOS Live ISO.
  • Once you reach the prompt (or press CTRL+ALT+F2 to get a console), identify your disk (Usually, it is /dev/sda or /dev/nvme0n1):
    lsblk
  • Then clean the found partition
    sudo wipefs -a /dev/sda 
  • Reboot the node and start the installation again.


info How Nodes Know Which Images to Pull from the Mirror Registry

When you run coreos-installer with the -u flag, the node downloads the raw RHCOS operating system image from your local web server — this is just the base OS with no OpenShift components. After first boot, the node needs to pull dozens of OpenShift container images (API server, etcd, operators, etc.) from a registry. In an air-gapped environment, two fields in install-config.yaml work together to make this seamless.

1. imageContentSources (Mirror Redirect Rules)

This field tells every node: "whenever you need an image from quay.io or registry.redhat.io, silently redirect that request to my local mirror instead." The node never needs to know it's in a disconnected environment — it requests images by their original Red Hat names and OpenShift handles the redirect automatically.

imageContentSources:
  - mirrors:
    - mirror-registry.ocp.lan:8443/openshift/release
    source: quay.io/openshift-release-dev/ocp-release
  - mirrors:
    - mirror-registry.ocp.lan:8443/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

2. additionalTrustBundle (Internal CA Certificate)

Your local mirror registry uses a self-signed or internally-issued TLS certificate. Without this field, nodes would reject connections to it as untrusted. The additionalTrustBundle injects your internal CA certificate into every node's trust store so HTTPS connections to the mirror registry are accepted without error.

additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
  <your internal CA certificate here>
  -----END CERTIFICATE-----

info Complete install-config.yaml

Below is a fully annotated install-config.yaml covering every field you need for a UPI air-gapped deployment. Every line is commented so you know exactly what it controls and why it exists.

⚠️
The installer consumes and deletes this file. Always keep a backup copy before running openshift-install create manifests. Once deleted, you cannot recover it from the generated output.
# ─────────────────────────────────────────────────────────────────
# API VERSION
# Must always be v1. This is the only supported version.
# ─────────────────────────────────────────────────────────────────
apiVersion: v1

# ─────────────────────────────────────────────────────────────────
# BASE DOMAIN
# The parent DNS domain for your cluster.
# The cluster name below is prepended to form the full domain:
#   <clusterName>.<baseDomain>  →  lab.ocp.lan
# Your DNS must have records for:
#   api.lab.ocp.lan        → Load Balancer IP (port 6443)
#   *.apps.lab.ocp.lan     → Ingress Load Balancer IP
# ─────────────────────────────────────────────────────────────────
baseDomain: ocp.lan

# ─────────────────────────────────────────────────────────────────
# CLUSTER NAME
# Short name for this cluster. Combined with baseDomain above.
# Used in all internal DNS names and TLS certificates.
# ─────────────────────────────────────────────────────────────────
metadata:
  name: lab             #Cluster name

# ─────────────────────────────────────────────────────────────────
# COMPUTE (WORKER) NODES
# Defines the default worker MachineSet.
# In UPI mode, the installer does NOT create machines automatically.
# Set replicas: 0 — you will boot workers manually.
# hyperthreading: Enabled is the default and recommended setting.
# ─────────────────────────────────────────────────────────────────
compute:
  - name: worker
    replicas: 0                  # Must be 0 for UPI — you provision workers manually
    hyperthreading: Enabled
    architecture: amd64          # Use arm64 for ARM-based nodes

# ─────────────────────────────────────────────────────────────────
# CONTROL PLANE (MASTER) NODES
# Always set replicas: 3 for a production HA cluster.
# A single master (replicas: 1) is supported only for dev/test.
# ─────────────────────────────────────────────────────────────────
controlPlane:
  name: master
  replicas: 3                    # 3 = HA. Never use 2 (no quorum).
  hyperthreading: Enabled
  architecture: amd64

# ─────────────────────────────────────────────────────────────────
# NETWORKING
# Defines the internal IP address ranges used inside the cluster.
# These are virtual ranges — they do NOT need to exist on your
# physical network. They must not overlap with your node IPs.
#
# networkType: OVNKubernetes is the current default and recommended.
#              OpenShiftSDN is deprecated as of OCP 4.15.
#
# clusterNetwork: The CIDR for pod IP addresses.
#   hostPrefix: /23 means each node gets a /23 subnet (~510 pod IPs).
#
# serviceNetwork: The CIDR for Kubernetes Service (ClusterIP) objects.
#   Must be a single entry. /16 gives 65,534 service IPs.
#
# machineNetwork: The CIDR of your physical node network.
#   Must match the real subnet your nodes are on (192.168.22.0/24).
# ─────────────────────────────────────────────────────────────────
networking:
  networkType: OVNKubernetes
  clusterNetwork:
    - cidr: 10.128.0.0/14        # Pod IP range across the cluster
      hostPrefix: 23             # Subnet size allocated per node
  serviceNetwork:
    - 172.30.0.0/16              # Kubernetes service (ClusterIP) range
  machineNetwork:
    - cidr: 192.168.22.0/24      # Must match your physical node subnet

# ─────────────────────────────────────────────────────────────────
# PLATFORM
# Set to "none" for UPI — tells the installer not to create any
# cloud or virtualization resources automatically.
# ─────────────────────────────────────────────────────────────────
platform:
  none: {}

# ─────────────────────────────────────────────────────────────────
# FIPS MODE (Optional)
# Enables FIPS 140-2/3 validated cryptographic modules.
# Required for US federal / DoD environments.
# Cannot be changed after installation.
# ─────────────────────────────────────────────────────────────────
fips: false

# ─────────────────────────────────────────────────────────────────
# PUBLISH STRATEGY
# Controls how the API server endpoint is exposed.
#   External: API accessible from outside the cluster network (default)
#   Internal: API accessible only within the cluster network
# For air-gapped environments, "Internal" is typically used.
# ─────────────────────────────────────────────────────────────────
publish: Internal

# ─────────────────────────────────────────────────────────────────
# PULL SECRET
# Authenticates nodes to pull container images from:
#   - registry.redhat.io   (Red Hat operator images)
#   - quay.io              (OpenShift release images)
#   - your local mirror    (air-gapped environments)
#
# For air-gapped installs, add your mirror registry credentials
# into this JSON alongside the Red Hat entries.
#
# Get from: https://console.redhat.com/openshift/install/pull-secret
# Must be a single-line JSON string inside single quotes.
# ─────────────────────────────────────────────────────────────────
pullSecret: '{"auths":{"registry.redhat.io":{"auth":"<base64-encoded-credentials>"},"quay.io":{"auth":"<base64-encoded-credentials>"},"mirror-registry.ocp.lan:8443":{"auth":"<base64-encoded-mirror-credentials>"}}}'

# ─────────────────────────────────────────────────────────────────
# SSH KEY
# Your SSH public key, injected into every RHCOS node.
# Allows SSH access as the built-in "core" user for troubleshooting.
# Only the PUBLIC key goes here — never the private key.
# Generate with: ssh-keygen -t ed25519 -f ~/.ssh/id_ocp
# ─────────────────────────────────────────────────────────────────
sshKey: 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... admin@ocp-cluster'

# ─────────────────────────────────────────────────────────────────
# ADDITIONAL TRUST BUNDLE
# Your internal CA certificate in PEM format.
# Required when your mirror registry uses a self-signed or
# internally-issued TLS certificate.
# Injected into every node's system trust store on first boot.
# Must be indented under the key with 2 spaces.
# to allow access to port 8443, generate the next CA then add to additionalTrustBundle
# openssl s_client -showcerts -connect mirror-registry.ocp.lan:8443 </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > registry-ca.crt
# Note: The | symbol is mandatory in YAML; it allows for a multi-line string. # ───────────────────────────────────────────────────────────────── additionalTrustBundle: | -----BEGIN CERTIFICATE----- MIIFazCCA1OgAwIBAgIUYourInternalCAcertificateHere... <full PEM certificate content> -----END CERTIFICATE----- # ───────────────────────────────────────────────────────────────── # If you need to ignore get the certificate, ignore the previous step and add this step: # insecureRegistries: # - mirror-registry.ocp.lan:8443 # ───────────────────────────────────────────────────────────────── # IMAGE CONTENT SOURCES (imageDigestMirrors in OCP 4.13+) # Tells every node to redirect image pulls from Red Hat registries # to your local mirror registry instead. # The "source" is the original Red Hat registry path. # The "mirrors" list is where to redirect requests. # The installer bakes these rules into the .ign files and creates # an ImageContentSourcePolicy object in the cluster on first boot. # ───────────────────────────────────────────────────────────────── imageContentSources: - mirrors: - mirror-registry.ocp.lan:8443/openshift/release source: quay.io/openshift-release-dev/ocp-release - mirrors: - mirror-registry.ocp.lan:8443/openshift/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - mirror-registry.ocp.lan:8443/redhat source: registry.redhat.io/redhat - mirrors: - mirror-registry.ocp.lan:8443/ubi8 source: registry.redhat.io/ubi8 # ───────────────────────────────────────────────────────────────── # PROXY (Optional) # Only needed if your nodes reach the mirror registry through # an HTTP/HTTPS proxy. Leave out entirely if no proxy is used. # noProxy: comma-separated list of hosts/CIDRs to bypass the proxy. # ───────────────────────────────────────────────────────────────── # proxy: # httpProxy: http://proxy.example.com:3128 # httpsProxy: http://proxy.example.com:3128 # noProxy: 192.168.22.0/24,mirror-registry.ocp.lan,.ocp.lan # ───────────────────────────────────────────────────────────────── # CLUSTER CAPABILITIES (Optional — OCP 4.12+) # Controls which optional cluster components get installed. # Use to reduce footprint in resource-constrained environments. # "vCurrent" installs all capabilities for your OCP version. # ───────────────────────────────────────────────────────────────── # capabilities: # baselineCapabilitySet: vCurrent # additionalEnabledCapabilities: # - marketplace # - openShiftSamples

No comments: