Posted in

Setting Up OpenShift 4.18.21 with User Provisioned Infrastructure: Challenges and Lessons Learned

Deploying OpenShift 4.18.21 on a bare metal lab is a rewarding but deeply challenging project, especially when using User Provisioned Infrastructure (UPI). In this article, I want to share the real-life hurdles I encountered—from planning and documentation all the way to the creative solutions I used for core services. My hope is to help fellow practitioners preparing for similar deployments, or simply offer insight into the unique requirements of OpenShift on UPI.

Why UPI?
UPI setups grant tremendous control. You get freedom over infrastructure and networking, but trade it for increased responsibility. Unlike install-assisted scenarios, everything—from load balancing to DNS—is on you.

Inventory and Planning: My Tracking System
One of the first lessons I learned was the importance of good documentation. To stay organized, I logged all the IPs, hostnames, user passwords, and services needed into a spreadsheet. Without this, keeping track of moving parts would be chaos, especially when spanning components like:

  • OpenShift Masters and Workers
  • Bastion host for tool access
  • Load balancers
  • DNS/NTP/DHCP infrastructure

Designing the Infrastructure

Networking

I kept a tight, controlled network:

  • Masters/Workers: Static IPs (e.g., 172.20.10.71–76) for all OpenShift nodes.
  • Bastion Host: Debian-based at 172.20.10.60, serving as the nerve center for load balancing, DNS, NTP, and DHCP.

Make sure your machines configure their DNS to point to 172.20.10.60.

Manual DNS & DHCP with PiHole

Automating DNS for OpenShift’s strict needs was tricky. Native cloud DNS was off the table, so I ran PiHole in Docker on Debian—the bastion host. I used its flexibility to:

  • Serve DNS for all OpenShift cluster FQDNs and wildcard apps endpoints.
  • Act as NTP and DHCP as well.

Custom dnsmasq configs let me create precise FQDN-to-IP mappings for the API servers, worker/masters, and wildcard routes, all pointing to the right local addresses.

External Load Balancer with HAProxy

OpenShift needs a robust load balancer, even on a tiny lab. I chose to run HAProxy in Docker on the same Debian bastion, exposing critical ports (6443 for Kubernetes API, 443/80 for app ingress) for the cluster. Docker Compose was invaluable here for managing both services simply from one YAML.

Notable Configuration Details

  • Passwords & Access Management: I set unique root/admin passwords for ESXi nodes, containers, and OpenShift users (“kubeadmin” and custom cluster-admins like “ivan” and “john”).
  • Single Host/Multiple Roles: Running everything (load balancer, DNS, DHCP, NTP) on one Debian VM required careful port mapping and resource isolation in Docker Compose.

Adding User Authentication with htpasswd

Setting up htpasswd-based authentication was crucial for locally controlled access within the cluster. Using a static user and password file on the bastion host, I was able to:

  • Define dedicated users (e.g., “ivan” and “john”) with cluster-admin roles for operational flexibility.
  • Centralize password management while ensuring that adding/removing users was straightforward through the OAuth configuration.
  • Quickly revoke privileges if needed—an advantage in environments where staff or test users rotate frequently.

With htpasswd, every change is immediately reflected after recycling the OAuth pods.

Surviving 12 Cluster Reinstalls – Lessons in Patience and Automation

The most time-consuming challenge was network instability. In my lab, erratic connectivity and periodic DHCP or DNS races caused OpenShift install failures that required me to recreate the environment nearly a dozen times. Here’s what I learned:

  • Document Every Step: Each reinstall, although frustrating, helped me refine my spreadsheet, scripting, and infra tracking.
  • Automate Where You Can: Scripts for DNS, DHCP, and Docker service re-creation became essential.
  • Validate Before Deploying: Always confirm DNS, NTP, load balancer health and certificate time sync before starting openshift-install.

Core Challenges

1. DNS and DHCP Customization

OpenShift is unforgiving about DNS. Every node and every wildcard app route must resolve internally. PiHole’s dnsmasq extensions meant I could fully customize this, but debugging typos or missing records was stressful—Pods won’t schedule, or API won’t work, until it’s perfect.

2. Reliable Load Balancing

Configuring HAProxy for TCP passthrough (to API) and HTTP(S) termination (for apps) was complex. Minor config mistakes meant master nodes couldn’t join or the web console was flaky.

3. Keeping Bastion as a Single Point of Failure

Consolidating all critical infra on one Debian box made backups and monitoring essential, as that VM going down would bring the whole cluster offline.

4. Documentation, Documentation, Documentation

Solid documentation, good use of Docker, and attention to networking details were the keys that turned my project from a fragile stack of VMs into a robust, production-like cluster in my lab. Without my continually updated spreadsheet, troubleshooting would’ve taken ten times longer. Each IP, password, DNS record, and config file needed to be tracked.

Keys to Success & Advice

  • Containerization: Running infra as containers (PiHole, HAProxy) on Debian made upgrades and restarts painless.
  • Consistent Timekeeping: Configuring every node to use the same NTP server on the bastion avoided weird certificate issues.
  • Pre-Testing DNS/Load Balancer: Before starting the OpenShift installer, I confirmed all FQDNs resolved and load balancing was working properly via manual curl and dig checks.
  • Leaning on Official Docs: Each customization (user auth, logos, alerts) had an official Red Hat doc at hand for guidance.

If you’re about to start a similar journey, don’t forget: track everything, test services independently, and be ready to debug at every layer!

Next Steps: Dell VNX5200 Storage and CSI Drivers

One major infrastructure gap was shared storage. My Dell VNX5200 SAN is enterprise-grade, but does not support Kubernetes/Red Hat CSI drivers natively. That left my cluster without automated persistent storage—a substantial hurdle for app workloads.

Plan: I’m testing the ember-csi (EMC/legacy storage CSI abstraction) as a workaround. If successful, I hope to:

  • Enable dynamic PersistentVolume provisioning for the cluster.
  • Integrate existing VNX5200 LUNs into OpenShift storage classes.
  • Avoid the need for manual NFS or ISCSI target config on each node.

The ember-csi project could bridge this legacy-to-modern gap, letting older infrastructure play nicely with container orchestration.

In summary:
Implementing authentication via htpasswd gave me precise control over user access. Surviving 12+ reinstalls in an unreliable network turned me into a documentation and automation fanatic. Next, storage innovation with ember CSI will (hopefully!) unlock persistent storage for apps in this hybrid lab.

If you’re embarking on a similar OpenShift journey, keep iterating, automate relentlessly, and don’t let old hardware limit your ambitions!

Leave a Reply

Your email address will not be published. Required fields are marked *