How To Deploy OpenShift Container Platform 4.13 on KVM

In this guide we will perform an installation of Red Hat OpenShift Container Platform 4.13 on KVM Virtual Machines. OpenShift is a powerful, platform agnostic, enterprise-grade Kubernetes distribution focused on developer experience and application security. The project is developed and owned by Red Hat Software company. OpenShift Container Platform is built around containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux.

The OpenShift platform offers automated installation, upgrades, and lifecycle management throughout the container stack – from the operating system, Kubernetes and cluster services, to deployed applications. Operating system that will be used on both the Control plan and Worker machines is Red Hat CoreOS (RHCOS). The RHCOS OS includes the kubelet, which is the Kubernetes node agent, and the CRI-O container runtime optimized for Kubernetes workloads.

In my installation the deployment is performed on a single node KVM compute server. This is not a production setup with high availability and should only be used for proof-of-concept and demo related purposes.

Red Hat’s recommendation on each cluster virtual machine minimum hardware requirements is as shown in the table below:

Virtual Machine	Operating System	vCPU	Virtual RAM	Storage
Bootstrap	RHCOS	4	16 GB	120 GB
Control plane	RHCOS	4	16 GB	120 GB
Compute	RHCOS	2	8 GB	120 GB

But the preferred requirements for each cluster virtual machine are:

Virtual Machine	Operating System	vCPU	Virtual RAM	Storage
Bootstrap	RHCOS	4	16 GB	120 GB
Control plane	RHCOS	8	16 GB	120 GB
Compute	RHCOS	6	8 GB	120 GB

The shared hardware requirements information for the virtual machines is not accurate since it depends on the workloads and desired cluster size when running in Production. Sizing can be done as deemed fit.

My Lab environment variables

OpenShift 4 Cluster base domain: example.com ( to be substituted accordingly)
OpenShift 4 Cluster name: ocp4 ( to be substituted accordingly)
OpenShift KVM network bridge: openshift4
OpenShift Network Block: 192.168.100.0/24
OpenShift Network gateway address: 192.168.100.1
Bastion / Helper node IP Address (Runs DHCP, Apache httpd, HAProxy, PXE, DNS) – 192.168.100.254
NTP server used: time.google.com

Used Mac Addresses and IP Addresses:

Machine Name	Mac Address (Generate yours and use)	DHCP Reserved IP Address
bootstrap.ocp4.example.com	52:54:00:a4:db:5f	192.168.100.10
master01.ocp4.example.com	52:54:00:8b:a1:17	192.168.100.11
master02.ocp4.example.com	52:54:00:ea:8b:9d	192.168.100.12
master03.ocp4.example.com	52:54:00:f8:87:c7	192.168.100.13
worker01.ocp4.example.com	52:54:00:31:4a:39	192.168.100.21
worker02.ocp4.example.com	52:54:00:6a:37:32	192.168.100.22
worker03.ocp4.example.com	52:54:00:95:d4:ed	192.168.100.23

Step 1: Setup KVM Infrastructure (On Hypervisor Node)

Install KVM in your hypervisor node using any of the guides in below links:

After installation verify your server CPU has support for Intel VT or AMD-V Virtualization extensions:

cat /proc/cpuinfo | egrep "vmx|svm"

Creating Virtual Network (optional, you can use existing network)

Create a new virtual network configuration file

vim virt-net.xml

File contents:

<network>
  <name>openshift4</name>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='openshift4' stp='on' delay='0'/>
  <domain name='openshift4'/>
  <ip address='192.168.100.1' netmask='255.255.255.0'>
  </ip>
</network>

Create a virtual network using this file file created; modify if need be:

$ sudo virsh net-define --file virt-net.xml
Network openshift4 defined from virt-net.xml

Set the network to autostart on boot

$ sudo virsh net-autostart openshift4
Network openshift4 marked as autostarted

$ sudo virsh net-start openshift4
Network openshift4 started

Confirm that the bridge is available and active:

$ brctl show
bridge name	bridge id		STP enabled	interfaces
openshift4		8000.5254002b479a	yes
virbr0		8000.525400ad641d	yes

Step 2: Create Bastion / Helper Virtual Machine

Create a Virtual Machine that will host some key services from officially provided virt-builder images. The virtual machine will be used to run the following services:

DNS Server (Bind)
Apache httpd web server
HAProxy Load balancer
DHCP & PXE/TFTP services
It will also be our bastion server for deploying and managing OpenShift platform (oc, openshift-install, kubectl, ansible)

Let’s first display available OS templates with command below:

$ virt-builder -l

I’ll create a VM image from fedora-38 template; you can also choose a CentOS template(8 or 7):

sudo virt-builder fedora-38  --format qcow2 \
  --size 20G -o /var/lib/libvirt/images/ocp-bastion-server.qcow2 \
  --root-password password:StrongRootPassw0rd

Where:

fedora-38 is the template used to create a new virtual machine
/var/lib/libvirt/images/ocp-bastion-server.qcow2 is the path to VM qcow2 image
StrongRootPassw0rd is the root user password

VM image creation progress will be visible in your screen

[   1.0] Downloading: http://builder.libguestfs.org/fedora-38.xz
########################################################################################################################################################### 100.0%
[  15.3] Planning how to build this image
[  15.3] Uncompressing
[  18.2] Resizing (using virt-resize) to expand the disk to 20.0G
[  39.7] Opening the new disk
[  44.1] Setting a random seed
[  44.1] Setting passwords
[  45.1] Finishing off
                   Output file: /var/lib/libvirt/images/ocp-bastion-server.qcow2
                   Output size: 20.0G
                 Output format: qcow2
            Total usable space: 20.0G
                    Free space: 19.0G (94%)

Now create a Virtual Machine to be used as DNS and DHCP server with virt-install

Using Linux bridge:

sudo virt-install \
  --name ocp-bastion-server \
  --ram 4096 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/ocp-bastion-server.qcow2 \
  --os-type linux \
  --os-variant rhel8.0 \
  --network bridge=openshift4 \
  --graphics none \
  --serial pty \
  --console pty \
  --boot hd \
  --import

Using openVSwitch bridge: Ref How To Use Open vSwitch Bridge on KVM Virtual Machines

sudo virt-install \
  --name ocp-bastion-server \
  --ram 4096 \
  --disk path=/var/lib/libvirt/images/ocp-bastion-server.qcow2 \
  --vcpus 2 \
  --os-type linux \
  --os-variant rhel8.0 \
  --network=bridge:openshift4,model=virtio,virtualport_type=openvswitch \
  --graphics none \
  --serial pty \
  --console pty \
  --boot hd \
  --import

When your VM is created and running login as root user and password set initially:

Fedora 38 (Thirty Eight)
fedora login: root
Password: StrongRootPassw0rd

You can reset root password after installation if that’s your desired action:

[root@fedora ~]# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

If the server didn’t get IP address from DHCP server you can set static IP manually on the primary interface:

# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:21:fb:33 brd ff:ff:ff:ff:ff:ff

Setting up the IP address using NMCLI:

nmcli con delete "Wired connection 1"
nmcli con add type ethernet con-name enp1s0 ifname enp1s0 \
  connection.autoconnect yes ipv4.method manual \
  ipv4.address 192.168.100.254/24 ipv4.gateway 192.168.100.1 \
  ipv4.dns 8.8.8.8

Test external connectivity from the VM:

# ping -c 2 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=4.98 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.14 ms

--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 4.981/5.061/5.142/0.080 ms

# ping -c 2 google.com
PING google.com (172.217.18.110) 56(84) bytes of data.
64 bytes from zrh04s05-in-f110.1e100.net (172.217.18.110): icmp_seq=1 ttl=118 time=4.97 ms
64 bytes from fra16s42-in-f14.1e100.net (172.217.18.110): icmp_seq=2 ttl=118 time=5.05 ms

--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 4.971/5.008/5.045/0.037 ms

Perform OS upgrade before deploying other services.

sudo dnf -y upgrade
sudo dnf -y install git vim wget curl bash-completion tree tar libselinux-python3 firewalld

Reboot the server after the upgrade is done.

sudo reboot

Confirm you can access the VM through virsh console or ssh

$ sudo virsh list
 Id   Name                  State
-------------------------------------
 1    ocp-bastion-server    running

$ sudo virsh console ocp-bastion-server
Connected to domain 'ocp-bastion-server'
Escape character is ^] (Ctrl + ])
<ENTE>
fedora login:

Enable domain autostart:

sudo virsh autostart ocp-bastion-server

Step 3: Install Ansible and Configure variables on Bastion / Helper node

Install Ansible configuration management tool on the Bastion machine

# Fedora
sudo dnf -y install git ansible vim wget curl bash-completion tree tar libselinux-python3

# CentOS 8 / Rocky Linux 8
sudo yum -y install epel-release
sudo yum -y install git ansible vim wget curl bash-completion tree tar libselinux-python3

# CentOS 7
sudo yum -y install epel-release
sudo yum -y install git ansible vim wget curl bash-completion tree tar libselinux-python

We have a Github repository with all the tasks and templates used in this guide. Clone the project to ~/ocp4_ansible directory.

cd ~/
git clone https://github.com/jmutai/ocp4_ansible.git
cd ~/ocp4_ansible

You can view the directory structure using tree command:

$ tree
.
├── ansible.cfg
├── files
│   └── set-dns-serial.sh
├── handlers
│   └── main.yml
├── inventory
├── LICENSE
├── README.md
├── tasks
│   ├── configure_bind_dns.yml
│   ├── configure_dhcpd.yml
│   ├── configure_haproxy_lb.yml
│   └── configure_tftp_pxe.yml
├── templates
│   ├── default.j2
│   ├── dhcpd.conf.j2
│   ├── dhcpd-uefi.conf.j2
│   ├── haproxy.cfg.j2
│   ├── named.conf.j2
│   ├── pxe-bootstrap.j2
│   ├── pxe-master.j2
│   ├── pxe-worker.j2
│   ├── reverse.j2
│   └── zonefile.j2
└── vars
    └── main.yml

5 directories, 21 files

Edit ansible configuration file and modify to suit your use.

$ vim ansible.cfg
[defaults]
inventory = inventory
command_warnings = False
filter_plugins = filter_plugins
host_key_checking = False
deprecation_warnings=False
retry_files = false

When not executing ansible as root user you can addprivilege_escalation section.

[privilege_escalation]
become = true
become_method = sudo
become_user = root
become_ask_pass = false

If running on the localhost the inventory can be set as below:

$ vim inventory
[vmhost]
localhost ansible_connection=local

These are service handlers created and will be referenced in bastion setup process tasks.

$ vim handlers/main.yml
---
- name: restart tftp
  service:
    name: tftp
    state: restarted

- name: restart bind
  service:
    name: named
    state: restarted

- name: restart haproxy
  service:
    name: haproxy
    state: restarted

- name: restart dhcpd
  service:
    name: dhcpd
    state: restarted

- name: restart httpd
  service:
    name: httpd
    state: restarted

Modify the default variables file inside vars folder:

vim vars/main.yml

Define all the variables required correctly. Be careful not to have wrong values which will cause issues at the time of OpenShift installation.

---
ppc64le: false
uefi: false
disk: vda                                  #disk where you are installing RHCOS on the masters/workers
helper:
  name: "bastion"                          #hostname for your helper node
  ipaddr: "192.168.100.254"                #current IP address of the helper
  networkifacename: "ens3"                 #interface of the helper node,ACTUAL name of the interface, NOT the NetworkManager name
dns:
  domain: "example.com"                    #DNS server domain. Should match  baseDomain inside the install-config.yaml file.
  clusterid: "ocp4"                        #needs to match what you will for metadata.name inside the install-config.yaml file
  forwarder1: "8.8.8.8"                    #DNS forwarder
  forwarder2: "1.1.1.1"                    #second DNS forwarder
  lb_ipaddr: "{{ helper.ipaddr }}"         #Load balancer IP, it is optional, the default value is helper.ipaddr
dhcp:
  router: "192.168.100.1"                  #default gateway of the network assigned to the masters/workers
  bcast: "192.168.100.255"                 #broadcast address for your network
  netmask: "255.255.255.0"                 #netmask that gets assigned to your masters/workers
  poolstart: "192.168.100.10"              #First address in your dhcp address pool
  poolend: "192.168.100.50"                #Last address in your dhcp address pool
  ipid: "192.168.100.0"                    #ip network id for the range
  netmaskid: "255.255.255.0"               #networkmask id for the range.
  ntp: "time.google.com"                   #ntp server address
  dns: ""                                  #domain name server, it is optional, the default value is set to helper.ipaddr
bootstrap:
  name: "bootstrap"                        #hostname (WITHOUT the fqdn) of the bootstrap node 
  ipaddr: "192.168.100.10"                 #IP address that you want set for bootstrap node
  macaddr: "52:54:00:a4:db:5f"             #The mac address for dhcp reservation
masters:
  - name: "master01"                       #hostname (WITHOUT the fqdn) of the master node (x of 3)
    ipaddr: "192.168.100.11"               #The IP address (x of 3) that you want set
    macaddr: "52:54:00:8b:a1:17"           #The mac address for dhcp reservation
  - name: "master02"
    ipaddr: "192.168.100.12"
    macaddr: "52:54:00:ea:8b:9d"
  - name: "master03"
    ipaddr: "192.168.100.13"
    macaddr: "52:54:00:f8:87:c7"
workers:
  - name: "worker01"                       #hostname (WITHOUT the fqdn) of the worker node you want to set
    ipaddr: "192.168.100.21"               #The IP address that you want set (1st node)
    macaddr: "52:54:00:31:4a:39"           #The mac address for dhcp reservation (1st node)
  - name: "worker02"
    ipaddr: "192.168.100.22"
    macaddr: "52:54:00:6a:37:32"
  - name: "worker03"
    ipaddr: "192.168.100.23"
    macaddr: "52:54:00:95:d4:ed"

Generating unique mac addresses for bootstrap, worker and master nodes

You can generate all required mac addresses using the command below:

date +%s | md5sum | head -c 6 | sed -e 's/\([0-9A-Fa-f]\{2\}\)/\1:/g' -e 's/\(.*\):$/\1/' | sed -e 's/^/52:54:00:/'

Step 4: Install and Configure DHCP serveron Bastion / Helper node

Install dhcp-server rpm package using dnf or yum package manager.

sudo yum -y install dhcp-server

Enable dhcpd service to start on system boot

$ sudo systemctl enable dhcpd
Created symlink /etc/systemd/system/multi-user.target.wants/dhcpd.service → /usr/lib/systemd/system/dhcpd.service.

Backup current dhcpd configuration file. If the server is not new you can modify existing configuration

sudo mv /etc/dhcp/dhcpd.conf /etc/dhcp/dhcpd.conf.bak

Task to configure dhcp server on the bastion server:

$ vim tasks/configure_dhcpd.yml
---
# Setup OCP4 DHCP Server on Helper Node

- hosts: all
  vars_files:
    - ../vars/main.yml
  handlers:
  - import_tasks: ../handlers/main.yml

  tasks:
  - name: Write out dhcp file
    template:
      src: ../templates/dhcpd.conf.j2
      dest: /etc/dhcp/dhcpd.conf
    notify:
      - restart dhcpd
    when: not uefi
  - name: Write out dhcp file (UEFI)
    template:
      src: ../templates/dhcpd-uefi.conf.j2
      dest: /etc/dhcp/dhcpd.conf
    notify:
      - restart dhcpd
    when: uefi

Configure DHCP server using ansible, defined variables and templates shared.

$ ansible-playbook tasks/configure_dhcpd.yml

PLAY [all] *******************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************
ok: [localhost]

TASK [Write out dhcp file] ***************************************************************************************************************************************
changed: [localhost]

TASK [Write out dhcp file (UEFI)] ********************************************************************************************************************************
skipping: [localhost]

RUNNING HANDLER [restart dhcpd] **********************************************************************************************************************************
changed: [localhost]

PLAY RECAP *******************************************************************************************************************************************************
localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

Confirm that dhcpd service is in running state:

$ systemctl status dhcpd
● dhcpd.service - DHCPv4 Server Daemon
     Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; enabled; vendor preset: disabled)
     Active: active (running) since Tue 2021-08-17 19:35:06 EDT; 2min 42s ago
       Docs: man:dhcpd(8)
             man:dhcpd.conf(5)
   Main PID: 24958 (dhcpd)
     Status: "Dispatching packets..."
      Tasks: 1 (limit: 4668)
     Memory: 9.7M
        CPU: 17ms
     CGroup: /system.slice/dhcpd.service
             └─24958 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid
...

You can as well check generated configuration file:

$ cat /etc/dhcp/dhcpd.conf

Step 4: Configure OCP Zone on Bind DNS Serveron Bastion / Helper node

We can now begin the installation of DNS and DHCP server packages required to run OpenShift Container Platform on KVM.

sudo yum -y install bind bind-utils

Enable the service to start at system boot up

sudo systemctl enable named

Install DNS Serialnumber generator script:

$ sudo vim /usr/local/bin/set-dns-serial.sh
#!/bin/bash
dnsserialfile=/usr/local/src/dnsserial-DO_NOT_DELETE_BEFORE_ASKING_CHRISTIAN.txt
zonefile=/var/named/zonefile.db
if [ -f zonefile ] ; then
	echo $[ $(grep serial ${zonefile}  | tr -d "\t"" ""\n"  | cut -d';' -f 1) + 1 ] | tee ${dnsserialfile}
else
	if [ ! -f ${dnsserialfile} ] || [ ! -s ${dnsserialfile} ]; then
		echo $(date +%Y%m%d00) | tee ${dnsserialfile}
	else
		echo $[ $(< ${dnsserialfile}) + 1 ] | tee ${dnsserialfile}
	fi
fi
##
##-30-

Make the script executable:

sudo chmod a+x /usr/local/bin/set-dns-serial.sh

This is the DNS Configuration task to be used:

$ vim tasks/configure_bind_dns.yml
---
# Configure OCP4 DNS Server on Helper Node

- hosts: all
  vars_files:
    - ../vars/main.yml
  handlers:
  - import_tasks: ../handlers/main.yml

  tasks:
  - name: Setup named configuration files
    block:
    - name: Write out named file
      template:
        src: ../templates/named.conf.j2
        dest: /etc/named.conf
      notify:
        - restart bind
    - name: Set zone serial number
      shell: "/usr/local/bin/set-dns-serial.sh"
      register: dymanicserialnumber

    - name: Setting serial number as a fact
      set_fact:
        serialnumber: "{{ dymanicserialnumber.stdout }}"

    - name: Write out "{{ dns.domain | lower }}" zone file
      template:
        src: ../templates/zonefile.j2
        dest: /var/named/zonefile.db
        mode: '0644'
      notify:
        - restart bind

    - name: Write out reverse zone file
      template:
        src: ../templates/reverse.j2
        dest: /var/named/reverse.db
        mode: '0644'
      notify:
        - restart bind

Run ansible playbook to configure bind dns server for OpenShift deployment.

$ ansible-playbook tasks/configure_bind_dns.yml
ansible-playbook tasks/configure_bind_dns.yml

PLAY [all] *******************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************
ok: [localhost]

TASK [Write out named file] **************************************************************************************************************************************
changed: [localhost]

TASK [Set zone serial number] ************************************************************************************************************************************
changed: [localhost]

TASK [Setting serial number as a fact] ***************************************************************************************************************************
changed: [localhost]

TASK [Write out "example.com" zone file] **********************************************************************************************************************
changed: [localhost]

TASK [Write out reverse zone file] *******************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [restart bind] ***********************************************************************************************************************************
changed: [localhost]

PLAY RECAP *******************************************************************************************************************************************************
localhost                  : ok=7    changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Forward DNS zone file is created under /var/named/zonefile.db and reverse DNS lookup file is /var/named/reverse.db

Check if the service is in running status:

$ systemctl status named
● named.service - Berkeley Internet Name Domain (DNS)
     Loaded: loaded (/usr/lib/systemd/system/named.service; disabled; vendor preset: disabled)
     Active: active (running) since Wed 2021-08-11 16:19:38 EDT; 4s ago
    Process: 1340 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else echo "Checking of zo>
    Process: 1342 ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} $OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 1344 (named)
      Tasks: 6 (limit: 4668)
     Memory: 26.3M
        CPU: 53ms
     CGroup: /system.slice/named.service
             └─1344 /usr/sbin/named -u named -c /etc/named.conf

Aug 11 16:19:38 fedora named[1344]: network unreachable resolving './NS/IN': 2001:500:1::53#53
Aug 11 16:19:38 fedora named[1344]: network unreachable resolving './NS/IN': 2001:500:200::b#53
Aug 11 16:19:38 fedora named[1344]: network unreachable resolving './NS/IN': 2001:500:9f::42#53
Aug 11 16:19:38 fedora named[1344]: network unreachable resolving './NS/IN': 2001:7fe::53#53
Aug 11 16:19:38 fedora named[1344]: network unreachable resolving './NS/IN': 2001:503:c27::2:30#53
Aug 11 16:19:38 fedora named[1344]: zone 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN: loaded serial 0
Aug 11 16:19:38 fedora named[1344]: all zones loaded
Aug 11 16:19:38 fedora named[1344]: managed-keys-zone: Initializing automatic trust anchor management for zone '.'; DNSKEY ID 20326 is now trusted, waiving the n>
Aug 11 16:19:38 fedora named[1344]: running
Aug 11 16:19:38 fedora systemd[1]: Started Berkeley Internet Name Domain (DNS).

To test our DNS server we just execute:

$ dig @127.0.0.1 -t srv _etcd-server-ssl._tcp.ocp4.example.com

; <<>> DiG 9.16.19-RH <<>> @127.0.0.1 -t srv _etcd-server-ssl._tcp.ocp4.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57264
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e694eee032b927690100000061143bcf3df96ad3e49125d0 (good)
;; QUESTION SECTION:
;_etcd-server-ssl._tcp.ocp4.example.com. IN SRV

;; ANSWER SECTION:
_etcd-server-ssl._tcp.ocp4.example.com. 86400	IN SRV 0 10 2380 etcd-1.ocp4.example.com.
_etcd-server-ssl._tcp.ocp4.example.com. 86400	IN SRV 0 10 2380 etcd-2.ocp4.example.com.
_etcd-server-ssl._tcp.ocp4.example.com. 86400	IN SRV 0 10 2380 etcd-0.ocp4.example.com.

;; ADDITIONAL SECTION:
etcd-0.ocp4.example.com. 86400 IN	A	192.168.100.11
etcd-1.ocp4.example.com. 86400 IN	A	192.168.100.12
etcd-2.ocp4.example.com. 86400 IN	A	192.168.100.13

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Aug 11 17:06:23 EDT 2021
;; MSG SIZE  rcvd: 280

As we’ve confirmed DNS server to be working as anticipated we can modify local system DNS server

$ nmcli connection show
NAME    UUID                                  TYPE      DEVICE
enp1s0  c0ab6b8c-0eac-a1b4-1c47-efe4b2d1191f  ethernet  enp1s0

$ nmcli connection modify enp1s0  ipv4.dns "192.168.100.254"
$ nmcli connection reload
$ nmcli connection up enp1s0

We can re-test if our Resolution is correct with:

$ host bootstrap.ocp4.example.com
bootstrap.ocp4.example.com has address 192.168.100.10

Open firewall ports on the machine

sudo firewall-cmd --add-service={dhcp,tftp,http,https,dns} --permanent
sudo firewall-cmd --reload

Step 5: Setup TFTP Serviceon Bastion / Helper node

Install ftp related packages

sudo yum -y install tftp-server syslinux

Allow service in the firewall

sudo firewall-cmd --add-service=tftp --permanent
sudo firewall-cmd --reload

Create TFTP Systemd unit file

$ sudo vim /etc/systemd/system/helper-tftp.service
[Unit]
Description=Starts TFTP on boot because of reasons
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/start-tftp.sh
TimeoutStartSec=0
Restart=always
RestartSec=30

[Install]
WantedBy=default.target

Create TFTP Systemd helper script

sudo tee /usr/local/bin/start-tftp.sh<<EOF
#!/bin/bash
/usr/bin/systemctl start tftp > /dev/null 2>&1
##
##
EOF

Give the script execution bits:

sudo chmod a+x /usr/local/bin/start-tftp.sh

Reload Systemd daemon

sudo systemctl daemon-reload

Start tftp service

sudo systemctl enable --now tftp helper-tftp

Populate the default files for tftpboot

sudo mkdir -p  /var/lib/tftpboot/pxelinux.cfg

Copy syslinux files needed for PXE boot

sudo cp -rvf /usr/share/syslinux/* /var/lib/tftpboot

Create a directory for hosting the kernel and initramfs for PXE boot

sudo mkdir -p /var/lib/tftpboot/rhcos

Files to be downloaded

Obtain the RHEL kernel, initramfs, and rootfs files from the RHCOS image mirror page. The three main files to be downloaded:

kernel: rhcos-<version>-live-kernel-<architecture>
initramfs: rhcos-<version>-live-initramfs.<architecture>.img
rootfs: rhcos-<version>-live-rootfs.<architecture>.img

Download the CoreOS kernel file to this directory:

wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/rhcos-installer-kernel-x86_64
sudo mv rhcos-installer-kernel-x86_64 /var/lib/tftpboot/rhcos/kernel

Then the CoreOS Installer initramfs image:

wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/rhcos-installer-initramfs.x86_64.img
sudo mv rhcos-installer-initramfs.x86_64.img /var/lib/tftpboot/rhcos/initramfs.img

Now we ned to relabel the files for selinux:

sudo restorecon -RFv  /var/lib/tftpboot/rhcos

List files in the directory:

$ ls /var/lib/tftpboot/rhcos
initramfs.img kernel

Apache httpd configurations

Install httpd server package

sudo yum -y install httpd

We need to change the configuration of the httpd from Listen on port 80 to Listen on Port 8080:

sudo vim /etc/httpd/conf/httpd.conf

Search for the Line:

Listen 80

Change the line to:

Listen 8080

After that start httpd:

sudo systemctl enable httpd
sudo systemctl restart httpd

Open port 8080 in the firewall

sudo firewall-cmd --add-port=8080/tcp --permanent
sudo firewall-cmd --reload

Create a directory in your web server root directory for CoreOS rootfs image

sudo mkdir -p /var/www/html/rhcos

Download Red Hat CoreOSrootfs image:

wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/rhcos-live-rootfs.x86_64.img

Move the file to directory we created

sudo mv rhcos-live-rootfs.x86_64.img /var/www/html/rhcos/rootfs.img
sudo restorecon -RFv /var/www/html/rhcos

You can modify TFTP / PXE configuration task if need be:

$ vim tasks/configure_tftp_pxe.yml
---
# Configure OCP4 TFTP/PXE on Helper Node

- hosts: all
  vars_files:
    - ../vars/main.yml
  handlers:
  - import_tasks: ../handlers/main.yml

  tasks:
  - name: Set the bootstrap specific tftp file
    template:
      src: ../templates/pxe-bootstrap.j2
      dest: "/var/lib/tftpboot/pxelinux.cfg/01-{{ bootstrap.macaddr | lower | regex_replace (':', '-')}}"
      mode: 0555
    notify:
      - restart tftp
    when: bootstrap is defined

  - name: Set the master specific tftp files
    template:
      src: ../templates/pxe-master.j2
      dest: "/var/lib/tftpboot/pxelinux.cfg/01-{{ item.macaddr | regex_replace (':', '-')}}"
      mode: 0555
    with_items: "{{ masters | lower }}"
    notify:
      - restart tftp

  - name: Set the worker specific tftp files
    template:
      src: ../templates/pxe-worker.j2
      dest: "/var/lib/tftpboot/pxelinux.cfg/01-{{ item.macaddr | regex_replace (':', '-')}}"
      mode: 0555
    with_items: "{{ workers | lower }}"
    notify:
      - restart tftp
    when:
      - workers is defined
      - workers | length > 0

Configure PXE environment for RHCOS using ansible

$ ansible-playbook tasks/configure_tftp_pxe.yml

PLAY [all] *******************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************
ok: [localhost]

TASK [Set the bootstrap specific tftp file] **********************************************************************************************************************
changed: [localhost]

TASK [Set the master specific tftp files] ************************************************************************************************************************
changed: [localhost] => (item={'name': 'master01', 'ipaddr': '192.168.100.11', 'macaddr': '52:54:00:8b:a1:17'})
changed: [localhost] => (item={'name': 'master02', 'ipaddr': '192.168.100.12', 'macaddr': '52:54:00:ea:8b:9d'})
changed: [localhost] => (item={'name': 'master03', 'ipaddr': '192.168.100.13', 'macaddr': '52:54:00:f8:87:c7'})

TASK [Set the worker specific tftp files] ************************************************************************************************************************
changed: [localhost] => (item={'name': 'worker01', 'ipaddr': '192.168.100.21', 'macaddr': '52:54:00:31:4a:39'})
changed: [localhost] => (item={'name': 'worker02', 'ipaddr': '192.168.100.22', 'macaddr': '52:54:00:6a:37:32'})
changed: [localhost] => (item={'name': 'worker03', 'ipaddr': '192.168.100.23', 'macaddr': '52:54:00:95:d4:ed'})

RUNNING HANDLER [restart tftp] ***********************************************************************************************************************************
changed: [localhost]

PLAY RECAP *******************************************************************************************************************************************************
localhost                  : ok=5    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Headless environment considerations

With the consideration of the fact that we’re working in a headless environment, minimal setup of KVM without graphical interface. We need to ensure CoreOS booted VM will automatically choose the correct image and ignition file for the OS installation.

PXE Boot files are created inside the directory /var/lib/tftpboot/pxelinux.cfg

NOTE: Each of the file created should have a 01- before the MAC Address. See below example of bootstrap node.

Bootstrap node

Mac Address:

52:54:00:a4:db:5f

The file created will be

cat /var/lib/tftpboot/pxelinux.cfg/01-52-54-00-a4-db-5f

With contents:

default menu.c32
 prompt 1
 timeout 9
 ONTIMEOUT 1
 menu title ######## PXE Boot Menu ########
 label 1
 menu label ^1) Install Bootstrap Node
 menu default
 kernel rhcos/kernel
 append initrd=rhcos/initramfs.img nomodeset rd.neednet=1 console=tty0 console=ttyS0 ip=dhcp coreos.inst=yes coreos.inst.install_dev=vda coreos.live.rootfs_url=http://192.168.100.254:8080/rhcos/rootfs.img coreos.inst.ignition_url=http://192.168.100.254:8080/ignition/bootstrap.ign

Master nodes

The file for each master has contents similar to this:

default menu.c32
 prompt 1
 timeout 9
 ONTIMEOUT 1
 menu title ######## PXE Boot Menu ########
 label 1
 menu label ^1) Install Master Node
 menu default
 kernel rhcos/kernel
 append initrd=rhcos/initramfs.img nomodeset rd.neednet=1 console=tty0 console=ttyS0 ip=dhcp coreos.inst=yes coreos.inst.install_dev=vda coreos.live.rootfs_url=http://192.168.100.254:8080/rhcos/rootfs.img coreos.inst.ignition_url=http://192.168.100.254:8080/ignition/master.ign

Worker nodes

The file for each worker node will looks similar to this:

default menu.c32
 prompt 1
 timeout 9
 ONTIMEOUT 1
 menu title ######## PXE Boot Menu ########
 label 1
 menu label ^1) Install Worker Node
 menu default
 kernel rhcos/kernel
 append initrd=rhcos/initramfs.img nomodeset rd.neednet=1 console=tty0 console=ttyS0 ip=dhcp coreos.inst=yes coreos.inst.install_dev=vda coreos.live.rootfs_url=http://192.168.100.254:8080/rhcos/rootfs.img coreos.inst.ignition_url=http://192.168.100.254:8080/ignition/worker.ign

You can list all the files created using the following command:

$ ls -1 /var/lib/tftpboot/pxelinux.cfg
01-52:54:00:31:4a:39
01-52:54:00:6a:37:32
01-52:54:00:8b:a1:17
01-52:54:00:95:d4:ed
01-52:54:00:a4:db:5f
01-52:54:00:ea:8b:9d
01-52:54:00:f8:87:c7

Step 6: Configure HAProxy as Load balanceron Bastion / Helper node

In this setup we’re using a software load balancer solution – HAProxy. In a Production setup of OpenShift Container Platform a hardware or highly available load balancer solution is required.

Install the package

sudo yum install -y haproxy

Set SEBool to allow haproxy connect any port:

sudo setsebool -P haproxy_connect_any 1

Backup the default HAProxy configuration

sudo mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.default

Here is HAProxy configuration ansible task:

$ vim tasks/configure_haproxy_lb.yml
---
# Configure OCP4 HAProxy Load balancer on Helper Node
- hosts: all
  vars_files:
    - ../vars/main.yml

  tasks:
  - name: Write out haproxy config file
    template:
      src: ../templates/haproxy.cfg.j2
      dest: /etc/haproxy/haproxy.cfg
    notify:
      - restart haproxy
  handlers:
  - name: restart haproxy
    ansible.builtin.service:
      name: haproxy
      state: restarted

Run ansible-playbook using created task to configure HAProxy Load balancer for OpenShift

$ ansible-playbook tasks/configure_haproxy_lb.yml

PLAY [all] *******************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************
ok: [localhost]

TASK [Write out haproxy config file] *****************************************************************************************************************************
changed: [localhost]

RUNNING HANDLER [restart haproxy] ********************************************************************************************************************************
changed: [localhost]

PLAY RECAP *******************************************************************************************************************************************************
localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Open the file for editing

sudo vim /etc/haproxy/haproxy.cfg

Configuration file is place in the file /etc/haproxy/haproxy.cfg

Configure SElinux for HAProxy to use the custom ports configured.

sudo semanage port  -a 6443 -t http_port_t -p tcp
sudo semanage port  -a 22623 -t http_port_t -p tcp
sudo semanage port -a 32700 -t http_port_t -p tcp

Open ports on the firewall

sudo firewall-cmd --add-service={http,https} --permanent
sudo firewall-cmd --add-port={6443,22623}/tcp --permanent
sudo firewall-cmd --reload

Step 7: Install OpenShift installer and CLI binaryon Bastion / Helper node

Download and install the OpenShift installer and client

OpenShift Client binary:

# Linux
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz
tar xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin
rm -f README.md LICENSE openshift-client-linux.tar.gz

# macOS
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-mac.tar.gz
tar xvf openshift-client-mac.tar.gz
sudo mv oc kubectl /usr/local/bin
rm -f README.md LICENSE openshift-client-mac.tar.gz

OpenShift installer binary:

# Linux
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux.tar.gz
tar xvf openshift-install-linux.tar.gz
sudo mv openshift-install /usr/local/bin
rm -f README.md LICENSE openshift-install-linux.tar.gz

# macOS
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-mac.tar.gz
tar xvf openshift-install-mac.tar.gz
sudo mv openshift-install /usr/local/bin
rm -f README.md LICENSE openshift-install-mac.tar.gz

Check if you can run binaries:

$ openshift-install version
openshift-install 4.13.4
built from commit 90acb3fa2990c35c9beeff4a188fb133fedba432
release image quay.io/openshift-release-dev/ocp-release@sha256:e3fb8ace9881ae5428ae7f0ac93a51e3daa71fa215b5299cd3209e134cadfc9c
release architecture amd64

$ oc version
Client Version: 4.13.4
Kustomize Version: v4.5.7

$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"05d83eff7e17160e679898a2a5cd6019ec252c49", GitTreeState:"clean", BuildDate:"2023-06-07T15:39:28Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7

Create SSH Key Pairs

Now we need to create a SSH key pair to access to use later to access the CoreOS nodes

ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa

Step 8: Generate ignition fileson Bastion / Helper node

We need to create the ignition files used for the installation of CoreOS machines

Download Pull Secret

We can store our pull secret in ~/.openshift directory:

mkdir ~/.openshift

Visit cloud.redhat.com and download your pull secret and save it under ~/.openshift/pull-secret

vim  ~/.openshift/pull-secret

Create ocp4 directory

mkdir -p ~/ocp4
cd ~/

We can now create OpenShift installation yaml file install-config-base.yaml:

cat <<EOF > install-config-base.yaml
apiVersion: v1
baseDomain: example.com
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: '$(< ~/.openshift/pull-secret)'
sshKey: '$(< ~/.ssh/id_rsa.pub)'
EOF

You can further modify the contents accordingly:

$ vim  install-config-base.yaml
apiVersion: v1
baseDomain: example.com
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: 'paste-as-obtained-from-https://cloud.redhat.com'
sshKey: 'PASTE-SSH-PUBLIC-KEY'

Copy the install-config-base.yaml file into the ocp4 directory with the name install-config.yaml

cd ~/
cp install-config-base.yaml ocp4/install-config.yaml

Change into ocp4 directory

cd ocp4

The directory must be empty every time you need to generate ignition files.

To create the Kubernetes manifest files run:

$ openshift-install create manifests
$ ls
manifests  openshift

# All files
$ tree
.
├── manifests
│   ├── 04-openshift-machine-config-operator.yaml
│   ├── cluster-config.yaml
│   ├── cluster-dns-02-config.yml
│   ├── cluster-infrastructure-02-config.yml
│   ├── cluster-ingress-02-config.yml
│   ├── cluster-network-01-crd.yml
│   ├── cluster-network-02-config.yml
│   ├── cluster-proxy-01-config.yaml
│   ├── cluster-scheduler-02-config.yml
│   ├── cvo-overrides.yaml
│   ├── kube-cloud-config.yaml
│   ├── kube-system-configmap-root-ca.yaml
│   ├── machine-config-server-tls-secret.yaml
│   ├── openshift-config-secret-pull-secret.yaml
│   └── openshift-kubevirt-infra-namespace.yaml
└── openshift
    ├── 99_kubeadmin-password-secret.yaml
    ├── 99_openshift-cluster-api_master-user-data-secret.yaml
    ├── 99_openshift-cluster-api_worker-user-data-secret.yaml
    ├── 99_openshift-machineconfig_99-master-ssh.yaml
    ├── 99_openshift-machineconfig_99-worker-ssh.yaml
    └── openshift-install-manifests.yaml

2 directories, 21 files

Disable pods scheduling on master nodes by changing mastersSchedulable parameter value from true to false

sed -i 's/true/false/' manifests/cluster-scheduler-02-config.yml

Now create the ignition files:

$ openshift-install create ignition-configs
INFO Consuming Common Manifests from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Master Machines from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Ignition-Configs created in: . and auth

Directory contents after generation of ignition files is as seen below

$ ls
auth  bootstrap.ign  master.ign  metadata.json  worker.ign
$ tree
.
├── auth
│   ├── kubeadmin-password
│   └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign

1 directory, 6 files

Copy the files to our httpd server /var/www/html inside ocp4 directory we created earlier on

sudo mkdir -p /var/www/html/ignition
sudo cp -v *.ign /var/www/html/ignition
sudo chmod 644 /var/www/html/ignition/*.ign
sudo restorecon -RFv /var/www/html/

Confirm files were copied

$ ls /var/www/html/ignition/
bootstrap.ign  master.ign  worker.ign

Ensure all services are enabled and running

sudo systemctl enable --now haproxy.service dhcpd httpd tftp named
sudo systemctl restart haproxy.service dhcpd httpd tftp named
sudo systemctl status haproxy.service dhcpd httpd tftp named

HAProxy service status

$ systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
     Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
     Active: active (running) since Wed 2021-08-11 20:05:40 EDT; 44s ago
    Process: 3129 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 3137 (haproxy)
      Tasks: 3 (limit: 4668)
     Memory: 34.6M
        CPU: 78ms
     CGroup: /system.slice/haproxy.service
             ├─3137 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
             └─3140 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Step 9: Create Bootstrap, Masters and Worker VMs(On Hypervisor Node)

Start with the creation of Bootstrap Virtual Machine.

sudo virt-install -n bootstrap.ocp4.example.com \
  --description "Bootstrap Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:a4:db:5f

To check if there are any errors in PXE boot process use the following command on the Bastion machine:

$ journalctl -f
# Or for specific service
$ journalctl -f -u tftp
$ journalctl -f -u dhcpd

A successful instance creation will give you an output on the console – with domain creation completion message and how to start it.

[   50.120576] [1226]: Remounting '/etc' read-only in with options 'seclabel,attr2,discard,inode64,logbufs=8,logbsize=32k,noquota'.
[   50.125181] [1227]: Unmounting '/etc'.
[   50.135577] [1228]: Remounting '/var' read-only in with options 'seclabel,attr2,discard,inode64,logbufs=8,logbsize=32k,noquota'.
[   50.138117] [1229]: Unmounting '/var'.
[   50.151944] XFS (loop0): Unmounting Filesystem
[   50.165537] systemd-shutdown[1]: All filesystems unmounted.
[   50.166730] systemd-shutdown[1]: Deactivating swaps.
[   50.167793] systemd-shutdown[1]: All swaps deactivated.
[   50.168888] systemd-shutdown[1]: Detaching loop devices.
[   50.170315] systemd-shutdown[1]: Not all loop devices detached, 1 left.
[   50.177714] kvm: exiting hardware virtualization
[   50.192437] reboot: Restarting system
[   50.193011] reboot: machine restart

Domain creation completed.
You can restart your domain by running:
  virsh --connect qemu:///system start bootstrap.ocp4.example.com

Start bootstrap node domain:

sudo virsh --connect qemu:///system start bootstrap.ocp4.example.com

Creation of the three Master nodes, set correct VM name, network, and mac address.

# Create Master01 VM
sudo virt-install -n master01.ocp4.example.com \
  --description "Master01 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:8b:a1:17

# Create Master02 VM
sudo virt-install -n master02.ocp4.example.com \
  --description "Master02 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:ea:8b:9d

# Create Master03 VM
sudo virt-install -n master03.ocp4.example.com \
  --description "Master03 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:f8:87:c7

Start Master nodes domains

$ sudo virsh --connect qemu:///system start master01.ocp4.example.com
Domain 'master01.ocp.example.com' started

$ sudo virsh --connect qemu:///system start master02.ocp4.example.com
Domain 'master02.ocp.example.com' started

$ sudo virsh --connect qemu:///system start master03.ocp4.example.com
Domain 'master03.ocp.example.com' started

Worker nodes installation using virt-install and PXE boot.

# Create Worker01 VM
sudo virt-install -n worker01.ocp4.example.com \
  --description "Worker01 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:31:4a:39
 
# Create Worker02 VM
sudo virt-install -n worker02.ocp4.example.com \
  --description "Worker02 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:6a:37:32

# Create Worker03 VM
sudo virt-install -n worker03.ocp4.example.com \
  --description "Worker03 Machine for Openshift 4 Cluster" \
  --ram=8192 \
  --vcpus=4 \
  --os-type=Linux \
  --os-variant=rhel8.0 \
  --noreboot \
  --disk pool=default,bus=virtio,size=50 \
  --graphics none \
  --serial pty \
  --console pty \
  --pxe \
  --network bridge=openshift4,mac=52:54:00:95:d4:ed

Start Worker machine domains

$ sudo virsh --connect qemu:///system start worker01.ocp4.example.com
Domain 'worker01.ocp.example.com' started

$ sudo virsh --connect qemu:///system start worker02.ocp4.example.com
Domain 'worker02.ocp.example.com' started

$ sudo virsh --connect qemu:///system start worker03.ocp4.example.com
Domain 'worker03.ocp.example.com' started

Once the master nodes are up and working the logs on the bootstrap node will show Succeeded

Aug 12 02:45:09 bootstrap.ocp.example.com bootkube.sh[61387]: Tearing down temporary bootstrap control plane...
Aug 12 02:45:10 bootstrap.ocp.example.com bootkube.sh[61387]: Sending bootstrap-finished event.Waiting for CEO to finish...
Aug 12 02:45:11 bootstrap.ocp.example.com bootkube.sh[61387]: W0812 02:45:11.179220       1 etcd_env.go:287] cipher is not supported for use with etcd: "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"
Aug 12 02:45:11 bootstrap.ocp.example.com bootkube.sh[61387]: W0812 02:45:11.179352       1 etcd_env.go:287] cipher is not supported for use with etcd: "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
Aug 12 02:45:11 bootstrap.ocp.example.com bootkube.sh[61387]: I0812 02:45:11.285760       1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully
Aug 12 02:45:11 bootstrap.ocp.example.com bootkube.sh[61387]: I0812 02:45:11.287327       1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd
Aug 12 02:45:11 bootstrap.ocp.example.com bootkube.sh[61387]: bootkube.service complete
Aug 12 02:45:11 bootstrap.ocp.example.com systemd[1]: bootkube.service: Succeeded.

Ensure all domains are set to autostart:

for i in {1..3}; do
	sudo virsh autostart master0${i}.ocp4.example.com
	sudo virsh autostart worker0${i}.ocp4.example.com
done

Validate the settings

$ virsh list --autostart
 Id   Name                           State
----------------------------------------------
 53   master01.ocp4.example.com   running
 55   master02.ocp4.example.com   running
 57   master03.ocp4.example.com   running
 59   worker01.ocp4.example.com   running
 61   worker03.ocp4.example.com   running
 63   worker02.ocp4.example.com   running

First, login to your cluster using generated kubeconfig file.

export KUBECONFIG=/root/ocp4/auth/kubeconfig

Or copy kubeconfig file to ~/.kube directory to make it default:

mkdir ~/.kube
sudo cp /root/ocp4/auth/kubeconfig ~/.kube/config
sudo chown $USER ~/.kube/config

Enable bash completion for oc and kubectl commands

$ vim ~/.bashrc
source <(oc completion bash)
source <(kubectl completion bash)

# Source bashrc file
$ source  ~/.bashrc

Run the following command to confirm who you’re logged into cluster as

$ oc whoami
system:admin

Printing API Endpoint URL:

$ oc whoami --show-server
https://api.ocp4.example.com:6443

Confirm the version of OpenShift deployed in your KVM powered infrastructure:

$ oc get clusterversions.config.openshift.io
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          25m     Working towards 4.13.4: 654 of 676 done (96% complete)

# Some minutes later
$ oc get clusterversions.config.openshift.io
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.4     True        False         43m     Cluster version is 4.13.4

Check available nodes and their current status:

# oc get nodes
NAME                          STATUS   ROLES    AGE   VERSION
master01.ocp.example.com      Ready    master   13m   vx.y.z
master02.ocp.example.com      Ready    master   13m   vx.y.z
master03.ocp.example.com      Ready    master   12m   vx.y.z

Your install may be waiting for worker nodes to get approved. Normally the machineconfig node approval operator takes care of this for you. However, sometimes this needs to be done manually. Check pending CSRs with the following command.

$ oc get csr
NAME                                       AGE     SIGNERNAME                                    REQUESTOR                                                                         CONDITION
csr-9cwd2                                  18m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Approved,Issued
csr-cqqvx                                  18m     kubernetes.io/kubelet-serving                 system:node:master01.ocp.example.com                                           Approved,Issued
csr-hdr7t                                  6m15s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Pending
csr-lcjsl                                  18m     kubernetes.io/kubelet-serving                 system:node:master02.ocp.example.com                                           Approved,Issued
csr-p9nj8                                  17m     kubernetes.io/kubelet-serving                 system:node:master03.ocp.example.com                                           Approved,Issued
csr-qkbrd                                  19m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Approved,Issued
csr-sxwlz                                  5m57s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Pending
csr-v244r                                  18m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         Approved,Issued
system:openshift:openshift-authenticator   17m     kubernetes.io/kube-apiserver-client           system:serviceaccount:openshift-authentication-operator:authentication-operator   Approved,Issued

You can approve all pending CSRs with the following command:

oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

List cluster nodes after csr approval

$ oc get nodes
NAME                          STATUS   ROLES    AGE    VERSION
master01.ocp4.example.com      Ready    master   22m    vx.y.z
master02.ocp4.example.com      Ready    master   22m    vx.y.z
master03.ocp4.example.com      Ready    master   21m    vx.y.z
worker01.ocp4.example.com      Ready    worker   2m2s   vx.y.z
worker02.ocp4.example.com      Ready    worker   2m4s   vx.y.z

To access a worker or master node shell use either of below methods:

# SSH
$ ssh core@master01.ocp4.example.com

# using oc debug
$ oc debug node/master01.ocp4.example.com
Starting pod/master01ocp4examplecom-debug ...
To use host binaries, run `chroot /host`

chroot /host
Pod IP: 192.168.100.11
If you don't see a command prompt, try pressing enter.
sh-4.4#
sh-4.4# chroot /host
sh-4.4# bash
[root@master01 /]#

Check the status of all Cluster operators. All should return [true in AVAILABLE] and [false in DEGRADED] states.

$ oc get co
NAME                                       VERSION    AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.13.4     True        False         False      44m
baremetal                                  4.13.4     True        False         False      76m
cloud-credential                           4.13.4     True        False         False      81m
cluster-autoscaler                         4.13.4     True        False         False      76m
config-operator                            4.13.4     True        False         False      77m
console                                    4.13.4     True        False         False      50m
csi-snapshot-controller                    4.13.4     True        False         False      76m
dns                                        4.13.4     True        False         False      76m
etcd                                       4.13.4     True        False         False      70m
image-registry                             4.13.4     True        False         False      66m
ingress                                    4.13.4     True        False         False      53m
insights                                   4.13.4     True        False         False      71m
kube-apiserver                             4.13.4     True        False         False      68m
kube-controller-manager                    4.13.4     True        False         False      75m
kube-scheduler                             4.13.4     True        False         False      75m
kube-storage-version-migrator              4.13.4     True        False         False      77m
machine-api                                4.13.4     True        False         False      76m
machine-approver                           4.13.4     True        False         False      76m
machine-config                             4.13.4     True        False         False      75m
marketplace                                4.13.4     True        False         False      75m
monitoring                                 4.13.4     True        False         False      52m
network                                    4.13.4     True        False         False      78m
node-tuning                                4.13.4     True        False         False      76m
openshift-apiserver                        4.13.4     True        False         False      65m
openshift-controller-manager               4.13.4     True        False         False      76m
openshift-samples                          4.13.4     True        False         False      65m
operator-lifecycle-manager                 4.13.4     True        False         False      76m
operator-lifecycle-manager-catalog         4.13.4     True        False         False      76m
operator-lifecycle-manager-packageserver   4.13.4     True        False         False      65m
service-ca                                 4.13.4     True        False         False      77m
storage                                    4.13.4     True        False         False      77m

Deployed applications are exposed through ingress routes on the base domain apps.ocp4.example.com:

$ oc get ingresscontroller default -n openshift-ingress-operator -o jsonpath='{.status.domain}'
apps.ocp4.example.com

To access OpenShift Web management console get login URL

$ oc whoami --show-console
https://console-openshift-console.apps.ocp4.example.com

Default login credentials are stored in the file ocp4/auth/kubeadmin-password

$ cat ocp4/auth/kubeadmin-password

Username is kubeadmin and password is as stored in the file.

Change your OpenShift updates channel to fast under Administration > Cluster Settings

Click on Channel to update

Upgrading to latest release of OpenShift

If you didn’t install the latest release, then just run the following to upgrade.

oc adm upgrade --to-latest

Configuring Csr automatic approval with systemd timer (Optional)

Copy kubeconfig file to a /etc directory to avoid permission issues for root user home directory:

sudo cp /root/ocp4/auth/kubeconfig /etc/ocp_kubeconfig
sudo chmod 0644 /etc/ocp_kubeconfig

Create a bash script that can be used to approve pending CSRs:

$ sudo vim /usr/local/bin/approve_csr.sh
#!/bin/bash
export KUBECONFIG=/etc/ocp_kubeconfig
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

Make the script executable

sudo chmod a+x /usr/local/bin/approve_csr.sh

Create CSR approval service:

$ sudo vim /etc/systemd/system/ocp_csr_approval.service
# This service unit is for approving pending csr in OpenShift 4 cluster
# By Josphat Mutai
# Licensed under GPL V2
#

[Unit]
Description=Approve pending csr in OpenShift 4 cluster
Wants=ocp_csr_approval.timer

[Service]
Type=oneshot
ExecStart=/usr/local/bin/approve_csr.sh

[Install]
WantedBy=multi-user.target

Next we create the timer unit file inside /etc/systemd/system directory:

$ sudo vim /etc/systemd/system/ocp_csr_approval.timer
# This timer unit is for approving pending csr in OpenShift 4 cluster
# By Josphat Mutai
# Licensed under GPL V2
#

[Unit]
Description=Approve pending csr in OpenShift 4 cluster
Requires=ocp_csr_approval.service

[Timer]
Unit=ocp_csr_approval.service
OnCalendar=*-*-* *:*:00

[Install]
WantedBy=timers.target

The OnCalendar time specification *-*-* *:*:00 should trigger the timer to execute the ocp_csr_approval.service unit every minute

Reload systemd units

sudo systemctl daemon-reload

Before we install the timer, we can first test the service.

sudo systemctl start ocp_csr_approval.service

Enable timer unit if test was successful.

sudo systemctl enable --now ocp_csr_approval.timer

Check execution status

journalctl -S today -f -u ocp_csr_approval.service

Our Systemd timer should be listed with other systemd timers:

$ systemctl list-timers
NEXT                        LEFT          LAST                        PASSED     UNIT                         ACTIVATES
Wed 2021-08-18 12:52:00 EAT 19s left      Wed 2021-08-18 12:51:04 EAT 35s ago    ocp_csr_approval.timer       ocp_csr_approval.service
Wed 2021-08-18 14:07:29 EAT 1h 15min left Wed 2021-08-18 12:43:57 EAT 7min ago   dnf-makecache.timer          dnf-makecache.service
Thu 2021-08-19 00:00:00 EAT 11h left      Wed 2021-08-18 04:17:21 EAT 8h ago     unbound-anchor.timer         unbound-anchor.service
Thu 2021-08-19 01:16:29 EAT 12h left      Wed 2021-08-18 01:16:29 EAT 11h ago    systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Mon 2021-08-23 01:09:11 EAT 4 days left   Mon 2021-08-16 07:08:52 EAT 2 days ago fstrim.timer                 fstrim.service

5 timers listed.
Pass --all to see loaded but inactive timers, too.

Step 11: Create other OpenShift Users

We have a guide on adding new users to OpenShift Cluster using HTPasswd Provider:

Manage OpenShift / OKD Users with HTPasswd Identity Provider

To this point, if you completed the setup steps, should have a running OpenShift Cluster on KVM Virtualization Environment. We will be sharing more guides on Administration and Deployment of various applications in the Cluster. Our aim is to also cover areas around CI/CD and GitOps within OpenShift ecosystem.

More guides on OpenShift: