Kubernetes is designed to be self-healing from the container orchestration perspective – able to detect failures from your pods and redeploy them to ensure application workloads are always up and running. On the other side there is a need for the same approach at the cluster management level when running Kubernetes Clusters on OpenStack using the Magnum service.
magnum-auto-healer is a self-healing cluster management service that will automatically recover a failed master or worker node within your Magnum cluster. It ensures that the running Kubernetes nodes are healthy at any point in time. It achieves this by:
- Monitoring the nodes’ status periodically
- Searching for unhealthy instances
- Triggering replacements of unhealthy nodes when needed
- Maximizing your cluster’s high availability and reliability
- Protecting your application from downtime when the node it’s running on fails.
Just like the Cluster Autoscaler, Magnum Auto-healer is implemented to be uses together with cloud providers, OpenStack Magnum is supported by default. In this article we look at how you can deploy Magnum auto-healer and demonstrate how it can be used to achieve automatic healing in your Magnum powered Kubernetes Clusters.
As a Cluster administrator you can disable the autohealing feature on the fly, which is very important for the cluster operations like upgrade or scheduled maintenance. The magnum-auto-healer is highly customizable to the extend that you can write your own health check plugin with customized health check parameters.
How To Deploy magnum-auto-healer
We are running a Magnum Kubernetes Cluster deployed with the following guide:
Login to your workstation with kubectl installed and configured to work with Magnum Kubernetes cluster, then confirm that:
- You have a multi-node cluster(3 masters and 3 workers) is created in Magnum
$ openstack coe cluster list
+--------------------------------------+----------------+---------+------------+--------------+-----------------+---------------+
| uuid | name | keypair | node_count | master_count | status | health_status |
+--------------------------------------+----------------+---------+------------+--------------+-----------------+---------------+
| 1647ab6a-423b-433e-a6be-1633dc2c60e6 | k8s-cluster-01 | admin | 4 | 3 | UPDATE_COMPLETE | HEALTHY |
+--------------------------------------+----------------+---------+------------+--------------+-----------------+---------------+
You can list all their servers in OpenStack created by Magnum:
$ openstack server list --name k8s-cluster-01
- Your kubectl is configured and working:
$ kubectl get ns
NAME STATUS AGE
default Active 32d
kube-node-lease Active 32d
kube-public Active 32d
kube-system Active 32d
openebs Active 10d
Download magnum-auto-healer deployment yaml manifest:
curl -O https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/magnum-auto-healer/magnum-auto-healer.yaml
Edit the file:
cp magnum-auto-healer.yaml magnum-auto-healer.yaml.bak
vim magnum-auto-healer.yaml
Set Magnum cluster-name under magnum-auto-healer-config ConfigMap:
kind: ConfigMap
apiVersion: v1
metadata:
name: magnum-auto-healer-config
namespace: kube-system
data:
config.yaml: |
cluster-name: paste-magnum-cluster-uuid
You can get cluster UUID with the command:
$ openstack coe cluster list
Also set Keystone Authentication settings:
openstack:
auth-url: input_keystone_auth_url
user-id: input_auth_user_id
project-id: input_user_project_id
password: input_auth_password
region: input_region
How to get some of the above information:
# Get user ID
$ openstack user list
# Get project ID
$ openstack project list
# Get region
$ openstack region list
Example of filled Keystone information:
openstack:
auth-url: http://192.168.20.5:5000/v3
user-id: 1a5e612358584aa9a9c5b658f5a068a2
project-id: d0515ffa23c24e54a3b987b491f17acb
password: MyKeystoneUserPassword
region: RegionOne
Once done with the file updates deploy
$ kubectl create --save-config=true -f magnum-auto-healer.yaml
serviceaccount/magnum-auto-healer created
clusterrolebinding.rbac.authorization.k8s.io/magnum-auto-healer created
configmap/magnum-auto-healer-config created
daemonset.apps/magnum-auto-healer created
Check DaemonSet Pod deployment status:
$ kubectl get pods -n kube-system -l k8s-app=magnum-auto-healer
How To Test magnum-auto-healer
We could ssh into one of the running worker node and stop the kubelet service to simulate the worker node failure.
$ ssh core@k8s-cluster-01-fxubl3a7ah2r-node-3
$ sudo systemctl stop kubelet
The magnum-auto-healer service will detect the node failure and trigger the repair process. But first, you would see the unhealthy node is shutdown:
+--------------------------------------+--------------------------------------+--------+---------------------------------------+------------------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------------------------------+--------+---------------------------------------+------------------+--------------+
| 163624d0-f0ba-45c1-8679-749b0145fd7d | k8s-cluster-01-fxubl3a7ah2r-node-1 | SHUTOFF| private=172.10.10.76 | Fedora-CoreOS-34 | m2.magnum |
| 9c653750-18a5-45ce-9dc8-ea7ab50ec704 | k8s-cluster-01-fxubl3a7ah2r-node-3 | ACTIVE | private=172.10.10.52 | Fedora-CoreOS-34 | m2.magnum |
| f6415e62-d522-44e7-ac3f-e198127c61b5 | k8s-cluster-01-fxubl3a7ah2r-node-0 | ACTIVE | private=172.10.10.175 | Fedora-CoreOS-34 | m2.magnum |
+--------------------------------------+--------------------------------------+--------+---------------------------------------+------------------+--------------+
Then a new node will be created:
| 163624d0-f0ba-45c1-8679-749b0145fd7d | k8s-cluster-01-fxubl3a7ah2r-node-1 | BUILD | | Fedora-CoreOS-34 | m2.magnum |
Give it a few minutes and all the nodes will be healthy again.
Note that the newly created node will have the same IP address and hostname with the old one. This marks the end of the article on how you can easily automate node failure replacements in your Magnum cluster.
Reference:
Books For Learning Kubernetes Administration:
Below are other links to helpful articles on Kubernetes / OpenStack: