The Horizontal Pod Autoscaler is a Kubernetes resource controller that allows for automatic scaling of the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization or with custom metrics support. Horizontal Pod Autoscaling only apply to objects that can be scaled. For objects that cannot be scaled like DaemonSets it cannot be used.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.
Using Horizontal Pod Autoscaler on Kubernetes EKS Cluster
Before you can use Horizontal Pod Autoscaler on EKS Cluster you need to have installed Metrics Server. Follow the guide below for complete installation steps.
Install Kubernetes Metrics Server on Amazon EKS Cluster
Verify the metrics server is functional by using the command below.
$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
creationTimestamp: "2020-08-12T11:27:13Z"
name: v1beta1.metrics.k8s.io
resourceVersion: "130943"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
uid: 83c44e41-6346-4dff-8ce2-aff665199209
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2020-08-12T11:27:18Z"
message: all checks passed
reason: Passed
status: "True"
type: Available
Deploy sample app for testing HPA
Let’s deploy a test application that we’ll use to demonstrate the working of Horizontal Pod Autoscaler.
Create demo demo namespace:
$ kubectl create ns demo
namespace/demo created
$ kubectl get ns
NAME STATUS AGE
default Active 2d20h
demo Active 22s
kube-node-lease Active 2d20h
kube-public Active 2d20h
kube-system Active 2d20h
Deploy a sample Apache web server application by running the following command in your terminal.
$ kubectl apply -f https://k8s.io/examples/application/php-apache.yaml -n demo
deployment.apps/php-apache created
service/php-apache created
You can also use kubectl run command to deploy the application and create a service.
$ kubectl run php-apache \
--generator=run-pod/v1 \
--image=k8s.gcr.io/hpa-example \
--requests=cpu=200m \
--limits=cpu=500m \
--expose \
--port=80
Check the status of your application.
$ kubectl get pods -n demo
NAME READY STATUS RESTARTS AGE
php-apache-79544c9bd9-wccnj 1/1 Running 0 40s
Create Kubernetes HPA resource
When the application is running we can create HPA resource.
$ kubectl autoscale deployment php-apache --cpu-percent=70 --min=1 --max=5 -n demo
horizontalpodautoscaler.autoscaling/php-apache autoscaled
The command above creates an autoscaler which scales up Pods when CPU utilization exceeds 70%. The minimum number of pods is set to 1 and Maximum is 5.
Get details of autoscaler with the following command:
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/70% 1 5 1 80s
$ kubectl describe hpa -n demo
Name: php-apache
Namespace: demo
Labels: <none>
Annotations: <none>
CreationTimestamp: Fri, 14 Aug 2020 21:38:12 +0300
Reference: Deployment/php-apache
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 0% (1m) / 70%
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Increasing Load
Let us now increase the load by hitting the Service we deployed on Kubernetes from several locations. For this purpose we’re using busybox container to generate load.
kubectl run -it --rm load-generator --image=busybox /bin/sh --generator=run-pod/v1 -n demo
You’re be logged into the container terminal. Run the following commands to execute a while loop which hits service endpoint on http:///php-apache
/ # while true; do wget -q -O - http://php-apache; done
Open a separate terminal and see how the autoscaler creates more Pods in the deployment as the load increases.
$ kubectl get hpa -n demo
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 83%/70% 1 5 5 9m
As long as actual CPU percentage is higher than the target percentage, then the replica count increases, up to 5. In this case, it’s 83%
, so the number of REPLICAS
continues to increase.
Stop the load using CTRL+C
Watch as autoscaler scales down deployment:
$ kubectl get hpa -n demo -w
It may take some minutes before running Pods drop back to 1. Clean the setup once done.
$ kubectl delete -f https://k8s.io/examples/application/php-apache.yaml -n demo
deployment.apps "php-apache" deleted
service "php-apache" deleted
Delete Autoscaler.
$ kubectl delete hpa php-apache -n demo
horizontalpodautoscaler.autoscaling "php-apache" deleted
Lastly delete the demo namespace.
$ kubectl delete ns demo
namespace "demo" deleted
You’ll use the same approach to autoscale your Applications with HPA using Metrics Server.
More articles on Kubernetes: