First of all my apologies for choosing a catchy headline – as much as I have derided such posts, I genuinely had only three things to talk about which I felt had a connection and added value to many users and their workloads. Autoscaling is often not as clearly understood as I think it should be, at least by the consumers who may not want to understand all the nuts and bolts of how it works. Autoscaling can be of the application work units (Pods in case of Kubernetes) and of the nodes – which form the underlying infrastructure. The application work units – or pods in case of Kubernetes also need slightly different treatment based on the kind of workloads. The workloads can be divided into two broad categories – the synchronous workloads such as the HTTP server responding to requests in realtime. The second kind of workload is which is more async in nature such as a process being invoked when a message arrives in a message queue. Both workloads need different ways to scale and be efficient.
In this post, we will talk about each of the three autoscaling use cases, which projects solve these problems when using them, etc. For a quick summary, we will cover three aspects:
- Autoscaling Event-driven workloads
- Autoscaling real-time workloads
- Autoscaling Nodes/Infrastructure
Autoscaling Event-Driven Workloads
For these workloads, the events are generated from a source and they land typically in some form of a message queue. Message queue enables handling scale as well as to manage back pressure etc. gracefully and divide the workload among many workers. The implementation of consumer and scaling is to an extent coupled with the message queue being used here. Also, many organizations may have more than one message queue in use.
The ideal scenario here would be that when there are no messages in the queue, the consumer pods are scaled down to zero. When messages start arriving the consumer pods are scaled up from zero to one and beyond one based on messages arriving.
One project that solves this problem nicely is the Keda project. We recently integrated Keda in the Fission project – which is a FaaS platform that works on Kubernetes. We also open-sourced the generic Keda connectors which enable you to consume messages from message queues and do generic things such as sending that message to an HTTP endpoint.
I also recently gave a talk where Keda was used to autoscale ACI (Azure Container Instance) containers using Keda from within a Kubernetes cluster using Virtual Kubelet. You can find the talk here and the slides here.
Autoscaling Realtime Workloads
Many times you want to scale down the HTTP servers to zero or based on demand so that you can conserve infrastructure or use for some other workloads which will be processed in off-peak hours etc. While it is relatively easy for event-driven workloads, it is not so straightforward for realtime workloads. On one hand you don’t want the end-user to wait for the pods to spin up etc. – so this is definitely not recommended for usage in production. This can be used though effectively in stage/dev environments especially when your users are not going to consume the resources.
Scaling down the standard HPA along with the monitoring system will give you enough to get it working. You can see this in action in a talk given by Hrishikesh here and the presentation can be found here.
If you must scale to zero there are projects like Osiris – which are highly experimental but can be still useful for non-prod workloads and to optimize the infrastructure usage.
Once you have optimized the workloads, the next natural step is to optimize the underlying infrastructure. All the benefits of optimizing the workloads can be only derived only if you can actually shut down a few instances and save some $$$! Especially for non-prod workloads – this can be a great cost-saving exercise.
There are definitely details here and the second part of a talk given by one of the InfraCloud members here discusses these things in detail. The Kubernetes Autoscaler project gives you all tools and levers to manage the infrastructure and scale it up and down based on need.
While we are on the topic of cost-saving in your Kubernetes clusters, you might also want to check out when to switch from CloudWatch to Prometheus – details in this post!
There are some interesting projects on KubeCost Github repo that enable things such as turning down the clusters at a specific schedule etc. They can be a great compliment to the other three projects and also give you visibility of cluster usage & costs.
- Autoscaling can be of workloads & the underlying nodes/infrastructure. To save costs in pre-prod environments you will need to look at both aspects.
- For autoscaling event-driven workloads
- Keda project enables and can be used in production too effectively.
- For autoscaling real-time workloads
- Projects such as Osiris enable it but is not recommended for production workloads!
- The real value will be realized only after scaling underlying nodes/infrastructure and Kubernetes Autoscaler enables that. While you can use this in production too – the right design and testing are crucial to avoid costly mistakes and get it right.
References & Images from: