Auto scaling of PODs to respond to the real demand of service | TPS- is a real challenge
Streaming movies, Web-series , sports has been the current market trend . Google ‘world cup streaming’ and you can find n number of streaming platforms . Considering the current market trend & customers expectation of flawless streaming, setting up the technology platform has become a challenge. We have multiple tools options to handle the workload but filtering those and implementing one of those is another big challenge at the same time a real fun filled journey which includes planning, design, implement & monitoring .

For one of the streaming platform customer , the expectation set to technical team was to ensure that the platform needs to autoscale / have elasticity and ensure the dynamic work loads are being handled . Based on the teams which play , their statistics , the number of audiences / transaction per second ( tps ) / request per second ( rps ) are being decided . For instance if there is a match between India | Pakistan , the number of viewers will be way too high and to add a cherry top , if the match goes very interesting beyond winner prediction , we can see a huge increase in tps . So how do we handle such increase in workload .
The Java applications are running in Kubernetes environment .
3 Master nodes and 100+ worker nodes . Based on the previous metrics , min/max pod are being decided . Team ensures that we have enough CPU , Memory on the nodes to handle the max pod .
But the bigger question is , can containers scale at the pace of dynamic huge increase of workload / tps . In seconds , the incoming transactions will reach 5k — 10k — 15k . So , will the POD spin up real quick in seconds and handle such big loads ? Practically speaking the answer is NO . PODs take atleast 2 to 3 mins to spin and get into running status , then take the incoming traffic . To avoid this delay and to ensure smooth online streaming without interruption , we did prescaling of the k8s pod .
Step 1
Took last 6 months metrics and anlysed the peak load , how much min / max POD has been set . Understand the CPU , Memory utilisation
Step 2
Understand the approximate transaction per second / load from the product owner for the event
Step 3
Request to perform a load testing with predicted TPS
Step 4
Devops team to perform the prescaling with min , max POD , setup anti affinity parameters according to the requirement to meet high availability , check node resources quota.
Reason for prescaling : K8S Autoscaling is a good option but not for the dynamic load which gets shooted up in few seconds
Step 5
During the load test , monitor below metrics
CPU Utilisation of POD , Containers, Node
Memory utilisation of POD , Containers, Node
Node resource utilisation metrics
POD Scaling
Node Scaling
K8S Control plane — Ensure the control plane is able to handle the load of node autoscaling , saving details to ETCD, fetching up templates from ETCD to spin up PODs as per requirement
Transaction per second
Request per second
Network traffic
Disk I/O pressure
Heap Memory
Step 6
Based on the observation , decide on setting up the min / max POD , Node autoscaling readiness which includes changing the NODE instance type ( aws example : r5.x large to r5.2x large instance )
Step 7
Perform the prescaling before match starts and scale down post match
This time , we couldn’t find a better option other than prescaling Kubernetes platform rather allow the default auto scaling to do its job. Prescaling worked perfect and we scale down post every match . Lets see how technology evolves and how we adapt to the right tools to perform autoscaling at this peak load increase .
Stay Tuned : How does AWS , Kubernetes costing impact us during autoscaling / prescaling .
Follow us for more details — cubensquare.com
