Streaming movies, Web-series , sports has been the current market trend . Google ‘world cup streaming’ and you can find n number of streaming platforms . Considering the current market trend & customers expectation of flawless streaming, setting up the technology platform has become a challenge. We have multiple tools options to handle the workload but filtering those and implementing one of those is another big challenge at the same time a real fun filled journey which includes planning, design, implement & monitoring . For one of the streaming platform customer , the expectation set to technical team was to ensure that the platform needs to autoscale / have elasticity and ensure the dynamic work loads are being handled . Based on the teams which play , their statistics , the number of audiences / transaction per second ( tps ) / request per second ( rps ) are being decided . For instance if there is a match between India | Pakistan , the number of viewers will be way too high and to add a cherry top , if the match goes very interesting beyond winner prediction , we can see a huge increase in tps . So how do we handle such increase in workload . The Java applications are running in Kubernetes environment . 3 Master nodes and 100+ worker nodes . Based on the previous metrics , min/max pod are being decided . Team ensures that we have enough CPU , Memory on the nodes to handle the max pod . But the bigger question is , can containers scale at the pace of dynamic huge increase of workload / tps . In seconds , the incoming transactions will reach 5k — 10k — 15k . So , will the POD spin up real quick in seconds and handle such big loads ? Practically speaking the answer is NO . PODs take atleast 2 to 3 mins to spin and get into running status , then take the incoming traffic . To avoid this delay and to ensure smooth online streaming without interruption , we did prescaling of the k8s pod . Step 1 Took last 6 months metrics and anlysed the peak load , how much min / max POD has been set . Understand the CPU , Memory utilisation Step 2 Understand the approximate transaction per second / load from the product owner for the event Step 3 Request to perform a load testing with predicted TPS Step 4 Devops team to perform the prescaling with min , max POD , setup anti affinity parameters according to the requirement to meet high availability , check node resources quota. Reason for prescaling : K8S Autoscaling is a good option but not for the dynamic load which gets shooted up in few seconds Step 5 During the load test , monitor below metrics CPU Utilisation of POD , Containers, Node Memory utilisation of POD , Containers, Node Node resource utilisation metrics POD Scaling Node Scaling K8S Control plane — Ensure the control plane is able to handle the load of node autoscaling , saving details to ETCD, fetching up templates from ETCD to spin up PODs as per requirement Transaction per second Request per second Network traffic Disk I/O pressure Heap Memory Step 6 Based on the observation , decide on setting up the min / max POD , Node autoscaling readiness which includes changing the NODE instance type ( aws example : r5.x large to r5.2x large instance ) Step 7 Perform the prescaling before match starts and scale down post match This time , we couldn’t find a better option other than prescaling Kubernetes platform rather allow the default auto scaling to do its job. Prescaling worked perfect and we scale down post every match . Lets see how technology evolves and how we adapt to the right tools to perform autoscaling at this peak load increase . Stay Tuned : How does AWS , Kubernetes costing impact us during autoscaling / prescaling . Follow us for more details — cubensquare.com