The customer is a telecom CSP and manages a fleet of servers which are running product for telecom companies.
The operation team wanted to streamline the incidents by applying reusable workflows to fix the usual problems in an automated way.
- A telecom CSP provider was looking for an efficient and automated way to deal with a large number of incidents.
- They wanted to build an auto-remediation platform which will execute various health check scripts and remediation scripts in a parallel workflow.
- The feedback from health check scripts would be fed into a remediation script workflow with probability score, and then execution of the remediation workflow would start.
- Infracloud recommended the Fission Function platform on top of Kubernetes for executing individual checks.
- To compose the individual checks into a workflow, we used a combination of Kafka queue and Fission workflow.
- This allowed parallel execution of health checks and faster response as a result of concurrency. We also modeled the remediation workflows similarly.
- Some of the remediation workflows required strict guarantees of sequential execution, which was natively available in Fission workflows.
The platform for health check execution and remediation was in a private datacenter.