If you have stumbled upon The Startup Zeitgeist post on HackerNews then as a operations person there is one thing you can not miss: the emergence of Slack and chatops ecosystem surrounding it. Slack has definitely disrupted how we communicate with teams and with machines too. But beyond just team communication – such tools have enabled a lot more:
- All operations are now known to team instead of known to one member. The handoff between team members is transparent and seamless.
- A lot more teams can participate in querying or modifying infrastructure based on access control while also keeping the whole thing visible
- A continuous log & audit trail of activities is maintained which is also searchable.
Again, what is ChatOps?
In simple words Chatops enables people to get work done through Chat tools. Chatops enables self serviceability of complex tasks in a team environment so that feedback loop is faster and people are empowered.
Capabilities of Modern ChatOps platform
In this post we will explore all capabilities that one should keep in mind when building a chatops platform. A lot of what is needed as “chatops platform” might be application/organization specific – but we want to draw a blueprint from which you can pick up and choose to build a chatops platform. We have intentionally focused on capabilities – and not talking from a tool/platform perspective. At the same time some tools have been mentioned in each section – which largely accomplish the capability being discussed.
Chat platform is no doubt one of main pieces of a ChatOps platform and a way to interface with ChatOps platform. The interface allows forming groups/discussions, file sharing etc. But probably the key differentiator in modern chat platforms as compared to traditional once is the integrations. These platforms integrate with a variety of services & chatbots to accomplish a lot more than traditional chatting platforms. A chat platform which does not integrate with anything external is absolutely deal breaker to build a chatops platform. The most common alternatives fulfilling this capability are Slack, Mattermost, Campfire and of course the good old IRC.
While the chat platform and ChatBots provide plenty of integrations OOTB, there are some integrations which are absolutely must for a successful ChatOps platform.
Monitoring & log management systems
Most of system’s health information comes from monitoring systems (Such as Zabbix, Nagios, Sensu etc.) and log management platforms (Likes of ELK stack, Splunk etc.) It is essential to be able to integrate with these systems and pull out as much data as possible without leaving the chat console. It should be possible not only to monitor health of system but also services and APIs. For example in case of API – it may not be down but the service might have degraded due to 1/2 instances being down at times. If the API/service is a public facing service, updating the status with services such as StatusPage is also a critical factor.
A lot of developers and support engineer’s time and focus is spent interacting with systems which enable delivery of software. ChatOps platform should enable interacting with such systems for example getting status of a certain deployment or status of a given build etc. Some basic operations on source code management system is also useful in enabling faster communication. Being able to interact with ticketing systems is an important feature of chatops platform.
Configuration management platform
Chatops platform should enable ops teams to take action on infrastructure right from chat interface. This has advantage of enabling teams without access to machines but also tracking the changes closely as a team. What level of integrations exist with likes of Capistrano, Chef, Puppet, Ansible, Saltstack and what additional work will be required to enable team fully is a key criteria in building the platform.
Schedule and escalation management
Most of systems today are built for 24×7 world and managing the on call rotation can get fairly complex. Integrating with a system which handles escalation policies, on call rotation and notifying right person at right time is critical for uptime and success of such online systems. Some systems which come to mind are PagerDuty, OpsGenie
Chatops enables great deal of collaboration and openness between teams while getting things done at super fast speed. We are in very nascent stage of ChatOps – the possibilities are endless, for example checkout the talk here. There is a dedicated ChatOps topic on reddit and discussions are defining the future. We would love to hear your ChatOps story.