If you have stumbled upon The Startup
Zeitgeist
post on HackerNews then as a operations person there is one thing you
can not miss: the emergence of Slack and chatops ecosystem surrounding
it. Slack has definitely disrupted how we communicate with teams and
with machines too. But beyond just team communication – such tools have
enabled a lot more:
- All operations are now known to team instead of known to one member.
The handoff between team members is transparent and seamless.
- A lot more teams can participate in querying or modifying
infrastructure based on access control while also keeping the whole
thing visible
- A continuous log & audit trail of activities is maintained which is
also searchable.
Again, what is ChatOps?
In simple words Chatops enables people to get work done through Chat
tools. Chatops enables self serviceability of complex tasks in a team
environment so that feedback loop is faster and people are empowered.
In this post we will explore all capabilities that one should keep in
mind when building a chatops platform. A lot of what is needed as
“chatops platform” might be application/organization specific – but we
want to draw a blueprint from which you can pick up and choose to build
a chatops platform. We have intentionally focused on capabilities – and
not talking from a tool/platform perspective. At the same time some
tools have been mentioned in each section – which largely accomplish the
capability being discussed.
Chat platform is no doubt one of main pieces of a ChatOps platform and a
way to interface with ChatOps platform. The interface allows forming
groups/discussions, file sharing etc. But probably the key
differentiator in modern chat platforms as compared to traditional once
is the integrations. These platforms integrate with a variety of
services & chatbots to accomplish a lot more than traditional chatting
platforms. A chat platform which does not integrate with anything
external is absolutely deal breaker to build a chatops platform. The
most common alternatives fulfilling this capability are
Slack, Mattermost,
Campfire and of course the good old IRC.
ChatBot
Chatbot platforms form the core of a ChatOps platform and does all
orchestration between multiple systems. These platforms provide a wide
variety of plugins to interact with multiple systems and extensibility
to write your own plugins easily. This is one area where a lot of
customization will happen over period of time and probably OOTB
installation won’t be of much use. It is also important to choose a
platform which is inline with your team’s comfort level of programming
language in which bot is written so that customizations are easier to
build in. The popular options are Lita written
in Ruby, Hubot written at Github in
javascript and Python based Err.
Integrations
While the chat platform and ChatBots provide plenty of integrations
OOTB, there are some integrations which are absolutely must for a
successful ChatOps platform.
Monitoring & log management systems
Most of system’s health information comes from monitoring systems (Such
as Zabbix, Nagios, Sensu etc.) and log management platforms (Likes of
ELK stack, Splunk etc.) It is essential to be able to integrate with
these systems and pull out as much data as possible without leaving the
chat console. It should be possible not only to monitor health of system
but also services and APIs. For example in case of API – it may not be
down but the service might have degraded due to 1/2 instances being down
at times. If the API/service is a public facing service, updating the
status with services such as StatusPage is
also a critical factor.
CI/CD ecosystem
A lot of developers and support engineer’s time and focus is spent
interacting with systems which enable delivery of software. ChatOps
platform should enable interacting with such systems for example getting
status of a certain deployment or status of a given build etc. Some
basic operations on source code management system is also useful in
enabling faster communication. Being able to interact with ticketing
systems is an important feature of chatops platform.
Chatops platform should enable ops teams to take action on
infrastructure right from chat interface. This has advantage of enabling
teams without access to machines but also tracking the changes closely
as a team. What level of integrations exist with likes of Capistrano,
Chef, Puppet, Ansible, Saltstack and what additional work will be
required to enable team fully is a key criteria in building the
platform.
Schedule and escalation management
Most of systems today are built for 24×7 world and managing the on call
rotation can get fairly complex. Integrating with a system which handles
escalation policies, on call rotation and notifying right person at
right time is critical for uptime and success of such online systems.
Some systems which come to mind are
PagerDuty,
OpsGenie

Summary
Chatops enables great deal of collaboration and openness between teams
while getting things done at super fast speed. We are in very nascent
stage of ChatOps – the possibilities are endless, for example checkout
the talk here. There is a
dedicated ChatOps topic on reddit
and discussions are defining the future. We would love to hear your
ChatOps story.
Looking for help with your cloud native journey? do check our cloud native consulting capabilities and expertise to know how we can help with your transformation journey.