Monitoring Kubernetes with Prometheus Operator

By December 7, 2018Kubernetes, Monitoring

Unless you’ve been living under a rock, you’ve probably heard about Kubernetes, an open-source container orchestration platform designed to automate the deployment, scaling, and management of containerized applications. Kubernetes is great, solves the pain points of application deployment and maintenance in a distributed system but what makes it awesome is its extensibility. Operators are one such powerful concept that makes use of this capability. In this post, we will take a dive into Prometheus operator, how to install and of course using the Prometheus operator to monitor a demo application.

Introduction

Introduced in 2016 by CoreOS, an Operator is a method of packaging, deploying and managing a Kubernetes application. A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs and kubernetes tooling such as Kubectl.

In other words, an operator is an application specific custom controller that works directly with kubernetes API as such it can create, configure, manage instances of complex stateful applications according to the custom rules written inside our custom controller.

Before we go any further into the details, lets dial it back a little to understand two important concepts which bring Operators to life.

A Custom Resource is an object that extends the Kubernetes API or allows you to introduce your own API into a kubernetes cluster. A Custom controller handles in-built Kubernetes objects, such as Deployment, Service in new ways, or manage custom resources as if they were native kubernetes components. Naturally, custom controllers are most effective when combined with custom resources and operator pattern is one such combination.

A bit more about Operator

As explained earlier, an operator builds upon the two central kubernetes concepts Resource and Controller and adds knowledge or configurations that allows the operator to execute common application tasks. They ultimately help you focus on a desired configuration, not the details of manual deployment and life cycle management. Let’s look at it from an example, for example when scaling an etcd cluster manually, a user has to perform a number of steps:

  • create a DNS name for the new etcd member
  • launch the new etcd instance
  • use the etcd administrative tools (etcdctl member add) to tell the existing cluster about this new member

Instead with the etcd operator, a user can simply increase etcd cluster size field by 1.

Operator use cases

A Kubernetes Operator can:

  • Install and provide initial configuration and sizing for your deployment
  • Perform live reloading for any user-requested parameter modification (hot config reloading)
  • Automatically scale up or down according to performance metrics
  • Perform backups, integrity checks or any other maintenance task

Prometheus Operator

Getting a kubernetes cluster up and running is very easy, but when you start deploying applications you are bound to run into some issues, coupled with the fact that Kubernetes being a distributed system makes troubleshooting not so trivial. Recently graduated from CNCF Prometheus has become the standard tool for monitoring and alerting in Kubernetes and container world. It provides by far the most detailed and actionable metrics and analysis. Prometheus-operator is a CoreOS conception that provides easy monitoring definitions for Kubernetes services, deployment and management of Prometheus instances.

Once deployed, Prometheus Operator provides the following features:

  • Create/Destroy: Easily launch a Prometheus instance for your Kubernetes namespace, a specific application or team easily using the Operator
  • Simple Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource
  • Target Services via Labels: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus specific configuration language

How it works?

The main idea behind is to decouple the deployment of Prometheus instances from the configuration of the entities they are monitoring. To implement this functionality, prometheus operator introduces additional resources and abstractions as Custom Resource Definitions(CRD)

  • Prometheus : Defines a desired Prometheus deployment
  • ServiceMonitor : Specifies how a group of services are to be monitored with hte help of labels. Similar to how Services monitor endpoints
  • AlertManager : Defines a desired AlertManager deployment
  • PrometheusRule: Defines a desired Prometheus rule file, which can be loaded by a Prometheus instance containing Prometheus alerting and recording rules.

Image Credits: CoreOS

From the picture above you can see that you can create a ServiceMonitor resource which will scrape the Prometheus metrics from the defined set of pods. Basically, the Operator instructs Prometheus to watch over the kubernetes API and upon detecting changes, creates a new set of configuration for the new service.

Enough theory, let’s deploy

To install the prometheus operator (prometheus-operator:v0.25.0), lets start by applying the manifests one by one and explain the reasoning behind it them. To get this started , you’ll need a kubernetes cluster you have access to, also the following set of deployments makes the assumption that RBAC is enabled on your cluster.

Below action deploys the prometheus operator, its ClusterRole, ClusterRoleBinding and the ServiceAccount.It grants the Prometheus Operator the following cluster-wide permissions:

  • read access to pods, nodes, and namespaces.
  • read/write access to services and their endpoints.
  • full access to secrets, ConfigMaps , StatefulSets, Prometheus-related resources (alert managers, service monitors,etc)
Check if the operator is successfully deployed or not. You should see the output similar to below
Next up is the ClusterRole and ClusterRoleBinding for the Prometheus Pods. Assuming that RBAC authorization is activated, we need to create RBAC rules for both Prometheus and Prometheus Operator. A ClusterRole and a ClusterRoleBinding for the Prometheus Operator were created in the first step.The same must be done for the Prometheus Pods. The below manifest creates a ClusterRole and ClusterRoleBinding for the Prometheus pods
We have deployed Prometheus operator, respective CRDs and corresponding ClusterRole and ClusterRoleBindings for both operator and its pods. There are a couple of things left to deploy namely the Prometheus Pods, ServiceMonitors for providing the scraping configuration and Service to expose Prometheus onto a specific node port, but first, we will deploy our demo application and we will get to the remaining parts as we go along.

Monitor Demo application:

We will be installing microservices-demo application from weaveworks.More about the application can be found here. Apply the manifests for deploying the application

Check if the application is deployed and we can can access the front end service on the port 30001.

All services in our demo application expose metrics via the /metrics endpoint, this information needs to be provided to Prometheus pods so that it can scrape our metrics. ServiceMonitors are just the thing that we need for this task. Lets create ServiceMonitor for our front-end service. Our front-end service exposes the metrics at containerPort 8079 via /metrics path.

ServiceMonitor works in the same manner of label selection as Pods and Services. In the above yaml, we are specifying the selector to match the label front-end against all the services that are present in the namespace sock-shop and our target being container port 8079 and path /metrics. Similarly, you’ll find that we have created other servicemonitors for the remaining services in our application. Lets apply them all.

To verify that everything is working correctly.

Now, onto the creation of Prometheus pod manifest in which we will provide the information as to which ServiceMonitors it needs to pick for scraping.

This tells Prometheus pods to scrap from those ServiceMonitor’s with the labels [front-end, carts, catalogue, orders, payment, shipping, user]. Lets verify if our prometheus pods are up and running in the default namespace.

Only thing that’s left to do here, is to expose our Prometheus pods via NodePort service, below yaml does just that.

To confirm our Prometheus pods are scraping the services, lets point our browser to :30900. Navigate to the Status dropdown and select Targets. Our services should be listed there.

Operator vs Helm

There is a small overlap with Helm as both perform application setup. Helm is a package manager, a good way to organize applications (deployment, service etc. templates packaged into one tar). An analogy for helm would be like ‘apt’ tool used in Ubuntu for Kubernetes. Operators enable you to manage the operation of applications within Kubernetes using custom resources and controllers. A Helm chart by comparison is a way to template out K8s objects to make them configurable for different environments. As such both of them are complimentary, lifecycle management of an Operator can be done using kubectl or Helm. you can use helm charts to deploy an operator as well.

Conclusions

Application monitoring is an important part of our application stack, with the help of Prometheus operator, we were able to implement our application monitoring with less effort, in a more declarative and reproducible manner, which is easier to scale, modify or migrate to a different set of hosts.

Imran Pochi

Author Imran Pochi

More posts by Imran Pochi

Join the discussion One Comment

  • Harshal Shah says:

    We actually created two internal helm charts, only to purely deploy the operator and another to deploy prometheus alertmanager as resources along with grafana. This made prometheus management a lot simpler for us.

Leave a Reply