Cortex for HA Monitoring with Prometheus

By September 18, 2019Monitoring
Prometheus has become the default monitoring applications and systems in a cloud-native world. For any real-world use case, Prometheus should be highly available – which has it’s set of challenges. Once you run Prometheus in HA mode there are a bunch of issues such as data duplication, achieving single pane of glass for duplicate data, etc. To solve this problem – Cortex was born. Cortex is a CNCF sandbox project that seeks to provide long term storage and a global metrics view for metrics scraped using Prometheus. Let’s first look at the main goals of Cortex and then some of the problems it solves for Prometheus.
  • Horizontal Scalability – Cortex can be split into multiple microservices, each of which can be horizontally scaled independently. For example, if a lot of Prometheus instances are sending data to Cortex, you can scale up the Ingester microservice. If there are a lot of queries to cortex, you can scale up the Querier or Query Frontend microservices.
  • High Availability – Cortex can replicate data between instances. This prevents data loss and avoids gaps in metric data even with machine failures and/or pod evictions.
  • Multi-Tenancy – Multiple untrusted parties can share the same cluster. Cortex provides isolation of data throughout the entire lifecycle, from ingestion to querying. This is really useful for a large organization storing data for multiple units or applications or for someone running a SaaS service.
  • Long term storage – Cortex stores data in chunks and generates an index for them. Cortex can be configured to store this in either self-hosted or cloud provider backed databases or object storage.

Need for Cortex

Prometheus High availability & Data De-duplication

Prometheus doesn’t have high availability by default. A way to make Prometheus highly available is to run multiple instances scraping the same jobs. These instances will have slight variances in data due to minor time interval differences when scraping metrics. Furthermore, if one of the instances goes down for a couple of hours, there will be gaps in data when queries get forwarded to that instance. If we use a tool like grafana to visualize the metrics as graphs, we may get different sample values or have gaps in the graphs.
Cortex can be configured to read data from multiple HA Prometheus instances. It accepts metrics from one main instance and discards the metrics from other instances. In case one replica goes down Cortex switches seamlessly to another replica and marks it as the main. To do this, Cortex looks at two labels, a common one to associate with a cluster(or a group of Prometheus) and the other to identify replicas.

Global metrics view

Prometheus instances can be configured to perform a remote write to cortex. Using this, metrics are aggregated from multiple clusters into one cluster running cortex. This provides us with a central location where we can observe the metrics of our entire infrastructure. Cortex provides a Prometheus/PromQL compatible endpoint. Services like Grafana can add this endpoint as a Prometheus data source and perform queries in the same way as a normal Prometheus instance.

Long Term Storage

Prometheus’s local storage is not meant as durable long term storage. Metrics sent to the cortex are stored in the configured storage service. In case of cloud-provided storages, this frees you from the hassle of running your own DB. This also lets you take advantage of the provided SLAs. As of 25th June, the following options are available:
Index (and Chunk) Storage:
  • Amazon Dynamodb
  • Google Bigtable
  • Apache Cassandra

Optionally, Cortex also supports Object Stores for storing chunks:

  • GCS
  • S3

Multi Tenancy

The multi-tenancy is provided by setting an http header (X-Scope-OrgID) when writing metrics to the cortex. The same header value has to be provided when querying. In the example below, this header is set using an nginx reverse proxy.

Architecture diagram

Cortex Architecture

Cortex Architecture (Source)

  • Nginx/gateway – A reverse proxy that sits in front of cortex and forwards any received requests to the respective service.
  • Distributor – Handles incoming metrics, splits them into batches and passes them to ingesters. If the replication factor is set to >1, the data is sent to multiple ingesters.
  • Ingester – This service is responsible for writing data to the configured storage backends. Ingesters are semi-stateful as they retain the last 12 hours of samples. These samples are batched and compressed before being written to the chunk store.
  • Query Frontend – An optional component that queues query requests and retries them in case of failures. Results are also cached to improve performance
  • Querier – The Querier handles the evaluation of the PromQL queries. Samples are fetched the chunk storage and/or the ingesters in case of recent data

Other components:

  • Ruler – Handles alerts produced by alertmanager
  • Alertmanager – Evaluates alert rules
  • ConfigsAPI – Stores configs for ruler and alertmanager in Postgres
  • Table Manager – Responsible for creating tables in the selected chunk/index storage backends
  • Consul – Stores a consistent hash ring generated by the distributor. The hashed values are used by the distributor to select ingesters when sending metrics.

Differences and similarities with other options

Thanos

Thanos and Cortex have very similar goals: to aggregate metrics, store them in block storage and have a single pane of glass for all your metrics. Thus, it’s no surprise that both projects reuse a lot of Prometheus code. However, there are a few key differences that may help you decide to use one of them over the other
Thanos Cortex
Recent data is stored in Prometheus Recent data is stored in Injesters (A Cortex Component)
Uses a Sidecar that writes data to block storage Uses Prometheus remote write to send data to cortex
Single-tenant Multi-tenant
Manual sharding Automatic Sharding of data based on labels
Prom TSDB blocks Indexed chunks
Downsampling: Historical data can be summarised

(eg. 5 sec samples averaged out to a 1 min sample)

query sharding (A 30 day is converted into 30 one day query)
Requires Ingress to cluster in which Prometheus is running, for querying Only Egress is required from cluster running Prometheus

Walkthrough

Let’s try out Cortex with a real example by installing it and configuring it with multiple Prometheus and Grafana to visualize the data. Clone the GitHub repo containing the necessary files.

Prometheus and Cortex with Docker Compose

For a simple setup we’ll start the following services using docker-compose:

  • Three Prometheus containers
  • Consul
  • Three Cortex containers
  • Grafana

To keep things simple, we will use an all-in-one cortex config. This runs cortex as a monolithic application. We’ll run three instances of it to check replication. There are three Prometheus config files. They have an external label that is added to all metrics when performing a remote write. The Prometheus1 and Prometheus3 containers write to Cortex1 while the Prometheus2 container writes to Cortex2. We will run our queries on Cortex3. The following code snippet shows the differences in the configs of the three Prometheus instances.

Use docker-compose to start all the services.

Go to http://localhost:3000/explore , login using the credentials admin/admin and select cortex3 as the datasource. Run a sample query (eg up). Cortex will return metrics from all 3 Prometheus containers:Cortex Query example Once you’re done, run the following command to clean up

Cortex & dependencies with Prometheus in Kubernetes

We will deploy Cortex as a group of microservices as shown in the architecture. We’ll also need Helm to deploy dependencies(Cassandra) and other services(Grafana, Prometheus). If you do not have Helm installed already, you can follow the quickstart guide in the Helm docs. Let’s get started, first, we’ll deploy the cortex components.

We’ll also install a Prometheus using Helm. The command below will do that for us along with 2 additional things:

  • Creates an external label called “cluster” and sets the value for that label to “one”. This will help to  separate different Prometheus instances
  • Set up remote write to cortex.
Now we will deploy Grafana. The command below uses Grafana’s provisioning feature to add Cortex as a datasource on pod startup.
Run the following command to get the Grafana admin password. After that, open localhost:3000/explore and log in using username “admin” and the printed password.
Here are some sample queries you can run to test if Prometheus is sending metrics to Cortex:
Cleanup
The ingester pods will get stuck in the terminating phase. This is by design as ingesters are semi stateful and will attempt to flush their data to other ingesters before terminating. This makes upgrades and rollbacks possible while avoiding loss of data. In this case, we are just trying it out and don’t care about the data so we can force delete it with the following command:

HA Prometheus setup with deduplication

This setup is quite similar to the previous one. The main difference is that we are deploying two Prometheus instances. Both have the cluster label which is set to the same value “one” and a replica label which is unique. The distributor component has been configured to perform de-duplication of metrics based on those two labels. If Cortex does not receive metrics from the current replica for more than a set amount of time (30s), it will failover to the next replica that sends a sample.

Run the following command to get the grafana admin password. After that, open localhost:3000/explore and login using username “admin” and the printed password.

Here are some sample queries you can run to test if Prometheus is sending metrics to Cortex:

To test HA, we can try deleting one of the Prometheus pods.

There should be no gaps in the Grafana graphs as Cortex will failover to the other instance.

Cleanup

Use Cassandra as index and chunk storage

In the previous two examples, we were using dynamodb-local as the index storage and fakes3 as the chunk storage. In this example, we’ll be using Apache Cassandra for both index and block storage.

The following commands will enable the helm incubator repo, install Cassandra using helm, and wait for the 3 replicas to be ready.

Once Cassandra is ready, proceed with installing all the other services.

Run the following command to get the Grafana admin password. After that, open localhost:3000/explore and login using username “admin” and the printed password.

Here are some sample queries you can run to test if Prometheus is sending metrics to Cortex:

Cleanup

Conclusion

Cortex is a powerful tool for running multiple Prometheus servers seamlessly while simplifying the operations and usage for end-users. While Thanos does provide very similar features – the way they implement it is drastically different. The use cases that an organization is trying to implement will drive the choice of Cortex vs. Thanos. But cortex indeed makes running a highly scalable and resilient Prometheus based monitoring system easy.

References and related blog posts

Shaunak Deshmukh

Author Shaunak Deshmukh

More posts by Shaunak Deshmukh

Join the discussion One Comment

Leave a Reply