Kubernetes Disaster Recovery using Kasten's K10 Platform

Hrishikesh Deodhar

15^th July 2020

13 mins

Backstory

This blog post explores the K10 data management platform by Kasten. This exercise was performed for a customer where I had to propose a Disaster Recovery (DR) strategy for Kubernetes clusters. The K10 platform functions as a way to perform backup/restore of Kubernetes applications and their volumes. This is specially useful in scenarios of disaster recovery and application migration as we will explore in the post below.

The objective of this post is to share the learnings of exploring backup and disaster recovery with the K10 platform and how you can implement it. One of the requirements by customer was to have a cross region active-passive setup of kubernetes clusters along with its data volumes replicating data from the source to the destination at a regular cadence. This is the first part of the series of evaluating platforms that provide a Backup and DR.

Some of the key evaluation criteria were:

Provision of near continuous replication of the kubernetes application resources along with data.
Provision for volume snapshots to be application and database consistent.
Cross region migration (Source and target clusters being in different zones).
The platform should be extensible.
Ability to monitor the backup and restore jobs. Should preferrably allow monitoring the platform via Prometheus.
Finally, the solution or tooling of choice should be API centric and cloud native itself.

Let’s explore the K10 platform and how it can help implement and simply disaster recovery on your Kubernetes clusters.

About Kasten K10 Platofrm

The K10 platform is completely Kubernetes native and gets installed on the Kubernetes cluster which needs to be backed up. The platform installs as a bunch of CRDs and controllers by a helm chart. It lives in its own namespace on the cluster and autodiscovers the installed applications within the cluster.

K10 kasten

Applications An Application represent the collection of resources such as configmaps, secrets, services, and application workloads within a namespace which are autodiscovered by K10 . A policy can be associated to the application or to a subset of resources within the application for backup/restore.
Policies Policies allow defining the actions that need to be taken when the policy is executed. The action could be to perform a snapshot or to perform an import of a previously exported backup. The frequency at which the action should be performed, the retention period, the selection of resources to backup are all defined in the policy. The policy also refers to a Location profile
Profiles A Location profile is an objectstore which is used to store the backup metadata as well as the volume data.
K10 Disaster recovery The K10 Disaster recovery performs a backup of the K10 namespace and its metadata alongwith the restore points for all the applications to enable the recovery of the K10 platform itself in case of a disaster.
K10 Dashboard The K10 dashboard is a minimalistic UI application which allows administration or crdreation of all the above mentioned objects. It allows administrators to perform adhoc backups, change retention periods of these backups among many other things without needing access to the kubectl CLI.

Note: K10 allows creation of snapshots and backups or both while defining the Policy.

A snapshot in the Kubernetes cluster is typically talked in context to a VolumeSnapshot and VolumeSnapshotContent resources which are the representation of a snapshot of a storage volume. Since the snapshot objects are present in the kubernetes cluster itself they fall in the same failure plane since a failure in the cluster also causes loss of the snapshots themselves. Hence while creating a policy K10 allows these snapshots to be backed up in an object store (Location profile) and are referred to as backups.

Let us look at some of the use-cases that were used to evaluate K10 platform.

Application migration
Application consistent backups
Alerting support
Cross region restore

How to do application migration and cross region restore using K10?

Setup source and target clusters in separate regions

Setup 2 clusters (source and target). For this setup, we would require the 2 clusters to have at least 3 nodes.

Install the snapshot CRDs if they are not installed by default on the source and target clusters.

# Install the CSI snapshotter
SNAPSHOTTER_VERSION=v2.1.1

# Apply VolumeSnapshot CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

Install the CSI snapshot controller (This controller will watch over the lifecycle of the VolumeSnapshot CRD)

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml

Follow the below two steps if the setup is on a Kind cluster

Deploy a CSI hostpath driver (Kind only)

git clone https://github.com/kubernetes-csi/csi-driver-host-path.git
cd csi-driver-host-path
./deploy/kubernetes-1.18/deploy.sh

Configure the CSI hostpath StorageClass to be the default StorageClass if you use a kind cluster.

Setup a VolumeSnapshotClass for the hostpath csi driver (Kind only)

cat << eof | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1beta1
driver: hostpath.csi.k8s.io
kind: VolumeSnapshotClass
metadata:
  annotations:
    k10.kasten.io/is-snapshot-class: "true"
    snapshot.storage.kubernetes.io/is-default-class: "true"
  name: csi-hostpath-snapclass
deletionPolicy: Delete
eof

Install MongoDB in a namespace on the source cluster

helm repo add bitnami https://charts.bitnami.com/bitnami
kubectl create ns mongo
helm install mongo --namespace mongo bitnami/mongodb

Create some data in the Mongo database

export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace mongo mongo-mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)

kubectl run --namespace mongo mongo-mongodb-client --rm --tty -i --restart='Never' --image docker.io/bitnami/mongodb:4.2.8-debian-10-r7 --command -- mongo admin --host mongo-mongodb --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD

db.demodb.insert({ name: "Jane Doe", twitter_id: "jane_doe", })
db.demodb.insert({ name: "John Doe", twitter_id: "john_doe", })

How to Install K10 platform?

Follow the instructions to setup K10 from their documentation

Setup an object store

The object store is required by the Location profile. This object store holds the backups from the cluster.

A GCS bucket is used in this demo.

Create a bucket named k10-backups in GCS.

Configure K10

Create a Secret and a Location profile on the source and target clusters.

A location profile specifies the object store where the backup metadata is stored. In this example we create a gcs storage bucket and use that in the Location profile

The service account should have permissions to create a storage bucket

Fill in the `project-id` and `service-account.json` as relevant.

kubectl create secret generic k10-gcs-secret \
      --namespace kasten-io \
      --from-literal=project-id=<gcp-projectid> \
      --from-file=service-account.json=<./sa-key.json>

cat << eof | kubectl apply -f -
apiVersion: config.kio.kasten.io/v1alpha1
kind: Profile
metadata:
  name: backup-default
  namespace: kasten-io
spec:
  locationSpec:
    credential:
      secret:
        apiVersion: v1
        kind: secret
        name: k10-gcs-secret
        namespace: kasten-io
      secretType: GcpServiceAccountKey
    objectStore:
      name: k10-backups
      objectStoreType: GCS
      pathType: Directory
      region: asia-south1
    type: ObjectStore
  type: Location
eof

Create a backup policy on the source cluster

This backup policy backs up the application and the volume snapshots and exports them to the GCS bucket every 15th minute of an hour. This Policy assumes that the MongoDB installation is in a namespace called mongo.

cat <<EOF | kubectl apply -f -
apiVersion: config.kio.kasten.io/v1alpha1
kind: Policy
metadata:
  name: backup-policy
  namespace: kasten-io
spec:
  comment: "Backup policy"
  frequency: '@hourly'
  subFrequency:
    minutes:
    - 15
  paused: false
  retention:
    hourly: 24
    daily: 7
    weekly: 4
    monthly: 12
    yearly: 7
  actions:
  - action: backup
  - action: export
    exportParameters:
      frequency: '@hourly'
      profile:
        name: backup-default
        namespace: kasten-io
        exportData:
          enabled: true
  selector:
    matchExpressions:
      - key: k10.kasten.io/appNamespace
        operator: In
        values:
          - mongo
EOF

Create a restore policy on the target cluster

Get the receive string from the source cluster by describing the policy. The receiveString is a token that is present on the policy of the source cluster and is used when we need to migrate applications across clusters. The receiveString is used while creating the restore policy on the target cluster.

EXPORT_STRING=$(kubectl get policy backup-policy -n kasten-io -o jsonpath='{.spec.actions[1].exportParameters.receiveString}')

Use the EXPORT_STRING from the backup policy on the source while applying the restore policy on the target cluster.

cat << eof | kubectl apply -f -
kind: Policy
apiVersion: config.kio.kasten.io/v1alpha1
metadata:
  name: restore-default
  namespace: kasten-io
spec:
  comment: Restore policy for Postgres
  frequency: "@hourly"
  subFrequency:
    minutes:
      - 15
    hours:
      - 0
    weekdays:
      - 0
    days:
      - 1
    months:
      - 1
  selector: {}
  actions:
    - action: import
      importParameters:
        receiveString: bIzAPpoanmEU0S57nj9FqtUkRn8TD0ig+TKu4Gg0KaE7acJYzjyDRti0e+nbkKsGfFjezKuNGWik9SNd1g6xyGY0+AYfLO+bYbay8eWagcya56Fh53Acb1moutKRBLJlQJEXpAoOkeJJsuvRtK3Sw0mnMsHTxQIVp1/rBhjUisGH1YpeUQKJyTvL7jWIOEtupek9PYKhqyEf3goMMHjXqtjxHy24Sj/i7jNKpoSNJI5YspGNdGaVY4YStbqUj8WyNYGfKqqXc8E/WHTxu1ty7TLd8+OEeuvNyQ2NDyU7CXVyQnjzonU3ti75lNbQ8Mp5y1w5apYKk3MNn8Uk2GTcGfNH9/lSZAgX4sZmld/rqr7nhFycy/fVuH141DDp3mw874DseI9W3+2kHjI/l9y0tWcW+rdfoWIOEFMSNvofYQ
        profile:
          name: backup-default
          namespace: kasten-io
    - action: restore
      restoreParameters: {}
eof

This is all that is needed for setup of a migration from a source cluster to a target cluster for a particular application in the K10 platform. The migration from the source to the target cluster works across regions as well.

Application consistent snapshots

Application consistent snapshots require quiescing the application to get consistent snapshots. Quiescing the application in contexts of cloud native applications such as Cassandra would mean flushing the in memory data to disk and then take a snapshot. K10 delegates this to Kanister in order to take application consistent snapshots. Kanister is an open source framework (built by Kasten) which allows defining blueprints for taking application consistent snapshots. The Kanister deployment consists of a kanister controller and 3 CRDs. namely,

Blueprints
ActionSets
Profiles

Although covering the features of Kanister would require a seperate blog post entirely, but we will quickly look at how Kanister blueprints work.

Quick Kanister overview:

About Kanister

A Kanister Blueprint is a collection of actions which are performed on the target application. Each action defines phases which are executed in order. A phase defines a Kanister function to execute.

ActionSets are required when we use Kanister alone without the K10 platform. Creating an ActionSet instructs the Kanister controller to execute the Blueprint on the target application.

Install Blueprint on the source cluster

In order to take application consistent snapshots apply the following blueprint on the mongo installation at the source cluster.

 cat <<eof | kubectl apply -f -
 apiVersion: config.kio.kasten.io/v1alpha1
 kind: Profile
 metadata:
   name: kanister-profile
   namespace: kasten-io
 spec:
   type: Kanister
   kanister:
     credential:
       secretType: GcpServiceAccountKey
       secret:
         apiVersion: v1
         kind: Secret
         name: k10-gcs-secret
         namespace: kasten-io
     location:
       type: ObjectStore
       objectStore:
         name: k10-backups
         objectStoreType: GCS
         region: asia-south1
eof

cat << eof | kubectl apply -f -
apiVersion: cr.kanister.io/v1alpha1
kind: Blueprint
metadata:
  name: mongodb-blueprint
actions:
  backupPrehook:
    type: Deployment
    phases:
    - func: KubeExec
      name: lockMongo
      objects:
        mongodb:
          kind: Secret
          name: mongo-mongodb
          namespace: ''
      args:
        namespace: ""
        pod: ""
        container: mongo-mongodb
        command:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          export MONGODB_ROOT_PASSWORD=""
          mongo --authenticationDatabase admin -u root -p "${MONGODB_ROOT_PASSWORD}" --eval="db.fsyncLock()"

  backupPosthook:
    type: Deployment
    phases:
    - func: KubeExec
      name: unlockMongo
      objects:
        mongodb:
          kind: Secret
          name: mongo-mongodb
          namespace: ''
      args:
        namespace: ""
        pod: ""
        container: mongo-mongodb
        command:
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          export MONGODB_ROOT_PASSWORD=''
          mongo --authenticationDatabase admin -u root -p "${MONGODB_ROOT_PASSWORD}" --eval="db.fsyncUnlock()"
eof

The interesting part about Kanister is that it allows creating custom blueprints for the application. It provides Kanister functions to execute as part of the phase and most of them should be sufficient but it is quite easy to create a function if they do not satisfy specific requirements. If the K10 platform is not in use then the use of ActionSets is much more relevant where application an ActionSet resource would define the Actions to run as part of the Blueprint.

Alerting support

The K10 platform does deploy a Prometheus server while installation but in most cases we would already have an elaborate setup for monitoring on the kubernetes cluster by the time the K10 platform is deployed. The platform does expose some of the metrics for each service deployed. The catalog service exports a metric catalog_actions_count which is a gauge.

This metric can be used to setup an alert either in Alertmanager or in Grafana to see if a backup / export is failing for a certain period of time.

eg:

catalog_actions_count{liveness="live",status="complete",type="backup"} 2
catalog_actions_count{liveness="live",status="complete",type="export"} 2

Other features

K10 Disaster Recovery

The K10 platform comes with its own disaster recovery. This requires creating a Location profile and enabling K10 DR. This creates a default policy which backs up the K10 platform itself. In case of a DR the restore requires a passphrase which is generated when K10 DR is enabled.

K10 Dashboard

All the functionality of creating Policies, Location profiles, and specifying the schedule of the backups and configuring the retention periods of these backups/imports are totally configurable via the UI dashboard.

Conclusion

The K10 platform provides a very flexible approach at backup and recovery for kubernetes applications as well as their data. The platform is extensible via the use of Kanister Blueprints and the dashboard is a nice addition to manage all these features right through the UI without requiring kubectl access.

I hope you enjoyed the blog post and gained insights into implementing disaster recovery on your Kubernetes clusters. If you have any queries or follow-up questions, please feel free to start a conversation on Twitter.

Looking to implement Kubernetes backup and disaster recovery? learn more about our capabilities and why startups & enterprises consider as one of the best Kubernetes consulting services companies.