Protecting Kubernetes applications data using Kanister

Protecting Kubernetes application data using Kanister header imageWhenever we talk about data-intensive applications, we usually also consider how we are going to manage that data of those applications. If we talk specifically about the stateful applications, people are still a bit dubious about running them on top of Kubernetes, because of the data management issues.

If we talk about data management in a bit more detail, there are a lot of other things involved apart from backup and restore. For example, application-consistent backup of the database, migration of the database, disaster recovery, or point in time recovery of the database. In this post, we are going to look into an opensource project named Kanister that helps us backup and restore applications that are deployed on Kubernetes.

What is Kanister?

Kanister is an open-source project by Kasten, that enables us to manage (backup and restore) application data on Kubernetes. You can deploy Kanister as a helm application into your Kubernetes cluster and start exploring things. Kanister uses Kubernetes custom resources very heavily to perform actions and do other things.

The main custom resources (referred as CR) that get installed when we install Kanister are Profile, Blueprint and ActionSet. Now if take a typical scenario where we would want to backup our application using an utility, that has been provided by the application or database itself (For example, in case of Elasticsearch esdump or for MySQL mysqldump), and after taking the backup we would like to upload that backup to an external persistent storage so that the backup can be restored later. In the next section, we will see how Kanister CRs help us achieve this.

Kanister’s Custom Resources

In the scenario we mentioned right above, the object storage details are maintained in the CR Profile, it has all the details about the object storage that would later be used by the platform to upload the backup to the object storage. Similarly the steps that are to be taken to backup and restore the database should be maintained in the Blueprint CR, it is going to have all the details for example how to take backup of the database that is running and once backup is completed how we can restore that backup. Now that we have the steps to backup and restore the database in Blueprint, we call (run) those steps (or actions in terms of Kanister) by creating Actionset CRs.

Now considering what we have just discussed, if we try to look into below architecture diagram, we can see that when we create an Actionset, a Blueprint is discovered and then the logic that is there in the actions of the blueprint is executed and finally if that action is completed/failed the respective status is updated in the Actionset.



Now that we have a clear understanding of all the CRs involved, let’s go ahead and try to create this entire workflow, taking MySQL as an example.

Install Kanister

Kanister is packaged and distributed through helm chart and we can use below command to install it

~ » helm repo add kanister
"kanister" has been added to your repositories

~ » kubectl create ns kanister
namespace/kanister created

~ » helm install myrelease --namespace kanister kanister/kanister-operator --set image.tag=0.32.0                                                                                                                   1 ↵ vivek@workmachine
NAME: myrelease
Thank you for trying Kanister.

# get all the pods from Kanister namespace to make sure it is installed successfully
~ » kubectl get pods -n kanister
NAME                                           READY   STATUS    RESTARTS   AGE
myrelease-kanister-operator-7d5cdb987c-9grdh   1/1     Running   0          4m43s

Once we have Kanister installed, if we go ahead and try to list all the CustomResourceDefinitions in the cluster we would see the CRs that we have discussed above

~ » kubectl get
NAME                                             CREATED AT                        2020-07-28T09:16:39Z                        2020-07-28T09:16:39Z                          2020-07-28T09:16:39Z

Once we have Kanister installed successfully, let’s go ahead and try to install the MySQL database that we are going to backup and restore using Kanister.

Backup and Restore for MySQL database

Install MySQL

To install MySQL from stable helm chart, please run below command

~ » kubectl create namespace mysql-test
namespace/mysql-test created

~ » helm install mysql-release stable/mysql --namespace mysql-test \
    --set mysqlRootPassword='asd#45@mysqlEXAMPLE' \
    --set persistence.size=10Gi
NAME: mysql-release
    mysql -h ${MYSQL_HOST} -P${MYSQL_PORT} -u root -p${MYSQL_ROOT_PASSWORD}

# we can list all the pod from mysql-test namespace to make sure mysql has been deployed successfully 
~ » kubectl get pod -n mysql-test
NAME                             READY   STATUS    RESTARTS   AGE
mysql-release-866dc87447-bdq96   1/1     Running   0          66s

Once we have MySQL installed now its turn to define the steps (actions) that should be taken to backup and restore this database. But before that let’s go ahead and create the Profile resource that is going to have the details about the object storage. I have details of the AWS S3 with me, but you can use any one of the supported object storage (S3, Azure or GCS).

Creating Profile resource

To create the Kanister custom resources (For example profile or actionset), Kanister provides a command line utility, kanctl and another utility kando that is used to interact with your object storage provider from blueprint and both of these utilities can be installed from here.

Please execute following command to create a profile resource 

~ » kanctl create profile s3compliant --access-key $ACCESS_KEY \
        --secret-key $SECRET_KEY \
        --bucket $BUCKET --region ap-south-1 \    
        --namespace mysql-test
secret 's3-secret-4ratwf' created
profile 's3-profile-gzbmn' created

So, if we look a bit cluster into this, apart from the object storage location we just specify which namespace our application, that we are going to take backup of, is deployed on. And that, in our case is mysql-test.

Creating Blueprint Resource

Once Profile resource is created now its turn to create the Blueprint that is going to maintain the steps (actions)  backup and restore and the commands that should be executed for each actions that are backup and restore.

Now if we take this example of MySQL and let’s say we want to backup the database using the utility mysqldump, what we can instruct blueprint to do is: execute mysqldump command onto the MySQL pod that is running in the mysql-test namespace. This execution of a command on the pod to take backup is pretty generic and would be useful for a lot of other databases and scenarios as well. To facilitate that Kanister provides us with some functions (called as Kanister functions) that can be used to achieve this. And the specific function that might be used in this case is KubeTask function. This function spins up a new pod and then runs the commands that are there in the blueprint into the specified pod. For more details about the KubeTask function, we can refer to the Kanister docs. Kanister provides a lot of other useful functions that the list can be found at this documentation link.

So now we know that we have to specify backup action using the command mysqldump that would need the host where the MySQL is deployed. Since we are going to have all these steps in the Blueprint, we should have a way to specify where a specific database is running. Luckily Kanister provides go template support for Blueprint and we can leverage that to read the object that would be passed to Actionset custom resource.

If we simplify this, we would need the connection details (service name in case of Kubernetes) to run the mysqldump command using KubeTask function. While creating Actionset we also specify an object (that we will see in a bit) and that object is passed to the Blueprint and eventually can be read using go template. So if we take a look into below snipped

mysqldump --column-statistics=0 -u root --password=${root_password} -h {{ .Deployment.Name }} --single-transaction --all-databases

If we specify this command as function KubeTask for action Backup in blueprint, and pass the deployment name (dep-name) while creating backup actionset, this would run the mysqldump command for the MySQL deployment (service name dep-name) that we have running in mysql-test namespace.

Now that we have some understanding of the blueprint let’s go ahead and have a look at the blueprint that the Kanister team provides for us and can be found here. Let’s look into what backup action has to say and you would be able to figure out details about restore action.

Take a look into the spec of the Blueprint below, we can see that the action backup is going to produce an artifact named `mysqlCloudDump` and has a phase named `dumpToObjectStore` that is going to be the KubeTask kanister function. Since we would need MySQL password to run mysqldump command we are taking the password with the help of object reference. The helm install command that we used to install the MySQL database created this secret (the secret name is same as the name of the MySQL deployment) that has the MySQL root password for key mysql-root-password.

After that, we are just specifying the pod that would be created as part of KubeTask function and all the commands that should be executed in that pod are mentioned there for this phase. Now we should have one question about this spec and that would be where would we get the deployment and profile from, to get to know that let’s go ahead and create the backup actionset.

    type: Deployment
          s3path: "{{ .Phases.dumpToObjectStore.Output.s3path }}"
    - func: KubeTask
      name: dumpToObjectStore
          kind: Secret
          name: '{{ .Deployment.Name }}'
          namespace: '{{ .Deployment.Namespace }}'
        image: kanisterio/mysql-sidecar:0.31.0
        namespace: "{{ .Deployment.Namespace }}"
        - bash
        - -o
        - errexit
        - -o
        - pipefail
        - -c
        - |
          s3_path="/mysql-backups/{{ .Deployment.Namespace }}/{{ .Deployment.Name }}/{{ toDate "2006-01-02T15:04:05.999999999Z07:00" .Time  | date "2006-01-02T15-04-05" }}/dump.sql.gz"
          root_password="{{ index .Phases.dumpToObjectStore.Secrets.mysqlSecret.Data "mysql-root-password" | toString }}"
          mysqldump --column-statistics=0 -u root --password=${root_password} -h {{ .Deployment.Name }} --single-transaction --all-databases | gzip - | kando location push --profile '{{ toJson .Profile }}' --path ${s3_path} -
          kando output s3path ${s3_path}

To create this blueprint execute below command, please make a note that we create the blueprint in the namespace where Kanister controller is deployed

~ » kubectl create -f -n kanister created

Creating Actionset resource

Now that we have blueprint, that has details on how to backup and restore the database, created. We can go ahead and create the actionset resource to actually run the backup and restore phases from the blueprint. But before that lets quickly insert some dummy details into the MySQL database so that we can make sure the records have been restored.

# exec into the mysql pod and insert some records into a table 
~ » kubectl exec -it -n mysql-test mysql-release-866dc87447-bdq96 bash                                                                                                                                            130 ↵ vivek@workmachine
root@mysql-release-866dc87447-bdq96:/# mysql -u root --password=asd#45@mysqlEXAMPLE
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 590
Server version: 5.7.30 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> use test;
Database changed
mysql> create table employees (name varchar(100), age int);
Query OK, 0 rows affected (0.29 sec)

mysql> insert into employees values ("Robert", "31");
Query OK, 1 row affected (0.01 sec)
mysql> insert into employees values ("John", "28");
Query OK, 1 row affected (0.01 sec)

mysql> select * from employees;
| name   | age  |
| Robert |   31 |
| John   |   28 |
2 rows in set (0.00 sec)

To create actionset please run below command

~ » kanctl create actionset --action backup --namespace kanister --blueprint mysql-blueprint --deployment mysql-test/mysql-release --profile mysql-test/s3-profile-gzbmn
actionset backup-skngt created

Now I think things would make sense because as you can see we are providing the deployment and the profile here, similarly you can pass any object that you want to, for example, statefulset or even configmaps, secrets and any other Kubernetes object using –objects flag in the format `–objects group/version/resource/namespace1/name1,group/version/resource/namespace2/name2`.

Once you have created the blueprint you can check the status of the blueprint by describing it, to make sure the actionset was successful. To debug the things further you can also check the logs of the controller pod that is deployed in the Kanister namespace.

~ » kubectl describe actionset -n kanister backup-skngt
Name:         backup-skngt
  Type    Reason           Age    From                 Message
  ----    ------           ----   ----                 -------
  Normal  Started Action   6m40s  Kanister Controller  Executing action backup
  Normal  Started Phase    6m40s  Kanister Controller  Executing phase dumpToObjectStore
  Normal  Ended Phase      6m26s  Kanister Controller  Completed phase dumpToObjectStore
  Normal  Update Complete  6m26s  Kanister Controller  Updated ActionSet 'backup-skngt' Status->complete

Once backup actionset is complete, let’s go ahead and delete the data from the database to imitate disaster and then we would run restore action to actually restore the data from the backup that we have already taken.

# exec into the mysql pod and run below in mysql shell
mysql> drop table employees;
Query OK, 0 rows affected (0.26 sec)

mysql> select * from employees;
ERROR 1146 (42S02): Table 'test.employees' doesn't exist

Now that we have deleted the data from the mysql database let’s go ahead and restore the backup that we have already taken by creating restore actionset

~ » kanctl --namespace kanister create actionset --action restore --from backup-skngt
actionset restore-backup-skngt-5q2k2 created

To confirm the actionset has been completed successfully we can describe the actionset like we described the backup actionset. Once the we have made sure the actionset is completed we can exec into the mysql pod once again to make sure the data has been restore.

Please login into the MySQL pod and run below command to make sure the data has been restored

mysql> select * from employees;
| name   | age  |
| Robert |   31 |
| John   |   28 |
2 rows in set (0.00 sec)

# As you can see the data has been restored successfully

And as you can see we were successfully able to backup and restore the MySQL database using the blueprint that was already provided by the Kanister team. They already provide blueprints for some databases that can be found here, but you are totally free to create your own blueprint as well.

To summarise below are the steps that we have taken to backup or MySQL database

  • Create profile resource that is going to have details of the external persistent storage where backup would be uploaded
  • Create blueprint resource that is going to have details steps/actions on how to backup and restore the database
  • Make some dummy entries in the MySQL database
  • Create actionset (with backup action) that would call the backup action from the blueprint
  • Delete the dummy data that we stored in the MySQL database
  • Restore the data back by creating actionset (with restore action)
  • Make sure the data has been restored successfully by querying the database

If you encounter any issues while going through this article you can either comment here or get in touch with the Kanister team at their slack workspace.

Vivek Singh

Author Vivek Singh

More posts by Vivek Singh

Leave a Reply