Automated bare metal provisioning with Tinkerbell

By September 16, 2020bare metal
Tinkerbell _ An Automated Bare Metal Provisioning Engine-header-image

Do you want to use bare metal servers? and are you not using it because you
are scared of setting and maintaining it up? Here we have the Tinkerbell which provides a quite simple way of setting up and provisioning a bare metal machine.
This fulfills the need and now you should not worry about the complexity of setting up your bare metal machine. You can even use Tinkerbell to setup your VMs and can do many more things besides provisioning. For example, installing some required software after provision or setting up a k8s cluster and much more.
 
In this post, we will talk about the use cases of bare metal servers in today’s cloud-native world and the challenges which people generally face. And then we will talk about how Tinkerbell can help us as a complete solution to overcome those challenges.
 

Bare Metal Servers and their Use Cases:

Bare metal servers is present everywhere from rack of servers that are present in a Data Center of a company to an RPi board in some small IoT devices.

Use cases:

  • There are organizations that would like to use their on-premise infrastructure and build a private cloud.
  • Bare metal is also applicable where data security comes into the picture. Security is very important aspect  and a private infrastructure lets you configure the way business would like. For example the data of the people of account holders of any bank.
  • With the Bare Metal servers, organizations can tune their performance on the basis of cost, consistency, and predictability. these properties are hard to bargain from the public on public clouds.
  • The increase of control is also a big advantage of bare metal servers as compared to cloud servers.

Bare Metal Servers – Challenges:

  • Administration cost: If a user would like to setup a large Data Centers which consist of hundreds of servers. In this case it would be difficult to manage this large infrastructure. For example, provisioning each server and maintaining its life cycle.
  • Hardware Cost: There are cases where a user needs to setup the infrastructure with different CPU structures. For example Intel x86-64, ARM, etc and different distros like CentOS, Ubuntu, etc. In this case, it is again a bit difficult to change the configuration of an already configured/setup server again and again as per the requirement changes.
  • Maintenance: “Increase of Control comes with Increase of Complexity”. The user will have more control over the bare metal server. But again to configure/reconfigure a bare metal server is more complex than a cloud server.

Tinkerbell – A Complete Solution:

tinkerbell-services-blog-infracloud-aman-parauliya-001

Source: https://tinkerbell.org/


There are few micro services in Tinkerbell and each one of them handles a particular task while doing a Network boot for a bare metal server
. Following are the brief description about each of them:
 
  1. Boots: When you reboot a bare metal in iPXE/Network mode it first tries to fetch the IP by broadcasting a DHCPDISCOVER packet in the private network. Boots handles this request/packet and offer the IP asked by to that machine.
  2. Hegel: Hegel is the metadata service used by Tink and OSIE. It collects data from both and transforms it into a JSON format to which we call metadata.
  3. OSIE: OSIE provides an in-memory installation environment. It installs the operating system and handles de-provisioning.
  4. PbnJ: PBnJ is a microservice that can communicate with baseboard management controllers (BMCs). It controls power and boot settings.
  5. Tink: This is the main workflow engine and responsible for processing the workflows. This component has the following three different binaries
    • tink-server: This is a server which handles the request for CRUD operation on gRPC . This handles the creation or workflows and it’s building blocks, template, and targeted hardware.
    • tink-cli: The CLI is used to create workflows and their building blocks, templates, and hardware data.
    • tink-worker: This is a service that runs on the targeted hardware, a machine in which the user wants to provision and execute the workflow.

Architecture diagram for Tinkerbell :

tinkerbell-services-blog-infracloud-aman-parauliya

Source: https://github.com/tinkerbell/tinkerbell.org/blob/master/static/images/docs/workflow-architecture.png

As can be seen in the above diagram Tinkerbell has a control server which is called Provisioner. On this machine all the services of Tinkerbell mentioned above will be running inside its own docker container. The following are few other processes which will also be running on the Provisioner machine :

  1. PostgreSQL : Tinkerbell uses PostgreSQL as its data store. Psql is required to store and maintain workflows and all the other information which is required to create/maintain a workflow like templates, information of targeted Hardwares etc.
  2. Image Repository : A private docker registry will be hosted on the Provisioner. In this private registry docker images of tink-worker and of all the actions which are part of workflow (defined in template) will be stored and worker machine will fetch those images from this private docker registry. This is particularly useful for secure environments that don’t have access to the internet.
  3. NGINX: NGINX is a web server. Tinkerbell uses NGINX to serve the required boot files during workflow execution.

On the Provisioner, the complete stack of containers services/components looks like as shown below:

Tinkerbell: Provisioner Stack

Provisioning Steps:

The following are the steps through which a bare metal server can be provisioned to an Ubuntu 18.04 OS.
The following operation performs through the tink-cli container

1. Setup the Provisioner:

User can setup the Provisioner on a hardware machine or on a VM as well. The instruction for the setup are given here.

2. Create a Hardware Inventory:

In this step data of all the hardware (bare metal servers) which needs to be provisioned will be pushed to the database. The data will be in JSON format. Following is the minimal hardware data which you need to push for particular hardware:

{
  "metadata": {
    "instance": {},
    "facility": {
      "facility_code": "onprem"
    }
  },
  "network": {
    "interfaces": [
      {
        "dhcp": {
          "mac": "98:03:9b:89:d7:aa",
          "hostname": "localhost",
          "arch": "x86_64",
          "ip": {
            "address": "192.168.1.6",
            "netmask": "255.255.255.248",
            "gateway": "192.168.1.1"
          }
        },
        "netboot": {
          "allow_pxe": true,
          "allow_workflow": true
        }
      }
    ]
  },
  "id": "f9f56dff-098a-4c5f-a51c-19ad35de85d4"
}

For a detailed description of the hardware data you can visit Hardware_Data page.
User should keep the above information inside a file.
Following is the command of “tink” CLI through which you can push the data of the hardware:

docker exec -i deploy_tink-cli_1 tink hardware push < <path to hardware data file>
> 2020/08/24 16:23:09 Hardware data pushed successfully

Just like the above user can push the data of all the Hardwares which are required to be provisioned.

3. Create Templates :

In this step user needs to create templates. Templates are Yaml based definition of a workflow which contains all the tasks/actions which should execute to provision a machine. User can create multiple templates but each template with unique names. For example, if a template contains tasks/actions which are for Ubuntu provisioning, can have name as “Ubuntu-template” or a template for CentOS can have name as “Centos-template” etc.
Following is a sample of a template:

version: "0.1"
name: ubuntu_provisioning
global_timeout: 6000
tasks:
  - name: "os-installation"
    worker: "{{.device_1}}"
    volumes:
      - /dev:/dev
      - /dev/console:/dev/console
      - /lib/firmware:/lib/firmware:ro
    environment:
      MIRROR_HOST: <MIRROR_HOST_IP>
    actions:
      - name: "disk-wipe"
        image: disk-wipe
        timeout: 90
      - name: "disk-partition"
        image: disk-partition
        timeout: 600
        environment:
          MIRROR_HOST: <MIRROR_HOST_IP>
        volumes:
          - /statedir:/statedir
      - name: "install-root-fs"
        image: install-root-fs
        timeout: 600
      - name: "install-grub"
        image: install-grub
        timeout: 600
        volumes:
          - /statedir:/statedir

As per the template definition above, it supports tasks and actions under each task. Each action will be running as a container on the worker machine. Templates have volume mount support at task and action levels as well.

If we provide a volume at a task level, each action under that particular task can mount the volume.
And if we provide a volume at the action level, only that particular action can use that volume.
The template also supports environment variables in the same way as volume mount support.
If there is any change in the template in the future you can always refer to sample-ubuntu.tmpl.
Each task contains a field “worker” which has value as “{.device_1}”. The value of “device_1” will be replaced with the mac address of the targeted hardware while creating the workflow in the next step.
Following the command through which you can create/insert a template in the database:

#docker exec -i deploy_tink-cli_1 tink template create -n <template-name> < <path to template file>
Created Template:  489da4a1-6672-461b-9b09-9aa67b8938f9

Creation of a template generates a unique id (UUID) which we will be use during workflow creation.
For example in the above case the UUID for the created template is  “489da4a1-6672-461b-9b09-9aa67b8938f9”.

4. Create Workflows :

After completion of above two steps we need to create a workflow. To create a workflow we need one template (UUID) and MAC address or IP address of the targeted hardware.
Following is the command which creats the workflow:

#docker exec -i deploy_tink-cli_1 tink workflow create -t <template uuid> -r '{"device_1":"<MAC/IP> address of targeted hardware"}'
Created Workflow:  2a4d1101-8ad2-4c5c-9822-c541d1775151

5. Reboot the targeted hardware in iPXE mode :

After creating a workflow for a particular mac address, We should  reboot the machine which has that mac address in the iPXE mode. Once the reboot starts, first it will ask for an IP through DHCP service provided by boots. Then it will ask for few files required for OS installation to OSIE through Nginx. After that it boots into base OS and then it ask tink-server to provide the workflow which we created for this hardware. Then that workflow start executing by the worker machine. Once the workflow completed successfully. we should reboot the machine so that it can boot into the newly installed OS.

What’s Next for Tinkerbell:

Packet open-sourced the Tinkerbell few months back. We received a lot of responses from the community including few interesting things. People are already doing great things like automating Raspberry Pis, installing Kubernetes on their home lab setups, and, of course, finding a lot of areas for improvement.
We’re loving it, and we welcome your comments, questions, and PRs or issues.

We hope you find it interesting and like to engage with us to contribute back to the community and help us with our mission of making infrastructure a competitive advantage.
Join us on Slack and look for the #tinkerbell channel.

Also, have a look at the website for more details about Tinkerbell.

Thanks

Aman Parauliya

Senior Software Engineer at InfraCloud Technologies.

Aman Parauliya

Author Aman Parauliya

More posts by Aman Parauliya

Leave a Reply