Persistent Storage

How to set up persistent storage for your ClickHouse Kubernetes cluster.

We’ve shown how to create ClickHouse clusters in Kubernetes, how to add zookeeper so we can create replicas of clusters. Now we’re going to show how to set persistent storage so you can change your cluster configurations without losing your hard work.

The examples here are built from the clickhouse-operator examples, simplified down for our demonstrations.

Create a new file called sample05.yaml with the following:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "demo-01"
spec:
  defaults:
    deployment:
      podTemplate: clickhouse-stable
      volumeClaimTemplate: storage-vc-template
  templates:
    podTemplates:
    - name: clickhouse-stable
      containers:
      - name: clickhouse
        image: yandex/clickhouse-server:latest
    volumeClaimTemplates:
    - name: storage-vc-template
      persistentVolumeClaim:
        metadata:
          name: storage-demo
        spec:
          storageClassName: default
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
  configuration:
    zookeeper:
        nodes:
        - host:  zookeeper-0.zookeepers.test
          port: 2181
    clusters:
      - name: "demo-01"
        layout:
          shardsCount: 2
          replicasCount: 2

Those who have followed the previous examples will recognize the clusters being created, but there are some new additions:

  • volumeClaimTemplate: This is setting up storage, and we’re specifying the class as default. For full details on the different storage classes see the kubectl Storage Class documentation
  • storage: We’re going to give our cluster 1 Gigabyte of storage, enough for our sample systems. If you need more space that can be upgraded by changing these settings.
  • podTemplate: Here we’ll specify what our pod types are going to be. We’ll use the latest version of the ClickHouse containers, but other versions can be specified to best it your needs. For more information, see the ClickHouse on Kubernetes Operator Guide.

Save your new configuration file and install it. If you’ve been following this guide and already have the namespace test operating, this will update it:

>> kubectl apply -f sample05.yaml -n test

Verify it completes with get all for this namespace, and you should have similar results:

>> kubectl -n test get chi -o wide
NAME      VERSION   CLUSTERS   SHARDS   HOSTS   STATUS      UPDATED   ADDED   DELETED   DELETE   ENDPOINT
demo-01   0.13.0    1          2        4       Completed   1         3       0         0        clickhouse-demo-01.test.svc.cluster.local

>> kubectl get service -n test
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
chi-demo-01-demo-01-0-0   ClusterIP      None            <none>        8123/TCP,9000/TCP,9009/TCP      42m
chi-demo-01-demo-01-0-1   ClusterIP      None            <none>        8123/TCP,9000/TCP,9009/TCP      2m10s
chi-demo-01-demo-01-1-0   ClusterIP      None            <none>        8123/TCP,9000/TCP,9009/TCP      113s
chi-demo-01-demo-01-1-1   ClusterIP      None            <none>        8123/TCP,9000/TCP,9009/TCP      96s
clickhouse-demo-01        LoadBalancer   10.100.122.58   localhost     8123:31461/TCP,9000:31965/TCP   42m

Testing Persistent Storage

Everything is running, let’s verify that our storage is working. We’re going to exec into our cluster with a bash prompt on one of the pods created:

>> kubectl -n test exec -it chi-demo-01-demo-01-0-0-0 -- bash
root@chi-demo-01-demo-01-0-0-0:/# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          59G  3.9G   52G   8% /
tmpfs            64M     0   64M   0% /dev
tmpfs           995M     0  995M   0% /sys/fs/cgroup
/dev/vda1        59G  3.9G   52G   8% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs           995M   12K  995M   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           995M     0  995M   0% /proc/acpi
tmpfs           995M     0  995M   0% /sys/firmware

And we can see we have about 1 Gigabyte of storage allocated into our cluster.

Let’s add some data to it. Nothing major, just to show that we can store information, then change the configuration and the data stays.

Exit out of your cluster and launch clickhouse-client on your LoadBalancer. We’re going to create a database, then create a table in the database, then show both.

clickhouse-client --host localhost --user=clickhouse_operator --password=clickhouse_operator_password
chi-demo-01-demo-01-1-1-0.chi-demo-01-demo-01-1-1.test.svc.cluster.local :) show databases

SHOW DATABASES

Query id: a4d45608-5d6b-4f0b-8d8b-554f49d2972f

┌─name────┐
 default 
 system  
└─────────┘

2 rows in set. Elapsed: 0.008 sec.

chi-demo-01-demo-01-1-1-0.chi-demo-01-demo-01-1-1.test.svc.cluster.local :) create database teststorage

CREATE DATABASE teststorage

Query id: 6dc4dc83-8207-496e-bd2b-2d9c79928515

Ok.

0 rows in set. Elapsed: 0.009 sec.
chi-demo-01-demo-01-1-1-0.chi-demo-01-demo-01-1-1.test.svc.cluster.local :) CREATE TABLE teststorage.test AS system.one ENGINE = Distributed('demo-01', 'system', 'one');

CREATE TABLE teststorage.test AS system.one
ENGINE = Distributed('demo-01', 'system', 'one')

Query id: 1e210d18-ae7c-4550-8550-b7277f4470d6

Ok.

0 rows in set. Elapsed: 0.005 sec.

chi-demo-01-demo-01-1-1-0.chi-demo-01-demo-01-1-1.test.svc.cluster.local :) show databases;

SHOW DATABASES

Query id: e65efbfa-c076-47b8-a00d-a5a3e988eee3

┌─name────────┐
 default     
 system      
 teststorage 
└─────────────┘

3 rows in set. Elapsed: 0.004 sec.

chi-demo-01-demo-01-1-1-0.chi-demo-01-demo-01-1-1.test.svc.cluster.local :) select * from teststorage.test;

SELECT *
FROM teststorage.test

Query id: b9e8b0d1-9798-498d-9a2d-0c25ea4236bd

┌─dummy─┐
     0 
└───────┘
┌─dummy─┐
     0 
└───────┘

2 rows in set. Elapsed: 0.022 sec.

If you followed the instructions from Zookeeper and Replicas, note at the end when we updated the configuration of our sample cluster that all of the tables and data we made went away. Let’s recreate that experiment now with a new configuration.

Create a new file called sample06.yaml. We’re going to reduce the shards and replicas to 1:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "demo-01"
spec:
  defaults:
    deployment:
      podTemplate: clickhouse-stable
      volumeClaimTemplate: storage-vc-template
  templates:
    podTemplates:
    - name: clickhouse-stable
      containers:
      - name: clickhouse
        image: yandex/clickhouse-server:latest
    volumeClaimTemplates:
    - name: storage-vc-template
      persistentVolumeClaim:
        metadata:
          name: storage-demo
        spec:
          storageClassName: default
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
  configuration:
    zookeeper:
        nodes:
        - host:  zookeeper-0.zookeepers.test
          port: 2181
    clusters:
      - name: "demo-01"
        layout:
          shardsCount: 1
          replicasCount: 1

Update the cluster with the following:

>> kubectl apply -f sample06.yaml -n test

Wait until the configuration is done and all of the pods are spun down, then launch a bash prompt on one of the pods and check the storage available:

>> kubectl -n test get chi -o wide
NAME      VERSION   CLUSTERS   SHARDS   HOSTS   STATUS      UPDATED   ADDED   DELETED   DELETE   ENDPOINT
demo-01   0.13.0    1          1        1       Completed   1         0       0         4        clickhouse-demo-01.test.svc.cluster.local

>> kubectl -n test exec -it chi-demo-01-demo-01-0-0-0 -- bash
root@chi-demo-01-demo-01-0-0-0:/# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          59G  3.9G   52G   7% /
tmpfs            64M     0   64M   0% /dev
tmpfs           995M     0  995M   0% /sys/fs/cgroup
/dev/vda1        59G  3.9G   52G   7% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs           995M   12K  995M   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           995M     0  995M   0% /proc/acpi
tmpfs           995M     0  995M   0% /sys/firmware

Storage is still there. We can test if our databases are still available by logging into clickhouse:

chi-demo-01-demo-01-0-0-0.chi-demo-01-demo-01-0-0.test.svc.cluster.local :) show databases;

SHOW DATABASES

Query id: 56ecb5f8-46fa-433b-b9b4-1bf6d0f6dedd

┌─name────────┐
 default     
 system      
 teststorage 
└─────────────┘

3 rows in set. Elapsed: 0.004 sec.

chi-demo-01-demo-01-0-0-0.chi-demo-01-demo-01-0-0.test.svc.cluster.local :) select * from teststorage.test;

SELECT *
FROM teststorage.test

Query id: f55d1ab3-dc2c-43f2-932e-c7a2b1c838dd

┌─dummy─┐
     0 
└───────┘
┌─dummy─┐
     0 
└───────┘

2 rows in set. Elapsed: 0.007 sec.

All of our databases and tables are there.

There are different ways of allocating storage - for data, for logging, multiple data volumes for your cluster nodes, but this will get you started in running your own Kubernetes cluster running ClickHouse in your favorite environment.