Storage Hardening

Secure data stored for ClickHouse

ClickHouse data is ultimately stored on file systems. Keeping that data protected when it is being used or “at rest” is necessary to prevent unauthorized entities from accessing your organization’s private information.

Hardening stored ClickHouse data is split into the following catagories:

  • Host-Level Security

  • Volume Level Encryption

  • Column Level Encryption

  • Log File Protection

  • IMPORTANT NOTE: Configuration settings can be stored in the default /etc/clickhouse-server/config.xml file. However, this file can be overwritten during vendor upgrades. To preserve configuration settings it is recommended to store them in /etc/clickhouse-server/config.d as separate XML files with the same root element, typically <yandex>. For this guide, we will only refer to the configuration files in /etc/clickhouse-server/config.d for configuration settings.

Host-Level Security

The file level security for the files that ClickHouse uses to run should be restricted as much as possible.

  • ClickHouse does not require root access to the file system, and runs by default as the user clickhouse.
  • The following directories should be restricted to the minimum number of users:
    • /etc/clickhouse-server: Used for ClickHouse settings and account credentials created by default.
    • /var/lib/clickhouse: Used for ClickHouse data and new credentials.
    • /var/log/clickhouse-server: Log files that may display privileged information through queries. See Log File Protection for more information.

Volume Level Encryption

Encrypting data on the file system prevents unauthorized users who may have gained access to the file system that your ClickHouse database is stored on from being able to access the data itself. Depending on your environment, different encryption options may be required.

Cloud Storage

If your ClickHouse database is stored in a cloud service such as AWS or Azure, verify that the cloud supports encrypting the volume. For example, Amazon AWS provides a method to encrypt new Amazon EBS volumes by default.

The Altinity.Cloud service provides the ability to set the Volume Type to gp2-encrypted. For more details, see the Altinity.Cloud Cluster Settings.

Local Storage

Organizations that host ClickHouse clusters on their own managed systems, LUKS is a recommended solution. Instructions for Linux distributions including Red Hat and Ubuntu are available. Check with the distribution your organization for instructions on how to encrypt those volumes.

Kubernetes Encryption

If your ClickHouse cluster is managed by Kubernetes, the StorageClass used may be encrypted. For more information, see the Kubernetes Storage Class documentation.

Column Level Encryption

Organizations running ClickHouse 20.11 or later can encrypt individual columns with AES functions. For full information, see the ClickHouse.tech Encryption functions documentation.

Applications are responsible for their own keys. Before enabling column level encryption, test to verify that encryption does not negatively impact performance.

The following functions are available:

Function MySQL AES Compatible
encrypt(mode, plaintext, key, [iv, aad])
decrypt(mode, ciphertext, key, [iv, aad])
aes_encrypt_mysql(mode, plaintext, key, [iv]) *
aes_decrypt_mysql(mode, ciphertext, key, [iv]) *

Encryption function arguments:

Argument Description Type
mode Encryption mode. String
plaintext Text thats need to be encrypted. String
key Encryption key. String
iv Initialization vector. Required for -gcm modes, optional for others. String
aad Additional authenticated data. It isn’t encrypted, but it affects decryption. Works only in -gcm modes, for others would throw an exception String

Column Encryption Examples

This example displays how to encrypt information using a hashed key.

  1. Takes a hex value, unhexes it and stores it as key.
  2. Select the value and encrypt it with the key, then displays the encrypted value.
 WITH unhex('658bb26de6f8a069a3520293a572078f') AS key
SELECT hex(encrypt('aes-128-cbc', 'Hello world', key)) AS encrypted
┌─encrypted────────────────────────┐
 46924AC12F4915F2EEF3170B81A1167E 
└──────────────────────────────────┘

This shows how to decrypt encrypted data:

  1. Takes a hex value, unhexes it and stores it as key.
  2. Decrypts the selected value with the key as text.
WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT decrypt('aes-128-cbc',
  unhex('46924AC12F4915F2EEF3170B81A1167E'), key) AS plaintext
┌─plaintext───┐
 Hello world 
└─────────────┘

Log File Protection

The great thing about log files is they show what happened. The problem is when they show what happened, like the encryption key used to encrypt or decrypt data:

2021.01.26 19:11:23.526691 [ 1652 ] {4e196dfa-dd65-4cba-983b-d6bb2c3df7c8}
<Debug> executeQuery: (from [::ffff:127.0.0.1]:54536, using production
parser) WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT
decrypt(???), key) AS plaintext

These queries can be hidden through query masking rules, applying regular expressions to replace commands as required. For more information, see the ClickHouse.tech Server Settings documentation.

To prevent certain queries from appearing in log files or to hide sensitive information:

  1. Update the configuration files, located by default in /etc/clickhouse-server/config.d.
  2. Add the element query_masking_rules.
  3. Set each rule with the following:
    1. name: The name of the rule.
    2. regexp: The regular expression to search for.
    3. replace: The replacement value that matches the rule’s regular expression.

For example, the following will hide encryption and decryption functions in the log file:

 <query_masking_rules>
    <rule>
        <name>hide encrypt/decrypt arguments</name>
        <regexp>
           ((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)
        </regexp>
        <!-- or more secure, but also more invasive:
            (aes_\w+)\s*\(.*\)
        -->
        <replace>\1(???)</replace>
    </rule>
</query_masking_rules>
Last modified 0001.01.01