This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Operations Guide

Recommended practices and procedures for running ClickHouse in your Production environments.

The methods to make your ClickHouse environment successful.

1 - Security

Security settings and best practices for ClickHouse

Keep your ClickHouse cluster and data safe and secure from intruders.

1.1 - Hardening Guide

Hardening procedures for ClickHouse environments.

ClickHouse is known for its ability to scale with clusters, handle terabytes to petabytes of data, and return query results fast. It also has a plethora of built in security options and features that help keep that data safe from unauthorized users.

Hardening your individual ClickHouse system will depend on the situation, but the following processes are generally applicable in any environment. Each of these can be handled separately, and do not require being performed in any particular order.

1.1.1 - User Hardening

User hardening security procedures.

Increasing ClickHouse security at the user level involves the following major steps:

  • User Configuration: Setup secure default users, roles and permissions through configuration or SQL.

  • User Network Settings: Limit communications by hostname or IP address

  • Secure Password: Store user information as hashed values.

  • Set Quotas: Limit how many resources users can use in given intervals.

  • Use Profiles: Use profiles to set common security settings across multiple accounts.

  • Database Restrictions: Narrow the databases, tables and rows that a user can access.

  • Enable Remote Authentication: Enable LDAP authentication or Kerberos authentication to prevent storing hashed password information, and enforce password standards.

  • IMPORTANT NOTE: Configuration settings can be stored in the default /etc/clickhouse-server/config.xml file. However, this file can be overwritten during vendor upgrades. To preserve configuration settings it is recommended to store them in /etc/clickhouse-server/config.d as separate XML files.

User Configuration

The hardening steps to apply to users are:

  • Restrict user access only to the specific host names or IP addresses when possible.
  • Store all passwords in SHA256 format.
  • Set quotas on user resources for users when possible.
  • Use profiles to set similar properties across multiple users, and restrict user to the lowest resources required.
  • Offload user authentication through LDAP or Kerberos.

Users can be configured through the XML based settings files, or through SQL based commands.

Detailed information on ClickHouse user configurations can be found on the ClickHouse.Tech documentation site for User Settings.

User XML Settings

Users are listed under the user.xml file under the users element. Each element under users is created as a separate user.

It is recommended that when creating users, rather than lumping them all into the user.xml file is to place them as separate XML files under the directory users.d, typically located in /etc/clickhouse-server/users.d/.

Note that if your ClickHouse environment is to be run as a cluster, then user configuration files must be replicated on each node with the relevant users information. We will discuss how to offload some settings into other systems such as LDAP later in the document.

Also note that ClickHouse user names are case sensitive: John is different than john. See the ClickHouse.tech documentation site for full details.

  • IMPORTANT NOTE: If no user name is specified when a user attempts to login, then the account named default will be used.

For example, the following section will create two users:

  • clickhouse_operator: This user has the password clickhouse_operator_password stored in a sha256 hash, is assigned the profile clickhouse_operator, and can access the ClickHouse database from any network host.
  • John: This user can only access the database from localhost, has a basic password of John and is assigned to the default profile.
<users>
    <clickhouse_operator>
        <networks>
            <ip>127.0.0.1</ip>
            <ip>0.0.0.0/0</ip>
            <ip>::/0</ip>
        </networks>          
        <password_sha256_hex>716b36073a90c6fe1d445ac1af85f4777c5b7a155cea359961826a030513e448</password_sha256_hex>
        <profile>clickhouse_operator</profile>
        <quota>default</quota>
    </clickhouse_operator>
    <John>
        <networks>
            <ip>127.0.0.1</ip>
        </networks>
        <password_sha456_hex>73d1b1b1bc1dabfb97f216d897b7968e44b06457920f00f2dc6c1ed3be25ad4c</password_sha256_hex>
        <profile>default</profile>
    </John>
</users>

User SQL Settings

ClickHouse users can be managed by SQL commands from within ClickHouse. For complete details, see the Clickhouse.tech User Account page.

Access management must be enabled at the user level with the access_management setting. In this example, Access Management is enabled for the user John:

<users>
    <John>
       <access_management>1</access_management>
    </John>
</users>

The typical process for DCL(Data Control Language) queries is to have one user enabled with access_management, then have the other accounts generated through queries. See the ClickHouse.tech Access Control and Account Management page for more details.

Once enabled, Access Management settings can be managed through SQL queries. For example, to create a new user called newJohn with their password set as a sha256 hash and restricted to a specific IP address subnet, the following SQL command can be used:

CREATE USER IF NOT EXISTS newJohn
  IDENTIFIED WITH SHA256_PASSWORD BY 'secret'
  HOST IP '192.168.128.1/24' SETTINGS readonly=1;

Access Management through SQL commands includes the ability to:

  • Set roles
  • Apply policies to users
  • Set user quotas
  • Restrict user access to databases, tables, or specific rows within tables.

User Network Settings

Users can have their access to the ClickHouse environment restricted by the network they are accessing the network from. Users can be restricted to only connect from:

  • IP: IP address or netmask.
    • For all IP addresses, use 0.0.0.0/0 for IPv4, ::/0 for IPv6
  • Host: The DNS resolved hostname the user is connecting from.
  • Host Regexp (Regular Expression): A regular expression of the hostname.

Accounts should be restricted to the networks that they connect from when possible.

User Network SQL Settings

User access from specific networks can be set through SQL commands. For complete details, see the Clickhouse.tech Create User page.

Network access is controlled through the HOST option when creating or altering users. Host options include:

  • ANY (default): Users can connect from any location
  • LOCAL: Users can only connect locally.
  • IP: A specific IP address or subnet.
  • NAME: A specific FQDN (Fully Qualified Domain Name)
  • REGEX: Filters hosts that match a regular expression.
  • LIKE: Filters hosts by the LIKE operator.

For example, to restrict the user john to only connect from the local subnet of ‘192.168.0.0/16’:

ALTER USER john
  HOST IP '192.168.0.0/16';

Or to restrict this user to only connecting from the specific host names awesomeplace1.com, awesomeplace2.com, etc:

ALTER USER john
  HOST REGEXP 'awesomeplace[12345].com';

User Network XML Settings

User network settings are stored under the user configuration files /etc/clickhouse-server/config.d with the <networks> element controlling the sources that the user can connect from through the following settings:

  • <ip> : IP Address or subnet mask.
  • <host>: Hostname.
  • <host_regexp>: Regular expression of the host name.

For example, the following will allow only from localhost:

<networks>
    <ip>127.0.0.1</ip>
</networks> 

The following will restrict the user only to the site example.com or from supercool1.com, supercool2.com, etc:

<networks>
    <host>example.com</host>
    <host_regexp>supercool[1234].com</host_regexp>
</networks> 

If there are hosts or other settings that are applied across multiple accounts, one option is to use the Substitution feature as detailed in the ClickHouse.tech Configuration Files page. For example, in the /etc/metrika.xml. file used for substitutions, a local_networks element can be made:

<local_networks>
    <ip>192.168.1.0/24</ip>
</local_networks>

This can then be applied to a one or more users with the incl attribute when specifying their network access:

<networks incl="local_networks" replace="replace">
</networks>

Secure Password

Passwords can be stored in plaintext or SHA256 (hex format).

SHA256 format passwords are labeled with the <password_sha256_hex> element. SHA256 password can be generated through the following command:

echo -n "secret" | sha256sum | tr -d '-'

OR:

echo -n "secret" | shasum -a 256 | tr -d '-'
  • IMPORTANT NOTE: The -n option removes the newline from the output.

For example:

echo -n "clickhouse_operator_password" | shasum -a 256 | tr -d '-'
716b36073a90c6fe1d445ac1af85f4777c5b7a155cea359961826a030513e448

Secure Password SQL Settings

Passwords can be set when using the CREATE USER OR ALTER USER with the IDENTIFIED WITH option. For complete details, see the ClickHouse.tech Create User page. The following secure password options are available:

  • sha256password BY ‘STRING’: Converts the submitted STRING value to sha256 hash.
  • sha256_hash BY ‘HASH’ (best option): Stores the submitted HASH directly as the sha256 hash password value.
  • double_sha1_password BY ‘STRING’ (only used when allowing logins through mysql_port): Converts the submitted STRING value to double sha256 hash.
  • double_sha1_hash BY ‘HASH’(only used when allowing logins through mysql_port): Stores the submitted HASH directly as the double sha256 hash password value.

For example, to store the sha256 hashed value of “password” for the user John:

ALTER USER John IDENTIFIED WITH sha256_hash BY '5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8';

Secure Password XML Settings

Passwords can be set as part of the user’s settings in the user configuration files in /etc/clickhouse-server/config.d. For complete details, see the Clickhouse.tech User Settings.

To set a user’s password with a sha256 hash, use the password_sha256_hex branch for the user. For example, to set the sha256 hashed value of “password” for the user John:

<users>
    <John>
        <password_sha256_hex>5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8</password_sha256_hex>
    </John>
</users>

Set Quotas

Quotas set how many resources can be accessed in a given time, limiting a user’s ability to tie up resources in the system. More details can be found on the ClickHouse.tech Quotas page.

Quota SQL Settings

Quotas can be created or altered through SQL queries, then applied to users.

For more information on ClickHouse quotas, see the ClickHouse.tech Access Control page on Quotas.

Quota XML Settings

These are defined in the users.xml file under the element quotas. Each branch of the quota element is the name of the quota being defined.

Quotas are set by intervals, which can be set to different restrictions. For example, this quota named limited has one interval that sets maximum queries at 1000, and another interval that allows a total of 10000 queries over a 24 hour period.

<quotas>
    <limited>
        <interval>
            <duration>3600</duration>
            <queries>1000</queries>
        </interval>
        <interval>
            <duration>86400</duration>
            <queries>10000</queries>
    </limited>
</quotas>

Use Profiles

Profiles allow settings that can be applied to multiple uses applied with the same name. More details on Settings Profiles are available on the ClickHouse.tech site.

Profile XML Settings

Profiles are applied to a user with the profile element. For example, this assigns the restricted profile to the user John:

<users>
    <John>
        <networks>
            <ip>127.0.0.1</ip>
            <ip>0.0.0.0/0</ip>
            <ip>::/0</ip>
        </networks>
        <password_sha256_hex>716b36073a90c6fe1d445ac1af85f4777c5b7a155cea359961826a030513e448</password_sha256_hex>
        <profile>restricted</profile>

Profiles are set in the users.xml file under the profiles element. Each branch of this element is the name of a profile. The profile restricted shown here only allows for eight threads to be used at a time for users with this profile:

<profiles>
    <restricted>
        <!-- The maximum number of threads when running a single query. -->
        <max_threads>8</max_threads>
    </default>
</profiles>

Recommended profile settings include the following:

  • readonly: This sets the profile to be applied to users but not to be changed.
  • max_execution_time: Limits the amount of time a process will run before being forced to time out.
  • max_bytes_before_external_group_by: Maximum RAM allocated for a single GROUP BY sort.
  • max_bytes_before_external_sort: Maximum RAM allocated for sort commands.

Database Restrictions

Restrict users to the databases they need, and when possible only the tables or rows within tables that they require access to.

Full details are found on the ClickHouse.tech User Settings documentation.

Database Restrictions XML Settings

To restrict a user’s access by data in the XML file:

  1. Update user configuration files in /etc/clickhouse-server/config.d or update their permissions through SQL queries.
  2. For each user to update:
    1. Add the <databases> element with the following branches:
      1. The name of the database to allow access to.
      2. Within the database, the table names allowed to the user.
      3. Within the table, add a <filter> to match rows that fit the filter.

Database Restrictions XML Settings Example

The following restricts the user John to only access the database sales, and from there only the table marked clients where salesman = 'John':

<John>
    <databases>
        <sales>
            <clients>
                <filter>salesman = 'John'</filter>
            </clients>
        </sales>
    </databases>
</John>

Enable Remote Authentication

One issue with user settings is that in a cluster environment, each node requires a separate copy of the user configuration files, which includes a copy of the sha256 encrypted password.

One method of reducing the exposure of user passwords, even in a hashed format in a restricted section of the file system, it to use external authentication sources. This prevents password data from being stored in local file systems and allows changes to user authentication to be managed from one source.

Enable LDAP

LDAP servers are defined in the ClickHouse configuration settings such as /etc/clickhouse-server/config.d/ldap.xml. For more details, see the ClickHouse.tech site on Server Configuration settings.

Enabling LDAP server support in ClickHouse allows you to have one authority on login credentials, set password policies, and other essential security considerations through your LDAP server. It also prevents password information being stored on your ClickHouse servers or cluster nodes, even in a SHA256 hashed form.

To add one or more LDAP servers to your ClickHouse environment, each node will require the ldap settings:

<ldap>
    <server>ldapserver_hostname</server>
        <roles>
            <my_local_role1 />
            <my_local_role2 />
        </roles>
</ldap>

When creating users, specify the ldap server for the user:

create user if not exists newUser
    identified with ldap by 'ldapserver_hostname'
    host any;

When the user attempts to authenticate to ClickHouse, their credentials will be verified against the LDAP server specified from the configuration files.

1.1.2 - Network Hardening

Secure network communications with ClickHouse

Hardening the network communications for your ClickHouse environment is about reducing exposure of someone listening in on traffic and using that against you. Network hardening falls under the following major steps:

  • Reduce Exposure

  • Enable TLS

  • Encrypt Cluster Communications

  • IMPORTANT NOTE: Configuration settings can be stored in the default /etc/clickhouse-server/config.xml file. However, this file can be overwritten during vendor upgrades. To preserve configuration settings it is recommended to store them in /etc/clickhouse-server/config.d as separate XML files with the same root element, typically <yandex>. For this guide, we will only refer to the configuration files in /etc/clickhouse-server/config.d for configuration settings.

Reduce Exposure

It’s easier to prevent entry into your system when there’s less points of access, so unused ports should be disabled.

ClickHouse has native support for MySQL client, PostgreSQL clients, and others. The enabled ports are set in the /etc/clickhouse-server/config.d files.

To reduce exposure to your ClickHouse environment:

  1. Review which ports are required for communication. A complete list of the ports and configurations can be found on the ClickHouse documentation site for Server Settings.

  2. Comment out any ports not required in the configuration files. For example, if there’s no need for the MySQL client port, then it can be commented out:

    <!-- <mysql_port>9004</mysql_port> -->
    

Enable TLS

ClickHouse allows for both encrypted and unencrypted network communications. To harden network communications, unencrypted ports should be disabled and TLS enabled.

TLS encryption required a Certificate, and whether to use a public or private Certificate Authority (CA) is based on your needs.

  • Public CA: Recommended for external services or connections where you can not control where they will be connecting from.
  • Private CA: Best used when the ClickHouse services are internal only and you can control where hosts are connecting from.
  • Self-signed certificate: Only recommended for testing environments.

Whichever method is used, the following files will be required to enable TLS with CLickHouse:

  • Server X509 Certificate: Default name server.crt
  • Private Key: Default name server.key
  • Diffie-Hellman parameters: Default name dhparam.pem

Generate Files

No matter which approach is used, the Private Key and the Diffie-Hellman parameters file will be required. These instructions may need to be modified based on the Certificate Authority used to match its requirements. The instructions below require the use of openssl, and was tested against version OpenSSL 1.1.1j.

  1. Generate the private key, and enter the pass phrase when required:

    openssl genrsa -aes256 -out server.key 2048
    
  2. Generate dhparam.pem to create a 4096 encrypted file. This will take some time but only has to be done once:

    openssl dhparam -out dhparam.pem 4096
    
  3. Create the Certificate Signing Request (CSR) from the generated private key. Complete the requested information such as Country, etc.

    openssl req -new -key server.key -out server.csr
    
  4. Store the files server.key, server.csr, and dhparam.pem in a secure location, typically /etc/clickhouse-server/.

Public CA

Retrieving the certificates from a Public CA or Internal CA performed by registering with a Public CA such as Let&rsquo;s Encrypt or Verisign or with an internal organizational CA service. This process involves:

  1. Submit the CSR to the CA. The CA will sign the certificate and return it, typically as the file server.crt.
  2. Store the file server.crt in a secure location, typically /etc/clickhouse-server/.

Create a Private CA

If you do not have an internal CA or do not need a Public CA, a private CA can be generated through the following process:

  1. Create the Certificate Private Key:

    openssl genrsa -aes256 -out internalCA.key 2048
    
  2. Create the self-signed root certificate from the certificate key:

    openssl req -new -x509 -days 3650 -key internalCA.key \
        -sha256 -extensions v3_ca -out internalCA.crt
    
  3. Store the Certificate Private Key and the self-signed root certificate in a secure location.

  4. Sign the server.csr file with the self-signed root certificate:

    openssl x509 -sha256 -req -in server.csr -CA internalCA.crt \
        -CAkey internalCA.key -CAcreateserial -out server.crt -days 365
    
  5. Store the file server.crt, typically /etc/clickhouse-server/.

Self Signed Certificate

To skip right to making a self-signed certificate, follow these instructions.

  • IMPORTANT NOTE: This is not recommended for production systems, only for testing environments.
  1. With the server.key file from previous steps, create the self-signed certificate. Replace my.host.name with the actual host name used:

    openssl req -subj "/CN=my.host.name" -new -key server.key -out server.crt
    
  2. Store the file server.crt, typically /etc/clickhouse-server/.

  3. Each clickhouse-client user that connects to the server with the self-signed certificate will have to allow invalidCertificateHandler by updating theirclickhouse-client configuration files at /etc/clickhouse-server/config.d:

    <config>
    <openSSL>
        <client>
            ...
            <invalidCertificateHandler>
                <name>AcceptCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>
    

Enable TLS in ClickHouse

Once the files server.crt, server.crt, and dhparam.dem have been generated and stored appropriately, update the ClickHouse Server configuration files located at /etc/clickhouse-server/config.d.

To enable TLS and disable unencrypted ports:

  1. Review the /etc/clickhouse-server/config.d files. Comment out unencrypted ports, including http_port and tcp_port:

    <!-- <http_port>8123</http_port> -->
    <!-- <tcp_port>9000</tcp_port> -->
    
  2. Enable encrypted ports. A complete list of ports and settings is available on the ClickHouse documentation site for Server Settings. For example:

    <https_port>8443</https_port>
    <tcp_port_secure>9440</tcp_port_secure>
    
  3. Specify the certificate files to use:

    <openSSL>
        <server>
            <!-- Used for https server AND secure tcp port -->
            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
            <dhParamsFile>/etc/clickhouse-server/dhparams.pem</dhParamsFile>
            ...
        </server>
    ...
    </openSSL>
    

Encrypt Cluster Communications

If your organization runs ClickHouse as a cluster, then cluster-to-cluster communications should be encrypted. This includes distributed queries and interservice replication. To harden cluster communications:

  1. Create a user for distributed queries. This user should only be able to connect within the cluster, so restrict it’s IP access to only the subnet or host names used for the network. For example, if the cluster is entirely contained in a subdomain named logos1,logos2, etc. This internal user be set with or without a password:

    CREATE USER IF NOT EXISTS internal ON CLUSTER 'my_cluster'
        IDENTIFIED WITH NO_PASSWORD
        HOST REGEXP '^logos[1234]$'
    
  2. Enable TLS for interservice replication and comment out the unencrypted interserver port by updating the /etc/clickhouse-server/config.d files:

    <!-- <interserver_http_port>9009</interserver_http_port> -->
    <interserver_https_port>9010</interserver_https_port> -->
    
  3. Set an the interserver_http_credentials in the /etc/clickhouse-server/config.d files, and include the internal username and password:

    <interserver_http_credentials>
        <user>internal</user>
        <password></password>
    </interserver_http_credentials>
    
  4. Enable TLS for distributed queries by editing the file /etc/clickhouse-server/config.d/remote_servers.xml

    1. For ClickHouse 20.10 and later versions, set a shared secret text and setting the port to secure for each shard:
    <remote_servers>
        <my_cluster>
        <shard>
            <secret>shared secret text</secret> <!-- Update here -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>logos1</host> <!-- Update here -->
                <port>9440</port> <!-- Secure Port -->
                <secure>1</secure> <!-- Update here, sets port to secure -->
            </replica>
        </shard>
    ...
    
    1. For previous versions of ClickHouse, set the internal user and enable secure communication:
    <remote_servers>
        <my_cluster>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>logos1</host> <!-- Update here -->
                    <port>9440</port> <!-- Secure Port -->
                    <secure>1</secure> <!-- Update here -->
                    <user>internal</port> <!-- Update here -->
                </replica>
            ... 
            </shard>
    ...
    

1.1.3 - Storage Hardening

Secure data stored for ClickHouse

ClickHouse data is ultimately stored on file systems. Keeping that data protected when it is being used or “at rest” is necessary to prevent unauthorized entities from accessing your organization’s private information.

Hardening stored ClickHouse data is split into the following catagories:

  • Host-Level Security

  • Volume Level Encryption

  • Column Level Encryption

  • Log File Protection

  • IMPORTANT NOTE: Configuration settings can be stored in the default /etc/clickhouse-server/config.xml file. However, this file can be overwritten during vendor upgrades. To preserve configuration settings it is recommended to store them in /etc/clickhouse-server/config.d as separate XML files with the same root element, typically <yandex>. For this guide, we will only refer to the configuration files in /etc/clickhouse-server/config.d for configuration settings.

Host-Level Security

The file level security for the files that ClickHouse uses to run should be restricted as much as possible.

  • ClickHouse does not require root access to the file system, and runs by default as the user clickhouse.
  • The following directories should be restricted to the minimum number of users:
    • /etc/clickhouse-server: Used for ClickHouse settings and account credentials created by default.
    • /var/lib/clickhouse: Used for ClickHouse data and new credentials.
    • /var/log/clickhouse-server: Log files that may display privileged information through queries. See Log File Protection for more information.

Volume Level Encryption

Encrypting data on the file system prevents unauthorized users who may have gained access to the file system that your ClickHouse database is stored on from being able to access the data itself. Depending on your environment, different encryption options may be required.

Cloud Storage

If your ClickHouse database is stored in a cloud service such as AWS or Azure, verify that the cloud supports encrypting the volume. For example, Amazon AWS provides a method to encrypt new Amazon EBS volumes by default.

The Altinity.Cloud service provides the ability to set the Volume Type to gp2-encrypted. For more details, see the Altinity.Cloud Cluster Settings.

Local Storage

Organizations that host ClickHouse clusters on their own managed systems, LUKS is a recommended solution. Instructions for Linux distributions including Red Hat and Ubuntu are available. Check with the distribution your organization for instructions on how to encrypt those volumes.

Kubernetes Encryption

If your ClickHouse cluster is managed by Kubernetes, the StorageClass used may be encrypted. For more information, see the Kubernetes Storage Class documentation.

Column Level Encryption

Organizations running ClickHouse 20.11 or later can encrypt individual columns with AES functions. For full information, see the ClickHouse.tech Encryption functions documentation.

Applications are responsible for their own keys. Before enabling column level encryption, test to verify that encryption does not negatively impact performance.

The following functions are available:

Function MySQL AES Compatible
encrypt(mode, plaintext, key, [iv, aad])
decrypt(mode, ciphertext, key, [iv, aad])
aes_encrypt_mysql(mode, plaintext, key, [iv]) *
aes_decrypt_mysql(mode, ciphertext, key, [iv]) *

Encryption function arguments:

Argument Description Type
mode Encryption mode. String
plaintext Text thats need to be encrypted. String
key Encryption key. String
iv Initialization vector. Required for -gcm modes, optional for others. String
aad Additional authenticated data. It isn’t encrypted, but it affects decryption. Works only in -gcm modes, for others would throw an exception String

Column Encryption Examples

This example displays how to encrypt information using a hashed key.

  1. Takes a hex value, unhexes it and stores it as key.
  2. Select the value and encrypt it with the key, then displays the encrypted value.
 WITH unhex('658bb26de6f8a069a3520293a572078f') AS key
SELECT hex(encrypt('aes-128-cbc', 'Hello world', key)) AS encrypted
┌─encrypted────────────────────────┐
 46924AC12F4915F2EEF3170B81A1167E 
└──────────────────────────────────┘

This shows how to decrypt encrypted data:

  1. Takes a hex value, unhexes it and stores it as key.
  2. Decrypts the selected value with the key as text.
WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT decrypt('aes-128-cbc',
  unhex('46924AC12F4915F2EEF3170B81A1167E'), key) AS plaintext
┌─plaintext───┐
 Hello world 
└─────────────┘

Log File Protection

The great thing about log files is they show what happened. The problem is when they show what happened, like the encryption key used to encrypt or decrypt data:

2021.01.26 19:11:23.526691 [ 1652 ] {4e196dfa-dd65-4cba-983b-d6bb2c3df7c8}
<Debug> executeQuery: (from [::ffff:127.0.0.1]:54536, using production
parser) WITH unhex('658bb26de6f8a069a3520293a572078f') AS key SELECT
decrypt(???), key) AS plaintext

These queries can be hidden through query masking rules, applying regular expressions to replace commands as required. For more information, see the ClickHouse.tech Server Settings documentation.

To prevent certain queries from appearing in log files or to hide sensitive information:

  1. Update the configuration files, located by default in /etc/clickhouse-server/config.d.
  2. Add the element query_masking_rules.
  3. Set each rule with the following:
    1. name: The name of the rule.
    2. regexp: The regular expression to search for.
    3. replace: The replacement value that matches the rule’s regular expression.

For example, the following will hide encryption and decryption functions in the log file:

 <query_masking_rules>
    <rule>
        <name>hide encrypt/decrypt arguments</name>
        <regexp>
           ((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)
        </regexp>
        <!-- or more secure, but also more invasive:
            (aes_\w+)\s*\(.*\)
        -->
        <replace>\1(???)</replace>
    </rule>
</query_masking_rules>

2 - Care and Feeding of Zookeeper with ClickHouse

Installing, configuring, and recovering Zookeeper

ZooKeeper is required for ClickHouse cluster replication. Keeping ZooKeeper properly maintained and fed provides the best performance and reduces the likelihood that your ZooKeeper nodes will become “sick”.

Elements of this guide can also be found on the ClickHouse on Kubernetes Quick Start guide, which details how to use Kubernetes and ZooKeeper with the clickhouse-operator.

2.1 - ZooKeeper Installation and Configuration

How to configure Zookeeper to work best with ClickHouse

Prepare and Start ZooKeeper

Preparation

Before beginning, determine whether ZooKeeper will run in standalone or replicated mode.

  • Standalone mode: One zookeeper server to service the entire ClickHouse cluster. Best for evaluation, development, and testing.
    • Should never be used for production environments.
  • Replicated mode: Multiple zookeeper servers in a group called an ensemble. Replicated mode is recommended for production systems.
    • A minimum of 3 zookeeper servers are required.
    • 3 servers is the optimal setup that functions even with heavily loaded systems with proper tuning.
    • 5 servers is less likely to lose quorum entirely, but also results in longer quorum acquisition times.
    • Additional servers can be added, but should always be an odd number of servers.

Precautions

The following practices should be avoided:

  • Never deploy even numbers of ZooKeeper servers in an ensemble.
  • Do not install ZooKeeper on ClickHouse nodes.
  • Do not share ZooKeeper with other applications like Kafka.
  • Place the ZooKeeper dataDir and logDir on fast storage that will not be used for anything else.

Applications to Install

Install the following applications in your servers:

  1. zookeeper (3.4.9 or later)
  2. netcat

Configure ZooKeeper

  1. /etc/zookeeper/conf/myid

    The myid file consists of a single line containing only the text of that machine’s id. So myid of server 1 would contain the text “1” and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255.

  2. /etc/zookeeper/conf/zoo.cfg

    Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with a series of lines of the form server.id=host:port:port

    # specify all zookeeper servers
    # The first port is used by followers to connect to the leader
    # The second one is used for leader election
    server.1=zookeeper1:2888:3888
    server.2=zookeeper2:2888:3888
    server.3=zookeeper3:2888:3888
    

    These lines must be the same on every ZooKeeper node

  3. /etc/zookeeper/conf/zoo.cfg

    This setting MUST be added on every ZooKeeper node:

    # The time interval in hours for which the purge task has to be triggered.
    # Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.
    autopurge.purgeInterval=1
    autopurge.snapRetainCount=5
    

Install Zookeeper

Depending on your environment, follow the Apache Zookeeper Getting Started guide, or the Zookeeper Administrator's Guide.

Start ZooKeeper

Depending on your installation, start ZooKeeper with the following command:

sudo -u zookeeper /usr/share/zookeeper/bin/zkServer.sh

Verify ZooKeeper is Running

Use the following commands to verify ZooKeeper is available:

echo ruok | nc localhost 2181
echo mntr | nc localhost 2181
echo stat | nc localhost 2181

Check the following files and directories to verify ZooKeeper is running and making updates:

  • Logs: /var/log/zookeeper/zookeeper.log
  • Snapshots: /var/lib/zookeeper/version-2/

Connect to ZooKeeper

From the localhost, connect to ZooKeeper with the following command to verify access (replace the IP address with your Zookeeper server):

bin/zkCli.sh -server 127.0.0.1:2181

Tune ZooKeeper

The following optional settings can be used depending on your requirements.

Improve Node Communication Reliability

The following settings can be used to improve node communication reliability:

/etc/zookeeper/conf/zoo.cfg
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between sending a request and getting an acknowledgement
syncLimit=5

Reduce Snapshots

The following settings will create fewer snapshots which may reduce system requirements.

/etc/zookeeper/conf/zoo.cfg
# To avoid seeks ZooKeeper allocates space in the transaction log file in blocks of preAllocSize kilobytes.
# The default block size is 64M. One reason for changing the size of the blocks is to reduce the block size
# if snapshots are taken more often. (Also, see snapCount).
preAllocSize=65536
# ZooKeeper logs transactions to a transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default snapCount is 10,000.
snapCount=10000

Documentation

Configuring ClickHouse to use ZooKeeper

Once ZooKeeper has been installed and configured, ClickHouse can be modified to use ZooKeeper. After the following steps are completed, a restart of ClickHouse will be required.

To configure ClickHouse to use ZooKeeper, follow the steps shown below. The recommended settings are located on ClickHouse.tech zookeeper server settings.

  1. Create a configuration file with the list of ZooKeeper nodes. Best practice is to put the file in /etc/clickhouse-server/config.d/zookeeper.xml.

    <yandex>
        <zookeeper>
            <node>
                <host>example1</host>
                <port>2181</port>
            </node>
            <node>
                <host>example2</host>
                <port>2181</port>
            </node>
            <session_timeout_ms>30000</session_timeout_ms>
            <operation_timeout_ms>10000</operation_timeout_ms>
            <!-- Optional. Chroot suffix. Should exist. -->
            <root>/path/to/zookeeper/node</root>
            <!-- Optional. ZooKeeper digest ACL string. -->
            <identity>user:password</identity>
        </zookeeper>
    </yandex>
    
  2. Check the distributed_ddl parameter in config.xml. This parameter can be defined in another configuration file, and can change the path to any value that you like. If you have several ClickHouse clusters using the same zookeeper, distributed_ddl path should be unique for every ClickHouse cluster setup.

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster. -->
    <!-- Works only if ZooKeeper is enabled. Comment it out if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>
    
        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>
    
  3. Check /etc/clickhouse-server/preprocessed/config.xml. You should see your changes there.

  4. Restart ClickHouse. Check ClickHouse connection to ZooKeeper detailed in ZooKeeper Monitoring.

Converting Tables to Replicated Tables

Creating a replicated table

Replicated tables use a replicated table engine, for example ReplicatedMergeTree. The following example shows how to create a simple replicated table.

This example assumes that you have defined appropriate macro values for cluster, shard, and replica in macros.xml to enable cluster replication using zookeeper. For details consult the ClickHouse.tech Data Replication guide.

CREATE TABLE test ON CLUSTER '{cluster}'
(
    timestamp DateTime,
    contractid UInt32,
    userid UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(timestamp)
ORDER BY (contractid, toDate(timestamp), userid)
SAMPLE BY userid;

The ON CLUSTER clause ensures the table will be created on the nodes of {cluster} (a macro value). This example automatically creates a ZooKeeper path for each replica table that looks like the following:

/clickhouse/tables/{cluster}/{replica}/default/test

becomes:

/clickhouse/tables/c1/0/default/test

You can see ZooKeeper replication data for this node with the following query (updating the path based on your environment):

SELECT *
  FROM system.zookeeper
  WHERE path = '/clickhouse/tables/c1/0/default/test'

Removing a replicated table

To remove a replicated table, use DROP TABLE as shown in the following example. The ON CLUSTER clause ensures the table will be deleted on all nodes. Omit it to delete the table on only a single node.

DROP TABLE test ON CLUSTER '{cluster}';

As each table is deleted the node is removed from replication and the information for the replica is cleaned up. When no more replicas exist, all ZooKeeper data for the table will be cleared.

Cleaning up ZooKeeper data for replicated tables

  • IMPORTANT NOTE: Cleaning up ZooKeeper data manually can corrupt replication if you make a mistake. Raise a support ticket and ask for help if you have any doubt concerning the procedure.

New ClickHouse versions now support SYSTEM DROP REPLICA which is an easier command.

For example:

SYSTEM DROP REPLICA 'replica_name' FROM ZKPATH '/path/to/table/in/zk';

ZooKeeper data for the table might not be cleared fully if there is an error when deleting the table, or the table becomes corrupted, or the replica is lost. You can clean up ZooKeeper data in this case manually using the ZooKeeper rmr command. Here is the procedure:

  1. Login to ZooKeeper server.
  2. Run zkCli.sh command to connect to the server.
  3. Locate the path to be deleted, e.g.:
    ls /clickhouse/tables/c1/0/default/test
  4. Remove the path recursively, e.g.,
    rmr /clickhouse/tables/c1/0/default/test

2.2 - ZooKeeper Monitoring

Verifying Zookeeper and ClickHouse are working together.

ZooKeeper Monitoring

For organizations that already have Apache ZooKeeper configured either manually, or with a Kubernetes operator such as the clickhouse-operator for Kubernetes, monitoring your ZooKeeper nodes will help you recover from issues before they happen.

Checking ClickHouse connection to ZooKeeper

To check connectivity between ClickHouse and ZooKeeper.

  1. Confirm that ClickHouse can connect to ZooKeeper. You should be able to query the system.zookeeper table, and see the path for distributed DDL created in ZooKeeper through that table. If something went wrong, check the ClickHouse logs.

    $ clickhouse-client -q "select * from system.zookeeper where path='/clickhouse/task_queue/'"
    ddl 17183334544    17183334544    2019-02-21 21:18:16    2019-02-21 21:18:16    0    8    0    0    0    8    17183370142    /clickhouse/task_queue/
    
  2. Confirm ZooKeeper accepts connections from ClickHouse. You can also see on ZooKeeper nodes if a connection was established and the IP address of the ClickHouse server in the list of clients:

    $ echo stat | nc localhost 2181
    ZooKeeper version: 3.4.9-3--1, built on Wed, 23 May 2018 22:34:43 +0200
    Clients:
     /10.25.171.52:37384[1](queued=0,recved=1589379,sent=1597897)
     /127.0.0.1:35110[0](queued=0,recved=1,sent=0)
    

ZooKeeper Monitoring Quick List

The following commands are available to verify ZooKeeper availability and highlight potential issues:

Check Name Shell or SQL command Severity
ZooKeeper is available select count() from system.zookeeper
where path=’/’
Critical for writes
ZooKeeper exceptions select value from system.events
where event=‘ZooKeeperHardwareExceptions’
Medium
Read only tables are unavailable for writes select value from system.metrics
where metric=‘ReadonlyReplica’
High
A data part was lost select value from system.events
where event=‘ReplicatedDataLoss’
High
Data parts are not the same on different replicas select value from system.events where event=‘DataAfterMergeDiffersFromReplica’;
select value from system.events where event=‘DataAfterMutationDiffersFromReplica’
Medium

2.3 - ZooKeeper Recovery

How to recover when Zookeeper has issues.

If there are issues with your ZooKeeper environment managing your ClickHouse clusters, the following steps can resolve them. If there are still issues, support is available to current Altinity customers.

Raising a Support Ticket

Altinity accepts support cases from its support partners via the Altinity Zendesk Portal, email, or Slack.

To log a ticket in the support portal:

  1. Login to https://altinity.zendesk.com using your email address.
  2. Press the “Add +” button to open a case.
  3. Enter case topic and details. Please include relevant information such as:
    1. ClickHouse version
    2. Error messages
    3. How to recreate the problem if you know

You can log a ticket by sending the above information to support@altinity.com using your registered email address or to the shared Slack channel if available.

Fault Diagnosis and Remediation

The following procedures can resolve issues.

  • IMPORTANT NOTE: Some procedures shown below may have a degree of risk depending on the underlying problem. These procedures are marked with Call Support and include raising a support ticket as the first step.

Restarting a crashed ClickHouse server

ClickHouse servers are managed by systemd and normally restart following a crash. If a server does not restart automatically, follow these steps:

  1. Access the ClickHouse error log for the failed server at /var/lib/clickhouse-server/clickhouse-server.err.log.
  2. Examine the last log entry and look for a stack trace showing the cause of the failure.
  3. If there is a stack trace:
    1. If the problem is obvious, fix the problem and run systemctl restart clickhouse-server to restart. Confirm that the server restarts.
    2. If the problem is not obvious, open an Altinity Support Ticket and provide the error log message.
  4. If there is no stack trace, ClickHouse may have been terminated by the OOM-killer due to excessive memory usage:
    1. Open the most recent syslog file at /var/log/syslog.
    2. Look for OOM-killer messages.
    3. If found, see Handling out-of-memory errors below.
    4. If the problem is not obvious, raise a support ticket and provide a description of the problem.

Replacing a failed cluster node

  1. Ensure the old node is truly offline and will not return.

  2. Create a new node with the same macros.xml definitions as the previous node.

  3. If possible use the same hostname as the failed node.

  4. Copy the metadata folder from a healthy replica.

  5. Set the force_restore_data so that ClickHouse wipes out existing ZooKeeper information for the node and replicates all data:

    sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data

  6. Start ClickHouse.

  7. Wait until all tables are replicated. You can check progress using:

    SELECT count(*) FROM system.replication_queue

Replacing a failed zookeeper node

  1. Configure ZooKeeper on a new server.
  2. Use the same hostname and myid as the failed node if possible.
  3. Start ZooKeeper on the new node.
  4. Verify the new node can connect to the ensemble.
  5. If the ZooKeeper environment does not support dynamic confirmation changes:
    1. If the new node has a different hostname or myid, modify zoo.cfg on the other nodes of the ensemble and restart them.
    2. ClickHouse’s sessions will be interrupted during this process.
  6. Make changes in ClickHouse configuration files if needed. A restart might be required for the changes to take effect.

Recovering from complete ZooKeeper loss (Call Support)

Complete loss of ZooKeeper is a serious event and should be avoided at all costs by proper ZooKeeper management. Follow this procedure only if you have lost all data in ZooKeeper as it is time-intensive and will cause affected tables to be unavailable.

  1. Raise a support ticket before taking any steps.
  2. Ensure that ZooKeeper is empty and working properly.
  3. Follow the instructions from Recovering from complete metadata loss in ZooKeeper or from the blog post A New Way to Restore ClickHouse After ZooKeeper Metadata Is Lost.
  4. ClickHouse will sync from the healthy table to all other tables.

Read-only tables

Read-only tables occur when ClickHouse cannot access ZooKeeper to record inserts on a replicated table.

  1. Login with clickhouse-client.

  2. Execute the following query to confirm that ClickHouse can connect to ZooKeeper:

    $ clickhouse-client -q "select * from system.zookeeper where path='/'"

  3. This query should return one or more ZNode directories.

  4. Execute the following query to check the state of the table.

    SELECT * from system.replicas where table='table_name'

  5. If there are connectivity problems, check the following.

    1. Ensure the <zookeeper> tag in ClickHouse configuration has the correct ZooKeeper host names and ports.
    2. Ensure that ZooKeeper is running.
    3. Ensure that ZooKeeper is accepting connections. Login to the ZooKeeper host and try to connect using zkClient.sh.