Mastering Data Resilience: ClickHouse Backup and Restore Best Practices

3 min readSep 10, 2023

ClickHouse is a powerful columnar database designed for high-performance OLAP (Online Analytical Processing) analytics. In this article, we’ll explore essential backup and restore strategies in a ClickHouse environment, particularly in Kubernetes-based, self-managed ClickHouse clusters.

Introduction

ClickHouse’s exceptional analytical capabilities make it an ideal choice for organizations dealing with large volumes of data. However, ensuring data reliability and recoverability is crucial. This article delves into backup and restore strategies to safeguard your ClickHouse data.

The Importance of Backup and Restore

Efficient backup and restore strategies are crucial for preventing data loss, handling system failures, and ensuring data recovery in a ClickHouse environment. Let’s explore the different approaches to achieve this.

ClickHouse Backup Options

Backup Schema Only

BACKUP DATABASE default
ON CLUSTER 'cluster-name'
TO S3(
    'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_08_24'
)
SETTINGS structure_only=true;

Above SQL command takes a backup of schema alone to S3 location
You can use the same S3 function for Google Cloud Storage (GCS) as well. It’s compatible.
In case of S3, it picks AWS credentials from what POD

Compression and Limitations

BACKUP DATABASE default
ON CLUSTER 'cluster-name'
TO S3(
    'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_08_24.zip'
)
SETTINGS compression_method='lzma', compression_level=3, structure_only=true;

This may error out saying :

Received exception from server (version 23.6.2):
Code: 344. DB::Exception: Received from localhost:9000. DB::Exception: Got error from chi%2Dlyric%2Dlyric%2D2%2D1:9000. DB::Exception: Using archives with backups on clusters is disabled. (SUPPORT_IS_DISABLED)

Findings:

compression_method is not supported with S3 based backup and restore flow like shown above.

-- without compression
-- works fine
BACKUP DATABASE default 
ON CLUSTER 'cluster-name' 
TO S3(
    'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_08_24'
)
SETTINGS structure_only=true;

-- Drop before restore
drop database default_restored ON CLUSTER 'cluster-name' SYNC ;

-- restore
RESTORE DATABASE default as default_restored 
ON CLUSTER 'cluster-name' 
FROM S3(
    'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_08_24'
);

Troubleshooting Shard Issues

Backup procedures may be blocked if there are inconsistencies between shards. Ensure your data is consistent across all shards to prevent issues during backup.

If one of the shard and other shard parts that are inconsistent then clickhouse BACKUPstatement doesn’t allow to backup. This issue may block us from taking backup. Schema backup without data would work but with data may not work.

# simplified error message
DB::Exception:
Table xxx on replica xxxx has part different from the part on replica checksum on replica CANNOT_BACKUP_TABLE

-- actual error message may look like this
Table default.`test_local` on replica 
chi-xxx-yyy-2-1 has part all_0_5_1 different from the part on replica chi-xxx-yyy-1-0 
(checksum '458fcabc38e861c2db258fe4dc6a412b' on replica chi-xxx-yyy-2-1 != checksum '65908f2583cad89f528bbfc4893dc779' on replica chi-xxx-yyy-1-0).
(CANNOT_BACKUP_TABLE)

https://altinity.com/blog/a-new-way-to-restore-clickhouse-after-zookeeper-metadata-is-lost

Best Practices for ClickHouse Backup

To ensure a robust backup and restore strategy, consider the following best practices:

Schedule regular backups to minimize data loss in case of failures.
Separate schema and data backups to facilitate more granular restoration.
Version your backups to track changes over time.

Data Portability and Schema Design

To enhance data portability and ease of restoration, consider the following:

Simplify Zookeeper Paths

Instead of declaring the Zookeeper path in every table, maintain them in server configurations. This can simplify schema design:

-- instead of this
CREATE TABLE table_name (
    x UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}')
ORDER BY x;

-- let's do this 
CREATE TABLE table_name (
    x UInt32
) ENGINE = ReplicatedMergeTree
ORDER BY x;

have the zookeeper path in server configuration like this:

<default_replica_path>/clickhouse/tables/{shard}/{database}/{table}</default_replica_path>
<default_replica_name>{replica}</default_replica_name>

Export and import table to AWS S3 in parquet format

Export your data as Parquet for improved portability and compatibility with various analytics tools:

-- export as parquet
INSERT INTO FUNCTION
   s3(
       'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_09_02_students.parquet',
       'Parquet'
    )
SELECT * FROM students;

-- assumes students table already exists in destination
INSERT INTO students
SELECT * FROM 
   s3(
       'https://bucket-name.s3.amazonaws.com/clickhouse_backup/backup-2023_09_02_students_parquet',
       'Parquet'
    );

Further Resources

For more in-depth information on ClickHouse backup and restore strategies, you can explore the following resources:

Conclusion

A well-planned backup and restore strategy is vital for maintaining data integrity and reliability in a ClickHouse environment. By implementing the practices and techniques outlined in this article, you can ensure the safety of your data and streamline the recovery process.