Skip to content

Latest commit

 

History

History
101 lines (79 loc) · 2.74 KB

replication_setup.md

File metadata and controls

101 lines (79 loc) · 2.74 KB

Setup ClickHouse cluster with data replication

Prerequisites

  1. ClickHouse operator installed
  2. Zookeeper installed

Manifest

Let's take a look on example, which creates a cluster with 2 shards and 2 replicas and persistent storage.

apiVersion: "clickhouse.radondb.com/v1"
kind: "ClickHouseInstallation"

metadata:
  name: "repl-05"

spec:
  defaults:
    templates: 
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse:19.6
 
  configuration:
    zookeeper:
      nodes:
      - host: zookeeper.zoo1ns
    clusters:
      - name: replicated
        layout:
          shardsCount: 2
          replicasCount: 2

  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Mi
    podTemplates:
      - name: clickhouse:19.6
        spec:
          containers:
            - name: clickhouse-pod
              image: yandex/clickhouse-server:19.6.2.11

Replicated table setup

Macros

Operator provides set of macros, which are:

  1. {installation} -- ClickHouse Installation name
  2. {cluster} -- primary cluster name
  3. {replica} -- replica name in the cluster, maps to pod service name
  4. {shard} -- shard id

ClickHouse also supports internal macros {database} and {table} that maps to current database and table respectively.

Create replicated table

Now we can create replicated table, using specified macros

CREATE TABLE events_local on cluster '{cluster}' (
    event_date  Date,
    event_type  Int32,
    article_id  Int32,
    title       String
) engine=ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, article_id);
CREATE TABLE events on cluster '{cluster}' AS events_local
ENGINE = Distributed('{cluster}', default, events_local, rand());

We can generate some data:

INSERT INTO events SELECT today(), rand()%3, number, 'my title' FROM numbers(100);

And check how these data are distributed over the cluster

SELECT count() FROM events;
SELECT count() FROM events_local;