ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [ClickHouse] Distributed Table 알아보기
    Database/Clickhouse 2024. 3. 18. 07:04
    반응형

    - 목차

     

    들어가며.

    이번 글에서는 ClickHouse 의 Distributed Table 에 대해서 알아보는 시간을 가지겠습니다.

    먼저 3개의 Shard 과 2개의 Replica 를 가지는 ClickHouse 구조를 Docker Compose 를 통해서 실행합니다.

     

    config.xml 설정 파일은 아래와 같습니다.

    3개의 Shard 로 구성되며 각 Shard 는 2개의 Replica 를 가집니다.

    그리고 3개의 Zookeeper 가 상태 저장소로써 동작합니다.

     

    cat <<'EOF'> /tmp/config.xml
    <yandex>
        <zookeeper>
            <node index="1">
                <host>zookeeper1</host>
                <port>2181</port>
            </node>
            <node index="2">
                <host>zookeeper2</host>
                <port>2181</port>
            </node>
            <node index="3">
                <host>zookeeper3</host>
                <port>2181</port>
            </node>
        </zookeeper>
    
        <remote_servers>
            <clickhouse_cluster>
                <shard>
                    <replica>
                        <host>clickhouse_shard1_replica1</host>
                        <port>9000</port>
                    </replica>
                    <replica>
                        <host>clickhouse_shard1_replica2</host>
                        <port>9000</port>
                    </replica>
                </shard>
                <shard>
                    <replica>
                        <host>clickhouse_shard2_replica1</host>
                        <port>9000</port>
                    </replica>
                    <replica>
                        <host>clickhouse_shard2_replica2</host>
                        <port>9000</port>
                    </replica>
                </shard>
                <shard>
                    <replica>
                        <host>clickhouse_shard3_replica1</host>
                        <port>9000</port>
                    </replica>
                    <replica>
                        <host>clickhouse_shard3_replica2</host>
                        <port>9000</port>
                    </replica>
                </shard>
            </clickhouse_cluster>
        </remote_servers>
    
        <profiles>
            <default>
                <replication_alter_partitions_sync>2</replication_alter_partitions_sync>
                <load_balancing>random</load_balancing>
            </default>
        </profiles>
    </yandex>
    EOF

     

    그리고 총 6개의 macro config 파일을 생성합니다.

    이는 ReplicatedMergeTree Table 을 생성하기 위해서 각 ClickHouse Replica 에 추가합니다.

     

    cat <<'EOF'> /tmp/macro1.xml
    <yandex>
      <macros>
        <shard>1</shard>
        <replica>1</replica>
      </macros>
    </yandex>
    EOF
    
    cat <<'EOF'> /tmp/macro2.xml
    <yandex>
      <macros>
        <shard>1</shard>
        <replica>2</replica>
      </macros>
    </yandex>
    EOF
    
    cat <<'EOF'> /tmp/macro3.xml
    <yandex>
      <macros>
        <shard>2</shard>
        <replica>1</replica>
      </macros>
    </yandex>
    EOF
    
    cat <<'EOF'> /tmp/macro4.xml
    <yandex>
      <macros>
        <shard>2</shard>
        <replica>2</replica>
      </macros>
    </yandex>
    EOF
    
    cat <<'EOF'> /tmp/macro5.xml
    <yandex>
      <macros>
        <shard>3</shard>
        <replica>1</replica>
      </macros>
    </yandex>
    EOF
    
    cat <<'EOF'> /tmp/macro6.xml
    <yandex>
      <macros>
        <shard>3</shard>
        <replica>2</replica>
      </macros>
    </yandex>
    EOF

     

    아래는 6개의 ClickHouse 와 3개의 Zookeeper 를 실행하기 위한 docker-compose.yaml 파일입니다.

     

    cat <<'EOF'> /tmp/ch.yaml
    version: '3.9'
    
    services:
      clickhouse_shard1_replica1:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard1_replica1
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard1_replica1:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro1.xml:/etc/clickhouse-server/config.d/macro.xml
    
      clickhouse_shard1_replica2:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard1_replica2
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard1_replica2:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro2.xml:/etc/clickhouse-server/config.d/macro.xml
    
      clickhouse_shard2_replica1:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard2_replica1
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard2_replica1:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro3.xml:/etc/clickhouse-server/config.d/macro.xml
    
      clickhouse_shard2_replica2:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard2_replica2
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard2_replica2:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro4.xml:/etc/clickhouse-server/config.d/macro.xml
    
      clickhouse_shard3_replica1:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard3_replica1
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard3_replica1:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro5.xml:/etc/clickhouse-server/config.d/macro.xml
    
      clickhouse_shard3_replica2:
        image: altinity/clickhouse-server:22.8.15.25.altinitystable
        container_name: clickhouse_shard3_replica2
        platform: linux/amd64
        networks:
          - clickhouse_network
        volumes:
          - ch_vol_shard3_replica2:/var/lib/clickhouse
          - /tmp/config.xml:/etc/clickhouse-server/config.d/config.xml
          - /tmp/macro6.xml:/etc/clickhouse-server/config.d/macro.xml
    
      zookeeper1:
        image: zookeeper:3.6.1
        container_name: zookeeper1
        hostname: zookeeper1
        platform: linux/amd64
        volumes:
          - zoo_vol1:/data
        networks:
          - clickhouse_network
        environment:
          ZOO_MY_ID: 1
          ZOO_SERVERS: "server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181"
    
      zookeeper2:
        image: zookeeper:3.6.1
        container_name: zookeeper2
        hostname: zookeeper2
        platform: linux/amd64
        volumes:
          - zoo_vol2:/data
        networks:
          - clickhouse_network
        environment:
          ZOO_MY_ID: 2
          ZOO_SERVERS: "server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181"
    
      zookeeper3:
        image: zookeeper:3.6.1
        container_name: zookeeper3
        hostname: zookeeper3
        platform: linux/amd64
        volumes:
          - zoo_vol3:/data
        networks:
          - clickhouse_network
        environment:
          ZOO_MY_ID: 3
          ZOO_SERVERS: "server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181"
    
    networks:
      clickhouse_network:
        driver: bridge
    
    volumes:
      ch_vol_shard1_replica1:
      ch_vol_shard1_replica2:
      ch_vol_shard2_replica1:
      ch_vol_shard2_replica2:
      ch_vol_shard3_replica1:
      ch_vol_shard3_replica2:
      zoo_vol1:
      zoo_vol2:
      zoo_vol3:
    
    EOF

     

    docker-compose -f /tmp/ch.yaml -p clickhouse up -d

     

    위 명령어의 실행 이후, 생성되는 Docker Container 목록은 아래와 같습니다.

     

     

    아래의 쿼리 실행 결과는 clickhouse_shard1_replica1 에서 실행한 내용입니다.

    shard1 과 replica1 에 해당하는 macro 를 가지며, 총 6개의 Replica 들이 ClickHouse Cluster 를 구성합니다.

     

    select * from system.macros;
    
    ┌─macro───┬─substitution─┐
    │ replica │ 1            │
    │ shard   │ 1            │
    └─────────┴──────────────┘
    
    
    select cluster, shard_num, replica_num, host_name 
    from system.clusters where cluster = 'clickhouse_cluster';
    
    
    ┌─cluster────────────┬─shard_num─┬─replica_num─┬─host_name──────────────────┐
    │ clickhouse_cluster │         1 │           1 │ clickhouse_shard1_replica1 │
    │ clickhouse_cluster │         1 │           2 │ clickhouse_shard1_replica2 │
    │ clickhouse_cluster │         2 │           1 │ clickhouse_shard2_replica1 │
    │ clickhouse_cluster │         2 │           2 │ clickhouse_shard2_replica2 │
    │ clickhouse_cluster │         3 │           1 │ clickhouse_shard3_replica1 │
    │ clickhouse_cluster │         3 │           2 │ clickhouse_shard3_replica2 │
    └────────────────────┴───────────┴─────────────┴────────────────────────────┘

     

     

    Distributed Table 과 ReplicatedMergeTree Table 생성하기.

     

    6개의 Replica 서버에 ReplicatedMergeTree Table 과 Distributed Table 을 생성합니다.

    생성하기 위한 쿼리는 아래와 같습니다.

     

    CREATE TABLE shard_local_table ON CLUSTER 'clickhouse_cluster'
    (
        id UInt64,
        name String,
        value Float64,
        created_at DateTime
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/shard_local_table', '{replica}')
    PARTITION BY toYYYYMM(created_at)
    ORDER BY id
    SETTINGS index_granularity = 8192;
    
    CREATE TABLE distributed_table ON CLUSTER 'clickhouse_cluster'
    (
        id UInt64,
        name String,
        value Float64,
        created_at DateTime
    )
    ENGINE = Distributed('clickhouse_cluster', 'default', 'shard_local_table', rand());

     

    위 쿼리의 실행 이후에 모든 Replica 내부에 Table 들이 생성됩니다.

     

     

     

    Row 생성하기.

    아래의 Insert Query 를 실행하여 1만개의 데이터를 생성합니다.

    미리 예측해보면 각 Shard 마다 1/3 씩 분배가 되어 생성이 될 것이며, 각 Shard 내부의 Replica 들은 복제된 동일한 데이터를 가집니다.

     

    docker exec clickhouse_shard3_replica2 \
    clickhouse-client --query="INSERT INTO distributed_table
    SELECT
        number AS id,
        concat('name_', toString(number)) AS name,
        rand() % 100 AS value,
        now() - INTERVAL rand() % 100 DAY AS created_at
    FROM numbers(10000);"

     

    아래의 결과와 같이 shard_local_table 은 대략 1/3 만큼의 데이터가 분산 저장되었습니다.

    그리고 동일한 Shard 내부의 Replica 는 같은 수의 데이터를 가지게 됩니다.

     

     

     

    반응형
Designed by Tistory.