ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [Hive Metastore] DBS 테이블 알아보기
    Hive 2024. 10. 1. 15:33
    반응형

    - 목차

     

    Hive Metastore Database 조회/생성 .

    아래의 명령어들을 통해서 Hive Metastore 데이터베이스을 조회 및 생성할 수 있습니다.

    ANSI SQL 의 Database 관련 쿼리와 동일하게 사용하실 수 있습니다.

     

    spark.sql("show databases").show()
    spark.sql("CREATE DATABASE my_db LOCATION 'hdfs://namenode:8020/data/my_db';")

     

     

     

    저는 Hive Metastore 의 외부 데이터베이스로 MySQL 을 사용하였고, 아래와 같은 순서로 Spark - Hive - MySQL 이 연결되게 됩니다.

    Spark -> Hive -> MySQL

     

    실제 MySQL 내부에서 databases 목록을 조회하게 되면, 아래와 같은 구조를 확인할 수 있습니다.

    저는 metastore 라는 MySQL 데이터베이스를 사전에 생성해두었고,

    show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | metastore          |
    | mysql              |
    | performance_schema |
    | sys                |
    +--------------------+
    5 rows in set (0.00 sec)

     

    Hive 의 schematool 초기화를 통해서 아래와 같은 테이블 목록들을 생성하였습니다.

     

    show tables;
    +-------------------------------+
    | Tables_in_metastore           |
    +-------------------------------+
    | AUX_TABLE                     |
    | BUCKETING_COLS                |
    | CDS                           |
    | COLUMNS_V2                    |
    | COMPACTION_METRICS_CACHE      |
    | COMPACTION_QUEUE              |
    | COMPLETED_COMPACTIONS         |
    | COMPLETED_TXN_COMPONENTS      |
    | CTLGS                         |
    | DATABASE_PARAMS               |
    | DATACONNECTORS                |
    | DATACONNECTOR_PARAMS          |
    | DBS                           |
    | DB_PRIVS                      |
    | DC_PRIVS                      |
    // .. 생략

     

     

    또한 DBS 데이터베이스는 생성된 Hive Database 의 메타데이터들이 보관되는 아래와 같이 저장됩니다.

    select * from DBS\G;
    *************************** 1. row ***************************
                      DB_ID: 1
                       DESC: Default Hive database
            DB_LOCATION_URI: file:/user/hive/warehouse
                       NAME: default
                 OWNER_NAME: public
                 OWNER_TYPE: ROLE
                  CTLG_NAME: hive
                CREATE_TIME: 1617268874
    DB_MANAGED_LOCATION_URI: NULL
                       TYPE: NATIVE
         DATACONNECTOR_NAME: NULL
              REMOTE_DBNAME: NULL
    *************************** 2. row ***************************
                      DB_ID: 2
                       DESC:
            DB_LOCATION_URI: hdfs://namenode:8020/data/my_db
                       NAME: my_db
                 OWNER_NAME: spark
                 OWNER_TYPE: USER
                  CTLG_NAME: hive
                CREATE_TIME: 1617268955
    DB_MANAGED_LOCATION_URI: NULL
                       TYPE: NATIVE
         DATACONNECTOR_NAME: NULL
              REMOTE_DBNAME: NULL
    2 rows in set (0.00 sec)

     

     

     

    유용한 명령어.

    MySQL Driver 을 가지는 Hive Docker Image 만들기.

    mkdir -p ./dockers
    
    cat <<EOF> ./dockers/hive-docker
    FROM apache/hive:4.0.1
    USER root
    
    RUN apt-get update && apt-get install -y wget
    RUN wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.30/mysql-connector-java-8.0.30.jar \
        -O /opt/hive/lib/mysql-connector-java-8.0.30.jar
    
    USER hive
    
    EOF
    
    docker build -t hive-mysql:4.0.1 -f ./dockers/hive-docker .

     

    MySQL & Hive Metastore 실행.

    docker network create hive
    
    cat <<EOF> user.sql
    CREATE DATABASE metastore;
    CREATE USER 'hive'@'%' IDENTIFIED BY 'hive123';
    GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'%';
    FLUSH PRIVILEGES;
    EOF
    
    docker run -d --network hive --name mysql --hostname mysql -e MYSQL_ROOT_PASSWORD=1234 -v ./user.sql:/docker-entrypoint-initdb.d/user.sql mysql:8.0.30
    
    mkdir -p $(pwd)/conf
    cat <<EOF> $(pwd)/conf/hive-site.xml
    <configuration>
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://mysql:3306/metastore?createDatabaseIfNotExist=true</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive123</value>
      </property>
      <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hive:9083</value>
      </property>
      <property>
        <name>hive.metastore.authorization.storage.checks</name>
        <value>true</value>
      </property>
    </configuration>
    EOF
    
    docker run -it --rm --network hive \
    -v $(pwd)/conf/hive-site.xml:/opt/hive/conf/hive-site.xml \
    --entrypoint /bin/bash \
    hive-mysql:4.0.1 -c "schematool -dbType mysql -initSchema --verbose"
    
    
    docker run -d --env SERVICE_NAME=metastore --name hive \
    --network hive --hostname hive \
    -v $(pwd)/conf/hive-site.xml:/opt/hive/conf/hive-site.xml \
    hive-mysql:4.0.1 hive --service metastore

     

    Hadoop 실행.

    mkdir -p ./conf
    cat <<EOF> ./conf/core-site.xml
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode:8020</value>
        <description>HDFS NameNode 주소 설정</description>
      </property>
      <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
      </property>
    </configuration>
    EOF
    
    cat <<EOF> ./conf/hdfs-site.xml
    <configuration>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/hdfs/namenode</value>
        <description>Block Metadata files</description>
      </property>  
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/hdfs/datanode</value>
        <description>Block files</description>
      </property>  
      <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
      </property>
    </configuration>
    EOF
    
    docker run -d --network hive --name namenode --hostname namenode \
    -v ./conf/core-site.xml:/opt/hadoop/etc/hadoop/core-site.xml \
    -v ./conf/hdfs-site.xml:/opt/hadoop/etc/hadoop/hdfs-site.xml \
    -p 9870:9870 \
    apache/hadoop:3.4.1 \
    sh -c '[ ! -d /data/hadoop/hdfs/namenode ] && echo start namenode formating && hdfs namenode -format; hdfs namenode'
    
    docker run -d --network hive --name datanode1 --hostname datanode1 \
    -v ./conf/core-site.xml:/opt/hadoop/etc/hadoop/core-site.xml \
    -v ./conf/hdfs-site.xml:/opt/hadoop/etc/hadoop/hdfs-site.xml \
    apache/hadoop:3.4.1 hdfs datanode

     

    spark user 권한 설정 및 디렉토리 생성.

    docker run --platform linux/amd64 -it --rm --network hive --user hadoop apache/hadoop:3.4.1 sh -c '
      whoami && \
      hdfs dfs -mkdir -p hdfs://namenode:8020/data/my_db && \
      hdfs dfs -chown spark:spark hdfs://namenode:8020/data/my_db
    '

     

    Spark 실행.

    docker run -d --name master --hostname master --network hive -p 8080:8080 -p 7077:7077 bitnami/spark:3.3.2 start-master.sh
    docker run -d --name worker1 --hostname worker1 --cpus 1 --memory 512M --network hive \
    -v ./conf/hive-site.xml:/opt/bitnami/spark/conf/hive-site.xml \
    bitnami/spark:3.3.2 start-worker.sh spark://master:7077
    docker run -d --name worker2 --hostname worker2 --cpus 1 --memory 512M --network hive \
    -v ./conf/hive-site.xml:/opt/bitnami/spark/conf/hive-site.xml \
    bitnami/spark:3.3.2 start-worker.sh spark://master:7077
    
    docker run -it --rm --name client --network hive -p 4040:4040 \
    -v ./conf/hive-site.xml:/opt/bitnami/spark/conf/hive-site.xml \
    bitnami/spark:3.3.2 spark-shell --master spark://master:7077 --conf spark.ui.port=4040

     

     

    반응형
Designed by Tistory.