-
Hive & Hadoop 연결하기Hive 2023. 12. 17. 08:15728x90반응형
- 목차
소개.
이번 글에서는 Hive 와 Hadoop 을 연결하여 어떠한 방식으로 두 Application 사이의 소통이 이루어지는지 알아보려고 합니다.
Docker Container 환경에서 진행할 예정입니다.
Hive & Hadoop 연결하기.
1. hadoop config 파일을 생성합니다.
하둡은 core-site.xml, hdfs-site.xml 등의 xml 설정을 기본으로 합니다.
아래 명령어를 통해서 hadoop_config 정보를 생성합니다.
cat <<EOF> /tmp/hadoop_config CORE-SITE.XML_fs.default.name=hdfs://namenode CORE-SITE.XML_fs.defaultFS=hdfs://namenode HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020 HDFS-SITE.XML_dfs.replication=1 MAPRED-SITE.XML_mapreduce.framework.name=yarn MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600 YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=* CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=* CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings= CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false EOF
2. hive-site.xml 파일을 생성합니다.
cat <<EOF> /tmp/hive-site.xml <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <property> <name>hive.tez.exec.inplace.progress</name> <value>false</value> </property> <property> <name>hive.exec.scratchdir</name> <value>/opt/hive/scratch_dir</value> </property> <property> <name>hive.user.install.directory</name> <value>/opt/hive/install_dir</value> </property> <property> <name>tez.runtime.optimize.local.fetch</name> <value>true</value> </property> <property> <name>hive.exec.submit.local.task.via.child</name> <value>false</value> </property> <property> <name>mapreduce.framework.name</name> <value>local</value> </property> <property> <name>tez.local.mode</name> <value>true</value> </property> <property> <name>hive.execution.engine</name> <value>tez</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/opt/hive/data/warehouse</value> </property> <property> <name>metastore.metastore.event.db.notification.api.auth</name> <value>false</value> </property> </configuration> EOF
3. hadoop docker-compose.yaml 생성합니다.
아래 명령어를 통해서 docker-compose.yaml 을 생성합니다.
yaml 파일의 위치는 /tmp 디렉토리로 설정하였습니다.
NameNode, DataNode, HiveServer, ResourceManager, NodeManager 를 각각 1개씩 생성합니다.
cat <<EOF> /tmp/hive-docker-compose.yaml version: "2" services: namenode: platform: linux/amd64 container_name: namenode image: apache/hadoop:3 hostname: namenode command: ["hdfs", "namenode"] ports: - 9870:9870 - 8020:8020 env_file: - /tmp/hadoop_config environment: ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name" networks: - hadoop_network datanode: platform: linux/amd64 container_name: datanode depends_on: - namenode links: - namenode:namenode image: apache/hadoop:3 command: ["hdfs", "datanode"] env_file: - /tmp/hadoop_config networks: - hadoop_network hive: platform: linux/amd64 container_name: hive depends_on: - namenode - datanode - nodemanager - resourcemanager image: apache/hive:3.1.3 ports: - 10000:10000 - 10002:10002 volumes: - type: bind source: '/tmp/hive-site.xml' target: '/opt/hive/conf/hive-site.xml' environment: SERVICE_NAME: hiveserver2 networks: - hadoop_network resourcemanager: platform: linux/amd64 image: apache/hadoop:3 hostname: resourcemanager command: ["yarn", "resourcemanager"] ports: - 8088:8088 env_file: - /tmp/hadoop_config volumes: - ./test.sh:/opt/test.sh networks: - hadoop_network nodemanager: platform: linux/amd64 image: apache/hadoop:3 command: ["yarn", "nodemanager"] env_file: - /tmp/hadoop_config networks: - hadoop_network networks: hadoop_network: name: hadoop_network EOF
4. docker-compose 를 실행합니다.
docker-compose -f /tmp/hive-docker-compose.yaml --project-name=hive up -d
Hive Table 생성하기.
1. Hadoop Namespace 생성.
Hadoop NameNode 에 commerce 라는 이름의 Namespace 를 생성합니다.
docker exec -it namenode hdfs dfs -mkdir /commerce/
Root Namespace 하위에 존재하는 Namespace 들을 살펴보면,
commerce 라는 Namespace 가 생성되어 있음을 확인할 수 있습니다.
docker exec -it namenode hdfs dfs -ls -h /
Found 1 items drwxr-xr-x - hadoop supergroup 0 2023-12-17 22:49 /commerce
생성된 commerce 네임스페이스의 ACL 에 hive 라는 사용자를 추가합니다.
이렇게함으로써 Hive 와 같은 외부 클라이언트에서 데이터 생성이 가능해집니다.
hdfs dfs -setfacl -m user:hive:rwx /commerce
hdfs dfs -getfacl /commerce
# file: /commerce # owner: hadoop # group: supergroup user::rwx user:hive:rwx group::r-x mask::rwx other::r-x
2. Hive Table 생성하기.
1. 먼저 Hive 의 beeline CLI 로 진입해야합니다.
docker exec -it hive beeline -u 'jdbc:hive2://hive:10000/default'
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://hive:10000/default Connected to: Apache Hive (version 3.1.3) Driver: Hive JDBC (version 3.1.3) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.3 by Apache Hive
2. 먼저 commerce database 를 생성합니다.
create database commerce;
INFO : Compiling command(queryId=hive_20231218000456_27454250-92ca-4bf4-9e37-dd80d319c289): create database commerce INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20231218000456_27454250-92ca-4bf4-9e37-dd80d319c289); Time taken: 3.21 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20231218000456_27454250-92ca-4bf4-9e37-dd80d319c289): create database commerce INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20231218000456_27454250-92ca-4bf4-9e37-dd80d319c289); Time taken: 0.265 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (4.254 seconds)
3. 생성된 데이터베이스를 확인할 수 있습니다.
show databases;
INFO : Compiling command(queryId=hive_20231218000534_40e254f9-33b2-40d5-8d09-85fb650bd3b1): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20231218000534_40e254f9-33b2-40d5-8d09-85fb650bd3b1); Time taken: 0.377 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20231218000534_40e254f9-33b2-40d5-8d09-85fb650bd3b1): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20231218000534_40e254f9-33b2-40d5-8d09-85fb650bd3b1); Time taken: 0.127 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +----------------+ | database_name | +----------------+ | commerce | | default | +----------------+
4. Hive Table 을 생성합니다.
USE commerce; CREATE EXTERNAL TABLE product ( id INT, name STRING, price DOUBLE ) STORED AS PARQUET LOCATION 'hdfs://namenode:8020/commerce';
INFO : Compiling command(queryId=hive_20231218004148_0001ae29-8621-44ee-83c1-c7143091b910): CREATE EXTERNAL TABLE product ( id INT, name STRING, price DOUBLE ) STORED AS PARQUET LOCATION 'hdfs://namenode:8020/commerce' INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20231218004148_0001ae29-8621-44ee-83c1-c7143091b910); Time taken: 2.892 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20231218004148_0001ae29-8621-44ee-83c1-c7143091b910): CREATE EXTERNAL TABLE product ( id INT, name STRING, price DOUBLE ) STORED AS PARQUET LOCATION 'hdfs://namenode:8020/commerce' INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20231218004148_0001ae29-8621-44ee-83c1-c7143091b910); Time taken: 1.936 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (5.666 seconds)
5. Row 를 생성합니다.
insert into product values (1, 'iPhone15', 15.50);
INFO : Compiling command(queryId=hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156): insert into product values (1, 'iPhone15', 15.50) INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:int, comment:null), FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2, type:double, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156); Time taken: 4.186 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156): insert into product values (1, 'iPhone15', 15.50) INFO : Query ID = hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into product values (1, 'iPh...15.50) (Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1702862285673_0001) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 15.16 s ---------------------------------------------------------------------------------------------- INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table commerce.product from hdfs://namenode:8020/commerce/.hive-staging_hive_2023-12-18_01-18-01_132_1289189002752009820-2/-ext-10000 INFO : Starting task [Stage-3:STATS] in serial mode INFO : Completed executing command(queryId=hive_20231218011801_0d81c744-2e60-4a55-a973-a4341bd9b156); Time taken: 18.844 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (23.554 seconds)
6. 생성된 데이터를 확인합니다.
docker exec -it namenode hdfs dfs -ls /commerce/
Found 1 items -rw-r--r-- 3 hive supergroup 640 2023-12-18 01:18 /commerce/000000_0
반응형'Hive' 카테고리의 다른 글
Hive MySQL Metastore 알아보기 (0) 2023.12.17 Docker 로 Hive 구현하기 (0) 2023.12.16