This example relies on a forked version of Hive 3.1.2 that supports CockroachDB. The changes were minimal and could be incorporated into future versions of Apache Hive. See here for details on the CockroachDB compatible fork.
The docker-compose.yml
contains the following services:
namenode
- Apache Hadoop NameNodedatanode
- Apache Hadoop DataNoderesourcemanager
- Apache Hadoop YARN Resource Managernodemanager
- Apache Hadoop YARN Node Managerhistoryserver
- Apache Hadoop YARN Timeline Managerhs2
- Apache Hive HiveServer2metastore
- Apache Hive Standalone Metastoremetastore-db
- CockroachDB that supports the Apache Hive Metastore
Hadoop configuration parameters are provided by the following .env
files. Ultimately these values are written to the appropriate Hadoop XML configuration file. For Example, properties beginning with the following keys map the following files:
CORE_CONF_*
>core-site.xml
HDFS_CONF_*
>hdfs-site.xml
HIVE_SITE_CONF_*
>hive-site.xml
YARN_CONF_*
>yarn-site.xml
METASTORE_SITE_CONF_*
>metastore-site.xml
Key names use the following character conversions:
- a single underscore
_
equals dot.
- a double underscore
__
equals a single underscore_
- a triple underscore
___
equals a dash-
For example, the key HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check
would result in the property dfs.namenode.datanode.registration.ip-hostname-check
being written to hdfs-site.xml
.
Another example, the key YARN_CONF_yarn_resourcemanager_resource__tracker_address
would result in the property yarn.resourcemanager.resource_tracker.address
being written to yarn-site.xml
.
Exiting configuration files and their default values are listed below. Please note the value for YARN_CONF_yarn_nodemanager_resource_memory___mb
assumes that your docker host has at least 8gb of memory. Feel free to modify as necessary.
HADOOP_LOG_DIR=/var/log/hadoop
YARN_LOG_DIR=/var/log/hadoop
CORE_CONF_fs_defaultFS=hdfs://namenode:9820
CORE_CONF_hadoop_http_staticuser_user=root
HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
HDFS_CONF_dfs_permissions_enabled=false
HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_replication=1
MAPRED_CONF_mapreduce_framework_name=yarn
YARN_CONF_yarn_nodemanager_resource_memory___mb=6144
YARN_CONF_yarn_nodemanager_aux___services=mapreduce_shuffle
YARN_CONF_yarn_nodemanager_aux___services_mapreduce__shuffle_cs=org.apache.hadoop.mapred.ShuffleHandler
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_timeline___service_enabled=true
HIVE_SITE_CONF_hive_server2_transport_mode=binary
HIVE_SITE_CONF_hive_execution_engine=tez
HIVE_SITE_CONF_hive_metastore_uri_resolver=org.apache.hadoop.hive.metastore.hooks.SimpleURIResolver
HIVE_SITE_CONF_hive_metastore_uris=thrift://metastore:9083
METASTORE_STANDALONE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://metastore-db:5432/metastore?ApplicationName=metastore
METASTORE_STANDALONE_CONF_javax_jdo_option_ConnectionDriverName=org.postgresql.Driver
METASTORE_STANDALONE_CONF_javax_jdo_option_ConnectionUserName=hive
METASTORE_STANDALONE_CONF_javax_jdo_option_ConnectionPassword=hive
METASTORE_STANDALONE_CONF_datanucleusq_schema_autoCreateAll=false
METASTORE_STANDALONE_CONF_metastore_metastore_event_db_notification_api_auth=false
YARN_CONF_yarn_resourcemanager_resource___tracker_address=resourcemanager:8031
./up.sh
./down.sh
./prune.sh
Once all services are up you can create a simple hive table to test functionality. For example:
$ docker exec -ti hs2 /bin/bash
# /opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
> CREATE TABLE pokes (foo INT, bar STRING);
> LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
> SELECT * FROM pokes;
> !quit
- Name Node Overview - http://localhost:9870
- Data Node Overview - http://localhost:9864
- YARN Resource Manager - http://localhost:8088
- YARN Node Manager - http://localhost:8042
- YARN Application History - http://localhost:8188
- HiveServer 2 - http://localhost:10002
- CockroachDB Dashboard - http://localhost:8080. Admin UI Username is
hive
and Password ishive
. - HAProxy Dashboard - http://localhost:8081
- Hadoop NameNode - timveil/docker-hadoop-namenode:3.1.x
- Hadoop DataNode - timveil/docker-hadoop-datanode:3.1.x
- YARN Resource Manager - timveil/docker-hadoop-resourcemanager:3.1.x
- YARN Node Manager - timveil/docker-hadoop-nodemanager:3.1.x
- YARN Timeline Server - timveil/docker-hadoop-historyserver:3.1.x
- Hive Hiverserver2 - timveil/docker-hadoop-hive-hs2:3.1.x-fork
- Hive Metastore Standalone - timveil/docker-hadoop-hive-metastore-standalone:3.1.x
- CockroachDB - cockroachdb/cockroach:latest
docker exec -ti namenode /bin/bash
docker exec -ti datanode /bin/bash
docker exec -ti resourcemanager /bin/bash
docker exec -ti nodemanager /bin/bash
docker exec -ti historyserver /bin/bash
docker exec -ti hs2 /bin/bash
docker exec -ti metastore /bin/bash
docker exec -ti metastore-db /bin/bash