Deploy Atlas Hive Hook — Part 2

Liangjun Jiang
6 min readDec 2, 2020

In the first part of Deploy Atlas Hive Hook, I went over some issues you might face when you played with the latest Apache Atlas and its Hive Hook. I will introduce an end-to-end working solution including Apache Hive, Apache Atlas Hive Hook, Kafka and Zookeeper. Apache Atlas is not included though.

Docker Image

You can find the docker image for this walkthrough on Docker Hub: ljiang510/hive:2.3.2-postgresql-metastore-kafka-zookeeper-atlas-hive-hook

This docker image is based on this project https://github.com/big-data-europe/docker-hive. We manually added Kafka and Atlas Hive Hook 2.1 with the modification, which was introduced in part 1, into this image.

Now we will start the images with other related images. You can head to this repo: https://github.com/liangjun-jiang/docker-hive-atlas-hook, find this docker-compose-local.yml file. In line 25, I changed the image

hive-server:    image: ljiang/hive:2.3.2-postgresql-metastore-kafka-zookeeper-atlas-hive-hook    env_file:      - ./hadoop-hive.env

Now you start the compose file

docker-compose -f docker-compose-local.yml up -d

Observe Kafka Message

Once all services are up and running, you will need to ssh to the hive-server shell

docker-compose exec hive-server bash

In this hive-server instance, the Hive is already setup to work with Atlas Hive Hook. But you will need to start the kafka

You can use this link to understand a little bit more. There are two steps:

  1. start the Zookeeper service
  2. Start the Kafka service
cd ~/kafka_2.11-2.3.0/bin/zookeeper-server-start.sh -daemon config/zookeeper.propertiesbin/kafka-server-start.sh  config/server.properties

Now we need to check the Kafka topics: ATLAS_ENTITIES and ATLAS_HOOK

bin/kafka-topics.sh --list --zookeeper localhost:2181

If not, you can create them

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ATLAS_ENTITIESbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ATLAS_HOOK

Now you inspect the ATLAS_ENTITIES topics:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic ATLAS_HOOK --from-beginning

There is nothing for right now. We need to create some database, tables and some DDL in Hive to see some actions

Experiment Hive

Still in the hive-server instance , let’s star the Hive terminal

/opt/hive/bin/hive

Now we create a database, create two tables under it, and do a insert-select so you can see some advanced features that Atlas Hive hook is tracking. Always keep eyes on your Kafka terminal to see those Kafka message

hive> create database hive_hook_database;
hive> use hive_hook_database;
hive> CREATE TABLE post (code INT, text STRING);
hive> CREATE TABLE pokes (foo INT, bar STRING);
hive> CREATE TABLE post_pokes (code INT, text STRING, foo INT, bar STRING);
hive> insert into table post_pokes(foo,bar,code,text) select p.foo, p.bar, po.code, po.text from pokes p inner join post po ON p.foo = po.code;

You will see Kafka messages popup for each command you execute. For example, the message for the last insert-select will look like. You might need some Atlas entity knowledge to understand it. The typeName is the flag to identity different data models of Atlas.

{
"version": {
"version": "1.0.0",
"versionParts": [
1
]
},
"msgCompressionKind": "NONE",
"msgSplitIdx": 1,
"msgSplitCount": 1,
"msgSourceIP": "172.19.0.10",
"msgCreatedBy": "root",
"msgCreationTime": 1606854172717,
"message": {
"type": "ENTITY_CREATE_V2",
"user": "root",
"entities": {
"referredEntities": {
"-565645672010544": {
"typeName": "hive_table",
"attributes": {
"owner": "root",
"tableType": "MANAGED_TABLE",
"temporary": false,
"lastAccessTime": 1605646631000,
"createTime": 1605646631000,
"qualifiedName": "default.pokes@primary",
"name": "pokes",
"comment": null,
"parameters": {
"last_modified_time": "1605646874",
"totalSize": "5812",
"numRows": "0",
"rawDataSize": "0",
"numFiles": "1",
"transient_lastDdlTime": "1605646874",
"last_modified_by": "root"
},
"retention": 0
},
"guid": "-565645672010544",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"sd": {
"guid": "-565645672010545",
"typeName": "hive_storagedesc",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary_storage"
},
"relationshipType": "hive_table_storagedesc"
},
"columns": [
{
"guid": "-565645672010546",
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.pokes.foo@primary"
},
"relationshipType": "hive_table_columns"
},
{
"guid": "-565645672010547",
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.pokes.bar@primary"
},
"relationshipType": "hive_table_columns"
},
{
"guid": "-565645672010548",
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.pokes.release_principal@primary"
},
"relationshipType": "hive_table_columns"
}
],
"partitionKeys": [],
"db": {
"typeName": "hive_db",
"uniqueAttributes": {
"qualifiedName": "default@primary"
},
"relationshipType": "hive_table_db"
}
},
"proxy": false
},
"-565645672010545": {
"typeName": "hive_storagedesc",
"attributes": {
"qualifiedName": "default.pokes@primary_storage",
"storedAsSubDirectories": false,
"location": "hdfs://namenode:8020/user/hive/warehouse/pokes",
"compressed": false,
"inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"parameters": {},
"outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"serdeInfo": {
"typeName": "hive_serde",
"attributes": {
"serializationLib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"name": null,
"parameters": {
"serialization.format": "1"
}
}
},
"numBuckets": -1
},
"guid": "-565645672010545",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"table": {
"guid": "-565645672010544",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary"
},
"relationshipType": "hive_table_storagedesc"
}
},
"proxy": false
},
"-565645672010546": {
"typeName": "hive_column",
"attributes": {
"owner": "root",
"qualifiedName": "default.pokes.foo@primary",
"name": "foo",
"comment": null,
"position": 0,
"type": "int"
},
"guid": "-565645672010546",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"table": {
"guid": "-565645672010544",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary"
},
"relationshipType": "hive_table_columns"
}
},
"proxy": false
},
"-565645672010547": {
"typeName": "hive_column",
"attributes": {
"owner": "root",
"qualifiedName": "default.pokes.bar@primary",
"name": "bar",
"comment": null,
"position": 1,
"type": "string"
},
"guid": "-565645672010547",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"table": {
"guid": "-565645672010544",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary"
},
"relationshipType": "hive_table_columns"
}
},
"proxy": false
},
"-565645672010548": {
"typeName": "hive_column",
"attributes": {
"owner": "root",
"qualifiedName": "default.pokes.release_principal@primary",
"name": "release_principal",
"comment": "Release date for this post",
"position": 2,
"type": "string"
},
"guid": "-565645672010548",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"table": {
"guid": "-565645672010544",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary"
},
"relationshipType": "hive_table_columns"
}
},
"proxy": false
}
},
"entities": [
{
"typeName": "hive_process",
"attributes": {
"recentQueries": [
"insert into table post_pokes(foo,bar,code,text) select p.foo, p.bar, po.code, po.text from pokes p inner join post po on p.foo = po.code"
],
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000",
"clusterName": "primary",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000",
"queryText": "",
"operationType": "QUERY",
"startTime": 1606854172663,
"queryPlan": "Not Supported",
"endTime": 1606854172663,
"userName": "",
"queryId": ""
},
"guid": "-565645672010555",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"outputs": [
{
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.post_pokes@primary"
},
"relationshipType": "process_dataset_outputs"
}
],
"inputs": [
{
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.post@primary"
},
"relationshipType": "dataset_process_inputs"
},
{
"guid": "-565645672010544",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "default.pokes@primary"
},
"relationshipType": "dataset_process_inputs"
}
]
},
"proxy": false
},
{
"typeName": "hive_process_execution",
"attributes": {
"hostName": "f92480d5e91c",
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:1606854146078:1606854172663",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:1606854146078:1606854172663",
"queryText": "insert into table post_pokes(foo,bar,code,text) select p.foo, p.bar, po.code, po.text from pokes p inner join post po on p.foo = po.code",
"startTime": 1606854146078,
"queryPlan": "Not Supported",
"endTime": 1606854172663,
"userName": "root",
"queryId": "root_20201201202226_03f78300-4f08-4676-aa16-3044b0c7dd8b"
},
"guid": "-565645672010556",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"process": {
"guid": "-565645672010555",
"typeName": "hive_process",
"relationshipType": "hive_process_process_executions"
}
},
"proxy": false
},
{
"typeName": "hive_column_lineage",
"attributes": {
"expression": null,
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:code",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:code",
"depenendencyType": "SIMPLE"
},
"guid": "-565645672010557",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"outputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post_pokes.code@primary"
},
"relationshipType": "process_dataset_outputs"
}
],
"inputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post.code@primary"
},
"relationshipType": "dataset_process_inputs"
}
],
"query": {
"guid": "-565645672010555",
"typeName": "hive_process",
"uniqueAttributes": {
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000"
},
"relationshipType": "hive_process_column_lineage"
}
},
"proxy": false
},
{
"typeName": "hive_column_lineage",
"attributes": {
"expression": null,
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:text",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:text",
"depenendencyType": "SIMPLE"
},
"guid": "-565645672010558",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"outputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post_pokes.text@primary"
},
"relationshipType": "process_dataset_outputs"
}
],
"inputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post.text@primary"
},
"relationshipType": "dataset_process_inputs"
}
],
"query": {
"guid": "-565645672010555",
"typeName": "hive_process",
"uniqueAttributes": {
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000"
},
"relationshipType": "hive_process_column_lineage"
}
},
"proxy": false
},
{
"typeName": "hive_column_lineage",
"attributes": {
"expression": null,
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:foo",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:foo",
"depenendencyType": "SIMPLE"
},
"guid": "-565645672010559",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"outputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post_pokes.foo@primary"
},
"relationshipType": "process_dataset_outputs"
}
],
"inputs": [
{
"guid": "-565645672010546",
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.pokes.foo@primary"
},
"relationshipType": "dataset_process_inputs"
}
],
"query": {
"guid": "-565645672010555",
"typeName": "hive_process",
"uniqueAttributes": {
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000"
},
"relationshipType": "hive_process_column_lineage"
}
},
"proxy": false
},
{
"typeName": "hive_column_lineage",
"attributes": {
"expression": null,
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:bar",
"name": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000:bar",
"depenendencyType": "SIMPLE"
},
"guid": "-565645672010560",
"isIncomplete": false,
"provenanceType": 0,
"version": 0,
"relationshipAttributes": {
"outputs": [
{
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.post_pokes.bar@primary"
},
"relationshipType": "process_dataset_outputs"
}
],
"inputs": [
{
"guid": "-565645672010547",
"typeName": "hive_column",
"uniqueAttributes": {
"qualifiedName": "default.pokes.bar@primary"
},
"relationshipType": "dataset_process_inputs"
}
],
"query": {
"guid": "-565645672010555",
"typeName": "hive_process",
"uniqueAttributes": {
"qualifiedName": "QUERY:default.pokes@primary:1605646631000:default.post@primary:1606853324000->:INSERT:default.post_pokes@primary:1606854087000"
},
"relationshipType": "hive_process_column_lineage"
}
},
"proxy": false
}
]
}
}
}

--

--