Deploy Atlas Hive Hook — Part I

Liangjun Jiang
4 min readNov 20, 2020

Working with the latest Atlas and Atlas Hive Hook

Thanks for being here. I just purchased a beach house at South Padre Island (SPI), TX, and used it as a short term rental property (Airbnb or Vrbo). SPI has the US top 10 beach, and is 7 miles away from Space X’s Mars Launch Base. You can check this house out from my property management’s website: https://spirentals.com/property-info/468183.html . You can also visit my website https://firststr.com for the details about this house and amenities.

In the past days, I was experimenting some Apache Atlas feature. In case you don’t know, Apache Atlas is a solution of storing, exploring and searching metadata. Metadata is actually a popular topic nowadays.

One aspect I like about Apache Atlas is that it has quite a few platform specified Hooks to automatically extract metadata into Atlas. You can find a complete list of hook here. Atlas Hive Hook is my point of interest. People still love Apache Hadoop, and Apache Hive is the face of Apache Hadoop.

By the time of writing, Apache Atlas has released version 2.1. Apache Hive is on its 2.3.7. But

Apache Atlas 2.1 integrates Hive’s version of 3.1.0. some of us are still using Hive’s version of 2.*.

You see, there might be a problem. Actually, there is a compatibility issue.

Fix the Hive version issue in Atlas 2.1.0

This part of content is credited to Jiezhi .G, and originally appeared here in Chinese.

Let’s say you have followed this instruction, deployed Atlas Hook Jars in Hive. You start to show a basic hive command

create table foo;

You will see an error from Apache Hive

[HiveServer2-Background-Pool: Thread-52]: HiveHook.run(): failed to process operation CREATETABLE
java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.api.Database.getCatalogName()Ljava/lang/String;
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.getDatabaseName(HiveMetaStoreBridge.java:577) ~[hive-bridge-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.hook.AtlasHiveHookContext.getQualifiedName(AtlasHiveHookContext.java:201) ~[hive-bridge-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.hook.AtlasHiveHookContext.init(AtlasHiveHookContext.java:293) ~[hive-bridge-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.hook.AtlasHiveHookContext.<init>(AtlasHiveHookContext.java:83) ~[hive-bridge-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.hook.AtlasHiveHookContext.<init>(AtlasHiveHookContext.java:64) ~[hive-bridge-2.1.0.jar:2.1.0]
at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:175) [hive-bridge-2.1.0.jar:2.1.0]

So you need to modify this file and method

addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java#getDatabaseName()public static String getDatabaseName(Database hiveDB) {
String dbName = hiveDB.getName().toLowerCase();
/*
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;

if (StringUtils.isNotEmpty(catalogName) && !StringUtils.equals(catalogName, DEFAULT_METASTORE_CATALOG)) {
dbName = catalogName + SEP + dbName;
}
*/

return dbName;
}

Now you run just under the addons/hive-bridge

mvn clean install -DskipTests

You will have a working hive-bridge-2.1.0.jar for Hive 2.*; You will use this file to replace the jar file under atlas-hive-plugin-impl folder. If you don’t know atlas-hive-plugin-impl, you didn’t follow the Atlas Hive deployment instruction. I copied the content for you.

Atlas Hive hook registers with Hive to listen for create/update/delete operations and updates the metadata in Atlas, via Kafka notifications, for the changes in Hive. Follow the instructions below to setup Atlas hook in Hive:

Set-up Atlas hook in hive-site.xml by adding the following:

<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

untar apache-atlas-${project.version}-hive-hook.tar.gz

cd apache-atlas-hive-hook-${project.version}

Copy entire contents of folder apache-atlas-hive-hook-${project.version}/hook/hive to <atlas package>/hook/hive

Add ‘export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh of your hive configuration

Copy <atlas-conf>/atlas-application.properties to the hive conf directory.

How Hive Hook Works — Workflow

Ebay’s this article tells how Atlas Hive Hook works well. And this is the workflow diagram.

Apache Atlas Hive Hook Workflow diagram

You might have asked

  1. Does Apache Atlas have to be up and running to use Atlas Hive Hook or other hook in general? Seems Kafka is the middle man so Atlas doesn’t to be running. That’s correct. Atlas doesn’t need to be up & running to use Atlas Hive Hook
  2. Does Kafka have to be up & running to use Atlas Hive Hook? Actually you can still *use* Atlas Hook but the extracted metadata won’t go anywhere.

The Problem of the latest Apache Atlas

Apache Atlas counts on Kafka to receive and consume messages sent to Kafka. When you follow the instruction, and run the latest Apache Atlas, you will see the error like this, asked here

Caused by: org.apache.solr.common.SolrException: Cannot connect to cluster at localhost:2181: cluster not found/not ready
at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:385)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:141)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:383)
at org.janusgraph.diskstorage.solr.Solr6Index.<init>(Solr6Index.java:218)
... 101 more

The embedded Hbase is trying to talk to Zookeeper, but zookeeper is not part of Apache Atlas, even you are taking the default Atlas Application settings. Even though we have said Apache Atlas is not required to run Apache Atlas Hive Hook, you will still need to get Kafka & Zookeeper up and running to use Apache Atlas Hive hook.

The Solution

To make your curiosity journey easier, I actually made a docker images including Apache Hive, Kafka and Zookeeper with the correct Apache Atlas Hive Hook jars and configuration. You should be able to get it up and running in less 5 mins. Stay toned.

--

--