Importing Hive Metadata using Command-Line (CLI) utility

You can use the Atlas-Hive import command-line utility to load Atlas with databases and tables present in Hive metastore.

This utility supports importing metadata of a specific table, tables from a specific database or all databases and tables.

Consider a scenario where Hive has databases and tables prior to enabling Hive hook for Atlas. In such a situation, the Atlas-Hive import utility can be employed to ensure Hive and Atlas are in sync.

Also, the utility can be used in a scenario where Atlas is unable to process specific messages due to some errors that could possibly occur with Kafka.

Supported Hive metadata import options:

Atlas-Hive utility supports various options which can be used while importing Hive Metadata:

  • -d <database regex> OR --database <database regex>

Specify the database name pattern which has to be synced with Atlas.

  • -t <table regex> OR --table <table regex>

Specify the table name pattern which has to be synced with Atlas. It must be used along with -d.

  • -f <filename>

Imports all databases and tables in the specified file. The file must have one entry on each line where the entry is in the form of <database_name>:<table_name>.

For example:

A scenario where the user wants to import two tables named t11 from database db1 and t21 from db2 and all tables from db3. The file content must be:

- db1:t11

- db2:t21

- db3

  • No options

    Does not specify any option to import all databases from Hive into Atlas.

A sample usage of the script

Atlas hook in Hive is not configured and hence no “Live” data gets reflected into Atlas.

Later, you configure the Atlas hook but it is observed that the Hive database already contains entities that need to reflect in Atlas. In such cases, an Atlas-hive import script reads the database and tables from Hive Metadata and creates entities in Atlas.

An example of Atlas-hive script:

Usage 1: <atlas bundle>/hook-bin/import-hive.sh

Usage 2: <atlas bundle>/hook-bin/import-hive.sh [-d <database regex> OR --database <database regex>] [-t <table regex> OR --table <table regex>]

Usage 3: <atlas bundle>/hook-bin/import-hive.sh [-f <filename>]

File Format:

database1:tbl1

database1:tbl2

database2:tbl1

Limitations of using Atlas-Hive import script

The Atlas-Hive import utility has the following limitations:

  • Cannot delete entities which are dropped from Hive but do exist in Atlas.
  • Cannot create lineages.