Bulk and migration import of Hive metadata

The existing sequential import of Hive metadata into Apache Atlas is time consuming. The new bulk and migration import process which creates a ZIP file first and then runs the Atlas bulk import using the ZIP file, improves performance and is faster. An improvised method to import Hive metadata into Atlas is now available.

The previous version of Hive Metadata import was a sequential import into Atlas. The time taken for manual import of databases to the Atlas system consumes more time. With the need to import more databases, additional time is consumed in completing the import process and the process takes longer to complete.

Instead of Atlas sequential import, this update enables you to import without the Atlas interaction, where the existing Hive import is updated to import the Hive databases and/or tables into ZIP files. And later, you can import the ZIP file into Atlas using a bulk or migration import process.

By default, the enhanced Hive metadata import creates the ZIP file and then automatically runs Atlas bulk import using the same ZIP file.

As an example, use the following commands to perform the enhanced new Hive metadata import.

./import-hive.sh -t hive_db_vxofz.hive_table_tmwos -o /tmp/import.zip

Optionally, you can perform Hive metadata import using the following two phases separately to use either the bulk or migration import process:

Create the Hive metadata export ZIP file.
Use the exported ZIP file to run the bulk or migration import process for Atlas.

As an example, use the following commands to perform the new Hive metadata import in phases.

To create the Hive metadata export ZIP file, run the following command:

./import-hive.sh -i -t hive_db_vxofz.hive_table_tmwos -o /tmp/import.zip


Option	Description
`-o or --output`	Used to create the ZIP file. If you specify a ZIP file followed by the `-o` option, it creates the ZIP file. If you provide only a path, then by default this option tries to create a `import-hive-output.zip` file inside the path. If the ZIP file already exists, the import process fails.
`-i or --ignorebulkImport`	Used to ignore the second phase, that is the bulk import execution (the default behavior is to invoke the bulk import process)

To Import the create a ZIP file using bulk import, use the CURL command:
cat > /tmp/importOptions.json
{"options":{"updateTypeDefinition":"false", "format":"zipDirect", "size":"1"}} kinit -kt /cdep/keytabs/atlas.keytab atlas
curl -k --negotiate -u : -X POST
'https://quasar-cgfwjs-1.quasar-cgfwjs.root.hwx.site:31443/api/atlas/admin/import' --form 'request=@/tmp/importOptions.json' --form 'data=@/tmp/import.zip’
Instead of the bulk import, run the migration import process and follow the Migration import documentation.

note
While performing the import operation using the API with this enhancement, the deleteNonExisting option is not supported. For example, when you create tables in a database, perform an import, and drop the table, and later perform an import with deleteNonExisting, the dropped table status shows as “Active” in Atlas. You must perform this operation using the previous version of Hive Metadata import.