Hive entity metadata migration

Hive metadata entities are fully migrated from Navigator to Atlas.

The following sections describe how metadata is mapped from Navigator to Atlas; if Atlas requires metadata that wasn't available in Navigator, the migration notes describe how the Atlas metadata values are generated.

Hive Database

Navigator hv_database entities are migrated to Atlas hive_db entities.

Navigator Metadata Atlas Metadata Migration Notes
fileSystemPath attributes.location
firstClassParentId Not needed in Atlas
params attributes.parameters
parentPath Not needed in Atlas.
technicalProperties customAttributes
type Inferred rather than migrated.
attributes.ownerType
attributes.parameters
attributes.qualifiedName Generated as a string in the format dbname@clustername.

Hive Table

Navigator hv_table entities are migrated to Atlas hive_table entities.

Navigator Metadata Atlas Metadata Migration Notes
clusterByColNames bucketCols Not needed in Atlas.
group attributes.parameters Added to Atlas entity as a key value pair with the Navigator name as the key.
params attributes.parameters Added to Atlas entity as a key value pair with the Navigator name as the key.
partColNames relationshipAttributes.partitionKeys
sortByColName attributes.sortCols Converted from string to array type.
technicalProperties attributes.parameters Added to the Atlas entity attributes as a key value pair with the Navigator name as the key.
attributes.aliases Defaults to null.
attributes.comment Defaults to null.
attributes.lastAccessTime Defaults to null.
attributes.qualifiedName Generated as a string in the format <parent_db>.<tablename>@<clustername>.
attributes.retention Defaults to null.
attributes.tableType Defaults to null.
attributes.temporary Defaults to null.
attributes.viewOriginalText Defaults to null.
attributes.viewExpandedText Defaults to null.

Hive View

Navigator hv_view entities are migrated to Atlas hive_table entities. Atlas does not distinguish between Hive tables and Hive views.

Hive Storage Description

Atlas includes a separate entity that represents how Hive table data is stored. Navigator included this metadata as part of its hv_table entity and the logical-physical lineage relationship. The migration creates the Atlas hive_storagedesc entity using metadata from the HMS table information.

Navigator Metadata Atlas Metadata Migration Notes
compressed attributes.compressed
fileSystemPath attributes.location
inputFormat attributes.inputFormat
outputFormat attributes.outputFormat
partColNames attributes.bucketColNames
serdeLibName attributes.serdeInfo.serializationLib
serdeProps attributes.serdeInfo
sortByColNames attributes.sortCols Converted from string to array type.
attributes.numBuckets
attributes.parameters
attributes.qualifiedName Generated as a string in the format <parent_db>.<tablename>@<clustername>_storage.
attributes.sortedAsSubDirectories

Hive Column

Navigator hv_column entities are migrated to Atlas hive_column entities. Note that the Atlas owner value is not available from Navigator and remains blank.

Navigator Metadata Atlas Metadata Migration Notes
dataType attributes.type
firstClassParentId Not used in Atlas.
fieldIndex attributes.position
parentPath Not used in Atlas.
attributes.comment Defaults to null.
attributes.owner Defaults to null.
attributes.qualifiedName Generated as a string in the format <parent_db>.<tablename>.<columnname>@<clustername>.

Hive Process

Navigator hv_query entities are migrated to Atlas hive_process entities.

Navigator Metadata Atlas Metadata Migration Notes
inputs inputs Points to the input entities as relationship attributes.
outputs outputs Points to the output entities as relationship attributes.
queryHash Not used in Atlas.
queryText attributes.queryText Not used currently in Atlas.
sourceId Not used in Atlas.
unparsed Not used in Atlas.
wfIds Not used in Atlas.
attributes.startTime Not used currently in Atlas.
attributes.endTime Not used currently in Atlas.
attributes.userName Not used currently in Atlas.
attributes.operationType Defaults to null.
attributes.qualifiedName Generated as a string with the operation, input entities, and output entities, where each entity is noted by <asset_qualifiedName>:<createTime> and entries are separated by colons, and an arrow shows the break between input and output entities. For example:
dbblue.table_aqua@clustercolor:1589373411000->dbblue.table_teal@clustercolor:1589394039000
attributes.queryId Defaults to null.
attributes.queryGraph Defaults to null.
attributes.recentQueries Defaults to null.

Hive Column Lineage

Navigator hv_query_part entities are migrated to Atlas hive_column_lineage entities.

Navigator Metadata Atlas Metadata Migration Notes
inputs attributes.inputs Points to the input column entities as relationship attributes.
outputs attributes.outputs Points to the output column entities as relationship attributes.
firstClassParentId attributes.query Points to the parent hive_process entity as a relationship attribute.
originalName attributes.qualifiedName Generated as a string with the operation, input entities, output entities, and target column name, where each entity is noted by <column_qualifiedName>:<createTime> and entries are separated by colons, and an arrow shows the break between input and output entities. For example:
dbblue.table_aqua@clustercolor:1589373411000->dbblue.table_teal@cm:1589390050000:column_beach
attributes.dependencyType Set to "SIMPLE".
attributes.expression Defaults to null.

Hive Process Execution

Navigator hv_query_execution entities are migrated to Atlas hive_process_execution entities.

Navigator Metadata Atlas Metadata Migration Notes
inputs inputs Points to the input entities as relationship attributes.
outputs outputs Points to the output entities as relationship attributes.
ended attributes.endTime
operation attributes.process Points to the parent hive_process entity as a relationship attribute.
originalName attributes.queryText
principal attributes.userName
started attributes.startTime
attributes.hostName Defaults to null.
attributes.qualifiedName Generated as a string with the operation, input entities, output entities, and execution start and end timestamps, where each entity is noted by <asset_qualifiedName>:<createTime> and entries are separated by colons, and an arrow shows the break between input and output entities. For example:
dbblue.table_aqua@clustercolor:1589373411000->dbblue.table_teal@clustercolor:1589390050000:1589394047386:1589394064097
attributes.queryGraph Defaults to null.
attributes.queryId Defaults to null.
attributes.queryPlan Set to "Not Supported".