Hive entity metadata migration
Hive metadata entities are fully migrated from Navigator to Atlas.
The following sections describe how metadata is mapped from Navigator to Atlas; if Atlas requires metadata that wasn't available in Navigator, the migration notes describe how the Atlas metadata values are generated.
Hive Database
Navigator hv_database entities are migrated to Atlas hive_db entities.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| fileSystemPath | attributes.location | |
| firstClassParentId | Not needed in Atlas | |
| params | attributes.parameters | |
| parentPath | Not needed in Atlas. | |
| technicalProperties | customAttributes | |
| type | Inferred rather than migrated. | |
| attributes.ownerType | ||
| attributes.parameters | ||
| attributes.qualifiedName | Generated as a string in the format
dbname@clustername. |
Hive Table
Navigator hv_table entities are migrated to Atlas hive_table entities.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| clusterByColNames | bucketCols | Not needed in Atlas. |
| group | attributes.parameters | Added to Atlas entity as a key value pair with the Navigator name as the key. |
| params | attributes.parameters | Added to Atlas entity as a key value pair with the Navigator name as the key. |
| partColNames | relationshipAttributes.partitionKeys | |
| sortByColName | attributes.sortCols | Converted from string to array type. |
| technicalProperties | attributes.parameters | Added to the Atlas entity attributes as a key value pair with the Navigator name as the key. |
| attributes.aliases | Defaults to null. | |
| attributes.comment | Defaults to null. | |
| attributes.lastAccessTime | Defaults to null. | |
| attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>@<clustername>. |
|
| attributes.retention | Defaults to null. | |
| attributes.tableType | Defaults to null. | |
| attributes.temporary | Defaults to null. | |
| attributes.viewOriginalText | Defaults to null. | |
| attributes.viewExpandedText | Defaults to null. |
Hive View
Navigator hv_view entities are migrated to Atlas hive_table entities. Atlas does not distinguish between Hive tables and Hive views.
Hive Storage Description
Atlas includes a separate entity that represents how Hive table data is
stored. Navigator included this metadata as part of its hv_table
entity and the logical-physical lineage relationship. The migration creates the Atlas hive_storagedesc entity using
metadata from the HMS table information.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| compressed | attributes.compressed | |
| fileSystemPath | attributes.location | |
| inputFormat | attributes.inputFormat | |
| outputFormat | attributes.outputFormat | |
| partColNames | attributes.bucketColNames | |
| serdeLibName | attributes.serdeInfo.serializationLib | |
| serdeProps | attributes.serdeInfo | |
| sortByColNames | attributes.sortCols | Converted from string to array type. |
| attributes.numBuckets | ||
| attributes.parameters | ||
| attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>@<clustername>_storage. |
|
| attributes.sortedAsSubDirectories |
Hive Column
Navigator hv_column entities are migrated to Atlas
hive_column entities. Note that the Atlas
owner value is not available from Navigator and
remains blank.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| dataType | attributes.type | |
| firstClassParentId | Not used in Atlas. | |
| fieldIndex | attributes.position | |
| parentPath | Not used in Atlas. | |
| attributes.comment | Defaults to null. | |
| attributes.owner | Defaults to null. | |
| attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>.<columnname>@<clustername>. |
Hive Process
Navigator hv_query entities are migrated to Atlas hive_process entities.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| inputs | inputs | Points to the input entities as relationship attributes. |
| outputs | outputs | Points to the output entities as relationship attributes. |
| queryHash | Not used in Atlas. | |
| queryText | attributes.queryText | Not used currently in Atlas. |
| sourceId | Not used in Atlas. | |
| unparsed | Not used in Atlas. | |
| wfIds | Not used in Atlas. | |
| attributes.startTime | Not used currently in Atlas. | |
| attributes.endTime | Not used currently in Atlas. | |
| attributes.userName | Not used currently in Atlas. | |
| attributes.operationType | Defaults to null. | |
| attributes.qualifiedName | Generated as a string with the operation, input entities,
and output entities, where each entity is noted by
<asset_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
|
| attributes.queryId | Defaults to null. | |
| attributes.queryGraph | Defaults to null. | |
| attributes.recentQueries | Defaults to null. |
Hive Column Lineage
Navigator hv_query_part entities are migrated to Atlas hive_column_lineage entities.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| inputs | attributes.inputs | Points to the input column entities as relationship attributes. |
| outputs | attributes.outputs | Points to the output column entities as relationship attributes. |
| firstClassParentId | attributes.query | Points to the parent hive_process entity
as a relationship attribute. |
| originalName | attributes.qualifiedName | Generated as a string with the operation, input entities,
output entities, and target column name, where each entity is
noted by
<column_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
| attributes.dependencyType | Set to "SIMPLE". | |
| attributes.expression | Defaults to null. |
Hive Process Execution
Navigator hv_query_execution entities are migrated to Atlas hive_process_execution entities.
| Navigator Metadata | Atlas Metadata | Migration Notes |
|---|---|---|
| inputs | inputs | Points to the input entities as relationship attributes. |
| outputs | outputs | Points to the output entities as relationship attributes. |
| ended | attributes.endTime | |
| operation | attributes.process | Points to the parent hive_process entity
as a relationship attribute. |
| originalName | attributes.queryText | |
| principal | attributes.userName | |
| started | attributes.startTime | |
| attributes.hostName | Defaults to null. | |
| attributes.qualifiedName | Generated as a string with the operation, input entities,
output entities, and execution start and end timestamps, where
each entity is noted by
<asset_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
|
| attributes.queryGraph | Defaults to null. | |
| attributes.queryId | Defaults to null. | |
| attributes.queryPlan | Set to "Not Supported". |
