Hive entity metadata migration
Hive metadata entities are fully migrated from Navigator to Atlas.
The following sections describe how metadata is mapped from Navigator to Atlas; if Atlas requires metadata that wasn't available in Navigator, the migration notes describe how the Atlas metadata values are generated.
Hive Database
Navigator hv_database entities are migrated to Atlas hive_db entities.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
fileSystemPath | attributes.location | |
firstClassParentId | Not needed in Atlas | |
params | attributes.parameters | |
parentPath | Not needed in Atlas. | |
technicalProperties | customAttributes | |
type | Inferred rather than migrated. | |
attributes.ownerType | ||
attributes.parameters | ||
attributes.qualifiedName | Generated as a string in the format
dbname@clustername . |
Hive Table
Navigator hv_table entities are migrated to Atlas hive_table entities.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
clusterByColNames | bucketCols | Not needed in Atlas. |
group | attributes.parameters | Added to Atlas entity as a key value pair with the Navigator name as the key. |
params | attributes.parameters | Added to Atlas entity as a key value pair with the Navigator name as the key. |
partColNames | relationshipAttributes.partitionKeys | |
sortByColName | attributes.sortCols | Converted from string to array type. |
technicalProperties | attributes.parameters | Added to the Atlas entity attributes as a key value pair with the Navigator name as the key. |
attributes.aliases | Defaults to null. | |
attributes.comment | Defaults to null. | |
attributes.lastAccessTime | Defaults to null. | |
attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>@<clustername> . |
|
attributes.retention | Defaults to null. | |
attributes.tableType | Defaults to null. | |
attributes.temporary | Defaults to null. | |
attributes.viewOriginalText | Defaults to null. | |
attributes.viewExpandedText | Defaults to null. |
Hive View
Navigator hv_view entities are migrated to Atlas hive_table entities. Atlas does not distinguish between Hive tables and Hive views.
Hive Storage Description
Atlas includes a separate entity that represents how Hive table data is
stored. Navigator included this metadata as part of its hv_table
entity and the logical-physical lineage relationship. The migration creates the Atlas hive_storagedesc
entity using
metadata from the HMS table information.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
compressed | attributes.compressed | |
fileSystemPath | attributes.location | |
inputFormat | attributes.inputFormat | |
outputFormat | attributes.outputFormat | |
partColNames | attributes.bucketColNames | |
serdeLibName | attributes.serdeInfo.serializationLib | |
serdeProps | attributes.serdeInfo | |
sortByColNames | attributes.sortCols | Converted from string to array type. |
attributes.numBuckets | ||
attributes.parameters | ||
attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>@<clustername>_storage . |
|
attributes.sortedAsSubDirectories |
Hive Column
Navigator hv_column entities are migrated to Atlas
hive_column entities. Note that the Atlas
owner
value is not available from Navigator and
remains blank.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
dataType | attributes.type | |
firstClassParentId | Not used in Atlas. | |
fieldIndex | attributes.position | |
parentPath | Not used in Atlas. | |
attributes.comment | Defaults to null. | |
attributes.owner | Defaults to null. | |
attributes.qualifiedName | Generated as a string in the format
<parent_db>.<tablename>.<columnname>@<clustername> . |
Hive Process
Navigator hv_query entities are migrated to Atlas hive_process entities.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
inputs | inputs | Points to the input entities as relationship attributes. |
outputs | outputs | Points to the output entities as relationship attributes. |
queryHash | Not used in Atlas. | |
queryText | attributes.queryText | Not used currently in Atlas. |
sourceId | Not used in Atlas. | |
unparsed | Not used in Atlas. | |
wfIds | Not used in Atlas. | |
attributes.startTime | Not used currently in Atlas. | |
attributes.endTime | Not used currently in Atlas. | |
attributes.userName | Not used currently in Atlas. | |
attributes.operationType | Defaults to null. | |
attributes.qualifiedName | Generated as a string with the operation, input entities,
and output entities, where each entity is noted by
<asset_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
|
attributes.queryId | Defaults to null. | |
attributes.queryGraph | Defaults to null. | |
attributes.recentQueries | Defaults to null. |
Hive Column Lineage
Navigator hv_query_part entities are migrated to Atlas hive_column_lineage entities.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
inputs | attributes.inputs | Points to the input column entities as relationship attributes. |
outputs | attributes.outputs | Points to the output column entities as relationship attributes. |
firstClassParentId | attributes.query | Points to the parent hive_process entity
as a relationship attribute. |
originalName | attributes.qualifiedName | Generated as a string with the operation, input entities,
output entities, and target column name, where each entity is
noted by
<column_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
attributes.dependencyType | Set to "SIMPLE". | |
attributes.expression | Defaults to null. |
Hive Process Execution
Navigator hv_query_execution entities are migrated to Atlas hive_process_execution entities.
Navigator Metadata | Atlas Metadata | Migration Notes |
---|---|---|
inputs | inputs | Points to the input entities as relationship attributes. |
outputs | outputs | Points to the output entities as relationship attributes. |
ended | attributes.endTime | |
operation | attributes.process | Points to the parent hive_process entity
as a relationship attribute. |
originalName | attributes.queryText | |
principal | attributes.userName | |
started | attributes.startTime | |
attributes.hostName | Defaults to null. | |
attributes.qualifiedName | Generated as a string with the operation, input entities,
output entities, and execution start and end timestamps, where
each entity is noted by
<asset_qualifiedName>:<createTime> and
entries are separated by colons, and an arrow shows the break
between input and output entities. For example:
|
|
attributes.queryGraph | Defaults to null. | |
attributes.queryId | Defaults to null. | |
attributes.queryPlan | Set to "Not Supported". |