Apache Atlas Reference

HiveServer entities created in Atlas

Each HiveServer entity in Atlas includes detailed metadata collected from Hive.

The following diagrams show a summary of the entities created in Atlas for Hive operations and assets. The supertypes that contribute attributes to the entity types are shaded.

Figure 1. Atlas Entity Types for HiveServer Data Sets

Figure 2. Atlas Entity Types for HiveServer Processes

The metadata collected for each entity type is as follows:

Hive Process🔗


Identifier	Example content
typeName	`hive_process`
guid	System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName	`<database>.<target table>@<clustername>:<generated ID>` The generated ID is distinct from the GUID.
name	Text of the query.
inputs	List of the input tables or views, including each entity’s type name and the qualified name.
outputs	List of the output objects, including each entity’s type name and the qualified name.
recentQueries	Last query executed (duplicated in `process_execution`).
operationType	One of the operations that triggers metadata collection.
queryPlan	Reserved for future use.

Hive Process Execution🔗


Identifier	Example Content
typeName	`hive_process_execution`
guid	System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName	`<database>.<target table>@<clustername>:<ID from process qualified name>:<ID from the process execution name>:<generated ID for this process execution>`
name	Text of the query with a system-generated ID added to the end.
queryText	Text of the query.
queryPlan	Reserved for future use.
queryId	`impala_<date as yyyymmddhhmmss>_<generated id>`
startTime	Query start time.
endTime	Query end time.
userName	The user who ran the query.
Relationship: Process	One process to one or more process executions. `hive_process_process_execution`

Hive Database🔗


Identifier	Example Content
typeName	`hive_db`
guid	System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName	`<database>@<clustername>`
name	Database name as reported from Hive.
clusterName	Cluster name.
location	The file system path where the backing files for the database are stored. This could be an HDFS path, an AWS S3 object, or an Azure data storage location.
owner	The user who initially created the database.
ownerType	The principal type of the database owner. Could be USER, ROLE, or GROUP.
parameters	Additional key-value pair metadata that comes from Hive such as table size, number of rows, and number of storage files.
Relationship: Table	One database to many tables. `hive_table_db`
Relationship: Database DDL	One database to many database DDL entities. `hive_db_ddl_queries`

Hive Table🔗


Identifier	Example Content
typeName	`hive_table`
guid	System generated ID. This value is used to identify the entity in the Atlas Dashboard URL.
qualifiedName	`<database>.<tablename>@<clustername>`
name	Table name.
columns	List of the columns defined in the table. The Atlas Dashboard shows these as links to the column entity details.
owner	The user who created the table.
parameters	Table details from HiveServer such as: totalSize External numFiles transient_lastDdlTime bucketing_version
retention	Provided by HS2. Integer value
sd	The location of the table data, the storage description. `<database>.<table>@<clustername>_storage`
tableType	How the table was created: one of EXTERNAL_TABLE, VIRTUAL_VIEW, or MANAGED_TABLE.
Relationship: Database	One database to many tables. `hive_table_db`
Relationship: Columns	One table to one or more columns. `hive_table_columns`
Relationship: Partition Key Column	One table to one or more columns that are partition keys. `hive_table_partitionkeys`
Relationship: Storage Description	One table to one storage description. `hive_table_storagedesc`
Relationship: DDL	One table to many DDL entities. `hive_table_ddl_queries`

Hive Column🔗


Identifier	Example Content
typeName	`hive_column`
comment	Metadata from Hive from the column description.
name	Column name as reported by HMS.
owner	Table owner name as reported by HMS.
position	This column’s position in the list of columns in a zero-based index.
qualifiedName	`<database>.<table>.<column>@<clustername>`
table	Table name. Also modeled as relationship.
type	Column data type as reported by HMS.
Relationship: table	One table to one or more columns. `hive_table_columns`
Relationship: inputToProcesses	The `hive_column_lineage` entities that include this column in the input to a transformation. The relationship type is `dataset_process_inputs`.
Relationship: outputFromProcesses	The `hive_column_lineage` entities that include this column in the output to a transformation. The relationship type is `process_dataset_outputs`.
Relationship: Table	One table to one or more columns. `hive_table_columns`
Relationship: Partition Key Column	One table to one or more columns that are partition keys. `hive_table_partitionkeys`

Hive Column Lineage🔗


Identifier	Example Content
typeName	`hive_column_lineage`
dependencyType	The type of relationship between the input and output columns; one of SIMPLE, EXPRESSION, or SCRIPT.
name	`<database>.<table>@<clustername>:<generated ID>:<output_column>`
inputs	List of 0 or more `hive_column` entities that contributed to the output columns. This is a legacy model component: the more current model uses a relationship attribute.
outputs	This is a legacy model component: the more current model uses a relationship attribute.
qualifiedName	Same as name.
query	Name of the `hive_process` entity that produced this lineage. This is a legacy model component: the more current model uses a relationship attribute.
Relationship: Process	Name of the `hive_process` entity that produced this lineage. hive_process_column_lineage
Relationship: inputToProcesses	List of 0 or more `hive_column` entities that contributed to the output columns.
Relationship: outputFromProcesses	List of 0 or more `hive_column` entities that were produced in the process.

Hive Storage Description🔗


Identifier	Example Content
typeName	`hive_storagedesc`
compressed	Metadata from Hive indicating whether the table is stored compressed.
inputFormat	Metadata from Hive indicating the storage input format.
outputFormat	Metadata from Hive indicating the storage output format.
parameters	Additional metadata from Hive in the form of key-value pairs.
qualifiedName	`<database>.<table>@<clustername>_storage`
serdeInfo	Metadata from Hive indicating the serialization/deserialization implementation used to write/read table data.
sortCols	Metadata from Hive listing the column or columns used to sort the table data.
storedAsSubDirectories	Metadata from Hive indicating whether a skewed table uses the list bucketing feature, which creates subdirectories for skewed values.
numBuckets	Metadata from Hive indicating the number of buckets for bucketed tables. Non-bucketed tables are indicated by -1.
table	The table that this storage description holds data for. Also represented as a relationship.
Relationship: table	The table that this storage description holds data for.

We want your opinion

How can we improve this page?

What kind of feedback do you have?