Apache Atlas ReferencePDF version

Apache Atlas Advanced Search language reference

Atlas lets you search for metadata using a domain-specific language with a SQL-like format.

If you find that the Basic Search or Free-text Search doesn't allow you to search as precisely as you would like, you can create a query in the Advanced Search interface to return exactly the results you are looking for. Advanced Search queries use a domain-specific language that is intentionally SQL-like.

Each Advanced Search query is in the form of three clauses:

FROM WHERE SELECT

Additional keywords such as GROUPBY, ORDERBY, and LIMIT can be used to affect the output.

The value specified in the FROM clause acts as the scope of the query. You can specify any entity type in the FROM clause. The possible entity types are the same list as in the Type search; the names are case-sensitive.

The FROM clause is required and also assumed: the first item included in the query (if not literally the word "from") is assumed to be the object of the FROM clause.

Examples

With or without FROM: To retrieve all entities of type "hive_db" use one of the following queries:

hive_db
from hive_db

If you only specify a FROM clause, Atlas returns all entities of that type.

The WHERE clause allows for filtering over the result set identified in the FROM clause by specifying a condition of the form:

identifier operator 'literal'

The identifier is the name of a property of the entity type specified in the FROM clause. The properties for a given entity type are those shown in the Properties tab of an entity detail page. The names are case-sensitive.

Operators vary by the data type of the literal and include the following:

String: = LIKE

Numeric, Date: = < >

Boolean: =

The LIKE operator allows you to use wildcards in the literal. Asterisk (*) replaces zero to multiple values; question mark (?) replaces a single value.

The literal must be enclosed in single or double quotes. Matches are case-sensitive. Literals can be lists of values. If you specify comma-separated values in square brackets, they act as an OR operation.

Dates used in literals need to be specified using the ISO 8601 format and in single or double quotes.

Boolean values used in literals are lower case "true" and "false" without quotation marks.

You can specify multiple conditions using AND or OR operators. Note that making a list of values is more efficient than using the same identifier in multiple conditions.

Examples:

Exact string: To retrieve all entities of type hive_table with a specific name "time_dim", use:

from hive_table where name = 'time_dim'

Multiple conditions: To retrieve entity of type hive_table with name that can be either "time_dim" or "customer_dim":

from hive_table where name 'time_dim' or name = 'customer_dim'

List of values: The query in the example above can be written using a value array:

from hive_table where name = ["customer_dim", "time_dim"]

Wildcard filtering: To retrieve entity of type hive_table whose name ends with '_dim':

from hive_table where name LIKE '*_dim'

To retrieve a hive_db whose name starts with R followed by any 3 characters, followed by rt followed by at least 1 character, followed by none or any number of characters:

DB where name like "R???rt?*"

Date Literal: To retrieve entity of type hive_table created within 2019 and 2020, use the date portion of the time value and specify a range using two phrases connected by AND:

from hive_table where createTime > '2019-01-01' and
        createTime < '2019-01-03'

Boolean Literal: To retrieve entity of type hdfs_path whose attribute isFile is set to true and whose name is Invoice:

from hdfs_path where isFile = true and name = "Invoice"

The select clause allows you to specify the properties you want returned in the search results. Properties with simple values can be returned; properties that contain other entities are not available. The property names are case sensitive.

To display column headers that are more meaningful that the system property names, you can use aliases using 'as.'

Examples

Select clause only: To retrieve entities of type "hive_table" with some of its properties:

from hive_table select owner, name, qualifiedName

WHERE and SELECT clauses: To retrieve entity of type hive_table for a specific table with some properties:

from hive_table where name = 'customer_dim' select owner, name, qualifiedName

Change output names using AS: To display column headers as 'Owner', 'Name' and 'FullName'.

from hive_table select owner as Owner, name as Name, qualifiedName as FullName
In the attribute filter lists, system attributes appear with normal text names. When you use them in advanced searches, use the corresponding field name, which is prefixed with two underscores.
System Attribute Description Identifier in Advanced Search
Type The Atlas entity type. __typeName
Status The entity status in Atlas: this field indicates if a data asset has been deleted; Atlas maintains the entity information after the asset no longer exists on the cluster. __state
Created By User The Atlas user who created this entity. Typically this is the Atlas system user. If an entity was created by an API call or created manually by users, the active user account would be included in this attribute. __createdBy
Last Modified User The Atlas user who last updated the entity, whether through Atlas metadata collection from a cluster service, an Atlas API, or a change through the Atlas UI. __modifiedBy
Created timestamp The date Atlas created the entity. Note that this field is different from the technical attribute for the creation date of the original data asset or operation. __timestamp
Last Modified timestamp The date when an entity was last updated in Atlas. Note that this field is different from the technical attribute for the last modification date of the actual data asset or operation on the cluster. __modificationTimestamp
GUID A unique identifier generated by Atlas. This is the 32-digit code found in the browser URL for an entity. __guid
Labels Label metadata added to an entity. __labels
User-Defined Properties Key-value pair metadata added to an entity. __customAttributes
Classifications Classifications added to an entity. __classificationNames
Propagated Classifications Classifications added to entities downstream from an entity where the classification was added by a user. __propagatedClassificationNames
A concatenated string of classification names and attributes for an entity. This attribute is not available through the Atlas UI. __classificationsText
IsIncomplete A system indicator that entities were created because they were referenced in the metadata collected by a service other than the source type associated with the entity type. An entity is typically marked "isIncomplete" when Atlas receives metadata out of order from when the events occurred. If IsIncomplete entities remain “incomplete” for a long time, it may indicate that the original messages for entity metadata have not arrived. __isIncomplete

You can search for entities that are tagged with a specific classification using "is" or "isa" keywords in either the From or Where clauses. Is and Isa are interchangeable.

Examples

FROM or WHERE clause: To retrieve all entities of type "hive_table" that are tagged with the "Dimension" classification, you could use the following query:

hive_table is Dimension
from hive_table where hive_table isa Dimension