Apache Atlas Advanced Search language reference
Atlas lets you search for metadata using a domain-specific language with a SQL-like format.
If you find that the Basic Search or Free-text Search doesn't allow you to search as precisely as you would like, you can create a query in the Advanced Search interface to return exactly the results you are looking for. Advanced Search queries use a domain-specific language that is intentionally SQL-like.
Each Advanced Search query is in the form of three clauses:
FROM WHERE SELECT
Additional keywords such as GROUPBY, ORDERBY, and LIMIT can be used to affect the output.
FROM clause
The value specified in the FROM clause acts as the scope of the query. You can specify any entity type in the FROM clause. The possible entity types are the same list as in the Type search; the names are case-sensitive.
The FROM clause is required and also assumed: the first item included in the query (if not literally the word "from") is assumed to be the object of the FROM clause.
Examples
With or without FROM: To retrieve all entities of type "hive_db" use one of the following queries:
hive_db from hive_db
If you only specify a FROM clause, Atlas returns all entities of that type.
Where Clause
The WHERE clause allows for filtering over the result set identified in the FROM clause by specifying a condition of the form:
identifier operator 'literal'
The identifier is the name of a property of the entity type specified in the FROM clause. The properties for a given entity type are those shown in the Properties tab of an entity detail page. The names are case-sensitive.
Operators vary by the data type of the literal and include the following:
String: = LIKE
Numeric, Date: = < >
Boolean: =
The LIKE operator allows you to use wildcards in the literal. Asterisk (*) replaces zero to multiple values; question mark (?) replaces a single value.
The literal must be enclosed in single or double quotes. Matches are case-sensitive. Literals can be lists of values. If you specify comma-separated values in square brackets, they act as an OR operation.
Dates used in literals need to be specified using the ISO 8601 format and in single or double quotes.
Boolean values used in literals are lower case "true" and "false" without quotation marks.
You can specify multiple conditions using AND or OR operators. Note that making a list of values is more efficient than using the same identifier in multiple conditions.
Examples:
Exact string: To retrieve all entities of type hive_table with a specific name "time_dim", use:
from hive_table where name = 'time_dim'
Multiple conditions: To retrieve entity of type hive_table with name that can be either "time_dim" or "customer_dim":
from hive_table where name 'time_dim' or name = 'customer_dim'
List of values: The query in the example above can be written using a value array:
from hive_table where name = ["customer_dim", "time_dim"]
Wildcard filtering: To retrieve entity of type hive_table whose name ends with '_dim':
from hive_table where name LIKE '*_dim'
To retrieve a hive_db whose name starts with R followed by any 3 characters, followed by rt followed by at least 1 character, followed by none or any number of characters:
DB where name like "R???rt?*"
Date Literal: To retrieve entity of type hive_table created within 2019 and 2020, use the date portion of the time value and specify a range using two phrases connected by AND:
from hive_table where createTime > '2019-01-01' and createTime < '2019-01-03'
Boolean Literal: To retrieve entity of type hdfs_path whose attribute isFile is set to true and whose name is Invoice:
from hdfs_path where isFile = true and name = "Invoice"
Select Clause
The select clause allows you to specify the properties you want returned in the search results. Properties with simple values can be returned; properties that contain other entities are not available. The property names are case sensitive.
To display column headers that are more meaningful that the system property names, you can use aliases using 'as.'
Examples
Select clause only: To retrieve entities of type "hive_table" with some of its properties:
from hive_table select owner, name, qualifiedName
WHERE and SELECT clauses: To retrieve entity of type hive_table for a specific table with some properties:
from hive_table where name = 'customer_dim' select owner, name, qualifiedName
Change output names using AS: To display column headers as 'Owner', 'Name' and 'FullName'.
from hive_table select owner as Owner, name as Name, qualifiedName as FullName
Searches with system attributes
System Attribute | Description | Identifier in Advanced Search |
---|---|---|
Type | The Atlas entity type. | __typeName |
Status | The entity status in Atlas: this field indicates if a data asset has been deleted; Atlas maintains the entity information after the asset no longer exists on the cluster. | __state |
Created By User | The Atlas user who created this entity. Typically this is the Atlas system user. If an entity was created by an API call or created manually by users, the active user account would be included in this attribute. | __createdBy |
Last Modified User | The Atlas user who last updated the entity, whether through Atlas metadata collection from a cluster service, an Atlas API, or a change through the Atlas UI. | __modifiedBy |
Created timestamp | The date Atlas created the entity. Note that this field is different from the technical attribute for the creation date of the original data asset or operation. | __timestamp |
Last Modified timestamp | The date when an entity was last updated in Atlas. Note that this field is different from the technical attribute for the last modification date of the actual data asset or operation on the cluster. | __modificationTimestamp |
GUID | A unique identifier generated by Atlas. This is the 32-digit code found in the browser URL for an entity. | __guid |
Labels | Label metadata added to an entity. | __labels |
User-Defined Properties | Key-value pair metadata added to an entity. | __customAttributes |
Classifications | Classifications added to an entity. | __classificationNames |
Propagated Classifications | Classifications added to entities downstream from an entity where the classification was added by a user. | __propagatedClassificationNames |
— | A concatenated string of classification names and attributes for an entity. This attribute is not available through the Atlas UI. | __classificationsText |
IsIncomplete | A system indicator that entities were created because they were referenced in the metadata collected by a service other than the source type associated with the entity type. An entity is typically marked "isIncomplete" when Atlas receives metadata out of order from when the events occurred. If IsIncomplete entities remain “incomplete” for a long time, it may indicate that the original messages for entity metadata have not arrived. | __isIncomplete |
Advanced Searches using Classifications
You can search for entities that are tagged with a specific classification using "is" or "isa" keywords in either the From or Where clauses. Is and Isa are interchangeable.
Examples
FROM or WHERE clause: To retrieve all entities of type "hive_table" that are tagged with the "Dimension" classification, you could use the following query:
hive_table is Dimension from hive_table where hive_table isa Dimension