Metadata Search Syntax and Properties

In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.

Search Syntax

You construct search strings by specifying the value of a default property and the following three types of key-value pairs using the given syntax:

Technical metadata key-value pairs - key:value, where
- key is one of the properties listed in Search Properties.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
These key-value pairs are read-only and cannot be modified.
Custom metadata key-value pairs - up_key:value, where
- key is a user-defined property defined on an entity after extraction.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
Custom metadata key-value pairs can be modified.
Hive extended attribute key-value pairs - tp_key:value, where
- key is an extended attribute defined on a Hive entity before extraction. The syntax of the attribute is specific to Hive.
- value is a single value supported by the entity type.
These key-value pairs are read-only and cannot be modified.

To construct complex strings, join multiple property-value pairs using the or and and operators.

Example Search Strings

Filesystem path /user/admin - fileSystemPath:\/user\/admin
Descriptions that start with the string "Banking" - description:Banking*
Sources of type MapReduce or Hive - sourceType:mapreduce or sourceType:hive
Directories owned by hdfs in the path /user/hdfs/input - owner:hdfs and type:directory and fileSystemPath:"/user/hdfs/input"
Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
User-defined key-value project-customer1 - up_project:customer1
Technical key-value - In Hive you can specify table properties like this:
```
ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
```
To query for this property, specify tp_key1:value1.

Search Properties

A reference for the search schema properties.

The full list of properties are:

Default Properties
Common Properties
HDFS Properties
Dataset Properties
MapReduce and YARN Properties
Operation Properties
Hive Properties
Oozie Properties
Pig Properties
Sqoop Properties

Default Properties

The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.

Common Properties

Name	Type	Description
`description`	text	Description of the entity.
`group`	caseInsensitiveText	The group to which the owner of the entity belongs.
`name`	ngramedText	The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces.
`operationType`	ngramedText	The type of an operation: Pig - SCRIPT Sqoop - Table Export, Query Import
`originalName`	ngramedText	The name of the entity when it was extracted.
`originalDescription`	text	The description of the entity when it was extracted.
`owner`	caseInsensitiveText	The owner of the entity.
`principal`	caseInsensitiveText	For entities with type `OPERATION_EXECUTION`, the initiator of the entity.
`properties`	string	A set of key-value pairs that describe the entity.
`tags`	ngramedText	A set of tags that describe the entity.
`type`	tokenizedCaseInsensitiveText	The type of the entity. The available types depend on the entity's source type: `hdfs` - `DIRECTORY`, `FILE`, `DATASET`, `FIELD` `hive` - `DATABASE`, `TABLE`, `FIELD`, `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION`, `PARTITION`, `RESOURCE`, `VIEW` `impala` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION` `mapreduce` - `OPERATION`, `OPERATION_EXECUTION` `oozie` - `OPERATION`, `OPERATION_EXECUTION` `pig` - `OPERATION`, `OPERATION_EXECUTION` `spark` - `OPERATION`, `OPERATION_EXECUTION` `sqoop` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION` `yarn` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION`
`userEntity`	Boolean	Indicates whether an entity was added using the Cloudera Navigator SDK.
Query
`queryText`	string	The text of a Hive, Impala, or Sqoop query.
Source
`clusterName`	string	The name of the cluster in which the source is managed.
`sourceId`	string	The ID of the source type.
`sourceType`	caseInsensitiveText	The source type of the entity: `hdfs`, `hive`, `impala`, `mapreduce`, `oozie`, `pig`, `spark`, `sqoop`, or `yarn`.
`sourceUrl`	string	The URL of web application for a resource.
Timestamps
The available timestamp fields vary by the source type: `hdfs` - `created`, `lastAccessed`, `lastModified` `hive` - `created`, `lastModified` `impala`, `mapreduce`, `pig`, `spark`, `sqoop`, and `yarn` - `started`, `ended`	date	Timestamps in the Solr Date Format. For example: `lastAccessed:[* TO NOW]` `created:[1976-03-06T23:59:59.999Z TO *]` `started:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]` `ended:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]` `created:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]` `lastAccessed:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]`

HDFS Properties

Name	Type	Description
`blockSize`	long	The block size of an HDFS file.
`deleted`	Boolean	Indicates whether the entity has been moved to the Trash folder.
`deleteTime`	date	The time the entity was moved to the Trash folder.
`fileSystemPath`	path	The path to the entity.
`mimeType`	ngramedText	The MIME type of an HDFS file.
`parentPath`	string	The path to the parent entity of a child entity. For example: `parent path:/default/sample_07` for the table `sample_07` from the Hive database `default`.
`permissions`	string	The UNIX access permissions of the entity.
`replication`	int	The number of copies of HDFS file blocks.
`size`	long	The exact size of the entity in bytes or a range of sizes. Range examples: `size:[1000 TO ]`, `size: [ TO 2000]`, and `size:[* TO *]` to find all fields with a size value.

Dataset Properties

Name	Type	Description
`compressionType`	tokenizedCaseInsensitiveText	The type of compression of a dataset file.
`dataType`	string	The data type: record.
`datasetType`	tokenizedCaseInsensitiveText	The type of the dataset: Kite.
`fileFormat`	tokenizedCaseInsensitiveText	The format of a dataset file: Avro or Parquet.
`fullDataType`	string	The full data type: record.
`partitionType`	string	The type of the partition.
`schemaName`	string	The name of the dataset schema.
`schemaNameSpace`	string	The namespace of the dataset schema.

MapReduce and YARN Properties

Name	Type	Description
`inputRecursive`	Boolean	Indicates whether files are searched recursively under the input directories, or just files directly under the input directories are considered.
`jobId`	ngramedText	The ID of the job. For a job spawned by Oozie, the workflow ID.
`mapper`	string	The fully-qualified name of the mapper class.
`outputKey`	string	The fully-qualified name of the class of the output key.
`outputValue`	string	The fully-qualified name of the class of the output value.
`reducer`	string	The fully-qualified name of the reducer class.

Operation Properties

Name	Type	Description
Operation
`inputFormat`	string	The fully-qualified name of the class of the input format.
`outputFormat`	string	The fully-qualified name of the class of the output format.
Operation Execution
`inputs`	string	The name of the entity input to an operation execution. For entities of resource type `mapreduce`, `yarn`, and `spark`, it is usually a directory. For entities of resource type `hive`, it is usually a table.
`outputs`	string	The name of the entity output from an operation execution. For entities of resource type `mapreduce`, `yarn`, and `spark`, it is usually a directory. For entities of resource type `hive`, it is usually a table.
`engineType`	string	The type of the engine used for an operation: MR or Spark.

Hive Properties

Name	Type	Description
Field
`dataType`	ngramedText	The type of data stored in a field (column).
Table
`compressed`	Boolean	Indicates whether a table is compressed.
`serDeLibName`	string	The name of the library containing the SerDe class.
`serDeName`	string	The fully-qualified name of the SerDe class.
Partition
`partitionColNames`	string	The table columns that define the partition.
`partitionColValues`	string	The table column values that define the partition.
`technical_properties`	string	Hive extended attributes.
`clusteredByColNames`	string	The column names that identify how table content is divided into buckets.
`sortByColNames`	string	The column names that identify how table content is sorted within a bucket.

Oozie Properties

Name	Type	Description
`status`	string	The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.

Pig Properties

Name	Type	Description
`scriptId`	string	The ID of the Pig script.

Sqoop Properties

Name	Type	Description
`dbURL`	string	The URL of the database from or to which the data was imported or exported.
`dbTable`	string	The table from or to which the data was imported or exported.
`dbUser`	string	The database user.
`dbWhere`	string	A where clause that identifies which rows were imported.
`dbColumnExpression`	string	An expression that identifies which columns were imported.

Metadata Extraction and Indexing

Accessing Metadata