Metadata Search Syntax and Properties

In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.

Search Syntax

You construct search strings by specifying the value of a default property and four types of key-value pairs, using the indicated syntax:

Technical metadata key-value pairs - key:value
- key is one of the properties listed in Search Properties.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
Technical metadata key-value pairs are read-only and cannot be modified.
Custom metadata key-value pairs - up_key:value
- key is a user-defined property.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
Custom metadata key-value pairs can be modified.
Hive extended attribute key-value pairs - tp_key:value
- key is an extended attribute set on a Hive entity. The syntax of the attribute is specific to Hive.
- value is a single value supported by the entity type.
Hive extended attribute key-value pairs are read-only and cannot be modified.
Managed metadata key-value pairs - namespace.key:value
- namespace is the namespace containing the property. See Defining Managed Metadata.
- key is the name of a managed metadata property.
- value is a single value, a range of values specified as [value1 TO value2], or a set of values separated by spaces. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
Only the values of managed metadata key-value pairs can be modified.

S3 key-value pairs - tp_key:value
- key is the name of user-defined metadata.
- value is a single value.
- Only file metadata is extracted; bucket and folder metadata is not extracted.

Constructing Compound Search Strings

To construct compound search strings, you can join multiple property-value pairs using the Lucene Query Parser Boolean operators:

, +, -
OR, AND, NOT

In both syntaxes, you use () to group multiple clauses into a single field and to form subqueries. When you filter results in the Navigator Metadata UI, the constructed search strings use the , +, - syntax.

Example Search Strings

Entities in the path /user/hive that have not been deleted - +("/user/hive") +(-deleted:true)
Descriptions that start with the string "Banking" - description:Banking*
Entities of type MapReduce or entities of type Hive - sourceType:mapreduce sourceType:hive or sourceType:mapreduce OR sourceType:hive
Entities of type HDFS with size equal to or greater than 1024 MiB or entities of type Impala - (+sourceType:hdfs +size:[1073741824 TO *]) sourceType:impala
Directories owned by hdfs in the path /user/hdfs/input - +owner:hdfs +type:directory +fileSystemPath:"/user/hdfs/input" or owner:hdfs AND type:directory AND fileSystemPath:"/user/hdfs/input"
Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
Custom key-value - project-customer1 - up_project:customer1
Technical key-value - In Hive, specify table properties like this:
```
ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
```
To search for this property, specify tp_key1:value1.
Managed key-value with multivalued property - MailAnnotation.emailTo:"dana@example.com" MailAnnotation.emailTo:"lee@example.com"

Search Properties

The following reference describes search schema properties.

Continue reading:

Default Properties
Common Properties
Dataset Properties
HDFS Properties
Hive Properties
MapReduce and YARN Properties
Operation Properties
Oozie Properties
Pig Properties
S3 Properties
Sqoop Properties

Default Properties

The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.

Common Properties

Name	Type	Description
`description`	text	Description of the entity.
`group`	caseInsensitiveText	The group to which the owner of the entity belongs.
`name`	ngramedText	The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces.
`operationType`	ngramedText	The type of an operation: Pig - SCRIPT Sqoop - Table Export, Query Import
`originalName`	ngramedText	The name of the entity when it was extracted.
`originalDescription`	text	The description of the entity when it was extracted.
`owner`	caseInsensitiveText	The owner of the entity.
`principal`	caseInsensitiveText	For entities with type `OPERATION_EXECUTION`, the initiator of the entity.
`properties`	string	A set of key-value pairs that describe the entity.
`tags`	ngramedText	A set of tags that describe the entity.
`type`	tokenizedCaseInsensitiveText	The type of the entity. The available types depend on the entity's source type: `hdfs` - `DIRECTORY`, `FILE`, `DATASET`, `FIELD` `hive` - `DATABASE`, `TABLE`, `FIELD`, `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION`, `PARTITION`, `RESOURCE`, `VIEW` `impala` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION` `mapreduce` - `OPERATION`, `OPERATION_EXECUTION` `oozie` - `OPERATION`, `OPERATION_EXECUTION` `pig` - `OPERATION`, `OPERATION_EXECUTION` `spark` - `OPERATION`, `OPERATION_EXECUTION` `sqoop` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION` `yarn` - `OPERATION`, `OPERATION_EXECUTION`, `SUB_OPERATION`
`userEntity`	Boolean	Indicates whether an entity was added using the Cloudera Navigator SDK.
Query
`queryText`	string	The text of a Hive, Impala, or Sqoop query.
Source
`clusterName`	string	The name of the cluster in which the source is managed.
`sourceId`	string	The ID of the source type.
`sourceType`	caseInsensitiveText	The source type of the entity: `hdfs`, `hive`, `impala`, `mapreduce`, `oozie`, `pig`, `spark`, `sqoop`, or `yarn`.
`sourceUrl`	string	The URL of web application for a resource.
Timestamps
The available timestamp fields vary by the source type: `hdfs` - `created`, `lastAccessed`, `lastModified` `hive` - `created`, `lastModified` `impala`, `mapreduce`, `pig`, `spark`, `sqoop`, and `yarn` - `started`, `ended`	date	Timestamps in the Solr Date Format. For example: `lastAccessed:[* TO NOW]` `created:[1976-03-06T23:59:59.999Z TO *]` `started:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]` `ended:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]` `created:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]` `lastAccessed:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]`

Dataset Properties

Name	Type	Description
`compressionType`	tokenizedCaseInsensitiveText	The type of compression of a dataset file.
`dataType`	string	The data type: record.
`datasetType`	tokenizedCaseInsensitiveText	The type of the dataset: Kite.
`fileFormat`	tokenizedCaseInsensitiveText	The format of a dataset file: Avro or Parquet.
`fullDataType`	string	The full data type: record.
`partitionType`	string	The type of the partition.
`schemaName`	string	The name of the dataset schema.
`schemaNameSpace`	string	The namespace of the dataset schema.

HDFS Properties

Name	Type	Description
`blockSize`	long	The block size of an HDFS file.
`deleted`	Boolean	Indicates whether the entity has been moved to the Trash folder.
`deleteTime`	long	The time the entity was moved to the Trash folder.
`fileSystemPath`	path	The path to the entity.
`mimeType`	ngramedText	The MIME type of an HDFS file.
`parentPath`	string	The path to the parent entity of a child entity. For example: `parent path:/default/sample_07` for the table `sample_07` from the Hive database `default`.
`permissions`	string	The UNIX access permissions of the entity.
`replication`	int	The number of copies of HDFS file blocks.
`size`	long	The exact size of the entity in bytes or a range of sizes. Range examples: `size:[1000 TO ]`, `size: [ TO 2000]`, and `size:[* TO *]` to find all fields with a size value.

Hive Properties

Name	Type	Description
Field
`dataType`	ngramedText	The type of data stored in a field (column).
Table
`compressed`	Boolean	Indicates whether a table is compressed.
`serDeLibName`	string	The name of the library containing the SerDe class.
`serDeName`	string	The fully qualified name of the SerDe class.
Partition
`partitionColNames`	string	The table columns that define the partition.
`partitionColValues`	string	The table column values that define the partition.
`technical_properties`	string	Hive extended attributes.
`clusteredByColNames`	string	The column names that identify how table content is divided into buckets.
`sortByColNames`	string	The column names that identify how table content is sorted within a bucket.

MapReduce and YARN Properties

Name	Type	Description
`inputRecursive`	Boolean	Indicates whether files are searched recursively under the input directories, or only files directly under the input directories are considered.
`jobId`	ngramedText	The ID of the job. For a job spawned by Oozie, the workflow ID.
`mapper`	string	The fully qualified name of the mapper class.
`outputKey`	string	The fully qualified name of the class of the output key.
`outputValue`	string	The fully qualified name of the class of the output value.
`reducer`	string	The fully qualified name of the reducer class.

Operation Properties

Name	Type	Description
Operation
`inputFormat`	string	The fully qualified name of the class of the input format.
`outputFormat`	string	The fully qualified name of the class of the output format.
Operation Execution
`inputs`	string	The name of the entity input to an operation execution. For entities of resource type `mapreduce`, `yarn`, and `spark`, it is usually a directory. For entities of resource type `hive`, it is usually a table.
`outputs`	string	The name of the entity output from an operation execution. For entities of resource type `mapreduce`, `yarn`, and `spark`, it is usually a directory. For entities of resource type `hive`, it is usually a table.
`engineType`	string	The type of the engine used for an operation: MR or Spark.

Oozie Properties

Name	Type	Description
`status`	string	The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.

Pig Properties

Name	Type	Description
`scriptId`	string	The ID of the Pig script.

S3 Properties

Name	Type	Description
Object Properties
`region`	string	The geographic region in which the bucket is stored
`bucketName`	string	The name of the bucket in which the object is stored
`fileSystemPath`	path	The key of the S3 object.
`size`	long	Object size in bytes.
`lastModified`	date	Object creation date or the last modified date, whichever is the latest.
`etag`	string	A hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data.
`storageClass`	string	Storage class used for storing the object.
`owner`	string	Owner of the object.
`sequencer`	string	Latest S3 event notification sequencer. Used to order events.
`parentPath`	string	Parent of the S3 object.
`technicalProperties`	key-value pairs	Custom metadata for each S3 object.
Bucket Properties
`region`	string	Region for the bucket.
`created`	date	Date the bucket was created.
`owner`	string	Owner of the bucket.

Sqoop Properties

Name	Type	Description
`dbURL`	string	The URL of the database from or to which the data was imported or exported.
`dbTable`	string	The table from or to which the data was imported or exported.
`dbUser`	string	The database user.
`dbWhere`	string	A where clause that identifies which rows were imported.
`dbColumnExpression`	string	An expression that identifies which columns were imported.

Metadata Extraction and Indexing

Accessing Metadata