Metadata Search Syntax and Properties
In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.
Search Syntax
You construct search strings by specifying the value of a default property and four types of key-value pairs, using the indicated syntax:
- Technical metadata key-value pairs - key:value
- key is one of the properties listed in Search Properties.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
- Custom metadata key-value pairs - up_key:value
- key is a user-defined property.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
- Hive extended attribute key-value pairs - tp_key:value
- key is an extended attribute set on a Hive entity. The syntax of the attribute is specific to Hive.
- value is a single value supported by the entity type.
- Managed metadata key-value pairs - namespace.key:value
- namespace is the namespace containing the property. See Defining Managed Metadata.
- key is the name of a managed metadata property.
- value is a single value, a range of values specified as [value1 TO value2], or a set of values separated by spaces. In a value, * is a wildcard. In property values, you must escape special characters :, -, /, and * with the backslash character (\), or enclose the property value in quotes.
- S3 key-value pairs - tp_key:value
- key is the name of user-defined metadata.
- value is a single value.
- Only file metadata is extracted; bucket and folder metadata is not extracted.
Constructing Compound Search Strings
To construct compound search strings, you can join multiple property-value pairs using the Lucene Query Parser Boolean operators:
- , +, -
- OR, AND, NOT
Example Search Strings
- Entities in the path /user/hive that have not been deleted - +("/user/hive") +(-deleted:true)
- Descriptions that start with the string "Banking" - description:Banking*
- Entities of type MapReduce or entities of type Hive - sourceType:mapreduce sourceType:hive or sourceType:mapreduce OR sourceType:hive
- Entities of type HDFS with size equal to or greater than 1024 MiB or entities of type Impala - (+sourceType:hdfs +size:[1073741824 TO *]) sourceType:impala
- Directories owned by hdfs in the path /user/hdfs/input - +owner:hdfs +type:directory +fileSystemPath:"/user/hdfs/input" or owner:hdfs AND type:directory AND fileSystemPath:"/user/hdfs/input"
- Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
- Custom key-value - project-customer1 - up_project:customer1
- Technical key-value - In Hive, specify table properties like this:
ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
To search for this property, specify tp_key1:value1. - Managed key-value with multivalued property - MailAnnotation.emailTo:"dana@example.com" MailAnnotation.emailTo:"lee@example.com"
Search Properties
Default Properties
The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.
Common Properties
Name | Type | Description |
---|---|---|
description | text | Description of the entity. |
group | caseInsensitiveText | The group to which the owner of the entity belongs. |
name | ngramedText | The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces. |
operationType | ngramedText | The type of an operation:
|
originalName | ngramedText | The name of the entity when it was extracted. |
originalDescription | text | The description of the entity when it was extracted. |
owner | caseInsensitiveText | The owner of the entity. |
principal | caseInsensitiveText | For entities with type OPERATION_EXECUTION, the initiator of the entity. |
properties | string | A set of key-value pairs that describe the entity. |
tags | ngramedText | A set of tags that describe the entity. |
type | tokenizedCaseInsensitiveText | The type of the entity. The available types depend on the entity's source type:
|
userEntity | Boolean | Indicates whether an entity was added using the Cloudera Navigator SDK. |
Query | ||
queryText | string | The text of a Hive, Impala, or Sqoop query. |
Source | ||
clusterName | string | The name of the cluster in which the source is managed. |
sourceId | string | The ID of the source type. |
sourceType | caseInsensitiveText | The source type of the entity: hdfs, hive, impala, mapreduce, oozie, pig, spark, sqoop, or yarn. |
sourceUrl | string | The URL of web application for a resource. |
Timestamps | ||
The available timestamp fields vary by the source type:
|
date | Timestamps in the Solr Date
Format. For example:
|
Dataset Properties
Name | Type | Description |
---|---|---|
compressionType | tokenizedCaseInsensitiveText | The type of compression of a dataset file. |
dataType | string | The data type: record. |
datasetType | tokenizedCaseInsensitiveText | The type of the dataset: Kite. |
fileFormat | tokenizedCaseInsensitiveText | The format of a dataset file: Avro or Parquet. |
fullDataType | string | The full data type: record. |
partitionType | string | The type of the partition. |
schemaName | string | The name of the dataset schema. |
schemaNameSpace | string | The namespace of the dataset schema. |
HDFS Properties
Name | Type | Description |
---|---|---|
blockSize | long | The block size of an HDFS file. |
deleted | Boolean | Indicates whether the entity has been moved to the Trash folder. |
deleteTime | long | The time the entity was moved to the Trash folder. |
fileSystemPath | path | The path to the entity. |
mimeType | ngramedText | The MIME type of an HDFS file. |
parentPath | string | The path to the parent entity of a child entity. For example: parent path:/default/sample_07 for the table sample_07 from the Hive database default. |
permissions | string | The UNIX access permissions of the entity. |
replication | int | The number of copies of HDFS file blocks. |
size | long | The exact size of the entity in bytes or a range of sizes. Range examples: size:[1000 TO *], size: [* TO 2000], and size:[* TO *] to find all fields with a size value. |
Hive Properties
Name | Type | Description |
---|---|---|
Field | ||
dataType | ngramedText | The type of data stored in a field (column). |
Table | ||
compressed | Boolean | Indicates whether a table is compressed. |
serDeLibName | string | The name of the library containing the SerDe class. |
serDeName | string | The fully qualified name of the SerDe class. |
Partition | ||
partitionColNames | string | The table columns that define the partition. |
partitionColValues | string | The table column values that define the partition. |
technical_properties | string | Hive extended attributes. |
clusteredByColNames | string | The column names that identify how table content is divided into buckets. |
sortByColNames | string | The column names that identify how table content is sorted within a bucket. |
MapReduce and YARN Properties
Name | Type | Description |
---|---|---|
inputRecursive | Boolean | Indicates whether files are searched recursively under the input directories, or only files directly under the input directories are considered. |
jobId | ngramedText | The ID of the job. For a job spawned by Oozie, the workflow ID. |
mapper | string | The fully qualified name of the mapper class. |
outputKey | string | The fully qualified name of the class of the output key. |
outputValue | string | The fully qualified name of the class of the output value. |
reducer | string | The fully qualified name of the reducer class. |
Operation Properties
Name | Type | Description |
---|---|---|
Operation | ||
inputFormat | string | The fully qualified name of the class of the input format. |
outputFormat | string | The fully qualified name of the class of the output format. |
Operation Execution | ||
inputs | string | The name of the entity input to an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table. |
outputs | string | The name of the entity output from an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table. |
engineType | string | The type of the engine used for an operation: MR or Spark. |
Oozie Properties
Name | Type | Description |
---|---|---|
status | string | The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED. |
S3 Properties
Name | Type | Description |
---|---|---|
Object Properties | ||
region | string | The geographic region in which the bucket is stored |
bucketName | string | The name of the bucket in which the object is stored |
fileSystemPath | path | The key of the S3 object. |
size | long | Object size in bytes. |
lastModified | date | Object creation date or the last modified date, whichever is the latest. |
etag | string | A hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. |
storageClass | string | Storage class used for storing the object. |
owner | string | Owner of the object. |
sequencer | string | Latest S3 event notification sequencer. Used to order events. |
parentPath | string | Parent of the S3 object. |
technicalProperties | key-value pairs | Custom metadata for each S3 object. |
Bucket Properties | ||
region | string | Region for the bucket. |
created | date | Date the bucket was created. |
owner | string | Owner of the bucket. |
Sqoop Properties
Name | Type | Description |
---|---|---|
dbURL | string | The URL of the database from or to which the data was imported or exported. |
dbTable | string | The table from or to which the data was imported or exported. |
dbUser | string | The database user. |
dbWhere | string | A where clause that identifies which rows were imported. |
dbColumnExpression | string | An expression that identifies which columns were imported. |