Metadata Search Syntax and Properties
In Cloudera Navigator, metadata search is implemented by an embedded Solr engine that supports the syntax described in LuceneQParserPlugin.
Search Syntax
You construct search strings by specifying the value of a default property and the following three types of key-value pairs using the given syntax:
- Technical metadata key-value pairs - key:value, where
- key is one of the properties listed in Search Properties.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
- Custom metadata key-value pairs - up_key:value, where
- key is a user-defined property defined on an entity after extraction.
- value is a single value or range of values specified as [value1 TO value2]. In a value, * is a wildcard. In property values you must escape special characters :, -, /, and * with the backslash character \ or enclose the property value in quotes. For example, fileSystemPath:/tmp/hbase\-staging.
- Hive extended attribute key-value pairs - tp_key:value, where
- key is an extended attribute defined on a Hive entity before extraction. The syntax of the attribute is specific to Hive.
- value is a single value supported by the entity type.
To construct complex strings, join multiple property-value pairs using the or and and operators.
Example Search Strings
- Filesystem path /user/admin - fileSystemPath:\/user\/admin
- Descriptions that start with the string "Banking" - description:Banking*
- Sources of type MapReduce or Hive - sourceType:mapreduce or sourceType:hive
- Directories owned by hdfs in the path /user/hdfs/input - owner:hdfs and type:directory and fileSystemPath:"/user/hdfs/input"
- Job started between 20:00 to 21:00 UTC - started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]
- User-defined key-value project-customer1 - up_project:customer1
- Technical key-value - In Hive you can specify table properties like this:
ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
To query for this property, specify tp_key1:value1.
Search Properties
Default Properties
The following properties can be searched by specifying a property value: type, fileSystemPath, inputs, jobId, mapper, mimeType, name, originalName, outputs, owner, principal, reducer, and tags.
Common Properties
Name | Type | Description |
---|---|---|
description | text | Description of the entity. |
group | caseInsensitiveText | The group to which the owner of the entity belongs. |
name | ngramedText | The overridden name of the entity. If the name has not been overridden, this value is empty. Names cannot contain spaces. |
operationType | ngramedText | The type of an operation:
|
originalName | ngramedText | The name of the entity when it was extracted. |
originalDescription | text | The description of the entity when it was extracted. |
owner | caseInsensitiveText | The owner of the entity. |
principal | caseInsensitiveText | For entities with type OPERATION_EXECUTION, the initiator of the entity. |
properties | string | A set of key-value pairs that describe the entity. |
tags | ngramedText | A set of tags that describe the entity. |
type | tokenizedCaseInsensitiveText | The type of the entity. The available types depend on the entity's source type:
|
userEntity | Boolean | Indicates whether an entity was added using the Cloudera Navigator SDK. |
Query | ||
queryText | string | The text of a Hive, Impala, or Sqoop query. |
Source | ||
clusterName | string | The name of the cluster in which the source is managed. |
sourceId | string | The ID of the source type. |
sourceType | caseInsensitiveText | The source type of the entity: hdfs, hive, impala, mapreduce, oozie, pig, spark, sqoop, or yarn. |
sourceUrl | string | The URL of web application for a resource. |
Timestamps | ||
The available timestamp fields vary by the source type:
|
date | Timestamps in the Solr Date
Format. For example:
|
HDFS Properties
Name | Type | Description |
---|---|---|
blockSize | long | The block size of an HDFS file. |
deleted | Boolean | Indicates whether the entity has been moved to the Trash folder. |
deleteTime | date | The time the entity was moved to the Trash folder. |
fileSystemPath | path | The path to the entity. |
mimeType | ngramedText | The MIME type of an HDFS file. |
parentPath | string | The path to the parent entity of a child entity. For example: parent path:/default/sample_07 for the table sample_07 from the Hive database default. |
permissions | string | The UNIX access permissions of the entity. |
replication | int | The number of copies of HDFS file blocks. |
size | long | The exact size of the entity in bytes or a range of sizes. Range examples: size:[1000 TO *], size: [* TO 2000], and size:[* TO *] to find all fields with a size value. |
Dataset Properties
Name | Type | Description |
---|---|---|
compressionType | tokenizedCaseInsensitiveText | The type of compression of a dataset file. |
dataType | string | The data type: record. |
datasetType | tokenizedCaseInsensitiveText | The type of the dataset: Kite. |
fileFormat | tokenizedCaseInsensitiveText | The format of a dataset file: Avro or Parquet. |
fullDataType | string | The full data type: record. |
partitionType | string | The type of the partition. |
schemaName | string | The name of the dataset schema. |
schemaNameSpace | string | The namespace of the dataset schema. |
MapReduce and YARN Properties
Name | Type | Description |
---|---|---|
inputRecursive | Boolean | Indicates whether files are searched recursively under the input directories, or just files directly under the input directories are considered. |
jobId | ngramedText | The ID of the job. For a job spawned by Oozie, the workflow ID. |
mapper | string | The fully-qualified name of the mapper class. |
outputKey | string | The fully-qualified name of the class of the output key. |
outputValue | string | The fully-qualified name of the class of the output value. |
reducer | string | The fully-qualified name of the reducer class. |
Operation Properties
Name | Type | Description |
---|---|---|
Operation | ||
inputFormat | string | The fully-qualified name of the class of the input format. |
outputFormat | string | The fully-qualified name of the class of the output format. |
Operation Execution | ||
inputs | string | The name of the entity input to an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table. |
outputs | string | The name of the entity output from an operation execution. For entities of resource type mapreduce, yarn, and spark, it is usually a directory. For entities of resource type hive, it is usually a table. |
engineType | string | The type of the engine used for an operation: MR or Spark. |
Hive Properties
Name | Type | Description |
---|---|---|
Field | ||
dataType | ngramedText | The type of data stored in a field (column). |
Table | ||
compressed | Boolean | Indicates whether a table is compressed. |
serDeLibName | string | The name of the library containing the SerDe class. |
serDeName | string | The fully-qualified name of the SerDe class. |
Partition | ||
partitionColNames | string | The table columns that define the partition. |
partitionColValues | string | The table column values that define the partition. |
technical_properties | string | Hive extended attributes. |
clusteredByColNames | string | The column names that identify how table content is divided into buckets. |
sortByColNames | string | The column names that identify how table content is sorted within a bucket. |
Oozie Properties
Name | Type | Description |
---|---|---|
status | string | The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED. |
Sqoop Properties
Name | Type | Description |
---|---|---|
dbURL | string | The URL of the database from or to which the data was imported or exported. |
dbTable | string | The table from or to which the data was imported or exported. |
dbUser | string | The database user. |
dbWhere | string | A where clause that identifies which rows were imported. |
dbColumnExpression | string | An expression that identifies which columns were imported. |