Managing Metadata
This topic describes tasks for enabling and disabling metadata extraction and purging obsolete metadata.
Enabling and Disabling Metadata Extraction
Minimum Required Role: Navigator Administrator (also provided by Full Administrator)
Enabling Hive Metadata Extraction in a Secure Cluster
The Navigator Metadata Server uses the hue user to connect to the Hive Metastore. The hue user can connect to the Hive Metastore by default. However, if the Hive service Hive Metastore Access Control and Proxy User Groups Override property or the HDFS service Hive Proxy User Groups property have been changed from their default values to settings that prevent the hue user from connecting to the Hive Metastore, Navigator Metadata Server cannot extract metadata from Apache Hive. If this is the case, modify the Hive service Hive Metastore Access Control and Proxy User Groups Override property or the HDFS service Hive Proxy User Groups property as follows:- Go to the Hive or HDFS service.
- Click the Configuration tab.
- In the Search box, type proxy.
- In the Hive service Hive Metastore Access Control and Proxy User Groups Override or the HDFS service Hive Proxy User Groups
property, click to add a new row.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Type hue.
- Click Save Changes to commit the changes.
- Restart the service.
Enabling Spark Metadata Extraction
Lineage diagrams for Spark are supported in CDH 5.11 and higher, with some restrictions as listed in Restrictions on Lineage for Spark. By default, Spark metadata extraction is enabled. To control the status of Spark metadata extraction:
- Search for the configuration setting config.navigator.lineage_enabled.
- Check or uncheck the checkbox as appropriate.
- In Navigator Metadata Server Advanced Configuration Snippet (Safety Valve) for cloudera-navigator.properties, remove any setting for the property:
nav.spark.extraction.enable
This prior method of enabling metadata extraction for Spark is now deprecated. - Click Save Changes to commit the changes.
- Restart the role.
Managing Metadata Capacity
Minimum Required Role: Full Administrator
Purging Metadata for HDFS Entities, Hive and Impala Select Queries, and YARN, Sqoop, and Pig Operations
You can delete metadata for HDFS entities, Hive and Impala select queries, and YARN, Sqoop and Pig operations by using the purge method. (Metadata for Hive tables is not deleted.) Purge is a long-running task that requires exclusive access to the Solr instance and does not allow any concurrent activities, including extraction.
- Back up the Navigator Metadata Server storage directory.
- Invoke the http://Navigator_Metadata_Server_host:port/api/v10/maintenance/purge endpoint with the following parameters:
Purge Parameters Metadata Parameter Description HDFS deleteTimeThresholdMinutes After an HDFS entity is deleted, the number of minutes that must pass before that entity can be purged. Default: 86400 minutes (60 days).
runtimeCapMinutes Number of minutes that the HDFS purge can run. When this limit is reached, the purge state is saved and the purge task terminates. You must run the purge again to purge any remaining entities. If you set the value to 0, no HDFS files or directories are purged.
Default: 720 minutes (12 hours).
Hive and Impala Select Queries; YARN, Sqoop, Pig Operations deleteSelectOperations Boolean. If set to true, the purge deletes all Hive and Impala select queries, and YARN, Sqoop, and Pig operations, that are older than the number of days defined by the staleQueryThresholdDays value. Default: false
staleQueryThresholdDays For Hive and Impala select queries, and YARN, Sqoop, and Pig operations, the number of days they must be older than to be purged. To disable purge for Hive and Impala select queries, and for YARN, Sqoop, and Pig operations, set the threshold to a very large value, for example, 36500.
Default: 60 days
$ curl -X POST -u admin:admin "http://Navigator_Metadata_Server_host:port/api/v10/maintenance/purge?deleteTimeThresholdMinutes=0"
Purge tasks do not start until all currently running extraction tasks finish.
- When all tasks have completed, click Continue to return to the Cloudera Navigator UI.
Retrieving Purge Status
To view the status of the purge process, invoke the http://Navigator_Metadata_Server_host:port/api/v10/maintenance/running endpoint. For example:curl -X GET -u admin:admin "http://Navigator_Metadata_Server_host:port/api/v10/maintenance/running"A result would look similar to:
[{ "id" : 5, "type" : "PURGE", "startTime" : "2016-03-10T23:17:41.884Z", "endTime" : "1970-01-01T00:00:00.000Z", "status" : "IN_PROGRESS", "message" : "Purged 2661984 out of 4864714 directories. Averaging 1709 directories per minute.", "username" : "admin", "stage" : "HDFS_DIRECTORIES", "stagePercent" : 54 }]
Retrieving Purge History
To view the purge history, invoke the http://Navigator_Metadata_Server_host:port/api/v10/maintenance/history endpoint with the following parameters:Parameter | Description |
---|---|
offset | First purge history entry to retrieve.
Default: 0. |
limit | Number of history entries to retrieve from the offset.
Default: 100. |
curl -X GET -u admin:admin "http://Navigator_Metadata_Server_host:port/api/v10/maintenance/history?offset=0&limit=100"A result would look similar to:
[ { "id": 1, "type": "PURGE", "startTime": "2016-03-09T18:57:43.196Z", "endTime": "2016-03-09T18:58:33.337Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 2, "type": "PURGE", "startTime": "2016-03-09T19:47:39.401Z", "endTime": "2016-03-09T19:47:40.841Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 3, "type": "PURGE", "startTime": "2016-03-10T01:27:39.632Z", "endTime": "2016-03-10T01:27:46.809Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 4, "type": "PURGE", "startTime": "2016-03-10T01:57:40.461Z", "endTime": "2016-03-10T01:57:41.174Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 5, "type": "PURGE", "startTime": "2016-03-10T23:17:41.884Z", "endTime": "2016-03-10T23:18:06.802Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 } ]
Configuring Display of Inputs and Outputs
- Do one of the following:
- Select .
- On the Cloudera Management Service table, click the Cloudera Management Service link. tab, in
- Click the Configuration tab.
- Select .
- In Navigator Metadata Server Advanced Configuration Snippet (Safety Valve) for cloudera-navigator.properties, set the property
nav.ui.details_io_enabled=true
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the role.