Using the Purge APIs for Metadata Maintenance Tasks
Required Role: Cloudera Navigator Full Administrator
The volume of metadata maintained by Navigator Metadata Server can grow quickly and exceed the capacity of the Solr instance that processes the index and supports the search capability. For faster search and cleaner lineage tracing, use the purge feature to routinely delete unwanted metadata from the system.
Purging stale metadata is also recommended prior to upgrading an existing Cloudera Navigator instance. See Avoiding Out-of-Memory Errors During an Upgrade for details.
Purging Metadata for HDFS Entities, Hive and Impala Select Queries, and YARN, Sqoop, and Pig Operations
You can delete metadata for HDFS entities, Hive and Impala select queries, and YARN, Sqoop and Pig operations by using the purge method. Purge is a long-running task that requires exclusive access to the Solr instance and does not allow any concurrent activities, including extraction.
- Back up the Navigator Metadata Server storage directory.
- Invoke the maintenance/purge endpoint with the POST method with parameters from the following table:
http://fqdn-n.example.com:port/api/APIversion/maintenance/purge?parameters
where fqdn-n.example.com is the host running the Navigator Metadata Server role instance listening for HTTP connections at the specified port number (7187 is the default port number). APIversion is the running version of the API as indicated in the footer of the API documentation (available from the Help menu in the Navigator console) or by calling http://fqdn-n.example.com:port/api/version.
Purge Parameters Metadata Parameter Description HDFS deleteTimeThresholdMinutes After an HDFS entity is deleted, the amount of time that elapses before the entity can be purged. Default: 86400 minutes (60 days).
runtimeCapMinutes Amount of time allowed for the HDFS purge process to run. When specified time is reached, the state is saved and the purge process stops. Run purge again to remove any remaining items held in state. Default: 720 minutes (12 hours). Set to 0 to effectively disable purge for HDFS files and directories (none will be purged).
Default: 720 minutes (12 hours).
Hive and Impala Select Queries; YARN, Sqoop, Pig Operations deleteSelectOperations Set to true to enable purge for Hive and Impala SELECT queries, and to enable YARN, Sqoop, and Pig operations older than the staleQueryThresholdDays value to be purged. Default: false
By default, does not purge Hive and Impala SELECT queries, nor are YARN, Sqoop, and Pig operations purged.staleQueryThresholdDays Number of days at which Hive and Impala SELECT queries and YARN, Sqoop, and Pig operations will be identified as stale, effectively marking them 'ready for purge.' They will be purged automatically within hours of the value being reached. Default: 60 days. To disable marking entities as stale for the foreseeable future, set the value to very large number, such as 36500.
$ curl -X POST -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/purge?deleteTimeThresholdMinutes=0"
Purge tasks do not start until all currently running extraction tasks finish.
- When all tasks have completed, click Continue to return to the Cloudera Navigator console.
Retrieving Purge Status
http://fqdn-n.example.com:port/api/APIversion/maintenance/running
curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/running"
[{ "id" : 5, "type" : "PURGE", "startTime" : "2016-03-10T23:17:41.884Z", "endTime" : "1970-01-01T00:00:00.000Z", "status" : "IN_PROGRESS", "message" : "Purged 2661984 out of 4864714 directories. Averaging 1709 directories per minute.", "username" : "admin", "stage" : "HDFS_DIRECTORIES", "stagePercent" : 54 }]
Retrieving Purge History
http://fqdn-n.example.com:port/api/APIversion/maintenance/history?parameters
where fqdn-n.example.com is the host running the Navigator Metadata Server role instance listening for HTTP connections at the specified port number (7187 is the default port number). APIversion is the running version of the API as indicated in the footer of the API documentation (available from the Help menu in the Navigator console) or by calling http://fqdn-n.example.com:port/api/version.
Parameter | Description |
---|---|
offset | First purge history entry to retrieve.
Default: 0. |
limit | Number of history entries to retrieve from the offset.
Default: 100. |
curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/history?offset=0&limit=100"A result would look similar to:
[ { "id": 1, "type": "PURGE", "startTime": "2016-03-09T18:57:43.196Z", "endTime": "2016-03-09T18:58:33.337Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 2, "type": "PURGE", "startTime": "2016-03-09T19:47:39.401Z", "endTime": "2016-03-09T19:47:40.841Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 3, "type": "PURGE", "startTime": "2016-03-10T01:27:39.632Z", "endTime": "2016-03-10T01:27:46.809Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 4, "type": "PURGE", "startTime": "2016-03-10T01:57:40.461Z", "endTime": "2016-03-10T01:57:41.174Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 }, { "id": 5, "type": "PURGE", "startTime": "2016-03-10T23:17:41.884Z", "endTime": "2016-03-10T23:18:06.802Z", "status": "SUCCESS", "username": "admin", "stagePercent": 0 } ]
Stopping a Purge Operation
To stop a purge operation that is running too long and blocking user access to Navigator, restart the Navigator Metadata Server. To complete the operation, rerun the purge command at a later time.