Using the Purge APIs for Metadata Maintenance Tasks

Required Role: Cloudera Navigator Full Administrator

The volume of metadata maintained by Navigator Metadata Server can grow quickly and reduce the efficiency of the Solr instance that processes the index and supports the search capability. For faster search and cleaner lineage tracing, use the purge feature to routinely delete stale metadata from the system.

Purging stale metadata is also recommended prior to upgrading an existing Cloudera Navigator instance. See Purging the Navigator Metadata Server of Deleted and Stale Metadata for details.

See also What Metadata is Purged?

Purging Stale Entity Metadata

You can delete obsolete metadata using the purge method. Purge is a long-running task that requires exclusive access to the Solr instance and does not allow any concurrent activities, including extraction. When a purge job runs, any running Navigator tasks—extractions, policy application, or other background tasks—are stopped so that the purge can run immediately. When the purge task completes, the tasks that were stopped are restarted from the beginning. The interruption for the purge task may delay collecting new audits and metadata but does not affect what content is collected.

Required Role: Metadata Administrator (or Full Administrator)

To purge metadata, do the following:
  1. Invoke the maintenance/purge endpoint with the POST method with a user with Full Administrator role:
    http://fqdn-n.example.com:port/api/APIversion/maintenance/purge?parameters

    where fqdn-n.example.com is the host running the Navigator Metadata Server role instance listening for HTTP connections at the specified port number (7187 is the default port number). APIversion is the running version of the API as indicated in the footer of the API documentation (available from the Help menu in the Navigator console) or by calling http://fqdn-n.example.com:port/api/version.

    Purge Parameters
    Metadata Parameter Description
    HDFS deleteTimeThresholdMinutes After an HDFS entity is deleted, the amount of time that elapses before the entity can be purged.

    Default: 86400 minutes (60 days).

    runtimeCapMinutes Amount of time allowed for the HDFS purge process to run. When specified time is reached, the state is saved and the purge process stops. Run purge again to remove any remaining items held in state.

    Default: 720 minutes (12 hours). Set to 0 to effectively disable purge for HDFS files and directories (none will be purged).

    Default: 720 minutes (12 hours).

    Operations from all services deleteSelectOperations Set to true to enable purge for Hive and Impala SELECT queries, and to enable YARN, Sqoop, and Pig operations older than the staleQueryThresholdDays value to be purged.

    Default: false

    By default, does not purge Hive and Impala SELECT queries, nor are YARN, Sqoop, and Pig operations purged.
    staleQueryThresholdDays Number of days at which operations will be identified as stale, effectively marking them 'ready for purge.' They will be purged automatically within hours of the value being reached.

    Default: 60 days. To disable marking entities as stale for the foreseeable future, set the value to very large number, such as 36500.

    For example, the following cURL call purges the metadata of all deleted HDFS entities (elapsed minutes value set to 0):
    curl -X POST -u admin:admin "http://node1.example.com:7187/api/v14/maintenance/purge?deleteTimeThresholdMinutes=0" 

    Purge tasks do not start until all currently running extraction tasks finish.

  2. When all tasks have completed, click Continue to return to the Cloudera Navigator console.

Retrieving Purge Status

Required Role: Metadata Administrator (or Full Administrator)

To view the status of the purge process, invoke the maintenance/running endpoint with the GET method:
http://fqdn-n.example.com:port/api/APIversion/maintenance/running
For example:
curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/running"
A result would look similar to:
[{
  "id" : 5,
  "type" : "PURGE",
  "startTime" : "2016-03-10T23:17:41.884Z",
  "endTime" : "1970-01-01T00:00:00.000Z",
  "status" : "IN_PROGRESS",
  "message" : "Purged 2661984 out of 4864714 directories. Averaging 1709 directories per minute.",
  "username" : "admin",
  "stage" : "HDFS_DIRECTORIES",
  "stagePercent" : 54
}]

Retrieving Purge History

Required Role: Metadata Administrator (or Full Administrator)

To view the purge history, invoke the maintenance/history endpoint with the GET method with parameters from the following table:
http://fqdn-n.example.com:port/api/APIversion/maintenance/history?parameters

where fqdn-n.example.com is the host running the Navigator Metadata Server role instance listening for HTTP connections at the specified port number (7187 is the default port number). APIversion is the running version of the API as indicated in the footer of the API documentation (available from the Help menu in the Navigator console) or by calling http://fqdn-n.example.com:port/api/version.

History Parameters
Parameter Description
offset First purge history entry to retrieve.

Default: 0.

limit Number of history entries to retrieve from the offset.

Default: 100.

For example:
curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/history?offset=0&limit=100"
A result would look similar to:
[
  {
    "id": 1,
    "type": "PURGE",
    "startTime": "2016-03-09T18:57:43.196Z",
    "endTime": "2016-03-09T18:58:33.337Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 2,
    "type": "PURGE",
    "startTime": "2016-03-09T19:47:39.401Z",
    "endTime": "2016-03-09T19:47:40.841Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 3,
    "type": "PURGE",
    "startTime": "2016-03-10T01:27:39.632Z",
    "endTime": "2016-03-10T01:27:46.809Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 4,
    "type": "PURGE",
    "startTime": "2016-03-10T01:57:40.461Z",
    "endTime": "2016-03-10T01:57:41.174Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 5,
    "type": "PURGE",
    "startTime": "2016-03-10T23:17:41.884Z",
    "endTime": "2016-03-10T23:18:06.802Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  }
]

Stopping a Purge Operation

To stop a purge operation that is running too long and blocking user access to Navigator, restart the Navigator Metadata Server. To complete the operation, rerun the purge command at a later time.