Using the Purge APIs for Metadata Maintenance Tasks

Required Role: Cloudera Navigator Full Administrator

The volume of metadata maintained by Navigator Metadata Server can grow quickly and exceed the capacity of the Solr instance that processes the index and supports the search capability. For faster search and cleaner lineage tracing, use the purge feature to routinely delete unwanted metadata from the system.

Purging stale metadata is also recommended prior to upgrading an existing Cloudera Navigator instance. See Avoiding Out-of-Memory Errors During an Upgrade for details.

Continue reading:

Purging Metadata for HDFS Entities, Hive and Impala Select Queries, and YARN, Sqoop, and Pig Operations
Retrieving Purge Status
Retrieving Purge History
Stopping a Purge Operation

Purging Metadata for HDFS Entities, Hive and Impala Select Queries, and YARN, Sqoop, and Pig Operations

You can delete metadata for HDFS entities, Hive and Impala select queries, and YARN, Sqoop and Pig operations by using the purge method. Purge is a long-running task that requires exclusive access to the Solr instance and does not allow any concurrent activities, including extraction.

To purge metadata, do the following:

Back up the Navigator Metadata Server storage directory.

Invoke the maintenance/purge endpoint with the POST method with parameters from the following table:

http://fqdn-n.example.com:port/api/APIversion/maintenance/purge?parameters

where fqdn-n.example.com is the host running the Navigator Metadata Server role instance listening for HTTP connections at the specified port number (7187 is the default port number). APIversion is the running version of the API as indicated in the footer of the API documentation (available from the Help menu in the Navigator console) or by calling http://fqdn-n.example.com:port/api/version.

Purge Parameters
Metadata	Parameter	Description
HDFS	`deleteTimeThresholdMinutes`	After an HDFS entity is deleted, the amount of time that elapses before the entity can be purged. Default: 86400 minutes (60 days).
HDFS	`runtimeCapMinutes`	Amount of time allowed for the HDFS purge process to run. When specified time is reached, the state is saved and the purge process stops. Run purge again to remove any remaining items held in state. Default: 720 minutes (12 hours). Set to 0 to effectively disable purge for HDFS files and directories (none will be purged). Default: 720 minutes (12 hours).
Hive and Impala Select Queries; YARN, Sqoop, Pig Operations	`deleteSelectOperations`	Set to true to enable purge for Hive and Impala SELECT queries, and to enable YARN, Sqoop, and Pig operations older than the `staleQueryThresholdDays` value to be purged. Default: false By default, does not purge Hive and Impala SELECT queries, nor are YARN, Sqoop, and Pig operations purged.
Hive and Impala Select Queries; YARN, Sqoop, Pig Operations	`staleQueryThresholdDays`	Number of days at which Hive and Impala SELECT queries and YARN, Sqoop, and Pig operations will be identified as stale, effectively marking them 'ready for purge.' They will be purged automatically within hours of the value being reached. Default: 60 days. To disable marking entities as stale for the foreseeable future, set the value to very large number, such as 36500.

For example, the following call purges the metadata of all deleted HDFS entities (elapsed minutes value set to 0):

$ curl -X POST -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/purge?deleteTimeThresholdMinutes=0"

Purge tasks do not start until all currently running extraction tasks finish.

When all tasks have completed, click Continue to return to the Cloudera Navigator console.

Retrieving Purge Status

To view the status of the purge process, invoke the maintenance/running endpoint with the GET method:

http://fqdn-n.example.com:port/api/APIversion/maintenance/running

For example:

curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/running"

A result would look similar to:

[{
  "id" : 5,
  "type" : "PURGE",
  "startTime" : "2016-03-10T23:17:41.884Z",
  "endTime" : "1970-01-01T00:00:00.000Z",
  "status" : "IN_PROGRESS",
  "message" : "Purged 2661984 out of 4864714 directories. Averaging 1709 directories per minute.",
  "username" : "admin",
  "stage" : "HDFS_DIRECTORIES",
  "stagePercent" : 54
}]

Retrieving Purge History

To view the purge history, invoke the maintenance/history endpoint with the GET method with parameters from the following table:

http://fqdn-n.example.com:port/api/APIversion/maintenance/history?parameters

History Parameters
Parameter	Description
`offset`	First purge history entry to retrieve. Default: 0.
`limit`	Number of history entries to retrieve from the offset. Default: 100.

For example:

curl -X GET -u admin:admin "http://node1.example.com:7187/api/v13/maintenance/history?offset=0&limit=100"

A result would look similar to:

[
  {
    "id": 1,
    "type": "PURGE",
    "startTime": "2016-03-09T18:57:43.196Z",
    "endTime": "2016-03-09T18:58:33.337Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 2,
    "type": "PURGE",
    "startTime": "2016-03-09T19:47:39.401Z",
    "endTime": "2016-03-09T19:47:40.841Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 3,
    "type": "PURGE",
    "startTime": "2016-03-10T01:27:39.632Z",
    "endTime": "2016-03-10T01:27:46.809Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 4,
    "type": "PURGE",
    "startTime": "2016-03-10T01:57:40.461Z",
    "endTime": "2016-03-10T01:57:41.174Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  },
  {
    "id": 5,
    "type": "PURGE",
    "startTime": "2016-03-10T23:17:41.884Z",
    "endTime": "2016-03-10T23:18:06.802Z",
    "status": "SUCCESS",
    "username": "admin",
    "stagePercent": 0
  }
]

Stopping a Purge Operation

To stop a purge operation that is running too long and blocking user access to Navigator, restart the Navigator Metadata Server. To complete the operation, rerun the purge command at a later time.

Categories: APIs | Navigator | Navigator Metadata Server | All Categories

Applying Metadata to HDFS and Hive Entities using the API

Cloudera Navigator Reference