Ways to clean up old queries from the Query Processor tables

Learn how to schedule a query clean-up and how to use the API to manually clean up queries from the following Query Processor tables: vertex_info, dag_details, dag_info, query_details, hive_query, tez_app_info.

Scheduling query clean-up

By default, the Hue Query Processor automatically clears queries from the Query Processor tables that are 30 days old at 2 AM every day as per the timezone set on your server. You can configure the following parameters in Cloudera Manager > Query Processor service > Configurations > Query Processor Extra Configurations to set up a schedule based on your requirements:
  • hue.query-processor.event-pipeline.cleanup.cron.expression
  • hue.query-processor.event-pipeline.cleanup-interval-secs
For example:
“hue.query-processor.event-pipeline.cleanup.cron.expression” : “0 0 2 * * ?”,
“hue.query-processor.event-pipeline.cleanup-interval-secs” : “2592000”

Manually cleaning up queries using an API

The ability to clean up queries manually in addition to the scheduled clean-up routines is useful when you have a high load of queries in a particular week that are hogging resources that you must free up. The API also runs a VACUUM command on the Query Processor table to reclaim storage that is occupied by dead tuples.

You can send an API request using tools such as cURL or Postman.

API format: [***QUERY-PROCESSOR-ADDRESS***]/api/admin/cleanup/[***EPOCH-TIME***]

Where,
  • [***QUERY-PROCESSOR-ADDRESS***] is the query processor host address
  • [***EPOCH-TIME***] is the Unix epoch time in seconds

Queries that were run before the specified epoch time are purged.

For example:
curl "http://machine1.company.com:30700/api/admin/cleanup/1670006742"