Ways to clean up old queries from the Query Processor tables

Learn how to schedule a query clean-up and how to use the API to manually clean up queries from the following Query Processor tables: vertex_info, dag_details, dag_info, query_details, hive_query, tez_app_info.

Scheduling query clean-up

Both Hive and Impala queries are retained in the backend database for 30 days by default, after which they are cleaned up. You can change the clean-up interval from the Database Catalog configurations. Go to Database Catalogs > > Edit > CONFIGURATIONS > > Hive query processor and select the hue-query-processor.json from the Configuration files drop-down menu.

Add the following line under the dasConf section and specify the time interval in seconds:
 "dasConf": {
    "hue.query-processor.event-pipeline.cleanup-interval-secs": "[***TIME-INTERVAL-IN-SECONDS***]",
    “hue.query-processor.event-pipeline.cleanup.cron.expression” : “[***CRON-EXPRESSION***]”
  },
For example:
“hue.query-processor.event-pipeline.cleanup.cron.expression” : “0 0 2 * * ?”,
“hue.query-processor.event-pipeline.cleanup-interval-secs” : “2592000”

Manually cleaning up queries using an API

The ability to clean up queries manually in addition to the scheduled clean-up routines is useful when you have a high load of queries in a particular week that are hogging resources that you must free up. The API also runs a VACUUM command on the Query Processor table to reclaim storage that is occupied by dead tuples.

You can send an API request using tools such as cURL or Postman.

API format: [***QUERY-PROCESSOR-ADDRESS***]/api/admin/cleanup/[***EPOCH-TIME***]

Where,
  • [***QUERY-PROCESSOR-ADDRESS***] is the query processor host address
  • [***EPOCH-TIME***] is the Unix epoch time in seconds

Queries that were run before the specified epoch time are purged.

For example:
curl "http://machine1.company.com:30700/api/admin/cleanup/1670006742"