Cleaning up old queries, DAG information, and reports data

The DAS Postgres database stores all the queries that you run from the DAS UI or beeline, and all the data that is used to generate the DAG information and reports. Over a period of time, this can grow in size. To optimize the available capacity, DAS has a cleanup mechanism that, by default, purges all the queries and DAG information older than 30 days and purges old reports after 365 days. However, you can customize the cleanup frequency from Cloudera Manager.

To customize the cleanup intervals:
  1. Sign in to Cloudera Manager as an Administrator.
  2. Go to Clusters > Data Analytics Studio service > Configuration and search eventprocessor_extra.properties.
    The Data Analytics Studio Eventprocessor Advanced Configuration Snippet (Safety Valve) for conf/props/eventprocessor_extra.properties field is displayed and is used to set or override properties for DAS in the form of key value pairs.
  3. Add the cleanup.query-info.interval, cleanup.report-info.interval, and cleanup.cron.expression configurations in the event-processing section as shown in the following example:
    event-processing= {
            hive.hook.proto.base-directory= "{{data_analytics_studio_event_processor_hive_base_dir}}",
            tez.history.logging.proto-base-dir= "{{data_analytics_studio_event_processor_tez_base_dir}}",
            meta.info.sync.service.delay.millis= 5000,
            actor.initialization.delay.millis= 20000,
            close.folder.delay.millis= 600000,
            reread.event.max.retries= -1,
            reporting.scheduler.initial.delay.millis= 30000,
            reporting.scheduler.interval.delay.millis= 300000,
            reporting.scheduler.weekly.initial.delay.millis= 60000,
            reporting.scheduler.weekly.interval.delay.millis= 600000,
            reporting.scheduler.monthly.initial.delay.millis= 90000,
            reporting.scheduler.monthly.interval.delay.millis= 900000,
            reporting.scheduler.quarterly.initial.delay.millis= 120000,
            reporting.scheduler.quarterly.interval.delay.millis= 1200000,
            cleanup.query-info.interval= 2592000,
            cleanup.report-info.interval= 31536000,
            cleanup.cron.expression= "0 0 2 * * ?"
      },
    In this example, the query data will be cleaned up after 2592000 seconds (which is equal to 30 days), the report data will be cleaned up after 31536000 seconds (which is equal to 365 days), and the cleanup jobs will be triggered to run at 02:00:00 hours (or 2 AM), as per the server timezone.
  4. Click Save Changes.
  5. Restart the DAS service.