The DAS Postgres database stores all the queries that you run from the DAS UI or
beeline, and all the data that is used to generate the DAG information and reports. Over a
period of time, this can grow in size. To optimize the available capacity, DAS has a cleanup
mechanism that, by default, purges all the queries and DAG information older than 30 days
and purges old reports after 365 days. However, you can customize the cleanup frequency by
adding the cleanup.query-info.interval
, and
cleanup.report-info.interval
configurations, and the Cron expression:
cleanup.cron.expression
in the
das-event-processor.json
file from Ambari.
-
From the Ambari UI, go to .
-
To customize the cleanup intervals, under Data Analytics Studio
Event Processor config file template, add the three new
configurations in the
event-processing
section as shown in the
following example:
"event-processing": {
"hive.hook.proto.base-directory": "{{data_analytics_studio_event_processor_hive_base_dir}}",
"tez.history.logging.proto-base-dir": "{{data_analytics_studio_event_processor_tez_base_dir}}",
"meta.info.sync.service.delay.millis": 5000,
"actor.initialization.delay.millis": 20000,
"close.folder.delay.millis": 600000,
"reread.event.max.retries": -1,
"reporting.scheduler.initial.delay.millis": 30000,
"reporting.scheduler.interval.delay.millis": 300000,
"reporting.scheduler.weekly.initial.delay.millis": 60000,
"reporting.scheduler.weekly.interval.delay.millis": 600000,
"reporting.scheduler.monthly.initial.delay.millis": 90000,
"reporting.scheduler.monthly.interval.delay.millis": 900000,
"reporting.scheduler.quarterly.initial.delay.millis": 120000,
"reporting.scheduler.quarterly.interval.delay.millis": 1200000,
“cleanup.query-info.interval”: 2592000,
“cleanup.report-info.interval”: 31536000,
“cleanup.cron.expression: "0 0 2 * * ?"
},
In this example, the query data will be cleaned up after 2592000 seconds
(which is equal to 30 days), the report data will be cleaned up after 31536000
seconds (which is equal to 365 days), and the clean up jobs will be triggered to
run at 02:00:00 hours (or 2 AM), as per the server timezone.
-
Click Save.
-
Restart all the required services.