The DAS Postgres database stores all the queries that you run from the DAS UI or
beeline, and all the data that is used to generate the DAG information and reports. Over a
period of time, this can grow in size. To optimize the available capacity, DAS has a cleanup
mechanism that, by default, purges all the queries and DAG information older than 30 days
and purges old reports after 365 days. However, you can customize the cleanup frequency from
Cloudera Manager.
To customize the cleanup intervals:
-
Sign in to Cloudera Manager as an Administrator.
-
Go to and search eventprocessor_extra.properties.
The Data Analytics Studio Eventprocessor Advanced Configuration
Snippet (Safety Valve) for
conf/props/eventprocessor_extra.properties field is displayed
and is used to set or override properties for DAS in the form of key value
pairs.
-
Add the
cleanup.query-info.interval
,
cleanup.report-info.interval
, and
cleanup.cron.expression
configurations in the
event-processing
section as shown in the following
example:
event-processing: {
hive.hook.proto.base-directory= "{{data_analytics_studio_event_processor_hive_base_dir}}",
tez.history.logging.proto-base-dir= "{{data_analytics_studio_event_processor_tez_base_dir}}",
meta.info.sync.service.delay.millis= 5000,
actor.initialization.delay.millis= 20000,
close.folder.delay.millis= 600000,
reread.event.max.retries= -1,
reporting.scheduler.initial.delay.millis= 30000,
reporting.scheduler.interval.delay.millis= 300000,
reporting.scheduler.weekly.initial.delay.millis= 60000,
reporting.scheduler.weekly.interval.delay.millis= 600000,
reporting.scheduler.monthly.initial.delay.millis= 90000,
reporting.scheduler.monthly.interval.delay.millis= 900000,
reporting.scheduler.quarterly.initial.delay.millis= 120000,
reporting.scheduler.quarterly.interval.delay.millis= 1200000,
cleanup.query-info.interval= 2592000,
cleanup.report-info.interval= 31536000,
cleanup.cron.expression= "0 0 2 * * ?"
},
In this example, the query data will be cleaned up after 2592000 seconds
(which is equal to 30 days), the report data will be cleaned up after 31536000
seconds (which is equal to 365 days), and the cleanup jobs will be triggered to
run at 02:00:00 hours (or 2 AM), as per the server timezone.
-
Click Save Changes.
-
Restart the DAS service.