Some tables in Hue retain data indefinitely resulting in slower performance or
application crash. Hue does not automatically clean up data from these tables. You can
configure Hue to retain the data for a specific number of days and then schedule a cron job
to clean up these tables at regular intervals for improved performance.
Consider cleaning up old data from the backend Hue
database if you face the following problems while using Hue:
Upgrade times out
Performance is slower than expected
Long time to log in to Hue
SQL query shows a large number of documents in tables
Hue crashes while trying to access saved documents
Back up your database before starting the cleanup activity.
Check the saved documents such as Queries and Workflows for a few users to prevent data
loss. You can also note the sizes of the tables you want to clean up as a reference by
running the following queries:
select count(*) from desktop_document;
select count(*) from desktop_document2;
select count(*) from beeswax_session;
select count(*) from beeswax_savedquery;
select count(*) from beeswax_queryhistory;
select count(*) from oozie_job;
SSH in to an active Hue instance.
Change to the Hue home directory:
cd /opt/cloudera/parcels/CDH/lib/hue
Run the following command as the root user:
DESKTOP_DEBUG=True ./build/env/bin/hue desktop_document_cleanup --keep-days x --cm-managed
The --keep-days property is used to specify the number of
days for which Hue will retain the data in the backend database.
The logs are displayed on the console because DESKTOP_DEBUG
is set to True. Alternatively, you can view the logs from the
following location:
/var/log/hue/desktop_document_cleanup.log
The first run can typically take around 1 minute per 1000 entries in each
table.
Check whether the table size has decreased by running a query as follows:
select count(*) from desktop_document;
If the desktop_document_cleanup command has run successfully,
the table size should decrease.
Set up a cron job that runs at regular intervals to
automate the database cleanup. For example, you can set up a cron job to run daily and
it purges data older than x number of days.