Backing up Hue

Backing up Hue is an automated process that saves the Hue database content. The process places the content in configured logs or data folders based on availability. If, for any reason, you want to manually back up the database, you can choose to do so.

Automatic backup

Automatic backup and restoration for Hue extracts the saved query and query history and loads the information onto the new cluster.

Monitoring backup execution

The backup starts a job to create the database dump file, but it does not wait for the job to complete. If you have a large database, the job can take up to 20 minutes to complete. Ensure you allow enough time for the job to succeed. To monitor execution, you can log into the cluster and monitor the job status under the database catalog namespace using the following command:

kubectl get jobs -n <database catalog id>

The output that shows the status of the Hue backup job looks something like this:

$ kubectl get jobs -n warehouse-1692037411-96hk
NAME                                              COMPLETIONS   DURATION   AGE
hue-backup-ede2b8bd-1d53-4d23-a0f9-87d8ec658f74   1/1           11s        113s
hue-query-processor-db-create-job                 1/1           8s         42h

The job logs contain the upload path where the file have been placed.

Optional manual backup

If anything goes wrong with the automatic backup of Hue, or if you just prefer a manual process, you can back up Hue manually. You can choose to manually save and restore the Hue data to keep the Hue saved queries and query history for your Virtual Warehouses on the new cluster.

One Hue database is shared between all Virtual Warehouses, so you execute the following steps only once.
  1. Find Hue pods and namespaces.
    $ kubectl get pods --all-namespaces --field-selector metadata.name=huebackend-0
  2. Use SSH to access the Hue pod on the virtual warehouse cluster.
    $ kubectl exec -it huebackend-0 -n <virtual warehouse ID> -c hue – /bin/bash
    For example:
    $ kubectl exec -it huebackend-0 -n compute-1668714083-8ms4 -c hue – /bin/bash
  3. Back up Hue data to the /tmp directory on the Hue pod. In the container, your current directory should be /opt/hive.
    $ ./build/env/bin/hue dumpdata -o /tmp/data.json
  4. Copy the backup from the Hue pod to the local machine.
    $ kubectl cp <virtual warehouse ID>/huebackend-0:/tmp/data.json -c hue /tmp/data.json