Restoring Hue

The backup procedure automatically saved the Hue database content and placed the content into the configured logs or data folders based on availability. Using the saved content, the restore process loads the data for the new Hue deployments.

During the manual or automatic Hue database restore operation it is critical to block any traffic to the running Hue services. If you cannot bring down the cluster, use the recommended workaround to disable end user access to the cluster endpoints. Failing to do so results in errors in addition to existing key constraints and other issues.

Automatic restoration

Automatic backup and restoration of Hue extracts the saved query and query history and loads them to the new cluster. In case the restore operation is called multiple times, the Hue database restore job will be run. This job will overwrite the current query history and saved queries with the contents of the backup.

Monitoring restoration execution

The restoration starts a job to load the database dump file, but does not wait for the job to complete. If you have a large database, the job can take up to an hour to complete. Ensure you allow enough time for the job to succeed.

To monitor execution, log into the cluster and monitor the job status under the database catalog namespace.

$ kubectl get jobs -n <database catalog id>

The output that shows the hue-restore job looks something like this:

$ kubectl get jobs -n warehouse-1692037411-96hk
NAME                                              COMPLETIONS   DURATION   AGE
hue-restore-ede2b8bd-1d53-4d23-a0f9-87d8ec658f74   1/1           11s        113s
hue-query-processor-db-create-job                 1/1           8s         42h

Manual restoration

You can manually restore the Hue database instance that you backed up.

In the manual backup of Hue, you followed steps to dump the entire Hue database. In the following procedure, you move the Hue backup file from the dump to the new CDW environment.

  1. (Optional) Edit the Hue statefulset Hue backup files that are larger than 3Gb and require a significant amount of memory to restore the database contents.
    Sufficient memory is not available for the Hue pod by default. To successfully load the data, you must increase the Hue pod memory limit.
  2. (Optional) Check file sizes and configure the memory accordingly.
    kubectl edit statefulset huebackend -n <new Virtual Warehouse ID>
  3. Set the hue container memory limit to 24Gb to provide leeway for the load command.
    name: hue 
    resources:
      limits:
        memory: 24G
      requests: 
        cpu: "1"
        memory: 8192M
  4. Copy the Hue backup data to the hue pod on the new Virtual Warehouse cluster.
    For example:
    $ kubectl cp /tmp/data.json compute-1677087760-cj7q/huebackend-0:/tmp/data.json -c hue
  5. Connect to Hue pod on new Hive/Impala Virtual Warehouse cluster.
    $ kubectl exec -it huebackend-0 -n <new Virtual Warehouse ID> -c hue – /bin/bash
  6. Load data to Hue onto the new Virtual Warehouse cluster.
    $ ./build/env/bin/hue loaddata --exclude auth.permission --exclude contenttypes --ignorenonexistent /tmp/data.json
  7. Go to the Hue UI and verify the saved queries and query history are showing up in the new CDW environment. Also, verify that Hue is working.
  8. If the Hue UI is not accessible, try the following workarounds.
    • Kill the Hue frontend pod, providing the Hue frontend pod ID, for example huefrontend-5bdc7bc7b8-8lpgj.
      $ kubectl get pods -n <new Virtual Warehouse ID> # the pod name
      
      $ kubectl delete pod <Hue frontend pod ID> -n <new Virtual Warehouse ID>
    • Increase virtual warehouse memory if the database size is large.
      ## edit hue backend container to max 16GB (note: there are two containers:
      ## busybox and hue)
      
      $ kubectl edit sts huebackend -n <new Virtual Warehouse ID>
  9. After the data load is finished, revert the container memory limit back to the original 8192M setting.