Manually restoring Hue from a 6GB or larger backup

You can manually restore the Hue database instance that you backed up. You can use this procedure for manually restoring Hue when your Hue backup is 6GB or larger.

In the manual backup of Hue, you followed steps to dump the entire Hue database. In the following procedure, you move the Hue backup file from the dump to the new CDW environment.
  • Check that the size of your Hue backup is 6GB or larger.

    If your Hue backup is smaller than 6GB, go to “Manually restoring Hue from a smaller than 6GB backup”.

  • Do not open the Hue web interface prior to completing the steps below.
  • During the manual or automatic Hue database restore operation it is critical to block any traffic to the running Hue services. If you cannot bring down the cluster, use the recommended workaround to disable end user access to the cluster endpoints. Failing to do so results in errors in addition to existing key constraints and other issues.
  1. Connect to Hue pod on new Hive/Impala Virtual Warehouse cluster.
    $ kubectl exec -it huebackend-0 -n <new Virtual Warehouse ID> -c hue – /bin/bash
                    
  2. Clean the Hue database by running the flush command from the hue pod
    ./build/env/bin/hue flush
  3. Split the json into smaller chunks.
    HUE_BACKUP_ORIG_FILE=data.json # Change the the correct path
    HUE_BACKUP_CHUNKS_DIR=hue_backup_parts # Change if needed
    
    mkdir -p ${HUE_BACKUP_CHUNKS_DIR}
    rm -rf ${HUE_BACKUP_CHUNKS_DIR}/part* | true
    
    jq -cn --stream 'fromstream(1|truncate_stream(inputs))' 
    ${HUE_BACKUP_ORIG_FILE} | split -l 5000 -a 4 -d - 
    ${HUE_BACKUP_CHUNKS_DIR}/part
    find ${HUE_BACKUP_CHUNKS_DIR}/part* -maxdepth 1 -type f ! -name "*.*" -exec sh -c 'jq --slurp "." "${0}" | gzip > "${0}.json.gz"' {} \;
    
    ls -alh ${HUE_BACKUP_CHUNKS_DIR}
    
    tar cvzf ${HUE_BACKUP_ORIG_FILE}.tar.gz 
    ${HUE_BACKUP_CHUNKS_DIR}/part*.json.gz
    
    echo "Generated the chunked backup file"
    
    ls -alh ${HUE_BACKUP_ORIG_FILE}.tar.gz # This is our final output file
  4. Import the chunked JSON:
    Move the tarball of backup chunks to the cluster pod. Extract the tarball to a directory, for example /tmp/hue_backup_parts.
  5. Run hue loaddata command on the pod.
    /opt/hive/build/env/bin/hue loaddata --verbosity 3 --exclude auth.permission --exclude contenttypes --ignorenonexistent $(find /tmp/hue_backup_parts -type f -name '*.json.gz') # UPDATE THE PATH TO THE EXTRACTION DIRECTORY FROM THE PREVIOUS STEP