Troubleshooting upgrade operations

This section provides some troubleshooting solutions for errors during upgrade operations.

Data Lake upgrade - out of memory error

Data Lake upgrade can fail due to not sufficient memory for Client Java Heap Size. The issue can be resolved by increasing the default Client Java Heap Size value in Cloudera Manager.

Condition

When upgrading the Data Lake from Cloudera Runtime 7.2.18 to one of its service pack versions or to 7.3.1, the upgrade fails with the following out of memory error during backup:
Upgrade not started, datalake backup failed. Failure message: Database: kinit: Client 'hdfs/xxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; kinit: Client 'hbase/xxxxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; kinit: Client 'solr/uxxxxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; moveFromLocal: Failed with java.io.IOException while processing file/directory :[/xxxxxxxx-xxx-xxxx-xxxx-xxxxxxxxf_database_backup/ranger_backup._COPYING_] in method:[java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory];

Cause

The default value of Client Java Heap Size in Bytes, 256 MB, is not the optimum value suitable for the upgrade operation.

Remedy

  1. Navigate to your environment in Cloudera Management Console, and click on the Data Lake tab on the environment details page.
  2. Open Cloudera Manager by clicking on the CM URL.
  3. Select Clusters > core_settings in Cloudera Manager.
  4. Click on the Configuration tab.
  5. Search for Client Java Heap Size in Bytes in the search bar.
  6. Increase the value from the default 256 MB to 1 GB (for example, 1073741824 bytes).
  7. Click Save changes.
  8. Restart the affected services as prompted by Cloudera Manager to apply the new configuration.
  9. After the affected services are started again, retry the upgrade operation. Monitor the upgrade process to confirm the issue is resolved and the Data Lake upgrade finishes without errors.

Data Lake upgrade - Kafka consumer not available yet

After the Data Lake upgrade, the Atlas Hook does not function as the Kafka service is not available yet.

Condition

After upgrading the Data Lake to 7.3.1 or one of its patch versions, sometimes Atlas Hook does not function when Apache Atlas and Apache Kafka are started at the same time, thus Atlas is unable to connect to Kafka while Kafka is still being set up. When this happens, the following exception can be seen:
Exception in getKafkaConsumer ,WakeupException: null
The Kafka consumer creation should be retried if the Kafka service is unavailable during Atlas startup.

Cause

Atlas performs only three attempts to restart, but during this time, the Kafka service might not be available yet. This causes the Atlas Hook to not function and none of the messages from the Kafka topics are consumed.

Remedy

  1. Wait until the Kafka service is available after the upgrade.
  2. Restart Atlas to trigger the reconnection to Kafka.