Troubleshooting upgrade operations
This section provides some troubleshooting solutions for errors during upgrade operations.
Data Lake upgrade - out of memory error
Data Lake upgrade can fail due to not sufficient memory for Client Java Heap Size. The issue can be resolved by increasing the default Client Java Heap Size value in Cloudera Manager.
Condition
Upgrade not started, datalake backup failed. Failure message: Database: kinit: Client 'hdfs/xxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; kinit: Client 'hbase/xxxxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; kinit: Client 'solr/uxxxxxxxxxxxxx.CLOUDERA.SITE' not found in Kerberos database while getting initial credentials; moveFromLocal: Failed with java.io.IOException while processing file/directory :[/xxxxxxxx-xxx-xxxx-xxxx-xxxxxxxxf_database_backup/ranger_backup._COPYING_] in method:[java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Direct buffer memory];Cause
The default value of Client Java Heap Size in Bytes, 256 MB, is not the optimum value suitable for the upgrade operation.
Remedy
- Navigate to your environment in Cloudera Management Console, and click on the Data Lake tab on the environment details page.
- Open Cloudera Manager by clicking on the CM URL.
- Select in Cloudera Manager.
- Click on the Configuration tab.
-
Search for
Client Java Heap Size in Bytesin the search bar. - Increase the value from the default 256 MB to 1 GB (for example, 1073741824 bytes).
- Click Save changes.
- Restart the affected services as prompted by Cloudera Manager to apply the new configuration.
- After the affected services are started again, retry the upgrade operation. Monitor the upgrade process to confirm the issue is resolved and the Data Lake upgrade finishes without errors.
Data Lake upgrade - Kafka consumer not available yet
After the Data Lake upgrade, the Atlas Hook does not function as the Kafka service is not available yet.
Condition
Exception in getKafkaConsumer ,WakeupException: nullThe
Kafka consumer creation should be retried if the Kafka service is unavailable during
Atlas startup.Cause
Atlas performs only three attempts to restart, but during this time, the Kafka service might not be available yet. This causes the Atlas Hook to not function and none of the messages from the Kafka topics are consumed.
Remedy
- Wait until the Kafka service is available after the upgrade.
- Restart Atlas to trigger the reconnection to Kafka.
