Known issues in Ozone parcel 718.2.2
You must be aware of the known issues and limitations, the areas of impact, and workaround in Ozone parcel.
Tez Configuration Changes
The following configuration changes have to be made to pick up the latest Ozone FS jar from the Ozone parcel (when installed):
- CDPD-48540
- For tez.cluster.additional.classpath.prefix, the value is /var/lib/hadoop-hdfs/* (Tez Additional Classpath)
- CDPD-47605
- For tez.cluster.additional.classpath.prefix the value is /var/lib/hadoop-hdfs/* (Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml)
Restart the TEZ and HIVE_ON_TEZ service as prompted by Cloudera Manager.
Update Yarn to use updated Ozone FS Jar
- CDPD-48500: Ozone parcel activation or install should handle redeployment of YARN jars and clean-up cache.
- Perform the following steps:
- Log in to Cloudera Manager UI
- Navigate to Clusters
- Select the YARN service
- Click Actions
- Click Install YARN Service Dependencies
- Click YARN MapReduce Framework JARs
- Restart the CDP 7.1.8 cluster
Other issues
- SSL Handshake fails between Ozone DataNodes if the two DataNodes have their certificate signed by different Ozone Storage Container Managers.
Ozone DataNode certificates are signed by the leader Storage Container Manager. Due to an issue in creating a TrustStore for DataNode to DataNode connections, the trust cannot be established between the two DataNodes if a different Storage Container Manager signs their certificate. These connections fail to establish and display an SSL Handshake Exception. This affects Pipeline creation and container replication (also EC container reconstruction). The symptoms vary, depending on the number of the nodes that have different singer certificates, either these DataNodes do not participate in any Ratis-3 Pipeline, or have Pipelines exclusively in between groups with the same signer. Over time this can lead to an imbalance in DataNode usage, and it might cause decommission of a DataNode stuck if the data has to be replicated to a node with a certificate that has a different signer.
This problem affects all the 7.1.8 Ozone Parcel releases.To identify if the problem is present on a cluster, the output of ozone admin cert list command must be examined. Ensure you define a sufficient number of certificates to be returned with the -c option to see all the certificates issued in the system.
If there are different Issuers for the latest DataNode certificates, this indicates the cluster is affected.
- CDPD-56006: On providing an incorrect hostname/service ID in the Ozone URI, the filesystem client instead of failing, retries till exhaustion and the default retry is too high.
- Configure ozone.client.failover.max.attempts to a lower the value to avoid long endless retries.
- CDPD-49137: Sometimes OM's kerberos token is not updated and it stops being able to communicate with SCM. When this occurs, writes will start to fail.
- Restarting OM or setting the safety valve hadoop.kerberos.keytab.login.autorenewal.enabled = true will fix the issue.
- CDPD-49808: Spark jobs against Ozone intermittently fail with
ERROR spark.SparkContext: [main]: Error initializing SparkContext.java.lang.IllegalStateException: No filter named
. - This is an intermittent failure which can be retried.
- CDPD-50678: Deleting containers which have one or more replicas which are not empty on the Datanode can cause the container to be stuck in a deleting state indefinitely. Containers in this state can also block decommission or maintenance operations completing.
- None.
- CDPD-52571/HDDS-8178: CertificateClient and KeyStoresFactory support multiple Sub-CA certificates in the trust chain
- None.
- CDPD-35141: Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source <bucket1> to destination <bucket2> (state=08S01,code=40000) java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source <bucket1> to destination <bucket2>.
- We may see the above issue if the source and target buckets are different in Hive queries. For now, copying across the same bucket is only supported.
- CDPD-60578: [EC] Re-replication failing due to NullPointerException.
- None.