Known Issues in HDFS
Learn about the known issues in HDFS, the impact or changes to the functionality, and the workaround.
- CDPD-28459: After performing an upgrade rollback from CDP 7.1.7 to CDH6, you may see the following error when restarting the DataNodes: ERROR datanode.DataNode: Exception in secureMain java.io.IOException: The path component: '/var/run/hdfs-sockets' in '/var/run/hdfs-sockets/dn' has permissions 0755 uid 39998 and gid 1006. It is not protected because it is owned by a user who is not root and not the effective user: '0'.
- You must run the command described in the error message "chown root /var/run/hdfs-sockets". After this, the DataNode will restart successfully.
- CDPD-28390: Rolling restart of the HDFS JournalNodes may time out on Ubuntu20.
- If the restart operation times out, you can manually stop and restart the Name Node and Journal Node services one by one.
- OPSAPS-60832: When decommission of DN runs for a longer time and when decommission monitor's kerberos ticket expires, it is not auto-renewed. Decommission of DN is not completed in CM as decommission monitor fails to fetch the state of DN after kerberos ticket expiry.
- Decommission state of DN can be fetched using CLI command, i.e, hdfs dfsadmin -report.
- Unsupported Features
-
The following HDFS features are currently not supported in Cloudera Data Platform:
- ACLs for the NFS gateway (HADOOP-11004)
- Aliyun Cloud Connector (HADOOP-12756)
- Allow HDFS block replicas to be provided by an external storage system (HDFS-9806)
- Consistent standby Serving reads (HDFS-12943)
- Cost-Based RPC FairCallQueue (HDFS-14403)
- HDFS Router Based Federation (HDFS-10467)
- More than two NameNodes (HDFS-6440)
- NameNode Federation (HDFS-1052)
- NameNode Port-based Selective Encryption (HDFS-13541)
- Non-Volatile Storage Class Memory (SCM) in HDFS Cache Directives (HDFS-13762)
- OpenStack Swift (HADOOP-8545)
- SFTP FileSystem (HADOOP-5732)
- Storage policy satisfier (HDFS-10285)
- TSB 2022-604: GetContentSummary call performance issues with Apache Ranger HDFS plugin
- With Apache Ranger enabled on the NameNode, getContentSummary calls in the Apache Hadoop Distributed File System (HDFS) lock for multiple seconds and can cause NameNode failover.
- Impact:
- GetContentSummary is well known to be a resource intensive
call and misuse of this API can have a significant impact on NameNode
performance and by extension, the performance of the entire cluster. Not having
the above configurations enabled can worsen this impact further.
- The NameNode global lock can be retained for a longer period (multiple seconds) depending on the number of files delaying responses to applications and impact the processing of other inbound requests.
- Action required
- Customers running CDP 7.1.7.x should apply the below configuration values using Cloudera Manager safety valves.
-
- Go to Cloudera Manager -> HDFS -> Configuration -> "HDFS Service
Advanced Configuration Snippet (Safety Valve) for
ranger-hdfs-security.xml". Configure the following in
ranger-hdfs-security.xml:
<property> <name>ranger.optimize-subaccess-authorization</name> <value>true</value> </property>
- Go to Cloudera Manager -> HDFS -> Configuration -> "HDFS Service
Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml".
Configure the following in hdfs-site.xml:
<property> <name>dfs.permissions.ContentSummary.subAccess</name> <value>true</value> </property>
- Restart HDFS Service.
- Go to Cloudera Manager -> HDFS -> Configuration -> "HDFS Service
Advanced Configuration Snippet (Safety Valve) for
ranger-hdfs-security.xml". Configure the following in
ranger-hdfs-security.xml:
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2022-604: GetContentSummary call performance issues with Apache Ranger HDFS plugin