Known Issues in HDFS
Learn about the known issues in HDFS, the impact or changes to the functionality, and the workaround.
- OPSAPS-60958: The dfs.access.time.precision and dfs.namenode.accesstime.precision parameters are available in Cloudera Manager > HDFS > Configuration.
- You must configure both the dfs.access.time.precision and dfs.namenode.accesstime.precision parameters with the same value as Cloudera Manager still sends both the parameters to HDFS service configuration.
- ENGESC-19334: After configuring multiple NameNodes, with heavy read and write workloads in the cluster, there are chances of performance impact in terms of slowness on the client side.
- This is because of the additional NN retry or probe introduced by the extra NN.
- OPSAPS-64307: In the case when on a cluster the JournalNodes were restarted recently, the "Add new NameNode" wizard for HDFS service might fail to bootstrap the new NameNode, if there wasn't a new fsImage created since the restart of the JournalNodes, but during restarting them the edit logs were rolled in the system.
- If the bootstrap fails during the "Add new NameNode" wizard, then do the following steps:
- Delete the newly added NameNode and FailoverController
- Move the active HDFS NameNode to safe mode
- Do a Save Namespace operation on the active HDFS NameNode
- Leave safe mode on the active HDFS NameNode
- Try to add the new NameNode again
- OPSAPS-64363: Deleting of additional Standby Namenode does not delete the ZKFC role and this has to be done manually.
- None
- OPSAPS-63558: Snapshot diff based HDFS replications do not provide correct file delete and rename counters through the API.
- The number of files deleted and renamed by DistCp for snapshot based replications can be checked in the logs provided by DistCp on the standard error output.
- CDPD-28459: After performing an upgrade rollback from CDP 7.1.7 to CDH6, you may see the following error when restarting the DataNodes: ERROR datanode.DataNode: Exception in secureMain java.io.IOException: The path component: '/var/run/hdfs-sockets' in '/var/run/hdfs-sockets/dn' has permissions 0755 uid 39998 and gid 1006. It is not protected because it is owned by a user who is not root and not the effective user: '0'.
- You must run the command described in the error message "chown root /var/run/hdfs-sockets". After this, the DataNode will restart successfully.
- CDPD-28390: Rolling restart of the HDFS JournalNodes may time out on Ubuntu20.
- If the restart operation times out, you can manually stop and restart the Name Node and Journal Node services one by one.
- OPSAPS-60832: When decommission of DN runs for a longer time and when decommission monitor's kerberos ticket expires, it is not auto-renewed. Decommission of DN is not completed in CM as decommission monitor fails to fetch the state of DN after kerberos ticket expiry.
- Decommission state of DN can be fetched using CLI command, i.e, hdfs dfsadmin -report.
- OPSAPS-55788: WebHDFS is always enabled. The Enable WebHDFS checkbox does not take effect.
- None.
- OPSAPS-63299: Disable HA command for a nameservice does not work if the nameservice has more than 2 NNs defined.
- None
- OPSAPS-63301: Deleting nameservice command does not delete all the NNs belonging to the nameservice, if there are more than two NNs that are assigned to the nameservice.
- None
- CDPD-50044: Data node tab loading issue in the name node UI
- When clicking on the data node tab, the message
NameNode is still loading. Redirecting to the Startup Progress page
appears. - Unsupported Features
-
The following HDFS features are currently not supported in Cloudera Data Platform:
- ACLs for the NFS gateway (HADOOP-11004)
- Aliyun Cloud Connector (HADOOP-12756)
- Allow HDFS block replicas to be provided by an external storage system (HDFS-9806)
- Consistent standby Serving reads (HDFS-12943)
- Cost-Based RPC FairCallQueue (HDFS-14403)
- HDFS Router Based Federation (HDFS-10467)
- NameNode Federation (HDFS-1052)
- NameNode Port-based Selective Encryption (HDFS-13541)
- Non-Volatile Storage Class Memory (SCM) in HDFS Cache Directives (HDFS-13762)
- OpenStack Swift (HADOOP-8545)
- SFTP FileSystem (HADOOP-5732)
- Storage policy satisfier (HDFS-10285)
Technical Service Bulletins
- TSB 2023-666: Out of order HDFS snapshot deletion may delete renamed/moved files, which may result in data loss
- Cloudera has discovered a bug in the Apache Hadoop Distributed File
System (HDFS) snapshot implementation. Deleting an HDFS snapshot may
incorrectly remove files in the
.Trash
directories or remove renamed files from the current file system state. This is an unexpected behavior because deleting an HDFS snapshot should only delete the files stored in the specified snapshot, but not data in the current state. - Components affected
- HDFS
- Products affected
-
- Cloudera Data Platform (CDP)
- Cloudera Distribution including Apache Hadoop (CDH)
- Hortonworks Data Platform (HDP)
- Releases affected
-
- CDP Private Cloud Base 7.1.7 SP2 CHF6 and earlier; 7.1.8 CHF7 and earlier
- All versions of CDP Public Cloud
- All versions of CDH
- All versions of HDP
- Users affected
- Cloudera customers using the HDFS snapshot feature.
- Impact
- When the files are removed incorrectly by deleting a snapshot, the
standby namenode checkpoint (or the namenode checkpoint for non-High
Availability clusters) fails with missing INode file and the namenode
shuts down with a “Missing INode” error message in the logs as shown in
the example below. This can result in data loss as the current file data
stored in HDFS can be removed incorrectly when deleting a HDFS
snapshot.
2023-04-14 10:04:11,175 [FSImageSaver for /grid/1/dfs/namenode/current of type IMAGE_AND_EDITS] ERROR namenode.FSImage (FSImageFormatPBINode.java:serializeINodeDirectorySection(765)) - FSImageFormatPBINode#serializeINodeDirectorySection: Dangling child pointer found. Missing INode in inodeMap: id=154614; path=/user/foo/.Trash/Current/file; parent=/user/foo/.Trash/Current
- Severity
- High
- Action required
-
- Risk Avoidance:
- When deleting multiple snapshots, delete them in order: from the earliest to the latest. This will reduce the risk of data loss as it is a proven experience that deleting the earliest snapshot in the file system will not cause data loss.
- To determine the snapshot creation order, use the
hdfs lsSnapshot <snapshotDir>
command, and then sort the output by the snapshot ID. - If snapshot A is created earlier than snapshot B, the
snapshot ID of A is smaller than the snapshot ID of B. The
following is the output format of
lsSnapshot
:<permission> <replication> <owner> <group> <length> <modification_time> <snapshot_id> <deletion_status> <path>
- Upgrade (Highly Recommended)
- CDH / HDP / CDP Private Cloud Base customers should upgrade to the CDP Private Cloud Base7.1.7 Service Pack (SP) 2 Cumulative Hotfix (CHF) 10 or CDP Private Cloud Base 7.1.8 Cumulative Hotfix (CHF) 10 or later.
- Upcoming CDP Public Cloud Service Pack releases will include the fixes
- Hotfixes (if any)
CDH or HDP customers should upgrade to one of the fixed CDP releases mentioned above or contact support to request a hotfix for HDFS-16972, HDFS-16975, and HDFS-17045.
- Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2023-666: Out of order HDFS snapshot deletion may delete renamed/moved files, which may result in data loss
- Risk Avoidance: