Working with the ABFS Connector Cloudera Data Platform allows you to configure access from a cluster to Azure in order to access cloud data.Introduction to Azure Storage and the ABFS ConnectorThe Hadoop-Azure module provides support for Azure Data Lake Storage Gen2 storage layer through the abfs connector. Setting up and configuring the ABFS connectorYou must have an Azure storage account with hierarchical namespace enabled and Azure container prior to configuring the ABFS connector. Configuring the ABFS ConnectorYou can configure access credentials to authorise access to Azure containers in multiple ways including IDBroker, Shared Key, Managed Instance, and Shared Access Signature.Manifest committer for ABFS and GCSThe Intermediate Manifest committer is a high performance committer for Spark work that provides performance on ABFS for real world queries, and performance and correctness on GCS. It also works with other filesystems, including HDFS. However, the design is optimized for object stores where listing operatons are slow and expensive. Performance and ScalabilityLike all Azure storage services, the Azure Datalake Gen 2 store offers a fully consistent view of the store, with a complete Create, Read, Update, and Delete operation consistency for data and metadata. Using ABFS using CLIAfter you configure access in the core-site.xml file, you can access your cluster using the CLI. You can run Hadoop file system commands, DistCP commands, create Hive tables, and so on using the CLI.Copying data with Hadoop DistCpDistCp (distributed copy) is a tool used to copy files in large inter-cluster and intra-cluster environments. It uses MapReduce to affect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which copy a partition of the files specified in the source list.DistCp and Proxy SettingsWhen using DistCp to back up data from an on-site Hadoop cluster, proxy settings may need to be set so as to reach the cloud store. For most of the stores, these proxy settings are hadoop configuration options which must be set in core-site.xml, or as options to the DistCp command.ADLS Trash Folder BehaviorIf the fs.trash.interval property is set to a value other than zero on your cluster and you do not specify the -skipTrash flag with your rm command when you remove files, the deleted files are moved to the trash folder in your ADLS account. The trash folder in your ADLS account is located at adl://your_account.azuredatalakestore.net/user/user_name/.Trash/current/.Troubleshooting ABFS