Fixed issues in Cloud Connectors

Review the list of Cloud Connectors issues that are resolved in Cloudera Runtime 7.1.9.

CDPD-48449: distcp -update skips files of same size, name when transferring from HDFS to S3
The Distcp -update option, may encounter potential inaccuracies by skipping the copy when doing incremental update of files with identical names and sizes during the transfer process from HDFS to S3 or ABFS. This occurs due to the absence of checksum verification between the files for different stores. In order to address this concern, we employ the modification time as a means to minimize the occurrence of incorrect skips. If the source file has been modified more recently than its corresponding destination file, we proceed with the copy operation; otherwise, the file is skipped.
CDPD-43464: HADOOP-18344 AWS SDK update to 1.12.262 due to CVE-2022-31159
The aws-java-sdk library was updated to 1.12.262+ due to CVE-2022-31159. Note that the S3A connector has never been vulnerable to the CVE, as it does not use the SDK's TransferManager for downloading files.
CDPD-45959: Some tests fail with ssl3_get_server_certificate:certificate verify failed
"fs.azure.ssl.channel.mode" has been set to "Default_JSSE". Switch to "Default" if the version of OpenSSL installed in your OS can successfully negotiate SSL connections with Azure to achieve possibly improved performance.
CDPD-12425: support S3 client side encryption
The S3A connector now supports S3-CSE client side encryption. See the documentation for the specific details on how to enable this.
CDPD-29477: HADOOP-17618. ABFS: Partially obfuscate SAS object IDs in Logs
ABFS: Partially obfuscate SAS object IDs in Logs
CDPD-35030: HADOOP-18112. Rename operation fails during multi object delete of size more than 1000.
Fix multi object delete in S3A when number of objects is more than 1000
CDPD-46175: HADOOP-18521. ABFS prefetching input stream corruption
ABFS prefetching input stream corruption
CDPD-46543: HADOOP-18526. Leak of S3AInstrumentation instances using Hadoop Metrics references
Leak of S3AInstrumentation instances using Hadoop Metrics references
CDPD-56830: HADOOP-18233. Initialization race condition with TemporaryAWSCredentialsProvider
Initialization race condition with TemporaryAWSCredentialsProvider
CDPD-35182: HADOOP-17198. Support S3 Access Points.
The S3A connector supports S3 AccessPoints. The access point for a bucket must be set for that specific bucket; for a bucket "NAME" the option would be "fs.s3a.bucket.NAME.accesspoint.arn". If the option "fs.s3a.accesspoint.required" is set to true, then all buckets must be configured with AccessPoint ARNs.

Apache patch information

  • HADOOP-18596
  • HADOOP-13887
  • HADOOP-17618
  • HADOOP-18521
  • HADOOP-18233
  • HADOOP-18526