Using Microsoft Azure Data Lake Store (Gen1 and Gen2) with Apache Hive in CDH

Microsoft Azure Data Lake Store (ADLS) is a massively scalable distributed file system that can be accessed through an HDFS-compatible API. ADLS acts as a persistent storage layer for CDH clusters running on Azure. In contrast to Amazon S3, ADLS more closely resembles native HDFS behavior, providing consistency, file directory structure, and POSIX-compliant ACLs.

There are two generations of ADLS; Gen1 and Gen2. For conceptual details, see:

CDH 5.11 and higher supports using ADLS Gen1 as a storage layer for MapReduce2 (MRv2 or YARN), Hive, Hive-on-Spark, Spark 2.1, and Spark 1.6. Comparable HBase support was added in CDH 5.12.

CDH 6.1 and higher supports using ADLS Gen2 as a storage layer for MapReduce2 (MRv2 or YARN), Hive, Hive-on-Spark, Spark 2.4.