Cloud Data Access
Also available as:
PDF
loading table of contents...

Chapter 4. Getting Started with ADLS

Azure Data Lake Store (ADLS) is a file system designed for use as a hyper-scale repository for big data analytic workloads.

The features of ADLS include:

  • Hierarchical filesystem containing folders, which in turn contain data stored as files.

  • Provides unlimited storage without imposing any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake.

  • Compatible with Hadoop Distributed File System (HDFS).

  • Can be accessed by Hadoop application via the WebHDFS-compatible REST APIs or the ADLS connector.

  • Uses Azure Active Directory (AAD) for identity and access management.

For more general information on ADLS, refer to Get Started with Azure Data Lake Store Using the Azure Portal in Azure documentation.

Overview of Configuring and Using ADLS with HDP

The following table provides an overview of tasks related to configuring and using HDP with ADLS. Click on the linked topics to get more information about specific tasks.

TaskDescription
Meet the prerequisites

To use ADLS storage, you must have:

  1. An Azure subscription for Data Lake Store.

  2. An ADLS account. For instructions on how to create one, refer to Microsoft Azure documentation.

Configure authentication

In order for Hadoop applications to access data stored in your ADLS account, you must configure authentication with the ADLS account using either a client credential (analogous to a service principal) or a refresh token (associated with a user).

We recommend that you use the simpler client credential method.

Configure optional features:

You can optionally configure how user and group information is represented during getFileStatus(), listStatus(), and getAclStatus() calls.

Work with ADLS data:

Once you've configured authentication with your data lake, you can access ADLS data from Hive (via external tables) and Spark, and perform other related tasks such as copying data between HDFS and ADLS when needed.