Accessing Data on Azure
Hortonworks Data Platform (HDP) supports reading and writing both block blobs and page blobs from/to Windows Azure Storage Blob (WASB) object store, as well as reading and writing files stored in an Azure Data Lake Storage (ADLS) account. This allows you to:
- Persist data using cloud storage services beyond the lifetime of your HDP clusters.
- Load data in Hadoop ecosystem applications directly from Azure storage services, without first importing or uploading data from external resources to HDFS.
- Use other applications (not necessarily in your Hadoop ecosystem) to manipulate the data stored in Azure storage services beyond the lifetime of your HDP clusters.
- Share data between multiple HDP clusters fast and easily by pointing to the same Azure data sets.
- Move or copy data between different Azure storage services or between Azure storage services and HDFS to facilitate different scenarios for big data analytics workloads.
- Back up unlimited archive data at any scale from HDP cluster to fully managed, durable, and highly available Azure storage services.
Accessing Data in ADLS
Azure Data Lake Store (ADLS) is an enterprise-wide hyper-scale repository for big data analytic workloads.
Prerequisites
If you want to use ADLS to store your data, you must enable Azure subscription for Data Lake Store, and then create an Azure Data Lake Store storage account.
Configuring Access to ADLS
ADLS is not supported as a default file system, but access to data in ADLS via the adl connector. To configure access to ADLS from a cluster managed via Cloudbreak use the steps described in How to Configure Authentication with ADLS.
Testing Access to ADLS
To tests access to ADLS, SSH to a cluster node and run a few hadoop fs shell commands against your existing ADLS account.
ADLS access path syntax is:
adl://account_name.azuredatalakestore.net/dir/file
For example, the following Hadoop FileSystem shell commands demonstrate access to a storage account named "myaccount":
hadoop fs -mkdir adl://myaccount.azuredatalakestore.net/testdir
hadoop fs -put testfile adl://myaccount.azuredatalakestore.net/testdir/testfile
To use DistCp against ADLS, use the following syntax:
hadoop distcp [-D hadoop.security.credential.provider.path=localjceks://file/home/user/adls.jceks] hdfs://namenode_hostname:9001/user/foo/007020615 adl://myaccount.azuredatalakestore.net/testDir/
Working with ADLS
For more information about configuring the ADLS connector and working with data stored in ADLS, refer to Cloud Data Access documentation.
Related Links
Cloud Data Access (Hortonworks)
How to Configure Authentication with ADLS (Hortonworks)
Azure Data Lake Store (External)
Create a Storage Account (External)
Get started with Azure Data Lake Store (External)
Accessing Data in WASB
Windows Azure Storage Blob (WASB) is an object store service available on Azure.
Prerequisites
If you want to use Windows Azure Storage Blob to store your data, you must enable Azure subscription for Blob Storage, and then create a storage account.
Configuring Access to WASB
In order to access data stored in your Azure blob storage account, you must configure your storage account access key in core-site.xml
. The configuration property that you must use is fs.azure.account.key.<account name>.blob.core.windows.net
and the value is the access key.
For example the following property should be used for a storage account called "testaccount":
<property>
<name>fs.azure.account.key.testaccount.blob.core.windows.net</name>
<value>TESTACCOUNT-ACCESS-KEY</value>
</property>
You can obtain your access key from the Access keys in your storage account settings.
Testing Access to WASB
To tests access to WASB, SSH to a cluster node and run a few hadoop fs shell commands against your existing WASB account.
WASB access path syntax is:
wasb://container_name@storage_account_name.blob.core.windows.net/dir/file
For example, to access a file called "testfile" located in a directory called "testdir", stored in the container called "testcontainer" on the account called "hortonworks", the URL is:
wasb://testcontainer@hortonworks.blob.core.windows.net/testdir/testfile
You can also use "wasbs" prefix to utilize SSL-encrypted HTTPS access:
wasbs://@ .blob.core.windows.net/dir/file
The following Hadoop FileSystem shell commands demonstrate access to a storage account named "myaccount" and a container named "mycontainer":
hadoop fs -ls wasb://mycontainer@myaccount.blob.core.windows.net/ hadoop fs -mkdir wasb://mycontainer@myaccount.blob.core.windows.net/testDir hadoop fs -put testFile wasb://mycontainer@myaccount.blob.core.windows.net/testDir/testFile hadoop fs -cat wasb://mycontainer@myaccount.blob.core.windows.net/testDir/testFile test file content
Working with WASB
For more information about configuring the WASB connector and working with data stored in WASB, refer to Cloud Data Access documentation.
Related Links
Cloud Data Access (Hortonworks)
Create a Storage Account (External)