Configuring OAuth with the Hadoop CredentialProvider

A more secure way to store your OAuth credentials is with the Hadoop CredentialProvider. When you submit a job, reference the CredentialProvider, which then supplies the OAuth information. Unlike the core-site.xml, the credentials are not stored in plain text.

The following steps describe how to create a credential provider and how to reference it when submitting jobs:

  1. Create a password for the Hadoop Credential Provider and export it to the environment:
    export HADOOP_CREDSTORE_PASSWORD=password
  2. Provision the credentials by running the following commands:
    
    hadoop credential create dfs.adls.oauth2.client.id -provider jceks://hdfs/user/USER_NAME/adls2keyfile.jceks -value client ID
    hadoop credential create dfs.adls.oauth2.credential -provider jceks://hdfs/user/USER_NAME/adls2keyfile.jceks -value client secret
    hadoop credential create dfs.adls.oauth2.refresh.url -provider jceks://hdfs/user/USER_NAME/adls2keyfile.jceks -value refresh URL

    You can omit the -value option and its value and the command will prompt the user to enter the value.

    For more details on the hadoop credential command, see Credential Management (Apache Software Foundation).

  3. Export the password to the environment:
    export HADOOP_CREDSTORE_PASSWORD=password
  4. Reference the credential provider on the command line when you submit a job:
    hadoop <command>
         -Ddfs.adls.oauth2.access.token.provider.type=ClientCredential \
         -Dhadoop.security.credential.provider.path=jceks://hdfs/user/USER_NAME/adls-cred.jceks \
         abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>