Configure Access to GCS from Your Cluster
After obtaining the service account key, perform these steps on your cluster. The steps below assume that your service account key is called google-access-key.json. If you chose a different name, make sure to update the commands accordingly.
Steps
In the Ambari web UI, set the following three properties under custom-core-site. To set these properties in the custom-core-site, navigate to HDFS > Configs > Custom core-site and click Add Property.
Set the following properties:
fs.gs.auth.service.account.email=<VALUE1> fs.gs.auth.service.account.private.key.id=<VALUE2> fs.gs.auth.service.account.private.key=<VALUE3>
You can obtain the values for these parameters as follows:
Parameter Value fs.gs.auth.service.account.email The client_email field extracted from the credential's JSON file. fs.gs.auth.service.account.private.key.id The private_key_id field extracted from the credential's JSON file. fs.gs.auth.service.account.private.key The private_key field extracted from the credential's JSON file. The JSON key file downloaded in the previous step is in plain text. Open the file in your favorite text editor to extract the relevant values needed in the above configuration.
Note For enhanced security, these values can also be configured using Hadoop CredentialProvider.
Configure the following properties if they're not already set by Ambari
fs.gs.working.dir=/ fs.gs.path.encoding=uri-path
Note Setting
fs.gs.working.dir
configures the initial working directory of a GHFS instance. This should always be set to "/".Setting
fs.gs.path.encoding
sets the path encoding to be used, and allows for spaces in the filename. This should always be set to "uri-path".
Save the configuration change and restart affected services. Additionally - depending on what services you are using - you must restart other services that access cloud storage such as Spark Thrift Server, HiveServer2, and Hive Metastore; These will not be listed as affected by Ambari, but require a restart to pick up the configuration changes.
Test access to the Google Cloud Storage bucket by running a few commands from any cluster node. For example, you can use the command listed below (replace “mytestbucket” with the name of your bucket):
hadoop fs -ls gs://mytestbucket/
After performing these steps, you should be able to start working with the Google Cloud Storage bucket(s).