Configure Access to GCS from Your Cluster
After obtaining the service account key, perform these steps on your cluster. The steps below assume that your service account key is called google-access-key.json. If you choose a different name, make sure to update the commands accordingly.
- In Cloudera Manager UI, set the following three properties under hdfs
core-site.xml. Navigate to Clusters > HDFS > Configuration > Advanced >
Add property under Cluster-wide Advanced Configuration Snippet
(Safety Valve) for core-site.xml.
- Set the following properties:
fs.gs.auth.service.account.email=<VALUE1> fs.gs.auth.service.account.private.key.id=<VALUE2> fs.gs.auth.service.account.private.key=<VALUE3>
You can obtain the values for these parameters as follows:
The JSON key file downloaded in the previous step is in plain text. Open the file in your favorite text editor to extract the relevant values needed in the above configuration.Parameter Value fs.gs.auth.service.account.email The client_email
field extracted from the credential's JSON file.fs.gs.auth.service.account.private.key.id The private_key_id
field extracted from the credential's JSON file.fs.gs.auth.service.account.private.key The private_key
field extracted from the credential's JSON file. -
Configure the following properties if they're not already set
by Cloudera Manager
fs.gs.working.dir=/ fs.gs.path.encoding=uri-path
- Set the following properties:
- Save the configuration change and restart affected services. Additionally - depending on what services you are using - you must restart other services that access cloud storage such as Spark Thrift Server, HiveServer2, and Hive Metastore; These will not be listed as affected by Cloudera Manager, but require a restart to pick up the configuration changes.
-
Test access to the Google Cloud Storage bucket by running a few commands from any
cluster node. For example, you can use the command listed below (replace
“mytestbucket” with the name of your bucket):
hadoop fs -ls gs://mytestbucket/
After performing these steps, you should be able to start working with the Google Cloud Storage bucket(s).