Configure Access to GCS from Your Cluster
After obtaining the service account key, perform these steps on your cluster. The steps below assume that your service account key is called google-access-key.json. If you chose a different name, make sure to update the commands accordingly.
Steps
Place the service account key on all nodes of the clusters.
Note the following about the location where to place file:
Make sure to use an absolute path such as
/etc/hadoop/conf/google-access-key.json
(wheregoogle-access-key.json
is your JSON key).The path must be the same on all nodes.
In a single-user cluster,
/etc/hadoop/conf/google-access-key.json
is appropriate. Permissions for the file should be set to 444.If you need to use this option with a multi-user cluster, you should place this in the user's home directory:
${USER_HOME}/.credentials/storage.json
. Permissions for the file should be set to 400.
There are many ways to place the file on the hosts. For example you can create a `hosts` file listing all the hosts, one per line, and then run the following:
for host in `cat hosts`; do scp -i <Path_to_ssh_private_key> google-access-key.json <Ssh_user>@$host:/etc/hadoop/conf/google-access-key.json; done
In the Ambari web UI, set the following two properties under custom-core-site.
To set these properties in the custom-core-site, navigate to HDFS > Configs > Custom core-site and click Add Property. The JSON and the p12 properties cannot be set at the same time.
If using a key in the JSON format (recommended), set the following properties:
fs.gs.auth.service.account.json.keyfile=<Path-to-the-JSON-file> fs.gs.working.dir=/ fs.gs.path.encoding=uri-path fs.gs.reported.permissions=777
If using a key in the P12 format, set the following properties:
fs.gs.auth.service.account.email=<Your-Service-Account-email> fs.gs.auth.service.account.keyfile=<Path-to-the-p12-file> fs.gs.working.dir=/ fs.gs.path.encoding=uri-path fs.gs.reported.permissions=777
Note Setting
fs.gs.working.dir
configures the initial working directory of a GHFS instance. This should always be set to "/".Setting
fs.gs.path.encoding
sets the path encoding to be used, and allows for spaces in the filename. This should always be set to "uri-path".Setting
fs.gs.reported.permissions
sets permissions for file listings when using gs. The default 700 may end up being too restrictive for some processes performing file-based checks.
Save the configuration change and restart affected services. Additionally - depending on what services you are using - you must restart other services that access cloud storage such as Spark Thrift Server, HiveServer2, and Hive Metastore; These will not be listed as affected by Ambari, but require a restart to pick up the configuration changes.
Test access to the Google Cloud Storage bucket by running a few commands from any cluster node. For example, you can use the command listed below (replace “mytestbucket” with the name of your bucket):
hadoop fs -ls gs://mytestbucket/
After performing these steps, you should be able to start working with the Google Cloud Storage bucket(s).