Accessing Cloud DataPDF version

Configure Access to GCS from Your Cluster

After obtaining the service account key, perform these steps on your cluster. The steps below assume that your service account key is called google-access-key.json. If you choose a different name, make sure to update the commands accordingly.

  1. In Cloudera Manager UI, set the following three properties under hdfs core-site.xml. Navigate to Clusters > HDFS > Configuration > Advanced > Add property under Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml.
    • Set the following properties: fs.gs.auth.service.account.email=<VALUE1> fs.gs.auth.service.account.private.key.id=<VALUE2> fs.gs.auth.service.account.private.key=<VALUE3> You can obtain the values for these parameters as follows:
      Parameter Value
      fs.gs.auth.service.account.email The client_email field extracted from the credential's JSON file.
      fs.gs.auth.service.account.private.key.id The private_key_id field extracted from the credential's JSON file.
      fs.gs.auth.service.account.private.key The private_keyfield extracted from the credential's JSON file.
      The JSON key file downloaded in the previous step is in plain text. Open the file in your favorite text editor to extract the relevant values needed in the above configuration.
    • Configure the following properties if they're not already set by Cloudera Manager fs.gs.working.dir=/ fs.gs.path.encoding=uri-path
  2. Save the configuration change and restart affected services. Additionally - depending on what services you are using - you must restart other services that access cloud storage such as Spark Thrift Server, HiveServer2, and Hive Metastore; These will not be listed as affected by Cloudera Manager, but require a restart to pick up the configuration changes.
  3. Test access to the Google Cloud Storage bucket by running a few commands from any cluster node. For example, you can use the command listed below (replace “mytestbucket” with the name of your bucket): hadoop fs -ls gs://mytestbucket/

After performing these steps, you should be able to start working with the Google Cloud Storage bucket(s).