Accessing data with Spark

When you are using Cloudera Data Warehouse, you can use Java Database Connectivity (JDBC).

JDBC is useful in the following cases:

Use JDBC connections when you have fine-grained access.
Use JDBC if the scale of data sent over the wire is on the order of tens of thousands of rows of data.

Add the Python code as described below, in the session where you want to utilize the data, and update the code with the data location information.

Permissions🔗

In addition, check with the Administrator that you have the correct permissions to access the data lake. You will need a role that has read access only.

Setting up a JDBC connection🔗

When using a JDBC connection, you read through a virtual warehouse that has Hive or Impala installed. You need to obtain the JDBC connection string, and paste it into the script in your session.

In Cloudera Data Warehouse, go to the Hive database containing your data.
From the kebab menu, click Copy JDBC URL.
Paste it into the script in your session.
Enter your user name and password in the script. Set up environmental variables to store these values, instead of hardcoding them in the script.

We want your opinion

How can we improve this page?

What kind of feedback do you have?