Example: Connect a Spark session to Hive Metastore in a Data Lake

After the Admin sets up the correct permissions, you can access the Data Lake from your project, as this example shows.

Make sure you have access to a Data Lake containing your data.

  1. Create a project in your ML Workspace.
  2. Create a file named spark-defaults.conf, or update the existing file with the property:
    • For S3: spark.yarn.access.hadoopFileSystems=s3a://STORAGE LOCATION OF ENV>
    • For ADLS: spark.yarn.access.hadoopFileSystems=abfs://STORAGE CONTAINER OF ENV>@STORAGE ACCOUNT OF ENV>
    Use the same location you defined in Data Access.
  3. Start a session (Python or Spark) and start a Spark session.

Setting up the project looks like this:

Now you can run Spark SQL commands. For example, you can:

  • Create a database foodb.

  • List databases and tables.

  • Create a table bartable.

  • Insert data into the table.

  • Query the data from the table.