Configuring dbt-hive adapter and dbt profiles to Hive warehouse

This topic guides you through configuring your Cloudera Data Warehouse Hive environment and dbt profiles to enable seamless data operations.

To effectively use dbt with Hive in a Cloudera environment, you must prepare your Hive Virtual Warehouse and then set up dbt connection profiles. This involves updating your profiles.yml file with connection details, such as host, port, schema, and authentication method.

You must have git and pip installed if you plan to use dbt project examples.
The dbt-hive adapter must be installed. See, dbt adapter installation.
You must have the necessary authentication credentials, such as an LDAP username and password or Kerberos principal and keytab, and the required permissions for your Hive schema.

Create a machine user and set its workload password. This machine user will be the identity running your queries.
1. For steps to create a machine user, see Creating a Machine User.
2. For steps to set the workload password, see Setting the workload password.
note
Record the workload username and password for your machine user, for example, test_user for test_user_machine. These credentials are required for later steps.
Create a Hive Virtual Warehouse.
Connect it to the default database catalog. Choose the virtual warehouse type as Hive.
1. For steps to create a Hive Virtual Warehouse, see Adding a new Virtual Warehouse.
  Once created, ensure the Virtual Warehouse is in the running state.
Copy the JDBC URL for your Virtual Warehouse. From your created Virtual Warehouse, click and click Copy JDBC URL. The URL must look similar to the following example:
```
jdbc:hive2://hs2-dbt-hive.xxxx.xxx/default;transportMode=http;httpPath=cliservice;socketTimeout=60;ssl=true;retries=3;
```
note

This URL is required in later steps.
Locate or create your dbt profiles.yml file.
dbt requires a profile to connect to your data warehouse. This file can be found in a .dbt directory in your home directory. If you did not use dbt before, create the directory using the mkdir ~/.dbt command and then create the profiles.yml file using your preferred text editor. For more information about dbt profiles, see Configure your profile | dbt Developer Hub.
Extract Hive connection details from your JDBC URL.
You need your workload username and workload password. Add the host field in the template from the Cloudera UI. The hostname is hs2-dbt-hive.xxxx.xxx.

Add the following template to your profiles.yml file and replace the placeholder values, such as xxxx.xxx, with your specific details, including the extracted hostname, workload username, and password.

dbt_hive_demo:
  outputs:
    dev_hive:
      type: hive
      use_http_transport: true
      use_ssl: true
      auth_type: ldap
      schema: explain_merge # The database schema for your demo
      threads: 4
      user: test_user # Your workload username
      password: xxxxx # Your workload password
      host: hs2-dbt-hive.xxxx.xxx # Your Hive hostname
      port: 443
      http_path: cliservice
  target: dev_hive

Table 1. Parameters for your profiles.yml
Parameter	Description
type	Specifies the adapter type for Hive.
schema	The database schema dbt will connect to.
host	The hostname or IP address of your HiveServer2.
port	The port for HiveServer2 (default is 443).
auth_type	Specifies the authentication method, such as LDAP.
user	The username used for authentication.
password	The password used for authentication.
use_http_transport	Set to `true` to use HTTP transport instead of binary. The default is false.
http_path	The HTTP endpoint path if `use_http_transport` is set to true, for example, cliservice.
use_ssl	This parameter enables or disables SSL/TLS encryption for the connection.
threads	The number of concurrent connections dbt can open.

Your Cloudera Data Warehouse Hive environment and dbt profiles are now configured. dbt is ready to connect to your Hive Virtual Warehouse.