Configuring dbt-hive adapter and dbt profiles to Hive warehouse

This topic guides you through configuring your Cloudera Data Warehouse Hive environment and dbt profiles to enable seamless data operations.

To effectively use dbt with Hive in a Cloudera environment, you must prepare your Hive Virtual Warehouse and then set up dbt connection profiles. This involves updating your profiles.yml file with connection details, such as host, port, schema, and authentication method.
  • You must have git and pip installed if you plan to use dbt project examples.
  • The dbt-hive adapter must be installed. See, dbt adapter installation.
  • You must have the necessary authentication credentials, such as an LDAP username and password or Kerberos principal and keytab, and the required permissions for your Hive schema.
  1. Create a machine user and set its workload password. This machine user will be the identity running your queries.
    1. For steps to create a machine user, see Creating a Machine User.
    2. For steps to set the workload password, see Setting the workload password.
  2. Create a Hive Virtual Warehouse.
    Connect it to the default database catalog. Choose the virtual warehouse type as Hive.
    1. For steps to create a Hive Virtual Warehouse, see Adding a new Virtual Warehouse.
      Once created, ensure the Virtual Warehouse is in the running state.
  3. Copy the JDBC URL for your Virtual Warehouse. From your created Virtual Warehouse, click and click Copy JDBC URL. The URL must look similar to the following example:
    jdbc:hive2://hs2-dbt-hive.xxxx.xxx/default;transportMode=http;httpPath=cliservice;socketTimeout=60;ssl=true;retries=3;
  4. Locate or create your dbt profiles.yml file.
    dbt requires a profile to connect to your data warehouse. This file can be found in a .dbt directory in your home directory. If you did not use dbt before, create the directory using the mkdir ~/.dbt command and then create the profiles.yml file using your preferred text editor. For more information about dbt profiles, see Configure your profile | dbt Developer Hub.
  5. Extract Hive connection details from your JDBC URL.
    You need your workload username and workload password. Add the host field in the template from the Cloudera UI. The hostname is hs2-dbt-hive.xxxx.xxx.
  6. Add the following template to your profiles.yml file and replace the placeholder values, such as xxxx.xxx, with your specific details, including the extracted hostname, workload username, and password.
    dbt_hive_demo:
      outputs:
        dev_hive:
          type: hive
          use_http_transport: true
          use_ssl: true
          auth_type: ldap
          schema: explain_merge # The database schema for your demo
          threads: 4
          user: test_user # Your workload username
          password: xxxxx # Your workload password
          host: hs2-dbt-hive.xxxx.xxx # Your Hive hostname
          port: 443
          http_path: cliservice
      target: dev_hive
    
    Table 1. Parameters for your profiles.yml
    Parameter Description
    type Specifies the adapter type for Hive.
    schema The database schema dbt will connect to.
    host The hostname or IP address of your HiveServer2.
    port The port for HiveServer2 (default is 443).
    auth_type Specifies the authentication method, such as LDAP.
    user The username used for authentication.
    password The password used for authentication.
    use_http_transport Set to true to use HTTP transport instead of binary. The default is false.
    http_path The HTTP endpoint path if use_http_transport is set to true, for example, cliservice.
    use_ssl This parameter enables or disables SSL/TLS encryption for the connection.
    threads The number of concurrent connections dbt can open.
Your Cloudera Data Warehouse Hive environment and dbt profiles are now configured. dbt is ready to connect to your Hive Virtual Warehouse.