Configuring dbt-impala adapter and dbt profiles to Impala warehouse

This topic guides you through configuring your Cloudera Data Warehouse Impala environment and dbt profiles to enable seamless data operations.

To effectively use dbt with Impala in a Cloudera environment, you must prepare your Impala Virtual Warehouse and then set up dbt connection profiles. This involves updating your profiles.yml file with connection details, such as host, port, schema, and authentication method.
  • You must have git and pip installed if you plan to use dbt project examples.
  • The dbt-impala adapter must be installed. See, dbt adapter installation.
  • You must have the necessary authentication credentials, such as an LDAP username and password or Kerberos principal and keytab, and the required permissions for your Impala schema.

  1. Create a machine user and set its workload password. This machine user will be the identity running your queries.
    1. For steps to create a machine user, see Creating a Machine User.
    2. For steps to set the workload password, see Setting the workload password.
  2. Create a Impala Virtual Warehouse.
    Connect it to the default database catalog. Choose the virtual warehouse type as Impala.
    1. For steps to create a Impala Virtual Warehouse, see Adding a new Virtual Warehouse.
      Once created, ensure the Virtual Warehouse is in the running state.
  3. Copy the JDBC URL for your Virtual Warehouse. From your created Virtual Warehouse, click and click the Copy JDBC URL. The URL must look similar to the following example:
    jdbc:impala://coordinator-impala-xxxxxx.xxx:443/default;AuthMech=3;transportMode=http;httpPath=cliservice;
  4. Locate or create your dbt profiles.yml file.
    dbt requires a profile to connect to your data warehouse. This file can be found in a .dbt directory in your home directory. If you did not use dbt before, create the directory using the mkdir ~/.dbt command and then create the profiles.yml file using your preferred text editor. For more information about dbt profiles, see Configure your profile | dbt Developer Hub.
  5. Extract Impala connection details from your JDBC URL.
    You need the workload credentials and Impala connection details collected earlier. For the host field in your profiles, extract only the Impala hostname from your JDBC URL, do not use the entire URL, the hostname to use is coordinator-impala-xxxxxx.xxx.

    The Impala profiles.yml template typically defaults to port: 21050, which is for direct Impala Thrift connections. If your JDBC URL uses port: 443 and transportMode=http, this indicates the connection is through an HTTP proxy or gateway. In such cases, use port: 443 in your profiles.yml file and set use_http_transport: true along with http_path: cliservice. If connecting directly to the Impala daemon, use port: 21050 and use_http_transport: false.

  6. Add the following template to your profiles.yml file. Replace the placeholder values, such as xxxxxx, with your specific details, including the extracted hostname, workload username, and password.
    dbt_impala_demo:
      outputs:
        dev_cdp:
          type: impala
          host: coordinator-impala-xxxxxx.xxx # Use your extracted Impala hostname
          http_path: cliservice
          port: 443
          auth_type: ldap
          use_http_transport: true
          use_ssl: true
          user: test_user # Your workload username
          password: xxxxxx # Your workload password
          schema: impala_demo # The database schema for your demo
          threads: 4
      target: dev_cdp
    
    Table 1. Parameters for your profiles.yml
    Parameter Description
    type Specifies the adapter type for Impala.
    schema The database schema dbt will connect to.
    host The hostname or IP address of your Impala coordinator.
    port The port for Impala (default is 443).
    auth_type Specifies the authentication method, such as LDAP.
    user The username used for authentication.
    password The password used for authentication.
    use_http_transport Set to true to use HTTP transport instead of binary. The default is false.
    http_path The HTTP endpoint path if use_http_transport is set to true, for example, cliservice.
    use_ssl This parameter enables or disables SSL/TLS encryption for the connection.
    threads The number of concurrent connections dbt can open.
Your Cloudera Data Warehouse Impala environment and dbt profiles are now configured. dbt is ready to connect to your Impala Virtual Warehouse using the specified credentials and settings.