Meeting the prerequisites

Before the Data Analysts can explore and query data, your central or departmental IT must have onboarded to CDP Public Cloud and must meet the requirements listed in this section.

To do in the CDP Management Console

  1. Make sure that the Cloudera Data Lake is created and running. For more information, see Creating an AWS environment with a medium duty data lake using the CLI.
  2. Grant EnvironmentUser role to the Data Analyst user and synchronize the user to FreeIPA. For more information, see Assigning account roles to users and Performing user sync.

To do in Cloudera Data Warehouse (CDW)

  1. Make sure that the CDW environment is activated. For more information, see Activating AWS environments.
  2. Add an S3 bucket as an external bucket in the CDW environment with read-only access.
    1. Go to CDW service > Environments and click its edit icon.
    2. On the Environment Details page, type the name of the AWS bucket you want to configure access to in the Add External S3 Bucket text box and select read-only access mode.
    3. Click Add Bucket and then click Apply to update the CDW environment.
  3. Create a non-default Database Catalog with the Load Demo Data option enabled.
    1. Go to CDW service > Database Catalogs and click Add New.
    2. Specify a name, select an environment, select SDX from the Datalake drop-down list and enable the Load Demo Data option, and click CREATE.
  4. Create an Impala-based Virtual Warehouse with the Enable Data Visualization option and make sure it is in the running state.
    1. Go to CDW service > Virtual Warehouses and click Add New.
    2. Select the type as IMPALA, select the Database Catalog associated with the Virtual Warehouse, select the size, click Enable Data Visualization, and then click CREATE.
  5. Enable the S3 File Browser for Hue.
    1. Go to CDW service > Virtual Warehouses, select your Virtual Warehouse and click edit.
    2. On the Virtual Warehouses page, go to CONFIGURATIONS > Hue and select hue-safety-valve from the drop-down list.
    3. Add the following configuration for Hive or Impala Virtual Warehouse in the space provided and click APPLY:
      [desktop]
      # Remove the file browser from the blocked list of apps.
      # Tweak the app_blacklist property to suit your app configuration.
      app_blacklist=spark,zookeeper,hive,hbase,search,oozie,jobsub,pig,sqoop,security
      
      [aws]
      [[aws_accounts]]
      [[[default]]]
      access_key_id=[***AWS-ACCESS-KEY***]
      secret_access_key=[***SECRET-ACCESS-KEY***]
      region=[***AWS-REGION***]
  6. Enable a link to the Data VIZ application (Cloudera Data Visualization) from the Hue web interface or provide the Data VIZ application URL to the Data Analyst.
    1. Go to CDW service > Virtual Warehouses, select your Virtual Warehouse and click edit.
    2. On the Virtual Warehouses details page, go to CONFIGURATIONS > Hue and select hue-safety-valve from the drop-down list.
    3. Add the following lines in the safety valve and click APPLY:
      [desktop]
        custom_dashboard_url=[***DATA-VIZ-URL***]

To do in Ranger

Grant the required DDL and DML Hadoop SQL policies to the Data Analyst user in Ranger.

  1. Go to CDW service > Database Catalogs, click the more option, and click Open Ranger.
  2. On the Ranger Service Manager page, click Hadoop SQL.
  3. Select the all - url policy.

    The Edit Policy page is displayed.

  4. Under the Add Conditions section, add the users under the Select User column and add permissions such as Create, Alter, Drop, Select, and so on from the Permissions column.
  5. Scroll to the bottom of the page and click Save.

Other prerequisites

  • Data in a file to be ingested is in a structured format, such as CSV, with headers.
  • The data file is available to the Data Analyst on their computer or accessible on a shared drive with permissions already granted.
  • Hue application URL is shared with the Data Analyst, if a link to the Data Visualization application is not enabled in Hue.