Creating tables by importing CSV files from ABFS

You can create tables in Hue by importing CSV files stored in ABFS. Hue automatically detects the schema and the column types, thus helping you to create tables without using the CREATE TABLE syntax.

Only Hue Superusers can access ABFS and import files to create tables.

The maximum file size supported is three gigabytes.

  1. Create a storage account from the Microsoft Azure portal. On the Create storage account > Advanced page of the Azure portal, enable Data Lake Storage Gen2 so that the objects and files within your account can be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized.
  2. While registering an Azure environment in CDP Management Console, set the Storage Location Base in the Data Access section as follows:
    abfs://storage-fs@[***AZURE-STORAGE-ACCOUNT-NAME***].dfs.core.windows.net

    This location is used to read and store data.

  3. Enable the ABFS file browser in Hue.
    1. Sign in to Cloudera Data Warehouse.
    2. Go to the Virtual Warehouse from which you want to access the ABFS containers and click its edit icon.
    3. On the Virtual Warehouses detail page, go to the Hue tab and select hue-safety-valve from the drop-down menu.
    4. Add the following configuration for Hive or Impala Virtual Warehouse in the space provided:
      For Hive Virtual Warehouse:
      [desktop]
      # Remove the file browser from the blacklisted apps.
      # Tweak the app_blacklist property to suit your app configuration.
      app_blacklist=oozie,search,hbase,security,pig,sqoop,spark,impala
      [azure]
        [[azure_accounts]]
          [[[default]]]
            client_id=[***AZURE-ACCOUNT-CLIENT-ID***]
            client_secret=[***AZURE-ACCOUNT-CLIENT-SECRET***]
            tenant_id=[***AZURE-ACCOUNT-TENANT-ID***]
      
          [[abfs_clusters]]
            [[[default]]]   
      		fs_defaultfs=abfs://[***CONTAINER-NAME***]@[***AZURE-STORAGE-ACCOUNT-NAME***]>.dfs.core.windows.net
                    webhdfs_url=https://[***AZURE-STORAGE-ACCOUNT-NAME***].dfs.core.windows.net/
      For Impala Virtual Warehouse:
      [desktop]
      # Remove the file browser from the blacklisted apps.
      # Tweak the app_blacklist property to suit your app configuration.
      app_blacklist=spark,zookeeper,hive,hbase,search,oozie,jobsub,pig,sqoop,security
      [azure]
        [[azure_accounts]]
          [[[default]]]
            client_id=<client_id>
            client_secret=<client_secret_id>
            tenant_id=<tenant_id>
      
          [[abfs_clusters]]
            [[[default]]]   
      		fs_defaultfs=abfs://[***CONTAINER-NAME***]@[***AZURE-STORAGE-ACCOUNT-NAME***]>.dfs.core.windows.net
                    webhdfs_url=https://[***AZURE-STORAGE-ACCOUNT-NAME***].dfs.core.windows.net/

      Make sure that the container name and the Azure storage account name that you specify under the abfs_clusters section is same as what you specified under Data Access > Storage Location Base while activating the Azure environment, so that Hive or Impala has permission to access the uploaded files.

    5. Click Apply in the upper right corner of the page.

      The ABFS File Browser icon appears on the left Assist panel on the Hue web UI after the Virtual Warehouse restarts.

  1. In the CDW service Overview page, select the Virtual Warehouse in which you want to create the table, click the options menu in the upper right corner and click Open Hue.
  2. From the left assist panel, click on Importer.
  3. On the Importer screen, click .. at the end of the Path field:


    Choose a file pop-up is displayed.
  4. Type abfs://[***CONTAINER-NAME***] in the address text box and press enter.
    The ABFS containers created under the Azure storage account are displayed.
    You can narrow down the list of results using the search option.
    If the file is present on your computer, then you can upload it to ABFS by clicking Upload a file.
  5. Select the CSV file that you want to import into Hue.
    Hue displays the preview of the table along with the format.
    Hue automatically detects the field separator, record separator, and the quote character from the CSV file. If you want to override a specific setting, then you can change it by selecting a different value from the drop-down menu.
  6. Click Next.
    On this page, you can set the table destination, partitions, and change the column data types.
  7. Verify the settings and click Submit to create the table.
    The CREATE TABLE query is triggered.
    Hue displays the logs and opens the Table Browser from which you can view the newly created table when the operation completes successfully.