Create an S3-based table

CDP Public Cloud sets up S3Guard for you when you create a table. In CDP Data Center, you need to set up S3Guard before creating an external table having a data source located on S3. This action prevents data inconsistency caused by the S3 eventual consistency model.

This task assumes you work in CDP Data Center. In this task, you create a partitioned, external table and load data from the source on S3. You can use the LOCATION clause in the CREATE TABLE to specify the location of external table data. The metadata is stored in the Hive warehouse.
  • Set up S3Guard.
  • Set up Hive policies in Ranger to include S3 URLs.
  1. Put data source files on S3.
  2. Create an external table based on the data source files.
    CREATE EXTERNAL TABLE `inventory`(
      `inv_item_sk` int,
      `inv_warehouse_sk` int,
      `inv_quantity_on_hand` int)
      PARTITIONED BY (
      `inv_date_sk` int) STORED AS ORC
      LOCATION
      's3a://BUCKET_NAME/tpcds_bin_partitioned_orc_200.db/inventory';