Create an S3-based external table

CDP Public Cloud sets up S3Guard for you when you create a table. This action prevents data inconsistency caused by the S3 eventual consistency model.

In this task, you create a partitioned, external table and load data from the source on S3. You can use the LOCATION clause in the CREATE TABLE to specify the location of external table data. The metadata is stored in the Hive warehouse.
  • Set up Hive policies in Ranger to include S3 URLs.
  1. Put data source files on S3.
  2. Create an external table based on the data source files.
    CREATE EXTERNAL TABLE `inventory`(
      `inv_item_sk` int,
      `inv_warehouse_sk` int,
      `inv_quantity_on_hand` int)
      PARTITIONED BY (
      `inv_date_sk` int) STORED AS ORC
      LOCATION
      's3a://BUCKET_NAME/tpcds_bin_partitioned_orc_200.db/inventory';