Creating an external table

You see how to put your source data on the S3 object store using Hue. From the Hue file browser, you copy the location of the data to create the LOCATION clause in the CREATE EXTERNAL TABLE statement. From Hive or Impala, you run the SQL query to create an external table from Hive or Impala from source data.

In this task, you create a CSV file of data and upload data to the CDP One object store S3. You run the CREATE TABLE statement. The metadata is stored in the metastore in the warehouse. After uploading a file to S3, delete it when you are finished to reduce cost.

Set up Hadoop SQL policies in Ranger to include S3 URLs.
  1. Create a text file that contains CSV (comma-separated values) data.
    For example, create a file called students.csv that has the following lines of text:
    1,jane,doe,senior,mathematics
    2,john,smith,junior,engineering
  2. Log into CDP One, and click All Services > Query & Notebooks > Upload.
  3. In the Hue Editor, click S3.
  4. In the Hue File Browser, click Upload, browse for, and select the CSV file you created.
  5. Click New > Directory, and create a directory named andrena.
  6. Drag the students.csv file to the andrena directory.
  7. Click the andrena directory, check that the CSV is in place, and copy the path.
    For example, copy s3a://cdpsaasdemo-cdp-private-default-1ogvplm/user/andrena.
  8. Create an external table in the default storage format of the SQL engine you are using.
    Hive or Impala example:
    CREATE EXTERNAL TABLE IF NOT EXISTS names_text( 
    student_ID INT,
    FirstName STRING, 
    LastName STRING, 
    year STRING, 
    Major STRING) 
    COMMENT 'Student Names' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
    STORED AS TEXTFILE LOCATION 's3a://cdpsaasdemo-cdp-private-default-1ogvplm/user/andrena';
    From Hive, data is stored in ORC (Optimized Row Columnar) format by default. From Impala, data is stored in Parquet format.
  9. Query the table to see the data.
    SELECT * FROM names_text;
    Output looks something like this:
    +------------------------+-----------------------+----------------------+------------------+-------------------+
    | names_text.student_id  | names_text.firstname  | names_text.lastname  | names_text.year  | names_text.major  |
    +------------------------+-----------------------+----------------------+------------------+-------------------+
    | 1                      | jane                  | doe                  | senior           | mathematics       |
    | 2                      | john                  | smith                | junior           | engineering       |
    +------------------------+-----------------------+----------------------+------------------+-------------------+
  10. In the Hue File Browser, select the file you uploaded, and click Actions > Delete