Creating an Ozone-based external table

You use the LOCATION clause in the CREATE EXTERNAL TABLE statement to create an external table having source data in Ozone.

In this task, you create an external table and load data from a .csv file that is stored in Ozone. You can use the LOCATION clause in the CREATE EXTERNAL TABLE statement to specify the location of the external table data. The metadata is stored in the Hive warehouse.
  • If you are running an insecure cluster, ensure that you set up the necessary policies and permissions in Ranger to give user access to Hive external files in Ozone.
  • You must have created a volume and bucket in Ozone. For example:
    ozone sh volume create vol1
    ozone sh bucket create vol1/bucket1
  • It is recommended that you set certain Hive configurations before querying Hive tables in Ozone.
  1. Create a text file named employee.csv that contains the following records.
    1,Andrew,45000,Technical Manager
    2,Sam,35000,Proof Reader
  2. Move the employee.csv file to a directory called employee_hive in the Ozone filesystem.
  3. Connect to the gateway node of the cluster and on the command line of the cluster, launch Beeline to start the Hive shell.
    beeline -u jdbc:hive2://myhiveserver.com:10000 -n hive -p
    The Hive 3 connection message appears, followed by the Hive prompt for entering queries on the command line.
  4. Create an external table schema definition that specifies the text format and loads data from employee.csv in ofs://ozone1/vol1/bucket1/employee_hive.
    CREATE EXTERNAL TABLE IF NOT EXISTS employee(
      employee_ID INT,
      name STRING,
      salary DOUBLE,
      designation STRING)
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION 'ofs://ozone1/vol1/bucket1/employee_hive';
  5. Verify that the Hive warehouse stores the employee.csv records in the external table.
    SELECT * FROM employee;