Create, use, and drop an external table
You use an external table, which is a table that Hive does not manage, to import data from a file on a file system into Hive. In contrast to the Hive managed table, an external table keeps its data outside the Hive metastore. Hive metastore stores only the schema metadata of the external table. Hive does not manage, or restrict access, to the actual external data.
You need to set up access to external tables in the file system using one of the following methods.
- Set up Hive HDFS policy in Ranger (recommended) to include the paths to external table data.
- Put an HDFS ACL in place (see link below). Store a comma-separated values (CSV) file in HDFS that will serve as the data source for the external table.
In this task, you create an external table from CSV (comma-separated values) data stored on the file system, depicted in the diagram below. Next, you want Hive to manage and store the actual data in the metastore. You create a managed table.
You insert the external table data into the managed table.
This task demonstrates the following Hive principles:
- The LOCATION clause in the CREATE TABLE specifies the location of external table data.
- A major difference between an external and a managed (internal) table: the
persistence of table data on the files system after a
DROP TABLE
statement.- External table drop: Hive drops only the metadata, consisting mainly of the schema.
- Managed table drop: Hive deletes the data and the metadata stored in the Hive warehouse.
After dropping an external table, the data is not gone. To retrieve it, you issue another CREATE EXTERNAL TABLE statement to load the data from the file system.