Using Apache Hive
Also available as:
PDF

Create partitions dynamically

You can configure Hive to create partitions dynamically and then run a query that creates the related directories on the file system, such as HDFS or S3. Hive then separates the data into the directories.

This example assumes you have the following CSV file named employees.csv to use as the data source:
1,jane doe,engineer,service
2,john smith,sales rep,sales
3,naoko murai,service rep,service
4,somporn thong,ceo,sales
5,xi singh,cfo,finance            
  1. Put the CSV file on a file system, for example in HDFS at/user/hive/dataload/employee, and change permissions.
    hdfs dfs -mkdir /user/hive/dataload/employee
    hdfs dfs -chmod 777 /user/hive/dataload/employee
    hdfs dfs -put employees.csv /user/hive/dataload/employee/employees.csv
    hdfs dfs -chmod 777 /user/hive/dataload/employee/employees.csv              
  2. Launch Beeline and in the Hive shell, create an unpartitioned table that holds all the data.
    CREATE EXTERNAL TABLE employees (eid int, name string, position string, dept string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION '/user/hive/dataload/employee';                   
  3. Check that the data loaded into the employees table.
    SELECT * FROM employees;
    The output, formatted to fit this publication, appears:
    
    +------+---------------+-------------+-------+---------+
    | eid  |    name       | position    | dept  |         |
    +------+---------------+-------------+-------+---------|
    | 1    | jane doe      | engineer    | service         |
    | 2    | john smith    | sales rep   | sales           |
    | 3    | naoko murai   | service rep | service         |
    | 4    | somporn thong | ceo         | sales           |
    | 5    | xi singh      | cfo         | finance         |
    +------+---------------+-------------+-----------------+
  4. Create a partition table.
    CREATE EXTERNAL TABLE EMP_PART (eid int, name string, position string) 
      PARTITIONED BY (dept string);
  5. Set the dynamic partition mode to create partitioned directories of data dynamically when data is inserted.
    SET hive.exec.dynamic.partition.mode=nonstrict;
  6. Insert data from the unpartitioned table (all the data) into the partitioned table , dynamically creating the partitions.
    INSERT INTO TABLE EMP_PART PARTITION (DEPT)
      SELECT eid,name,position,dept FROM employees;
    Partitions are created dynamically.
  7. Check that the partitions were created.
    SHOW PARTITIONS emp_part;
    +----------------+
    |   partition    |
    +----------------+
    | dept=finance   |
    | dept=sales     |
    | dept=service   |
    +----------------+
  8. Check the corresponding partition directories on HDFS.
    hdfs dfs -ls /warehouse/tablespace/external/hive/emp_part
    drwxrwxrwx+  - hive hadoop          0 2019-06-13 22:30 /warehouse/tablespace/external/hive/emp_part/dept=finance 
    drwxrwxrwx+  - hive hadoop          0 2019-06-13 22:30 /warehouse/tablespace/external/hive/emp_part/dept=sales
    drwxrwxrwx+  - hive hadoop          0 2019-06-13 22:30 /warehouse/tablespace/external/hive/emp_part/dept=service           
    You cannot remove the directories in the warehouse owned by hive.