Partitions and performance
A brief description of partitions and the performance benefits includes characters you must avoid when creating a partition. Examples of creating a partition and inserting data in a partition introduce basic partition syntax. Best practices for partitioning are mentioned.
A table you create without partitioning puts the data in a single directory. Partitioning
   divides the data into multiple directories. Queries of one or more columns based on the
   directories can run faster. Lengthy full table scans are avoided. Only data in the relevant
   directory is scanned. For example, a school_records table partitioned on a
    year column, segregates values by year into separate directories. A
    WHERE condiition such as YEAR=2020, YEAR IN
    (2020,2019), or YEAR BETWEEN 2001 AND 2010 scans only the data in the
   appropriate directory to resolve the query. Using partitions typically improves query
   performance.
CREATE TABLE sale(id in, amount decimal) PARTITIONED BY (xdate string, state string);
   To insert data into this table, you specify the partition key for fast loading:
   INSERT INTO sale (xdate='2016-03-08', state='CA') SELECT * FROM staging_table WHERE xdate='2016-03-08' AND state='CA'; You
   do not need to specify dynamic partition columns. Hive generates a partition specification if you
   enable dynamic partitions. Examples of a query on partitioned data
INSERT INTO sale (xdate, state)
SELECT * FROM staging_table;Follow these best practices when you partition tables and query partitioned tables:
- Never partition on a unique ID.
 - Size partitions to greater than or equal to 1 GB on average.
 - Design queries to process not more than 1000 partitions.
 
Invalid Characters in a Partition Name
- colon
 - question mark
 - percent
 
If you use these characters in a partition name, your directories will be named using the URL encoding of these characters, as described in "Why some special characters should not be used in a partition name in Hive/Impala."
