CHAR data type
A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If values are longer than the specified length, Impala truncates any trailing characters.
Syntax: In the column definition of a
CREATE TABLE statement:
The maximum length you can specify is 255.
Semantics of trailing spaces:
When you store a
CHARvalue shorter than the specified length in a table, queries return the value padded with trailing spaces if necessary; the resulting value has the same length as specified in the column definition.
Leading spaces in
CHARare preserved within the data file.
If you store a
CHARvalue containing trailing spaces in a table, those trailing spaces are not stored in the data file. When the value is retrieved by a query, the result could have a different number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the specified length of the column.
If you compare two
CHARvalues that differ only in the number of trailing spaces, those values are considered identical.
When comparing or processing
CAST()truncates any longer string to fit within the defined length. For example:
SELECT CAST('x' AS CHAR(4)) = CAST('x ' AS CHAR(4)); -- Returns TRUE.
CHARvalue is shorter than the specified length, it is padded on the right with spaces until it matches the specified length.
CHAR_LENGTH()returns the length including any trailing spaces.
LENGTH()returns the length excluding trailing spaces.
CONCAT()returns the length including trailing spaces.
Partitioning: This type can be used for partition key columns. Because of the efficiency advantage
of numeric values over character-based values, if the partition key is a string representation of a number,
prefer to use an integer type with sufficient range (
BIGINT, and so
on) where practical.
HBase considerations: This data type cannot be used with HBase tables.
- This type can be read from and written to Parquet files.
- There is no requirement for a particular level of Parquet.
- Parquet files generated by Impala and containing this type can be freely interchanged with other components such as Hive and MapReduce.
- Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files.
Parquet data files might contain values that are longer than allowed by the
CHAR(n)length limit. Impala ignores any extra trailing characters when it processes those values during a query.
Text table considerations: Text data files might contain values that are longer than
allowed for a particular
CHAR(n) column. Any extra
trailing characters are ignored when Impala processes those values during a query. Text data
files can also contain values that are shorter than the defined length limit, and Impala
pads them with trailing spaces up to the specified length. Any text data files produced by
INSERT statements do not include any trailing blanks for
Avro considerations: The Avro specification
allows string values up to 2**64 bytes in length. Impala queries for Avro tables use 32-bit
integers to hold string lengths. In Impala 2.5 and higher, Impala truncates
VARCHAR values in Avro tables to (2**31)-1
bytes. If a query encounters a
STRING value longer than (2**31)-1 bytes in
an Avro table, the query fails. In earlier releases, encountering such long values in an
Avro table could cause a crash.
Compatibility: This type is available using or higher.
Some other database systems make the length specification optional. For Impala, the length is required.
Internal details: Represented in memory as a byte array with the same size as the length specification. Values that are shorter than the specified length are padded on the right with trailing spaces.
Added in: Impala 2.0.0
Column statistics considerations: Because this type has a fixed size, the maximum and average size
fields are always filled in for column statistics, even before you run the
UDF considerations: This type cannot be used for the argument or return type of a user-defined function (UDF) or user-defined aggregate function (UDA).
Currently, the data types
STRUCT cannot be used with Kudu tables.
Performance consideration: The
CHAR type currently does not have the
Impala Codegen support, and we recommend using
CHAR as the performance gain of Codegen
outweighs the benefits of fixed width
Restrictions: Because the blank-padding behavior requires allocating the maximum length
for each value in memory, for scalability reasons, you should avoid declaring
CHAR columns that are much longer than typical values in that column.
All data in
VARCHAR columns must be in a character encoding that
is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use
STRING column to hold it.
When an expression compares a
CHAR with a
CHAR value is implicitly converted to
STRING first, with trailing spaces preserved.
This behavior differs from other popular database systems. To get the expected result of
TRUE, cast the expressions on both sides to
of the appropriate length. For example:
SELECT CAST("foo " AS CHAR(5)) = CAST('foo' AS CHAR(3)); -- Returns TRUE.
This behavior is subject to change in future releases.