VARCHAR data type
A variable-length character type, truncated during processing if necessary to fit within the specified length.
Syntax:
In the column definition of a CREATE TABLE
statement:
column_name VARCHAR(max_length)
The maximum length you can specify is 65,535.
Partitioning: This type can be used for partition key columns. Because of the efficiency advantage
of numeric values over character-based values, if the partition key is a string representation of a number,
prefer to use an integer type with sufficient range (INT
, BIGINT
, and so
on) where practical.
HBase considerations: This data type cannot be used with HBase tables.
Parquet considerations:
- This type can be read from and written to Parquet files.
- There is no requirement for a particular level of Parquet.
- Parquet files generated by Impala and containing this type can be freely interchanged with other components such as Hive and MapReduce.
-
Parquet data files can contain values that are longer than allowed by the
VARCHAR(n)
length limit. Impala ignores any extra trailing characters when it processes those values during a query.
Text table considerations:
Text data files can contain values that are longer than allowed by the
VARCHAR(n)
length limit. Any extra trailing characters are ignored when
Impala processes those values during a query.
Avro considerations: The Avro specification
allows string values up to 2**64 bytes in length. Impala queries for Avro tables use 32-bit
integers to hold string lengths. In Impala 2.5 and higher, Impala truncates
CHAR
and VARCHAR
values in Avro tables to (2**31)-1
bytes. If a query encounters a STRING
value longer than (2**31)-1 bytes in
an Avro table, the query fails. In earlier releases, encountering such long values in an
Avro table could cause a crash.
Schema evolution considerations:
You can use ALTER TABLE ... CHANGE
to switch column data types to and from
VARCHAR
. You can convert from STRING
to
VARCHAR(n)
, or from VARCHAR(n)
to
STRING
, or from CHAR(n)
to
VARCHAR(n)
, or from VARCHAR(n)
to
CHAR(n)
. When switching back and forth between VARCHAR
and CHAR
, you can also change the length value. This schema evolution works the same for
tables using any file format. If a table contains values longer than the maximum length defined for a
VARCHAR
column, Impala does not return an error. Any extra trailing characters are ignored
when Impala processes those values during a query.
Compatibility:
This type is available in Impala version 2.0 or higher.
Internal details: Represented in memory as a byte array with the minimum size needed to represent each value.
Column statistics considerations: Because the values of this type have variable size, none of the
column statistics fields are filled in until you run the COMPUTE STATS
statement.
Kudu considerations:
Currently, the data types CHAR
,
VARCHAR
, ARRAY
, MAP
, and
STRUCT
cannot be used with Kudu tables.
Restrictions:
All data in CHAR
and VARCHAR
columns must be in a character encoding that
is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use
a STRING
column to hold it.
Examples:
The following examples show how long and short VARCHAR
values are treated. Values longer
than the maximum specified length are truncated by CAST()
, or when queried from existing
data files. Values shorter than the maximum specified length are represented as the actual length of the
value, with no extra padding as seen with CHAR
values.
create table varchar_1 (s varchar(1));
create table varchar_4 (s varchar(4));
create table varchar_20 (s varchar(20));
insert into varchar_1 values (cast('a' as varchar(1))), (cast('b' as varchar(1))), (cast('hello' as varchar(1))), (cast('world' as varchar(1)));
insert into varchar_4 values (cast('a' as varchar(4))), (cast('b' as varchar(4))), (cast('hello' as varchar(4))), (cast('world' as varchar(4)));
insert into varchar_20 values (cast('a' as varchar(20))), (cast('b' as varchar(20))), (cast('hello' as varchar(20))), (cast('world' as varchar(20)));
select * from varchar_1;
+---+
| s |
+---+
| a |
| b |
| h |
| w |
+---+
select * from varchar_4;
+------+
| s |
+------+
| a |
| b |
| hell |
| worl |
+------+
[localhost:21000] > select * from varchar_20;
+-------+
| s |
+-------+
| a |
| b |
| hello |
| world |
+-------+
select concat('[',s,']') as s from varchar_20;
+---------+
| s |
+---------+
| [a] |
| [b] |
| [hello] |
| [world] |
+---------+
The following example shows how identical VARCHAR
values compare as equal, even if the
columns are defined with different maximum lengths. Both tables contain 'a'
and
'b'
values. The longer 'hello'
and 'world'
values from the
VARCHAR_20
table were truncated when inserted into the VARCHAR_1
table.
select s from varchar_1 join varchar_20 using (s);
+-------+
| s |
+-------+
| a |
| b |
+-------+
The following examples show how VARCHAR
values are freely interchangeable with
STRING
values in contexts such as comparison operators and built-in functions:
select length(cast('foo' as varchar(100))) as length;
+--------+
| length |
+--------+
| 3 |
+--------+
select cast('xyz' as varchar(5)) > cast('abc' as varchar(10)) as greater;
+---------+
| greater |
+---------+
| true |
+---------+
UDF considerations: This type cannot be used for the argument or return type of a user-defined function (UDF) or user-defined aggregate function (UDA).