Iceberg data types
References include Iceberg data types and a table of equivalent SQL data types by Hive/Impala SQL engine types.
Iceberg supported data types
Iceberg data type | SQL data type | Hive | Impala |
---|---|---|---|
binary | BINARY | BINARY | |
boolean | BOOLEAN | BOOLEAN | BOOLEAN |
date | DATE | DATE | DATE |
decimal(P, S) | DECIMAL(P, S) | DECIMAL (P, S) | DECIMAL (P, S) |
double | DOUBLE | DOUBLE | DOUBLE |
fixed(L) | BINARY | Not supported | |
float | FLOAT | FLOAT | FLOAT |
int | TINYINT, SMALLINT, INT | INTEGER | INTEGER |
list | ARRAY | ARRAY | Read only |
long | BIGINT | BIGINT | BIGINT |
map | MAP | MAP | Read only |
string | VARCHAR, CHAR | STRING | STRING |
struct | STRUCT | STRUCT | Read only |
time | STRING | Not supported | |
timestamp | TIMESTAMP | TIMESTAMP | TIMESTAMP (see limitation below) |
timestamptz | TIMESTAMP WITH LOCAL TIME ZONE | Use TIMESTAMP WITH LOCAL TIMEZONE for handling these in queries |
Read timestamptz into TIMESTAMP values Writing not supported |
uuid | none |
STRING Writing to Parquet is not supported |
Not supported |
Data type limitations
An implicit conversion to an Iceberg type occurs only if there is an exact match; otherwise, a cast is needed. For example, to insert a VARCHAR(N) column into an Iceberg table you need a cast to the VARCHAR type as Iceberg does not support the VARCHAR(N) type. To insert a SMALLINT or TINYINT into an Iceberg table, you need a cast to the INT type as Iceberg does not support these types.
- timestamp (without timezone)
- timestamptz (with timezone)
spark.sql.timestampType
(the default value is
TIMESTAMP_LTZ).When creating an Iceberg table using Spark SQL, if
spark.sql.timestampType
is set to TIMESTAMP_LTZ, TIMESTAMP is
mapped to Iceberg's timestampz type. If spark.sql.timestampType
is
set to TIMESTAMP_NTZ, then TIMESTAMP is mapped to Iceberg's timestamp type.
Impala is unable to write to Iceberg tables with timestamptz columns. For
interoperability, when creating Iceberg tables from Spark, you can use the Spark
configuration, spark.sql.timestampType=TIMESTAMP_NTZ
.
For consistent results across query engines, all the engines must be running in UTC.
Unsupported data types
- TIMESTAMPTZ (only read support)
- TIMESTAMP in tables in AVRO format
- FIXED
- UUID