Iceberg data types

References include Iceberg data types and a table of equivalent SQL data types by Hive/Impala SQL engine types.

Iceberg supported data types

Table 1.
Iceberg data type SQL data type Hive Impala
binary BINARY BINARY
boolean BOOLEAN BOOLEAN BOOLEAN
date DATE DATE DATE
decimal(P, S) DECIMAL(P, S) DECIMAL (P, S) DECIMAL (P, S)
double DOUBLE DOUBLE DOUBLE
fixed(L) BINARY Not supported
float FLOAT FLOAT FLOAT
int TINYINT, SMALLINT, INT INTEGER INTEGER
list ARRAY ARRAY Read only
long BIGINT BIGINT BIGINT
map MAP MAP Read only
string VARCHAR, CHAR STRING STRING
struct STRUCT STRUCT Read only
time STRING Not supported
timestamp TIMESTAMP TIMESTAMP TIMESTAMP (see limitation below)
timestamptz TIMESTAMP WITH LOCAL TIME ZONE Use TIMESTAMP WITH LOCAL TIMEZONE for handling these in queries

Read timestamptz into TIMESTAMP values

Writing not supported

uuid none

STRING

Writing to Parquet is not supported

Not supported

Data type limitations

An implicit conversion to an Iceberg type occurs only if there is an exact match; otherwise, a cast is needed. For example, to insert a VARCHAR(N) column into an Iceberg table you need a cast to the VARCHAR type as Iceberg does not support the VARCHAR(N) type. To insert a SMALLINT or TINYINT into an Iceberg table, you need a cast to the INT type as Iceberg does not support these types.

Iceberg supports two timestamp types:
  • timestamp (without timezone)
  • timestamptz (with timezone)
With Spark 3.4, Spark SQL supports a timestamp with local timezone (TIMESTAMP_LTZ) type and a timestamp without timezone (TIMESTAMP_NTZ) type, with TIMESTAMP defaulting to the TIMESTAMP_LTZ type. However, this can be configured by setting the spark.sql.timestampType (the default value is TIMESTAMP_LTZ).

When creating an Iceberg table using Spark SQL, if spark.sql.timestampType is set to TIMESTAMP_LTZ, TIMESTAMP is mapped to Iceberg's timestampz type. If spark.sql.timestampType is set to TIMESTAMP_NTZ, then TIMESTAMP is mapped to Iceberg's timestamp type.

Impala is unable to write to Iceberg tables with timestamptz columns. For interoperability, when creating Iceberg tables from Spark, you can use the Spark configuration, spark.sql.timestampType=TIMESTAMP_NTZ.

For consistent results across query engines, all the engines must be running in UTC.

Unsupported data types

Impala does not support the following Iceberg data types:
  • TIMESTAMPTZ (only read support)
  • TIMESTAMP in tables in AVRO format
  • FIXED
  • UUID