Hive Warehouse Connector supported types
The Hive Warehouse Connector maps most Apache Hive types to Apache Spark types and vice versa, but there are a few exceptions that you must manage.
Spark-Hive supported types mapping
The following types are supported for access through HiveWareHouseConnector library:
Spark Type | Hive Type |
---|---|
ByteType | TinyInt |
ShortType | SmallInt |
IntegerType | Integer |
LongType | BigInt |
FloatType | Float |
DoubleType | Double |
DecimalType | Decimal |
StringType* | String, Varchar* |
BinaryType | Binary |
BooleanType | Boolean |
TimestampType** | Timestamp** |
DateType | Date |
ArrayType | Array |
StructType | Struct |
- * StringType (Spark) and String, Varchar (Hive)
A Hive String or Varchar column is converted to a Spark StringType column. When a Spark StringType column has maxLength metadata, it is converted to a Hive Varchar column; otherwise, it is converted to a Hive String column.
- ** Timestamp (Hive)
The Hive Timestamp column loses submicrosecond precision when converted to a Spark TimestampType column, because a Spark TimestampType column has microsecond precision, while a Hive Timestamp column has nanosecond precision.
Hive timestamps are interpreted to be in
UTC
time. When reading data from Hive, timestamps are adjusted according to the local timezone of the Spark session. For example, if Spark is running in theAmerica/New_York
timezone, a Hive timestamp2018-06-21 09:00:00
is imported into Spark as2018-06-21 05:00:00
. This is due to the 4-hour time difference betweenAmerica/New_York
andUTC
.
Spark-Hive unsupported types
Spark Type | Hive Type |
---|---|
CalendarIntervalType | Interval |
N/A | Char |
MapType | Map |
N/A | Union |
NullType | N/A |
TimestampType | Timestamp With Timezone |