HWC supported types mapping
To create HWC API apps, you must know how Hive Warehouse Connector maps Apache Hive types to Apache Spark types, and vice versa. Awareness of a few unsupported types helps you avoid problems.
Spark-Hive supported types mapping
The following types are supported by the HiveWareHouseConnector library:
Spark Type | Hive Type |
---|---|
ByteType | TinyInt |
ShortType | SmallInt |
IntegerType | Integer |
LongType | BigInt |
FloatType | Float |
DoubleType | Double |
DecimalType | Decimal |
StringType* | String, Varchar* |
BinaryType | Binary |
BooleanType | Boolean |
TimestampType** | Timestamp** |
DateType | Date |
ArrayType | Array |
StructType | Struct |
Notes:
* StringType (Spark) and String, Varchar (Hive)
A Hive String or Varchar column is converted to a Spark StringType column. When a Spark StringType column has maxLength metadata, it is converted to a Hive Varchar column; otherwise, it is converted to a Hive String column.
** Timestamp (Hive)
The Hive Timestamp column loses submicrosecond precision when converted to a Spark TimestampType column because a Spark TimestampType column has microsecond precision, while a Hive Timestamp column has nanosecond precision.
Hive timestamps are interpreted as UTC
. When reading data from Hive,
timestamps are adjusted according to the local timezone of the Spark session. For example,
if Spark is running in the America/New_York
timezone, a Hive timestamp
2018-06-21 09:00:00
is imported into Spark as 2018-06-21
05:00:00
due to the 4-hour time difference between
America/New_York
and UTC
.
Spark-Hive unsupported types
Spark Type | Hive Type |
---|---|
CalendarIntervalType | Interval |
N/A | Char |
MapType | Map |
N/A | Union |
NullType | N/A |
TimestampType | Timestamp With Timezone |