HWC supported types mapping

To create HWC API apps, you must know how Hive Warehouse Connector maps Apache Hive types to Apache Spark types, and vice versa. Awareness of a few unsupported types helps you avoid problems.

Spark-Hive supported types mapping

The following types are supported by the HiveWareHouseConnector library:

Spark Type Hive Type
ByteType TinyInt
ShortType SmallInt
IntegerType Integer
LongType BigInt
FloatType Float
DoubleType Double
DecimalType Decimal
StringType* String, Varchar*
BinaryType Binary
BooleanType Boolean
TimestampType** Timestamp**
DateType Date
ArrayType Array
StructType Struct

Notes:

* StringType (Spark) and String, Varchar (Hive)

A Hive String or Varchar column is converted to a Spark StringType column. When a Spark StringType column has maxLength metadata, it is converted to a Hive Varchar column; otherwise, it is converted to a Hive String column.

** Timestamp (Hive)

The Hive Timestamp column loses submicrosecond precision when converted to a Spark TimestampType column because a Spark TimestampType column has microsecond precision, while a Hive Timestamp column has nanosecond precision.

Hive timestamps are interpreted as UTC. When reading data from Hive, timestamps are adjusted according to the local timezone of the Spark session. For example, if Spark is running in the America/New_York timezone, a Hive timestamp 2018-06-21 09:00:00 is imported into Spark as 2018-06-21 05:00:00 due to the 4-hour time difference between America/New_York and UTC.

Spark-Hive unsupported types

Spark Type Hive Type
CalendarIntervalType Interval
N/A Char
MapType Map
N/A Union
NullType N/A
TimestampType Timestamp With Timezone