HWC supported types mapping
To create HWC API apps, you must know how Hive Warehouse Connector maps Apache Hive types to Apache Spark types, and vice versa. Awareness of a few unsupported types helps you avoid problems.
Spark-Hive supported types mapping
The following types are supported by the HiveWareHouseConnector library:
| Spark Type | Hive Type |
|---|---|
| ByteType | TinyInt |
| ShortType | SmallInt |
| IntegerType | Integer |
| LongType | BigInt |
| FloatType | Float |
| DoubleType | Double |
| DecimalType | Decimal |
| StringType* | String, Varchar* |
| BinaryType | Binary |
| BooleanType | Boolean |
| TimestampType** | Timestamp** |
| DateType | Date |
| ArrayType | Array |
| StructType | Struct |
Notes:
* StringType (Spark) and String, Varchar (Hive)
A Hive String or Varchar column is converted to a Spark StringType column. When a Spark StringType column has maxLength metadata, it is converted to a Hive Varchar column; otherwise, it is converted to a Hive String column.
** Timestamp (Hive)
The Hive Timestamp column loses submicrosecond precision when converted to a Spark TimestampType column because a Spark TimestampType column has microsecond precision, while a Hive Timestamp column has nanosecond precision.
Hive timestamps are interpreted as UTC. When reading data from Hive,
timestamps are adjusted according to the local timezone of the Spark session. For example,
if Spark is running in the America/New_York timezone, a Hive timestamp
2018-06-21 09:00:00 is imported into Spark as 2018-06-21
05:00:00 due to the 4-hour time difference between
America/New_York and UTC.
Spark-Hive unsupported types
| Spark Type | Hive Type |
|---|---|
| CalendarIntervalType | Interval |
| N/A | Char |
| MapType | Map |
| N/A | Union |
| NullType | N/A |
| TimestampType | Timestamp With Timezone |
