Querying Kafka data

You can get useful information, including Kafka record metadata from a table of Kafka data by using typical Hive queries.

Each Kafka record consists of a user payload key (byte []) and value (byte[]), plus the following metadata fields:

Partition int32
Offset int64
Timestamp int64

The Hive row represents the dual composition of Kafka data:

The user payload serialized in the value byte array
The metadata: key byte array, partition, offset, and timestamp fields

In the Hive representation of the Kafka record, the key byte array is called __key and is of type binary. You can cast __key at query time. Hive appends __key to the last column derived from value byte array, and appends the partition, offset, and timestamp to __key columns that are named accordingly.