Apache Hive-Kafka integration
As a Hive user, you can connect to, analyze, and transform data in Kafka from Hive. You can offload data from Kafka to the Hive warehouse.
You connect to Kafka data from Hive by creating an external table that maps to a Kafka topic.
The table definition includes a reference to a Kafka storage handler that makes the connection to
Kafka. On the external table, Hive-Kafka integration supports ad hoc queries, such as questions
about data changes in the stream within an interval of time just passed. You can transform Kafka
data in the following ways:
- Perform data masking.
- Join dimension tables or any stream.
- Aggregate data.
- Change the Serde encoding of the original stream.
- Create a persistent stream in a Kafka topic.
You can achieve
exactly once
offloading of data by controlling its position in the stream. The
Hive-Kafka connector supports the following serialization and deserialization formats:- JsonSerDe (default)
- OpenCSVSerde
- AvroSerDe