Apache Hive-Kafka integration
As an Apache Hive user, you can connect to, analyze, and transform data in Apache Kafka from Hive. You can offload data from Kafka to the Hive warehouse. Using Hive-Kafka integration, you can perform actions on real-time data and incorporate streamed data into your application.
You connect to Kafka data from Hive by creating an external table that maps to a Kafka topic.
The table definition includes a reference to a Kafka storage handler that connects to Kafka. On
the external table, Hive-Kafka integration supports ad hoc queries, such as questions about data
changes in the stream over a period of time. You can transform Kafka data in the following
ways:
- Perform data masking
- Join dimension tables or any stream
- Aggregate data
- Change the SerDe encoding of the original stream
- Create a persistent stream in a Kafka topic
You can achieve data offloading by controlling its position in the stream. The Hive-Kafka
connector supports the following serialization and deserialization formats:
- JsonSerDe (default)
- OpenCSVSerde
- AvroSerDe