You can create an external table in Hive that represents a Kafka stream to query
real-time data in Kafka. You use a storage handler and table properties that map the Hive database to a Kafka topic and broker. If the Kafka data is not in JSON format, you alter the table to specify a serializer-deserializer for another format.
-
Get the name of the Kafka topic you want to query to use as a table property.
For example: "kafka.topic" = "wiki-hive-topic"
-
Construct the Kafka broker connection string.
For example:
"kafka.bootstrap.servers"="kafka.hostname.com:9092"
-
Create an external table named kafka_table using
'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
as shown
in the following example:
CREATE EXTERNAL TABLE kafka_table
(`timestamp` timestamp , `page` string, `newPage` boolean,
added int, deleted bigint, delta double)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "test-topic", "kafka.bootstrap.servers"="localhost:9092");
-
If the default JSON serializer-deserializer is incompatible with your data,
choose another format in one of the following ways.
- Alter the table to use another supported serializer-deserializer. For
example, if your data is in Avro format, use the Kafke
serializer-deserializer for Avro:
ALTER TABLE kafka_table SET TBLPROPERTIES ("kafka.serde.class"="org.apache.hadoop.hive.serde2.avro.AvroSerDe");
- Create an external table that specifies the table in another format. For
example, create a table named that specifies the Avro format in the table
definition:
CREATE EXTERNAL TABLE kafka_t_avro
(`timestamp` timestamp , `page` string, `newPage` boolean,
added int, deleted bigint, delta double)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "test-topic",
"kafka.bootstrap.servers"="localhost:9092"
-- STORE AS AVRO IN KAFKA
"kafka.serde.class"="org.apache.hadoop.hive.serde2.avro.AvroSerDe");