Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
This use case walks you through the process of creating a data flow focused on ingesting data
from Apache Kafka into Apache Hive. The PutHive3Streaming processor is used to ingest data into a
Hive table. This processor interacts with the Hive Metastore to determine the storage location of
the Hive table and writes data files directly to that location, eliminating the need for a
running HiveServer2. After the delta files are written, the NiFi processor updates the table
metadata in the Hive Metastore, ensuring that future queries include the newly ingested data.
If you are moving data from a source other
than Kafka, review the Getting Started with Apache NiFi for information on building
data flows and exploring other data ingestion options. You can also find instructions on building
a flow for ingesting data into other CDP components.