Understand the use case

You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.

This use case walks you through the process of creating a data flow focused on ingesting data from Apache Kafka into Apache Hive. The PutHive3Streaming processor is used to ingest data into a Hive table. This processor interacts with the Hive Metastore to determine the storage location of the Hive table and writes data files directly to that location, eliminating the need for a running HiveServer2. After the delta files are written, the NiFi processor updates the table metadata in the Hive Metastore, ensuring that future queries include the newly ingested data. If you are moving data from a source other than Kafka, review the Getting Started with Apache NiFi for information on building data flows and exploring other data ingestion options. You can also find instructions on building a flow for ingesting data into other CDP components.