Ingesting Data into Apache Hive in CDP Public Cloud

Understand the use case
You can use Apache NiFi to move data from a range of locations into a Data Engineering cluster running Apache Hive in CDP Public Cloud.
Meet the prerequisites
Use this checklist to make sure that you meet all the requirements before you start building your data flow.
Configure the service account
Configure the Service Account you will use to ingest data into Hive.
Create IDBroker mapping
To enable your CDP user to utilize the central authentication features CDP provides and to exchange credentials for AWS or Azure access tokens, you have to map this CDP user to the correct IAM role or Azure Managed Service Identity (MSI). The option to add/modify these mappings is available from the Management Console in your CDP environment.
Create the Hive target table
Before you can ingest data into Apache Hive in CDP Public Cloud, ensure that you have a Hive target table. These steps walk you through creating a simple table. Modify these instructions based on your data ingest target table needs.
Add Ranger policies
Add Ranger policies to ensure that you have write access to your Hive tables.
Obtain Hive connection details
To enable an Apache NiFi data flow to communicate with Hive, you must obtain the Hive connection details by downloading several client configuration files. The NiFi processors require these files to parse the configuration values and use those values to communicate with Hive.
Build the data flow
From the Apache NiFi canvas, set up the elements of your data flow. This involves opening NiFi in CDP Public Cloud, adding processors to your NiFi canvas, and connecting the processors.
Configure the controller services
You can add Controller Services to provide shared services to be used by the processors in your data flow. You will use these Controller Services later when you configure your processors.
Configure the processor for your data source
You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring ConsumeKafkaRecord_2_0. If you are moving data from a location other than Kafka, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data consumption processor options.
Configure the processor for your data target
You can set up a data flow to move data into many locations. This example assumes that you are moving data into Apache Hive using PutHive3Streaming. If you are moving data into another location, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data ingest processor options.
Start your data flow
Start your data flow to verify that you have created a working dataflow and to begin your data ingest process.
Verify your data flow
Learn how you can verify the operation of your Hive ingest data flow.
Next steps
Provides information on what to do once you have moved data into Hive in CDP Public Cloud.