Understand the use case

You can use Apache NiFi to move data from a range of locations, into Kudu in your Real-time Data Mart cluster in CDP Public Cloud.

Time series use cases analyse data obtained during specified intervals, and enable you to improve performance based on available data. Examples include:

  • Optimizing yield or yield quality in a manufacturing plant

  • Dynamically optimizing network capacity during peak load of better telecommunications uptime and services

These use cases require that you store events at a high frequency, while providing ad-hoc query and record update abilities. You can use Apache NiFi data flows into Apache Kudu and Apache Impala in a CDP Public Cloud Real-time Data Mart cluster to accomplish these goals.

Time Series use cases also need flexible ingest, at scale. You can use NiFi in a Data Hub Flow Management cluster, in the same environment, to build a data flow that reads your source data, modifies it to match the Kudu schema, and ingests into your Real-time Data Mart cluster for fast analytics and dashboard serving. Once you have successfully built the data flow in NiFi, you can query the data stored in Kudu by running Impala queries in Hue on the Real-time Data Mart cluster.

This use case walks you through the steps associated with creating an ingest-focused data flow from Apache Kafka in a Streaming cluster in CDP Public Cloud, into Apache Kudu in a Real Time Data Mart cluster, in the same CDP Public Cloud environment. This will get you started with creating a data ingest data flow into Kudu. If you are moving data from a location other than Kafka, review the Getting Started with Apache NiFi for information about how to build a data flow, and about other data ingest processor options.