Configure the processor for your data source

You can set up a data flow to move data from many locations into Apache Hive. This example assumes that you are configuring ConsumeKafkaRecord_2_0. If you are moving data from a location other than Kafka, review Getting Started with Apache NiFi for information about how to build a data flow, and about other data consumption processor options.

  • Ensure that you have the Ranger policies required to access the Kafka consumer group.
  • This use case demonstrates a data flow using data in Kafka, and moving it to Hive. To review how to ingest data to Kafka, see Ingesting Data to Apache Kafka.
  1. Launch the Configure Processor window, by right-clicking the processor and selecting Configure.
  2. Click the Properties tab.
  3. Configure ConsumeKafkaRecord_2_0 with the required values.

    The following table includes a description and example values for the properties required to configure an ingest data flow. For a complete list of ConsumeKafkaRecord_2_0, see the processor documentation.

    Property Description Value for ingest to Hive data flow
    Kafka Brokers Provide a comma-separated list of known Kafka Brokers in the format <host>:<port>.
    Docs-messaging-broker1.cdf-docs.a465-9q4k.cloudera.site:9093,
    docs-messaging-broker2.cdf-docs.a465-9q4k.cloudera.site:9093,
    docs-messaging-broker3.cdf-docs.a465-9q4k.cloudera.site:9093
    Topic Name(s) Enter the name of the Kafka topic from which you want to get data. customer
    Topic Name Format Specify whether the topics are a comma separated list of topic names, or a single regular expression.

    names

    Record Reader Specifies the record reader to use for incoming Flow Files. This can be a range of data formats.

    AvroReader

    Record Writer

    Specifies the format you want to use in order to serialize data before for sending it to the data target.

    Note: Ensure that the source record writer and the target record reader are using the same Schema.

    CSVRecordSetWriter

    Honor Transactions Specifies whether or not NiFi should honor transactional guarantees when communicating with Kafka.

    false

    Security Protocol Specifies the protocol used to communicate with brokers. This value corresponds to Kafka's 'security.protocol' property.

    SASL_SSL

    SASL Mechanism The SASL mechanism to use for authentication. Corresponds to Kafka's 'sasl.mechanism' property.

    plain

    Username Specify the username when the SASL Mechanism is PLAIN or SCRAM-SHA-256.

    srv_nifi-hive-ingest

    Password Specify the password for the given username when the SASL Mechanism is PLAIN or SCRAM-SHA-256.

    Password1!

    SSL Context Service Specifies the SSL Context Service to use for communicating with Kafka.

    default

    Offset Reset Allows you to manage the condition when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted). Corresponds to Kafka's auto.offset.reset property. latest
    Group ID A Group ID is used to identify consumers that are within the same consumer group. Corresponds to Kafka's 'group.id' property.

    nifi-hive-ingest

Configure the processor for your data target.