PutKudu 2.3.0.4.10.0.0-147

Bundle
org.apache.nifi | nifi-kudu-nar
Description
Reads records from an incoming FlowFile using the provided Record Reader, and writes those records to the specified Kudu's table. The schema for the Kudu table is inferred from the schema of the Record Reader. If any error occurs while reading records from the input, or writing records to Kudu, the FlowFile will be routed to failure
Tags
HDFS, NoSQL, database, kudu, put, record
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
  • Additional Details for PutKudu 2.3.0.4.10.0.0-147

    PutKudu

    Description:

    This processor writes Records to a Kudu table.

    A Record Reader must be supplied to read the records from the FlowFile. The schema supplied to the Record Reader is used to match fields in the Record to the columns of the Kudu table. See the Table Schema section for more.

    Table Name

    When Hive MetaStore integration is enabled for Impala/Kudu, do not use the “impala::” syntax for the table name. Simply use the Hive “dbname.tablename” syntax.

    For example, without HMS integration, you might use

    
                    Table Name: impala::default.testtable
                
    

    With HMS integration, you would simply use

    
                    Table Name: default.testtable
                
    

    Table Schema

    When writing to Kudu, NiFi must map the fields from the Record to the columns of the Kudu table. It does this by acquiring the schema of the table from Kudu and the schema provided by the Record Reader. It can now compare the Record field names against the Kudu table column names. Additionally, it also compares the field and colunm types, to apply the appropriate type conversions.

    For example, assuming you have the following data:

    
                    {
                        "forename":"Jessica",
                        "surname":"Smith",
                        "employee_id":123456789
                    }
                
    

    With the following schema in the Record Reader:

    
                    {
                        "type": "record",
                        "namespace": "nifi",
                        "name": "employee",
                        "fields": [
                            { "name": "forename", "type": "string" },
                            { "name": "surname", "type": "string" },
                            { "name": "employee_id", "type": "long" }
                        ]
                    }
                
    

    With a Kudu table created via Impala using the following create table:

    
                    CREATE TABLE employees
                    (
                    forename STRING,
                    surname STRING,
                    employee_id BIGINT,
                    PRIMARY KEY(employee_id)
                    )
                    PARTITION BY HASH PARTITIONS 16
                    STORED AS KUDU; 
                
    

    NiFi will acquire the table schema from Kudu, so it knows the column names and types. (e.g. forename STRING, surname STRING, employee_id BIGINT) Then, it matches the Record field names against the Kudu column names (e.g. record forename -> column forename, etc.) Next, it matches the Record data types to the column data types. See the Data Types section for more.

    Where there is deviation in Record schema and Table schema, there is two existing options.

    Firstly, the Lowercase Field Names option allows NiFi to handle differences in casing. For example, if your Kudu columns were FORENAME, SURNAME and EMPLOYEE_ID these would not match the Record Schema above, as they are case senstive. This option would simply convert the names to lowercase for the purpose of comparison. It does not change the Kudu table schema.

    Secondly, the Handle Schema Drift options allows for un-matched fields to be added to the table schema. This does modify the Kudu table schema. For example, if we add a “dateOfBirth” field to the above data & record schema examples, these would not map to a column in the Kudu table. With this option enabled, NiFi would modify the Kudu table to add a new column called “dateOfBirth” and then insert the Record.

    Data Types

    NiFi data types are mapped to the following Kudu types:

    NiFi Type Kudu Type
    BOOLEAN BOOL
    BYTE INT8
    SHORT INT16
    INT INT32
    LONG INT64
    FLOAT FLOAT
    DOUBLE DOUBLE
    DECIMAL DECIMAL
    TIMESTAMP UNIXTIME_MICROS
    STRING STRING
    CHAR STRING
Properties
System Resource Considerations
Resource Description
MEMORY An instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.
Relationships
Name Description
success A FlowFile is routed to this relationship after it has been successfully stored in Kudu
failure A FlowFile is routed to this relationship if it cannot be sent to Kudu
Writes Attributes
Name Description
record.count Number of records written to Kudu