Getting Started with Streaming Analytics
Also available as:
PDF
loading table of contents...

Chapter 2. Prepare Your Environment

Deploying Your HDF Clusters

About This Task

Now that you have reviewed the reference architecture and planned the deployment of your trucking application, you can begin installing HDF according to your use case specifications. To fully build the trucking application as described in this Getting Started with Stream Analytics document, use the following steps.

Steps

  1. Install Ambari.

  2. Install a standalone HDF cluster.

  3. Install a standalone HDP cluster.

You can find more information about your HDF and HDP cluster deployment options in Planning Your Deployment.

You can find more information about which versions of HDP and Ambari you should use with your version of HDF in the HDF Support Matrix.

More Information

Planning Your Deployment

HDF Support Matrix

Registering Schemas in Schema Registry

The trucking application streams CSV events from the two sensors to a single Kafka topic called raw-all_truck_events_csv. NiFi consumes the events from this topic, and then routes, enriches, and delivers them to a set of Kafka topics (truck_events_avro and truck_speed_events_avro) for consumption by the streaming analytics applications. To do this, you must perform the following tasks:

  • Create the 3 Kafka topics

  • Register a set of Schemas in Schema Registry

Create the Kafka Topics

About This Task

Kafka topics are categories or feed names to which records are published.

Steps

  1. Log into the node where Kafka broker is running.

  2. Create the Kafka topics using the following commands:

    cd /usr/[hdf/\hdp]current/kafka-broker/bin/
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partitions 3 \
    --topic raw-all_truck_events_csv;
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partitions 3 \
    --topic truck_events_avro;
    
    ./kafka-topics.sh \
    --create \
    --zookeeper <zookeeper-host>:2181 \
    --replication-factor 2 \
    --partitions 3 \
    --topic truck_speed_events_avro;
    

More Information

Apache Kafka Component Guide

Register Schemas for the Kafka Topics

About This Task

Register the schemas for the two sensor streams and the two Kafka topics to which NiFi will publish the enriched events. Registering the Kafka topic schemas is beneficial in several ways. Schema Registry provides a centralized schema location, allowing you to stream records into topics without having to attach the schema to each record.

Steps

  1. Go to the Schema Registry UI by selecting the Registry service in Ambari and under 'Quick Links' selecting 'Registry UI'.

  2. Click the + button to add a schema, schema group, and schema metadata for the Raw Geo Event Sensor Kafka topic.

    1. Enter the following:

      • Name = raw-truck_events_avro

      • Description = Raw Geo events from trucks in Kafka Topic

      • Type = Avro schema provider

      • Schema Group = truck-sensors-kafka

      • Compatibility: BACKWARD

    2. Check the evolve check box.

    3. Copy the schema from here and paste it into the Schema Text area.

    4. Click Save.

  3. Click the + button to add a schema, schema group (exists from previous step), and schema metadata for the Raw Speed Event Sensor Kafka topic.

    1. Enter the following information:

      • Name = raw-truck_speed_events_avro

      • Description = Raw Speed Events from trucks in Kafka Topic

      • Type = Avro schema provider

      • Schema Group = truck-sensors-kafka

      • Compatibility: BACKWARD

    2. Check the evolve check box.

    3. Copy the schema from here and paste it into the Schema Text area.

    4. Click Save.

  4. Click the + button to add a schema, schema group and schema metadata for the Geo Event Sensor Kafka topic:

    1. Enter the following information:

      • Name = truck_events_avro

      • Description = Schema for the Kafka topic named 'truck_events_avro'

      • Type = Avro schema provider

      • Schema Group = truck-sensors-kafka

      • Compatibility: BACKWARD

    2. Check the evolve checkbox.

    3. Copy the schema from here and paste it into the Schema Text area.

    4. Click Save.

  5. Click the + button to add a schema, schema group (exists from previous step), and schema metadata for the Speed Event Sensor Kafka topic:

    1. Enter the following information:

      • Name = truck_speed_events_avro

      • Description = Schema for the Kafka topic named 'truck_speed_events_avro'

      • Type = Avro schema provider

      • Schema Group = truck-sensors-kafka

      • Compatibility: BACKWARD

    2. Check the evolve check box.

    3. Copy the schema from here and paste it into the Schema Text area.

    4. Click Save.

More Information

If you want to create these schemas programmatically using the Schema Registry client via REST rather than through the UI, you can find examples at this Github location.

Setting up an Enrichment Store, Creating an HBase Table, and Creating an HDFS Directory

About This Task

To prepare for Performing Predictive Analytics on Streams, you need some HBase and Phoenix tables. This section gives you instructions on setting up the HBase and Phoenix tables timesheet and drivers, loading them with reference data, and downloading the custom UDFs and processors to perform the enrichment and normalization.

Install HBase/Phoenix and download the sam-extensions

  1. If HBase is not installed, install/add an HBase service.

  2. Ensure that Phoenix is enabled on the HBase Cluster.

  3. Download the Sam-Custom-Extensions.zip and save it to your local machine.

  4. Unzip the contents. Name the unzipped folder $SAM_EXTENSIONS.

Steps for Creating Phoenix Tables and Loading Reference Data

  1. Copy the $SAM_EXTENSIONS/scripts.tar.gz to a node where HBase/Phoenix client is installed.

  2. On that node, untar the scripts.tar.gz. Name the directory $SCRIPTS.

    tar -zxvf scripts.tar.gz
  3. Navigate to the directory where the phoenix script is located which will create the phoenix tables for enrichment and load it with reference data.

    cd $SCRIPTS/phoenix
  4. Open the file phoenix_create.sh and replace <ZK_HOST> with the FQDN of your ZooKeeper host.

  5. Make the phoenix_create.sh script executable and execute it. Make sure you add the script to JAVA_HOME.

    ./phoenix_create.sh

Steps for Verifying Data has Populated Phoenix Tables

  1. Start up the sqlline Phoenix client.

    cd /usr/hdp/current/phoenix-client/bin
    
    ./sqlline.py $ZK_HOST:2181:/hbase-unsecure
  2. List all the tables in Phoenix.

    !tables
  3. Query the drivers and timesheet tables.

    select * from drivers;
    select * from timesheet; 

Steps for Starting HBase and Creating an HBase Table

  1. This can be easily done by adding the HDP HBase Service using Ambari.

  2. Create a new HBase table by logging into an node where Hbase client is installed then execute the following commands:

    cd /usr/hdp/current/hbase-client/bin
    
    /hbase shell
    
    create 'violation_events', {NAME=> 'events', VERSIONS => 3} ;
    

Steps for Creating an HDF Directory

Create the following directory in HDFS and give it access to all users.

  1. Log into a node where HDFS client is installed.

  2. Execute the following commands:

    su hdfs
    
    hadoop fs -mkdir /apps/trucking-app
    
    hadoop fs -chmod 777 /apps/trucking-app