Chapter 2. Prepare Your Environment
Deploying Your HDF Clusters
About This Task
Now that you have reviewed the reference architecture and planned the deployment of your trucking application, you can begin installing HDF according to your use case specifications. To fully build the trucking application as described in this Getting Started with Stream Analytics document, use the following steps.
Steps
Install Ambari.
Install a standalone HDF cluster.
Install a standalone HDP cluster.
You can find more information about your HDF and HDP cluster deployment options in Planning Your Deployment.
You can find more information about which versions of HDP and Ambari you should use with your version of HDF in the HDF Support Matrices.
More Information
Registering Schemas in Schema Registry
The trucking application streams CSV events from the two sensors to a single Kafka topic
called raw-all_truck_events_csv
. NiFi consumes the events from this
topic, and then routes, enriches, and delivers them to a set of Kafka topics
(truck_events_avro
and truck_speed_events_avro
)
for consumption by the streaming analytics applications. To do this, you must perform the
following tasks:
Create the 3 Kafka topics
Register a set of Schemas in Schema Registry
Create the Kafka Topics
About This Task
Kafka topics are categories or feed names to which records are published.
Steps
Log into the node where Kafka broker is running.
Create the Kafka topics using the following commands:
cd /usr/[hdf/\hdp]current/kafka-broker/bin/ ./kafka-topics.sh \ --create \ --zookeeper <zookeeper-host>:2181 \ --replication-factor 2 \ --partitions 3 \ --topic raw-all_truck_events_csv; ./kafka-topics.sh \ --create \ --zookeeper <zookeeper-host>:2181 \ --replication-factor 2 \ --partitions 3 \ --topic truck_events_avro; ./kafka-topics.sh \ --create \ --zookeeper <zookeeper-host>:2181 \ --replication-factor 2 \ --partitions 3 \ --topic truck_speed_events_avro;
More Information
Register Schemas for the Kafka Topics
About This Task
Register the schemas for the two sensor streams and the two Kafka topics to which NiFi will publish the enriched events. Registering the Kafka topic schemas is beneficial in several ways. Schema Registry provides a centralized schema location, allowing you to stream records into topics without having to attach the schema to each record.
Steps
Go to the Schema Registry UI by selecting the Registry service in Ambari and under 'Quick Links' selecting 'Registry UI'.
Click the
button to add a schema, schema group, and schema metadata for the Raw Geo Event Sensor Kafka topic.Enter the following:
Name = raw-truck_events_avro
Description = Raw Geo events from trucks in Kafka Topic
Type = Avro schema provider
Schema Group = truck-sensors-kafka
Compatibility: BACKWARD
Check the evolve check box.
Copy the schema from here and paste it into the Schema Text area.
Click Save.
Click the
button to add a schema, schema group (exists from previous step), and schema metadata for the Raw Speed Event Sensor Kafka topic.Enter the following information:
Name = raw-truck_speed_events_avro
Description = Raw Speed Events from trucks in Kafka Topic
Type = Avro schema provider
Schema Group = truck-sensors-kafka
Compatibility: BACKWARD
Check the evolve check box.
Copy the schema from here and paste it into the Schema Text area.
Click Save.
Click the
button to add a schema, schema group and schema metadata for the Geo Event Sensor Kafka topic:Enter the following information:
Name = truck_events_avro
Description = Schema for the Kafka topic named 'truck_events_avro'
Type = Avro schema provider
Schema Group = truck-sensors-kafka
Compatibility: BACKWARD
Check the evolve checkbox.
Copy the schema from here and paste it into the Schema Text area.
Click Save.
Click the
button to add a schema, schema group (exists from previous step), and schema metadata for the Speed Event Sensor Kafka topic:Enter the following information:
Name = truck_speed_events_avro
Description = Schema for the Kafka topic named 'truck_speed_events_avro'
Type = Avro schema provider
Schema Group = truck-sensors-kafka
Compatibility: BACKWARD
Check the evolve check box.
Copy the schema from here and paste it into the Schema Text area.
Click
.
More Information
If you want to create these schemas programmatically using the Schema Registry client via REST rather than through the UI, you can find examples at this Github location.
Setting up an Enrichment Store, Creating an HBase Table, and Creating an HDFS Directory
About This Task
To prepare for Performing Predictive Analytics on Streams, you need some HBase and Phoenix tables. This section gives you instructions on setting up the HBase and Phoenix tables timesheet and drivers, loading them with reference data, and downloading the custom UDFs and processors to perform the enrichment and normalization.
Install HBase/Phoenix and download the sam-extensions
If HBase is not installed, install/add an HBase service.
Ensure that Phoenix is enabled on the HBase Cluster.
Download the Sam-Custom-Extensions.zip and save it to your local machine.
Unzip the contents. Name the unzipped folder $SAM_EXTENSIONS.
Steps for Creating Phoenix Tables and Loading Reference Data
Copy the $SAM_EXTENSIONS/scripts.tar.gz to a node where HBase/Phoenix client is installed.
On that node, untar the
scripts.tar.gz
. Name the directory $SCRIPTS.tar -zxvf scripts.tar.gz
Navigate to the directory where the phoenix script is located which will create the phoenix tables for enrichment and load it with reference data.
cd $SCRIPTS/phoenix
Open the file
phoenix_create.sh
and replace <ZK_HOST> with the FQDN of your ZooKeeper host.Make the
phoenix_create.sh
script executable and execute it. Make sure you add the script to JAVA_HOME../phoenix_create.sh
Steps for Verifying Data has Populated Phoenix Tables
Start up the sqlline Phoenix client.
cd /usr/hdp/current/phoenix-client/bin ./sqlline.py $ZK_HOST:2181:/hbase-unsecure
List all the tables in Phoenix.
!tables
Query the drivers and timesheet tables.
select * from drivers; select * from timesheet;
Steps for Starting HBase and Creating an HBase Table
This can be easily done by adding the HDP HBase Service using Ambari.
Create a new HBase table by logging into an node where Hbase client is installed then execute the following commands:
cd /usr/hdp/current/hbase-client/bin /hbase shell create 'violation_events', {NAME=> 'events', VERSIONS => 3} ;
Steps for Creating an HDF Directory
Create the following directory in HDFS and give it access to all users.
Log into a node where HDFS client is installed.
Execute the following commands:
su hdfs hadoop fs -mkdir /apps/trucking-app hadoop fs -chmod 777 /apps/trucking-app