Hortonworks Docs
»
DataFlow 3.5.1
»
Installing an HDF cluster
Installing an HDF cluster
Also available as:
Release Notes
Release Notes
Hortonworks DataFlow 3.5.1 Release Notes
Component Support
Component Availability in HDF
Understanding the Apache Component Version
What's New in HDF 3.5.1
Unsupported Features
Technical Preview Features
Community Driven Features
Unsupported HDP Components
Unsupported Customizations
Behavior Changes
Apache Patch Information
NiFi Patches
NiFi Registry Patches
Common Vulnerabilities and Exposures
Download
HDF Repository Locations
HDF Repository Location for IBM Power Systems
Fixed Issues
Known Issues
Third-Party Licenses
Legal Information
Concepts
HDF Platform Overview
Overview
Apache NiFi Overview
What is Apache NiFi?
The core concepts of NiFi
NiFi Architecture
Performance Expectations and Characteristics of NiFi
High Level Overview of Key NiFi Features
Streaming Analytics Manager Overview
Streaming Analytics Manager Overview
Streaming Analytics Manager Modules
Streaming Analytics Manager Taxonomy
Streaming Analytics Manager Personas
Platform Operator Persona
Services, Service Pools and Environments
Application Developer Persona
Component Building Blocks
Sources
Processors
Sinks
Custom Components
Schema Requirements
Analyst Persona
SDK Developer Persona
Schema Registry Overview
Schema Registry Overview
Examples of Interacting with Schema Registry
Schema Registry Use Cases
Use Case 1: Registering and Querying a Schema for a Kafka Topic
Use Case 2: Reading/Deserializing and Writing/Serializing Data from and to a Kafka Topic
Use Case 3: Dataflow Management with Schema-based Routing
Schema Registry Component Architecture
Schema Registry Concepts
Schema Entities
Compatibility Policies
Apache Kafka Overview
Building a High-Throughput Messaging System with Apache Kafka
Apache Kafka Concepts
Apache Storm Overview
Analyzing Streams of Data with Apache Storm
Installation & Upgrade
Installing & Upgrading HDF
Downloading Your Software
Downloading your Software
HDF Repository Locations
HDF Repository Location for IBM Power Systems
Planning Your HDF Deployment
Deployment Scenarios
HDF Cluster Types and Recommendations
Production Cluster Guidelines
Hardware Sizing Recommendations
Sizing your Flow Management cluster
Data flow design
NiFi design
Cluster layout
Disk configuration
Resource intensive processors
Recommendations
Installing an HDF Cluster
Installing Ambari
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Installing the HDF Management Pack on an HDF Cluster
Install an HDF Cluster Using Ambari
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Installing SmartSense
Installing HDF Services on an Existing HDP Cluster
Upgrade Ambari and HDP
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Installing the HDF Management Pack
Update the HDF Base URL
Add HDF Services to an HDP Cluster
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Installing HDF Services on a New HDP Cluster
Installing Ambari
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Deploying an HDP Cluster Using Ambari
Installing an HDP Cluster
Customize Druid Services
Configure Superset
Deploy the Cluster Services using Ambari
Access the Stream Insight Superset UI
Installing the HDF Management Pack
Update the HDF Base URL
Add HDF Services to an HDP Cluster
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Apache Ambari Managed HDF Upgrade
Pre-upgrade tasks
Accessing Ambari Managed HDF software for upgrades
Upgrade paths
Stop HDF Services
Stop Ambari-Dependent Services
Set Kafka Properties
Upgrading Only the HDF Management Pack
Upgrade the Management Pack for an HDF-only Cluster
Upgrade the Management Pack for HDF Services on an HDP Cluster
Upgrade Ambari and the HDF Management Pack
Preparing to Upgrade Ambari
Get the Ambari Repository
Upgrade Ambari Server
Upgrade the Ambari Agents
Upgrade Ambari Metrics
Upgrade SmartSense
Backup and Upgrade Ambari Infra
Upgrade Ambari Log Search
Upgrade the HDF Management Pack
Upgrade the Ambari Database Schema
Upgrading an HDF Cluster
Prerequisites
Registering Your Target Version
Installing Your Target Version
Upgrade HDF
Start Ambari LogSearch and Metrics
Upgrading HDF services on an HDP cluster
Upgrade HDP
Upgrade HDF services
Post-Upgrade Tasks
Check the NiFi Toolkit Symlink
Update NiFi Properties
Review Storm Configurations
Verify Kafka Properties
Installing & Upgrading HDF on IBM Power Systems
Downloading Your Software
Downloading your Software
HDF Repository Locations
HDF Repository Location for IBM Power Systems
Planning Your HDF Deployment on IBM Power Systems
Deployment Scenarios
HDF Cluster Types and Recommendations
Production Cluster Guidelines
Hardware Sizing Recommendations
Installing an HDF Cluster on IBM Power Systems
Installing Ambari
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Installing the HDF Management Pack on an HDF Cluster
Install an HDF Cluster Using Ambari
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Installing SmartSense
Installing HDF Services on an Existing HDP Cluster using IBM Power Systems
Upgrade Ambari and HDP
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Installing the HDF Management Pack
Update the HDF Base URL
Add HDF Services to an HDP Cluster
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Installing HDF Services on a New HDP Cluster using IBM Power Systems
Installing Ambari
Installing Databases
Supported Databases with NiFi Registry
Installing MySQL
Configuring SAM and Schema Registry Metadata Stores in MySQL
Configuring Druid and Superset Metadata Stores in MySQL
Configuring NiFi Registry Metadata Stores in MySQL
Install Postgres
Configure Postgres to Allow Remote Connections
Configure SAM and Schema Registry Metadata Stores in Postgres
Configure Druid and Superset Metadata Stores in Postgres
Configuring NiFi Registry Metadata Stores in Postgres
Specifying an Oracle Database to Use with SAM and Schema Registry
Switching to an Oracle Database After Installation
Deploying an HDP Cluster Using Ambari
Installing an HDP Cluster
Customize Druid Services
Configure Superset
Deploy the Cluster Services using Ambari
Access the Stream Insight Superset UI
Installing the HDF Management Pack
Update the HDF Base URL
Add HDF Services to an HDP Cluster
Configure HDF Components
Configure Schema Registry
Configure SAM
Configuring SAM log search and event sampling
Configure NiFi
Configure NiFi for Atlas Integration
Configure Kafka
Configure Storm
Configure Log Search
Deploy the Cluster Services
Access the UI for Deployed Services
Configuring Schema Registry and SAM for High Availability
Configuring SAM for High Availability
Configuring Schema Registry for High Availability
Apache Ambari Managed HDF Upgrade for IBM Power Systems
Pre-upgrade tasks
Accessing Ambari Managed HDF software for upgrades
Upgrade paths
Stop HDF Services
Stop Ambari-Dependent Services
Set Kafka Properties
Upgrade Ambari and the HDF Management Pack
Preparing to Upgrade Ambari
Get the Ambari Repository
Upgrade Ambari Server
Upgrade the Ambari Agents
Upgrade Ambari Metrics
Upgrade SmartSense
Backup and Upgrade Ambari Infra
Upgrade Ambari Log Search
Upgrade the HDF Management Pack
Upgrade the Ambari Database Schema
Upgrade HDF
Upgrading an HDF Cluster
Prerequisites
Registering Your Target Version
Installing Your Target Version
Upgrade HDF
Start Ambari LogSearch and Metrics
Upgrading HDF services on an HDP cluster
Upgrade HDP
Upgrade HDF services
Post-Upgrade Tasks
Check the NiFi Toolkit Symlink
Update NiFi Properties
Review Storm Configurations
Verify Kafka Properties
Installing HDF Components
Installing and Upgrading Apache NiFi
NiFi Installation
Installing NiFi on Linux
Installing NiFi on Linux
Installing NiFi as a Service
Starting and Stopping NiFi on Linux
Installing NiFi on Windows
Configuring the NiFi MSI
Using a Local User for NiFi Windows Service
Using a Domain User for NiFi Windows Service
Starting and Stopping NiFi on Windows
Launching the User Interface
Docker Installation
Importing Docker
Downloading NiFi using Docker
Running a Docker Container
Standalone Instance, Unsecured
Standalone Instance, Two-Way SSL
Standalone Instance, LDAP
Configuration Information
Upgrading NiFi
Getting ready to upgrade
Preserve customizations prior to upgrade
Preserve your custom processors
Preserve your custom NAR files
Install the new NiFi version
Update the configuration files for your new NiFi installation
Migrating a dataflow with sensitive properties
Restarting the dataflow after upgrade
Apache MiNiFi Quick Start
MiNiFi Java Agent Quick Start
Overview
Before You Begin
Installing and Starting MiNiFi
Installing MiNiFi on Linux
Installing MiNiFi as a Service on Linux
Starting MiNiFi on Linux
Installing MiNiFi on Windows
Configuring the MiNiFi MSI
Using a Local User for MiNiFi Windows Service
Using a Domain User for MiNiFi Windows Service
Starting MiNiFi on Windows
Working with Dataflows
Setting up Your Dataflow
Using Processors Not Packaged with MiNiFi
Securing your Dataflow
Managing MiNiFi
Monitoring Status
Loading a New Dataflow
Stopping MiNiFi
Processors
Default Included Processors with Java Agent
Usable Processors Requiring NAR Addition
How To
Flow Management
Using the Apache NiFi Interface
Terminology
NiFi User Interface
Accessing the UI with Multi-Tenant Authorization
Logging In
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
Building an Apache NiFi DataFlow
Building a DataFlow
Adding Components to the Canvas
Component Versions
Sorting and Filtering Components
Changing Component Versions
Understanding Version Dependencies
Configuring a Processor
Settings Tab
Scheduling Tab
Properties Tab
Comments Tab
Additional Help
Parameters
Parameter Contexts
Adding a Parameter to a Parameter Context
Assigning a Parameter Context to a Process Group
Referencing Parameters
Accessing Parameters
Using Custom Properties with Expression Language
Variables
Referencing Custom Properties via nifi.properties
Controller Services
Adding Controller Services for Reporting Tasks
Adding Controller Services for Dataflows
Enabling/Disabling Controller Services
Reporting Tasks
Connecting Components
Details Tab
Settings
Changing Configuration and Context Menu Options
Bending Connections
Processor Validation
Site-to-Site
Configure Site-to-Site client NiFi instance
Configure Site-to-Site Server NiFi Instance
Example Dataflow
Managing an Apache NiFi DataFlow
Command and Control of the DataFlow
Starting a Component
Stopping a Component
Enabling/Disabling a Component
Remote Process Group Transmission
Individual Port Transmission
Encrypted Content Repository
What is it?
How does it work?
StaticKeyProvider
FileBasedKeyProvider
Key Rotation
Writing and Reading Content Claims
Potential Issues
Encrypted FlowFile Repository
What is it?
How does it work?
StaticKeyProvider
FileBasedKeyProvider
Key Rotation
Writing and Reading FlowFiles
Potential Issues
Experimental Warning
Other Management Features
Navigating an Apache NiFi DataFlow
Navigating within a DataFlow
Component Linking
Component Alignment
Align Vertically
Align Horizontally
Monitoring an Apache NiFi DataFlow
Monitoring of DataFlow
Anatomy of a Processor
Anatomy of a Process Group
Anatomy of a Remote Process Group
Queue Interaction
Summary Page
Historical Statistics of a Component
Versioning an Apache NiFi DataFlow
Versioning a DataFlow
Connecting to a NiFi Registry
Version States
Import a Versioned Flow
Start Version Control
Managing Local Changes
Show Local Changes
Revert Local Changes
Commit Local Changes
Change Version
Stop Version Control
Nested Versioned Flows
Parameters in Versioned Flows
Variables in Versioned Flows
Restricted Components in Versioned Flows
Restricted Controller Service Created in Root Process Group
Restricted Controller Service Created in Process Group
Using Apache NiFi Templates
Templates
Creating a Template
Importing a Template
Instantiating a Template
Managing Templates
Exporting a Template
Removing a Template
Using Apache NiFi Provenance Tools
Data Provenance
Provenance Events
Searching for Events
Details of an Event
Replaying a FlowFile
Viewing FlowFile Lineage
Find Parents
Expanding an Event
Write Ahead Provenance Repository
Backwards Compatibility
Older Existing NiFi Version
Bootstrap.conf
System Properties
Encrypted Provenance Considerations
Encrypted Provenance Repository
What is it?
How does it work?
Writing and Reading Event Records
Potential Issues
Adding functionality to Apache NiFi
Introduction
NiFi Components
Processor API
Supporting API
FlowFile
ProcessSession
ProcessContext
PropertyDescriptor
Validator
ValidationContext
PropertyValue
Relationship
StateManager
ProcessorInitializationContext
ComponentLog
AbstractProcessor API
Processor Initialization
Exposing Processor's Relationships
Exposing Processor Properties
Validating Processor Properties
Responding to Changes in Configuration
Performing the Work
When Processors are Triggered
Component Lifecycle
@OnAdded
@OnEnabled
@OnRemoved
@OnScheduled
@OnUnscheduled
@OnStopped
@OnShutdown
Component Notification
@OnPrimaryNodeStateChange
Restricted
State Manager
Scope
Storing and Retrieving State
Unit Tests
Reporting Processor Activity
Documenting a Component
Documenting Properties
Documenting Relationships
Documenting Capability and Keywords
Documenting FlowFile Attribute Interaction
Documenting Related Components
Advanced Documentation
Provenance Events
Common Processor Patterns
Data Ingress
Data Egress
Route Based on Content (One-to-One)
Route Based on Content (One-to-Many)
Route Streams Based on Content (One-to-Many)
Route Based on Attributes
Split Content (One-to-Many)
Update Attributes Based on Content
Enrich/Modify Content
Error Handling
Exceptions within the Processor
Exceptions within a callback: IOException, RuntimeException
Penalization vs. Yielding
Session Rollback
General Design Considerations
Consider the User
Cohesion and Reusability
Naming Conventions
Processor Behavior Annotations
Data Buffering
Controller Services
Developing a ControllerService
Interacting with a ControllerService
Reporting Tasks
Developing a Reporting Task
UI Extensions
Custom Processor UIs
Content Viewers
Command Line Tools
Testing
Instantiate TestRunner
Add ControllerServices
Set Property Values
Enqueue FlowFiles
Run the Processor
Validate Output
Mocking External Resources
Additional Testing Capabilities
NiFi Archives (NARs)
Per-Instance ClassLoading
Deprecating a Component
How to contribute to Apache NiFi
Technologies
Where to Start?
Supplying a contribution
Contact Us
Using the Apache NiFi Toolkit
Overview
Prerequisites for Running in a Secure Environment
NiFi CLI
Usage
Property/Argument Handling
Security Configuration
Example - Secure NiFi Registry without Proxied-Entity
Example - Secure NiFi Registry with Proxied-Entity
Interactive Usage
Output
Back-Referencing
Adding Commands
Encrypt-Config Tool
Usage
File Manager
Usage
Expected Behavior
Backup
Install
Restore
Flow Analyzer
Usage
Node Manager
Usage
Expected Behavior
Status
Disconnect
Connect
Remove
Notify
Usage
S2S
Usage
TLS Toolkit
Wildcard Certificates
Potential issues with wildcard certificates
Operation Modes
Standalone
Client/Server
Using An Existing Intermediate Certificate Authority (CA)
nifi-cert.pem
nifi-key.key
Signing with Externally-signed CA Certificates
Additional Certificate Commands
ZooKeeper Migrator
Usage
Migrating Between Source and Destination ZooKeepers
ZooKeeper Migration Steps
Using Apache NiFi Registry
Introduction
Browser Support
Unsupported Browsers
Viewing the UI in Variably Sized Browsers
Terminology
NiFi Registry User Interface
Logging In
Manage Flows
View a Flow
Sorting & Filtering Flows
Delete a Flow
Manage Buckets
Sorting & Filtering Buckets
Create a Bucket
Delete a Bucket
Delete Multiple Buckets
Edit a Bucket Name
Bucket Policies
Create a Bucket Policy
Delete a Bucket Policy
Manage Users & Groups
Sorting & Filtering Users/Groups
Add a User
Delete a User
Delete Multiple Users
Edit a User Name
Special Privileges
Grant Special Privileges to a User
Manage Groups
Add an Empty Group
Add User to a Group
Create a New Group with Selected Users
Remove a User from a Group
User Window
Group Window
Other Group Level Actions
Manage Bundles
Upload Bundle
Download Bundle
Bundle Coordinates
Bundle Id
Additional Actions
Tuning your DataFlow
Tuning your dataflow
Timer and event driven thread pools
Viewing the total number of threads in a cluster
Viewing the total number of active threads
Viewing the number of cores
Configuring thread pool size
Concurrent tasks
Run duration
Recommendations
Managing Schemas
Integrating with Schema Registry
Integrating with NiFi
Understanding NiFi Record Based Processing
Setting up the HortonworksSchemaRegistry Controller Service
Adding and Configuring Record Reader and Writer Controller Services
Using Record-Enabled Processors
Integrating with Kafka
Integrating Kafka and Schema Registry Using NiFi Processors
Integrating Kafka and Schema Registry
Integrating with Stream Analytics Manager
Using Schema Registry
Adding a New Schema
Querying Schemas
Evolving Schema
Streaming Analytics
Using Kafka Streams
Using Kafka Streams
Integrating Hive and Kafka
Apache Hive-Kafka integration
Create a table for a Kafka stream
Querying Kafka data
Query live data from Kafka
Perform ETL by ingesting data from Kafka into Hive
Writing data to Kafka
Write transformed Hive data to Kafka
Set consumer and producer properties as table properties
Kafka storage handler and table properties
Creating Streaming Analytics Manager Data Visualizations using Superset
Creating Visualizations Using Superset
Creating Insight Slices
Adding Insight Slices to a Dashboard
Dashboards for the Trucking IOT App
Adding Custom Builder Components to Streaming Analytics Manager
Adding Custom Builder Components
Adding Custom Processors
Creating Custom Processors
Registering Custom Processors with SAM
Creating a Custom Streaming Application
Adding Custom Functions
Creating UDAFs
Creating UDFs
Building Custom Functions
Uploading Custom Functions to SAM
Building a Streaming Analytics Manager Application.
Building an Application
Launch the Stream Builder UI
Add a New Stream Application
Add a Source
Connect Components
Join Multiple Streams
Filter Events in a Stream
Use Aggregate Functions over Windows
Deploying a Stream App
Configure Deployment Settings
Deploy the App
Using Streaming Analytics Manager
Stream Operations
My Applications View
Application Performance Monitoring
Exporting and Importing Stream Applications
Troubleshooting and Debugging a Stream Application
Monitoring SAM Apps and Identifying Performance Issues
Identifying Throughput Bottlenecks
Throughput Improvements for the Kafka Source
Identifying Processor Performance Bottlenecks
Latency Improvements
Debugging an Application through Distributed Log Search
Debugging an Application through Sampling
Setting Up Your Streaming Analytics Manager Environment
Streaming Analytics Manager Environment Setup and Managing Stream Apps
Managing Service Pools
Adding a New Service Pool
Updating Service Pools
Managing Environments
Create New Environment
Editing Environments
Deleting Environments
Mirroring Data Across Clusters with Apache Kafka MirrorMaker
Mirroring Data Between Clusters: Using the MirrorMaker Tool
Running MirrorMaker
Checking Mirroring Progress
Avoiding Data Loss
Running MirrorMaker on Kerberos-Enabled Clusters
Developing Apache Kafka Producers and Consumers
Developing Kafka Producers and Consumers
Creating Apache Kafka Topics
Creating a Kafka Topic
Developing Apache Storm Applications
Developing Apache Storm Applications
Core Storm Concepts
Spouts
Bolts
Stream Groupings
Topologies
Processing Reliability
Workers, Executors, and Tasks
Parallelism
Core Storm Example: RollingTopWords Topology
Trident Concepts
Introductory Example: Trident Word Count
Trident Operations
Filters
Functions
Trident Aggregations
CombinerAggregator
ReducerAggregator
Aggregator
Trident State
Trident Spouts
Achieving Exactly-Once Messaging in Trident
Further Reading about Trident
Moving Data Into and Out of a Storm Topology
Implementing Windowing Computations on Data Streams
Understanding Sliding and Tumbling Windows
Implementing Windowing in Core Storm
Understanding Tuple Timestamps and Out-of-Order Tuples
Understanding Watermarks
Understanding the "at-least-once" Guarantee
Saving the Window State
Implementing Windowing in Trident
Trident Windowing Implementation Details
Sample Trident Application with Windowing
Implementing State Management
Checkpointing
Recovery
Guarantees
Implementing Custom Actions: IStateful Bolt Hooks
Implementing Custom States
Implementing Stateful Windowing
Sample Topology with Saved State
Using Apache Storm to Move Data
Moving Data Into and Out of Apache Storm Using Spouts and Bolts
Ingesting Data from Kafka
KafkaSpout Integration: Core Storm APIs
KafkaSpout Integration: Trident APIs
Tuning KafkaSpout Performance
Configuring Kafka for Use with the Storm-Kafka Connector
Configuring KafkaSpout to Connect to HBase or Hive
Ingesting Data from HDFS
Configuring HDFS Spout
HDFS Spout Example
Streaming Data to Kafka
KafkaBolt Integration: Core Storm APIs
KafkaBolt Integration: Trident APIs
Writing Data to HDFS
Storm-HDFS: Core Storm APIs
Storm-HDFS: Trident APIs
Writing Data to HBase
Writing Data to Hive
Core-storm APIs
Trident APIs
Configuring Connectors for a Secure Cluster
Configuring KafkaSpout for a Secure Kafka Cluster
Configuring Storm-HDFS for a Secure Cluster
Configuring Storm-HBase for a Secure Cluster
Configuring Storm-Hive for a Secure Cluster
Working with Apache Storm Topologies
Packaging Storm Topologies
Deploying and Managing Apache Storm Topologies
Configuring the Storm UI
Using the Storm UI
Monitoring and Debugging an Apache Storm Topology
Enabling Dynamic Log Levels
Setting and Clearing Log Levels Using the Storm UI
Setting and Clearing Log Levels Using the CLI
Enabling Topology Event Logging
Configuring Topology Event Logging
Enabling Event Logging
Viewing Event Logs
Accessing Event Logs on a Secure Cluster
Disabling Event Logs
Extending Event Logging
Enabling Distributed Log Search
Dynamic Worker Profiling
Tuning an Apache Storm Topology
Security
Enabling Kerberos
Enabling Kerberos
Installing and Configuring the KDC
Use an Existing MIT KDC
Use an Existing Active Directory
Use Manual Kerberos Setup
(Optional) Install a new MIT KDC
Installing the JCE
Install the JCE
Enabling Kerberos on Ambari
Cluster Component Configuration Updates
NiFi Authentication
NiFi Authentication
Enabling SSL with a NiFi Certificate Authority
Enabling SSL with Existing Certificates
(Optional) Setting Up Identity Mapping
Generating Client Certificates
Logging into NiFi After Enabling SSL
Configuring NiFi Authentication and Proxying with Apache Knox
Configuring NiFi Authentication and Proxying with Apache Knox
Configuring NiFi for Knox Authentication
Configuring Knox for NiFi
Preparing to Generate Knox Certificates using the TLS Toolkit
Generating Knox Certificates Using the TLS Toolkit
Configuring the Knox SSO Topology
Creating an Advanced Topology
Configuring Knox SSO
Adding a NiFi Policy for Knox
Adding a Policy Using NiFi
Adding a Policy Using Ranger
Accessing NiFi Using Knox
SAM Authetication
SAM Authentication
Logging into SAM for the First Time
Logging In as a Different User
Authorization with Ranger
Authorization with Ranger
Creating Policies for NiFi Access
Creating Policies to View NiFi
Allowing Users Read and Write Access
Create a Kafka Policy
Create a Storm Policy
Creating the Ranger Plugin for HDF Services
Set up NiFi Registry Ranger Plugin
Establish Communication Between NiFi Registry and Ranger
Enable NiFi Registry Ranger Plugin
Confirm Ranger Configuration
Set up Ranger Policies
Deployment Scenarios for NiFi Registry Ranger Plugin
Authorizers.xml Template for NiFi Registry Ranger Plugin
Troubleshooting
NiFi Authorization
NiFi Authorization
Authorizer Configuration
Authorizers.xml Setup
Initial Admin Identity (New NiFi Instance)
Legacy Authorized Users (NiFi Instance Upgrade)
Cluster Node Identities
Configuring Users & Access Policies
Creating Users and Groups
Access Policies
Global Access Policies
Component Level Access Policies
Access Policy Inheritance
Access Policy Configuration Examples
Moving a Processor
Editing a Processor
Creating a Connection
Editing a Connection
SAM Authorization
SAM Authorization
Roles and Permissions
Creating Users and Assigning Them to Roles
Sharing Resources
Sharing an Environment
Sharing an Application
SAM Authorization Limitations
Deploying SAM Applications in a Secure Cluster
Deploying SAM Applications in a Secure Cluster
Connecting to a Secure Service that Supports Delegation Tokens
Connecting to Secure Kafka
Securing SAM - An End-to-End Workflow
Understanding the End-to-End Workflow
Reference
Apache NiFi Record Path Reference
Overview
Structure of a RecordPath
Child Operator
Descendant Operator
Filters
Function Usage
Arrays
Maps
Predicates
Functions
Standalone Functions
substring
substringAfter
substringAfterLast
substringBefore
substringBeforeLast
replace
replaceRegex
concat
fieldName
toDate
toString
toBytes
format
trim
toUpperCase
toLowerCase
base64Encode
base64Decode
PadLeft
PadRight
Filter Functions
contains
matchesRegex
startsWith
endsWith
not
isEmpty
isBlank
Apache NiFi Expression Language Reference
Overview
Structure of a NiFi Expression
Expression Language in the Application
Escaping Expression Language
Expression Language Editor
Functions
Data Types
Boolean Logic
String Manipulation
Encode/Decode Functions
Searching
Mathematical Operations and Numeric Manipulation
Date Manipulation
Type Coercion
Subjectless Functions
Evaluating Multiple Attributes
Streaming Analytics Manager Configuration Values
Source, Processor, and Sink Configuration Values
Source Configuration Values
Processor Configuration Values
Sink Configuration Values
Apache NiFi Configuration Best Practices
Configuration Best Practices
Port Configuration
NiFi Default Ports
Embedded Zookeeper
Recommended Antivirus Exclusions
Clustering Configuration
Zero-Master Clustering
Why Cluster?
Terminology
Communication within the Cluster
Managing Nodes
Disconnect Nodes
Offload Nodes
Delete Nodes
Decommission Nodes
NiFi CLI Node Commands
Flow Election
Basic Cluster Setup
Troubleshooting
Bootstrap Properties
Notification Services
Email Notification Service
HTTP Notification Service
Proxy Configuration
Analytics Framework
Apache NiFi Security Reference
Security Configuration
TLS Generation Toolkit
User Authentication
Lightweight Directory Access Protocol (LDAP)
Kerberos
OpenId Connect
Apache Knox
Multi-Tenant Authorization
Authorizer Configuration
Authorizers.xml Setup
FileUserGroupProvider
LdapUserGroupProvider
ShellUserGroupProvider
Composite Implementations
FileAccessPolicyProvider
StandardManagedAuthorizer
FileAuthorizer
Initial Admin Identity (New NiFi Instance)
Legacy Authorized Users (NiFi Instance Upgrade)
Cluster Node Identities
Configuring Users & Access Policies
Creating Users and Groups
Access Policies
Viewing Policies on Users
Access Policy Configuration Examples
Encryption Configuration
Key Derivation Functions
Additional Resources
Salt and IV Encoding
NiFi Legacy
OpenSSL PKCS#5 v1.5 EVP_BytesToKey
Bcrypt, Scrypt, PBKDF2
Java Cryptography Extension (JCE) Limited Strength Jurisdiction Policies
Allow Insecure Cryptographic Modes
Encrypted Passwords in Configuration Files
Kerberos Service
Notes
Apache NiFi State Management
State Management
Configuring State Providers
Embedded ZooKeeper Server
ZooKeeper Access Control
Securing ZooKeeper
Kerberizing Embedded ZooKeeper Server
Kerberizing NiFi's ZooKeeper Client
Troubleshooting Kerberos Configuration
ZooKeeper Migrator
Apache NiFi System Properties
System Properties
Upgrade Recommendations
Core Properties
State Management
H2 Settings
FlowFile Repository
Write Ahead FlowFile Repository
Encrypted Write Ahead FlowFile Repository Properties
Volatile FlowFile Repository
RocksDB FlowFile Repository
Swap Management
Content Repository
File System Content Repository Properties
Encrypted File System Content Repository Properties
Volatile Content Repository Properties
Provenance Repository
Write Ahead Provenance Repository Properties
Encrypted Write Ahead Provenance Repository Properties
Persistent Provenance Repository Properties
Volatile Provenance Repository Properties
Component Status Repository
Site to Site Properties
Site to Site Routing Properties for Reverse Proxies
Site to Site protocol sequence
Reverse Proxy Configurations
Site to Site and Reverse Proxy Examples
Web Properties
Security Properties
Identity Mapping Properties
Cluster Common Properties
Cluster Node Properties
ZooKeeper Properties
Kerberos Properties
Analytics Properties
Custom Properties
Apache NiFi Toolkit
Overview
Prerequisites for Running in a Secure Environment
NiFi CLI
Usage
Property/Argument Handling
Security Configuration
Example - Secure NiFi Registry without Proxied-Entity
Example - Secure NiFi Registry with Proxied-Entity
Interactive Usage
Output
Back-Referencing
Adding Commands
Encrypt-Config Tool
Usage
File Manager
Usage
Expected Behavior
Backup
Install
Restore
Flow Analyzer
Usage
Node Manager
Usage
Expected Behavior
Status
Disconnect
Connect
Remove
Notify
Usage
S2S
Usage
TLS Toolkit
Wildcard Certificates
Potential issues with wildcard certificates
Operation Modes
Standalone
Client/Server
Using An Existing Intermediate Certificate Authority (CA)
nifi-cert.pem
nifi-key.key
Signing with Externally-signed CA Certificates
Additional Certificate Commands
ZooKeeper Migrator
Usage
Migrating Between Source and Destination ZooKeepers
ZooKeeper Migration Steps
Administering Apache NiFi Registry
System Requirements
How to install and start NiFi Registry
Security Configuration
User Authentication
Lightweight Directory Access Protocol (LDAP)
Kerberos
Authorization
Authorizer Configuration
Authorizers.xml Setup
StandardManagedAuthorizer
UserGroupProvider
AccessPolicyProvider
Initial Admin Identity (New NiFi Registry Instance)
Access Policies
Bucket Policies
Special Privilege Policies
Encrypted Passwords in Configuration Files
Encrypt-Config Tool
Sensitive Property Key Migration
Bootstrap Properties
Proxy Configuration
Kerberos Service
Notes
System Properties
Web Properties
Security Properties
Identity Mapping Properties
Providers Properties
Alias Properties
Database Properties
Extension Directories
Kerberos Properties
Metadata Database
H2
Postgres
MySQL
Schema Differences & Limitations
Persistence Providers
Flow Persistence Providers
FileSystemFlowPersistenceProvider
GitFlowPersistenceProvider
DatabaseFlowPersistenceProvider
Switching from other Flow Persistence Provider
Data model version of serialized Flow snapshots
Bundle Persistence Providers
FileSystemBundlePersistenceProvider
S3BundlePersistenceProvider
Event Hooks
Shared Event Hook Properties
ScriptEventHookProvider
LoggingEventHookProvider
URL Aliasing
Backup & Recovery
Metadata Database
Persistence Providers
Flow Persistence
Bundle Persistence
Configuration Files
Learning & Training
Getting Started with Apache NiFi
Who is This Guide For?
Terminology Used in This Guide
Downloading and Installing NiFi
Starting NiFi
For Windows Users
For Linux/Mac OS X users
Installing as a Service
I Started NiFi. Now What?
Adding a Processor
Configuring a Processor
Connecting Processors
Starting and Stopping Processors
Getting More Info for a Processor
Other Components
What Processors are Available
Data Transformation
Routing and Mediation
Database Access
Attribute Extraction
System Interaction
Data Ingestion
Data Egress / Sending Data
Splitting and Aggregation
HTTP
Amazon Web Services
Working With Attributes
Common Attributes
Extracting Attributes
Adding User-Defined Attributes
Routing on Attributes
Expression Language / Using Attributes in Property Values
Working With Templates
Monitoring NiFi
Status Bar
Component Statistics
Bulletins
Data Provenance
Event Details
Lineage Graph
Getting Started with Streaming Analytics
Building an End-to-End Stream Application
Understanding the Use Case
Reference Architecture
Prepare Your Environment
Deploying Your Cluster
Registering Schemas in Schema Registry
Create the Kafka Topics
Register Schemas for the Kafka Topics
Setting up an Enrichment Store, Creating an HBase Table, and Creating an HDFS Directory
Creating a Dataflow Application
Data Producer Application Generates Events
NiFi: Create a Dataflow Application
NiFi Controller Services
NiFi Ingests the Raw Sensor Events
Publish Enriched Events to Kafka for Consumption by Analytics Applications
Start the NiFi Flow
Pick your Streaming Engine
Creating a Streaming Analytics Application with SAM
Creating a Stream Analytics Application with SAM
Two Options for Creating the Streaming Analytics Applications
Setting up the Stream Analytics App using the TruckingRefAppEnvEnviornmentBuilder
Configuring and Deploying the Reference Application
Creating a Service Pool and Environment
Creating Your First Application
Creating and Configuring the Kafka Source Stream
Connecting Components
Joining Multiple Streams
Filtering Events in a Stream using Rules
Using Aggregate Functions over Windows
Implementing Business Rules on the Stream
Transforming Data using a Projection Processor
Streaming Alerts to an Analytics Engine for Dashboarding
Streaming Violation Events to an Analytics Engine for Descriptive Analytics
Streaming Violation Events into a Data Lake and Operational Data Store
Deploy a SAM Application
Configure Deployment Settings
Deploy the App
Advanced: Performing Predictive Analytics on the Stream using SAM
Logistical Regression Model
Export the Model into SAM's Model Registry
Enrichment and Normalization of Model Features
Upload Custom Processors and UDFs for Enrichment and Normalization
Upload Custom UDFs
Upload Custom Processors
Scoring the Model in the Stream using a Streaming Split Join Pattern
Streaming Split Join Pattern
Score the Model Using the PMML Processor and Alert
Creating Visualizations Using Superset
Creating Insight Slices
Adding Insight Slices to a Dashboard
Dashboards for the Trucking IOT App
SAM Test Mode
Four Test Cases using SAM's Test Mode
Test Case 1: Testing Normal Event with No Violation Prediction
Analyzing Test Case 1 Results
Test Case 2: Testing Normal Event with Yes Violation Prediction
Analyzing Test Case 2 Results
Test Case 3: Testing Violation Event
Analyzing Test Case 3 Results
Test Case 4: Testing Multiple-Speeding-Events
Analyzing Test Case 4 Results
Running SAM Test Cases as Junit Tests in CI Pipelines
Creating Custom Sources and Sinks
Cloud Use Case: Integration with AWS Kinesis and S3
Registering a Custom Source in SAM for AWS Kinesis
Registering a Custom Sink in SAM for AWS S3
Implementing the SAM App with Kinesis Source and S3 Sink
Stream Operations
My Applications View
Application Performance Monitoring
Exporting and Importing Stream Applications
Troubleshooting and Debugging a Stream Application
Monitoring SAM Apps and Identifying Performance Issues
Identifying Throughput Bottlenecks
Throughput Improvements for the Kafka Source
Identifying Processor Performance Bottlenecks
Latency Improvements
Debugging an Application through Distributed Log Search
Debugging an Application through Sampling
Spark Streaming
Running the Stream Simulator
Managing Kafka with Streams Messaging Manager
SMM Overview
Installing DataPlane Streams Messaging Manager
Enabling Reference Application Cluster for SMM
Monitoring Kafka with SMM
Getting Started with Apache NiFi Registry
Terminology Used in This Guide
Downloading and Installing NiFi Registry
Starting NiFi Registry
For Linux/Unix/Mac OS X users
Installing as a Service
I Started NiFi Registry. Now What?
Create a Bucket
Connect NiFi to the Registry
Start Version Control on a Process Group
Save Changes to a Versioned Flow
Import a Versioned Flow
Deploy the Cluster Services
Finish the wizard and deploy the cluster. After the cluster has been deployed, some services might fail in starting. If this is the case, start those services individually.
Parent topic:
Configure HDF Components
© 2012–2020, Cloudera, Inc.
Document licensed under the
Creative Commons Attribution ShareAlike 4.0 License
.
Cloudera.com
|
Documentation
|
Support
|
Community