Data Movement and Integration
Also available as:
PDF
loading table of contents...

Contents

1. HDP Data Movement and Integration
Intended Audience
Data Movement Components
2. Data Management and Falcon Overview
Falcon Access
Understanding Entity Relationships
3. Considerations for Using Falcon
4. Prerequisite to Installing or Upgrading Falcon
Ambari Install or Upgrade of Falcon
Manual Install or Upgrade of Falcon
5. Configuring for High Availability
Configuring Properties and Setting Up Directory Structure for High Availability
Preparing the Falcon Servers
Manually Failing Over the Falcon Servers
6. Creating Falcon Entity Definitions
Replication Between HDP Versions
Running Falcon in a Secure Environment
Creating HDFS Directories for Falcon
Defining Entities Using the Falcon Web UI
Creating a Cluster Entity Definition Using the Web UI
Creating a Feed Entity Definition Using the Web UI
Creating a Process Entity Definition Using the Web UI
Scheduling or Pausing an Entity Using the Web UI
Defining Entities Using the CLI
Creating a Cluster Entity Definition Using the CLI
Creating a Feed Entity Definition Using the CLI
Creating a Process Entity Definition Using the CLI
Submitting and Scheduling an Entity Using the CLI
7. Mirroring Data with Falcon
Prepare to Mirror Data
Mirror File System Data Using the Web UI
Mirror Hive Data Using the Web UI
Mirror Data Using Snapshots
Mirror File System Data Using the CLI
8. Replicating Data with Falcon
Prerequisites
Replicating Data Using the CLI
Define the Data Source: Set Up a Source Cluster Entity
Create the Replication Target: Define a Cluster Entity
Create the Feed Entity
Submit and Schedule the Entities
Confirm Results
9. Mirroring Data with HiveDR in a Secure Environment
Prepare for Disaster Recovery
auth-to-local Setting
Proxy User
Nameservices
Configure Properties for HiveDR
Initialize Data for Hive Replication
Mirror Hive Data Using the Web UI
10. Enabling Mirroring and Replication with Azure Cloud Services
Connect the Azure Data Factory to Your On-premises Hadoop Cluster
Configuring to Copy Files From an On-premises HDFS Store to Azure Blob Storage
11. Using Advanced Falcon Features
Locating and Managing Entities
Accessing File Properties from Ambari
Enabling Transparent Data Encryption
Putting Falcon in Safe Mode
Viewing Alerts in Falcon
Late Data Handling
Setting a Retention Policy
Setting a Retry Policy
Enabling Email Notifications
Understanding Dependencies in Falcon
Viewing Dependencies
12. Using Apache Sqoop to Transfer Bulk Data
Apache Sqoop Connectors
Sqoop Import Table Commands
Netezza Connector
Sqoop-HCatalog Integration
Controlling Transaction Isolation
Automatic Table Creation
Delimited Text Formats and Field and Line Delimiter Characters
HCatalog Table Requirements
Support for Partitioning
Schema Mapping
Support for HCatalog Data Types
Providing Hive and HCatalog Libraries for the Sqoop Job
Examples
Configuring a Sqoop Action to Use Tez to Load Data into a Hive Table
Troubleshooting Sqoop
"Can't find the file" Error when Using a Password File
13. Using HDP for Workflow and Scheduling With Oozie
ActiveMQ With Oozie and Falcon
Configuring Pig Scripts to Use HCatalog in Oozie Workflows
Configuring Individual Pig Actions to Access HCatalog
Configuring All Pig Actions to Access HCatalog
14. Using Apache Flume for Streaming
15. Troubleshooting
Falcon logs
Falcon Server Failure
Delegation Token Renewal Issues
Invalid Entity Schema
Incorrect Entity
Bad Config Store Error
Unable to set DataSet Entity
Oozie Jobs
16. Appendix
Configuring for Disaster Recovery in Releases Prior to HDP 2.5