Chapter 8. Using the Falcon View
Apache Falcon solves enterprise challenges related to Hadoop data replication, business continuity, and lineage tracing by deploying a framework for data management and processing. The Falcon framework can also leverage other HDP components, such as Apache Pig, Apache Hadoop Distributed File System (HDFS), Apache Sqoop, Apache Hive, Apache Spark, and Apache Oozie. Falcon enables this simplified management by providing a framework to define and manage backup, replication, and data transfer.
Hadoop administrators can use the Falcon View to centrally define, schedule, and monitor data management policies. Falcon uses those definitions to auto-generate workflows in Apache Oozie.
This chapter describes the following:
1. Configuring Your Cluster
For the Falcon View to access HDFS, the Ambari Server daemon hosting the view needs to act as the proxy user for HDFS. This allows Ambari to submit requests to HDFS on behalf of the users using the Falcon View. This is critical since the Falcon View stores metadata about the user Falcon entity definitions. This also means users who access the Falcon View must have a user directory setup in HDFS.
1.1. Setup HDFS Proxy User
To set up an HDFS proxy user for the Ambari Server daemon account, you need to configure the proxy user in the HDFS configuration. This configuration is determined by the account name the ambari-server daemon is running as. For example, if your ambari-server is running as root, you set up an HDFS proxy user for root with the following:
In Ambari Web, browse to Services > HDFS > Configs.
Under the Advanced tab, navigate to the Custom core-site section.
Click Add Property… to add the following custom properties:
hadoop.proxyuser.root.groups="users" hadoop.proxyuser.root.hosts=ambari-server.hostname
Notice the ambari-server daemon account name root is part of the property name. Be sure to modify this property name for the account name you are running the ambari-server as. For example, if you were running ambari-server daemon under an account name of ambariusr, you would use the following properties instead:
hadoop.proxyuser.ambariusr.groups="users" hadoop.proxyuser.ambariusr.hosts=ambari-server.hostname
Similarly, if you have configured Ambari Server for Kerberos, be sure to modify this property name for the primary Kerberos principal user. For example, if ambari-server is setup for Kerberos using principal ambari-server@EXAMPLE.COM, you would use the following properties instead:
hadoop.proxyuser.ambari-server.groups="users" hadoop.proxyuser.ambari-server.hosts=ambari-server.hostname
Save the configuration change and restart the required components as indicated by Ambari.
1.2. Setup HDFS User Directory
The Falcon View stores user metadata in HDFS. By default, the location in HDFS for
this metadata is /user/${username}
where
${username}
is the username of the currently logged in user
that is accessing the Falcon View.
Important | |
---|---|
Since many users leverage the default Ambari admin user for getting started
with Ambari, the |
To create user directories in HDFS, do the following for each user you plan to have use the Hive View.
Connect to a host in the cluster that includes the HDFS client.
Switch to the hdfs system account user.
su - hdfs
Using the HDFS client, make an HDFS directory for the user. For example, if your username is admin, you would create the following directory.
hadoop fs -mkdir /user/admin
Set the ownership on the newly created directory. For example, if your username is admin, you would make that user the directory owner.
hadoop fs -chown admin:hadoop /user/admin
2. Installing and Configuring the Falcon View
You must manually copy the .jar
file for the Falcon View, then
configure Ambari to access the View. You can install the Falcon View in a secure or an
unsecure cluster. If using a secure cluster, Ambari and Falcon must be properly
configured with Kerberos.
Prerequisites
Apache Falcon must have been installed and configured, and be deployed in Ambari.
For an Ambari-managed installation, Falcon is included as a default service. To deploy the Falcon service, refer to Adding a Service to your Hadoop cluster.
For manual (non-Ambari) installation and setup of Falcon, refer to Installing Apache Falcon, then Adding a Service to your Hadoop cluster.
The users and groups for Falcon must exist in Ambari prior to installing the Falcon View.
Refer to Managing Users and Groups.
Falcon must have been configured as a proxy super user in the
oozie-site
properties and in the HDFScore-site
properties.
Steps
Copy the Falcon View
falcon-ambari-view.jar
file from the Falcon server/webapp
directory to the Ambari server/views
directory.If the Falcon and Ambari servers are on the same host, use the copy command:
cp /usr/hdp/current/falcon-server/server/webapp/falcon-ambari-view.jar /var/lib/ambari-server/resources/views/
If the Falcon server is on a remote host, use the secure copy command for your operating system.
A key pair might be required. See your operating system documentation for more information about remote copies.
Restart the Ambari server.
[root@DataMovementDocs-1 ~]# ambari-server restart
In Ambari, navigate to
user_name
> Manage Ambari.Under Deploy Views, click Views, then click Falcon > Create Instance in the Views list.
Provide the required Details information.
Instance Name: 250 characters, no spaces, no special characters Display Name: 250 characters, including spaces; no special characters; can be the same as the Instance Name Description: 140 characters max, including spaces; special characters allowed Note If you enter more than the allowed number of characters, you might see the error message
Cannot create instance: Server Error
.Select a cluster configuration.
The Local and Remote fields populate with the names of available clusters. The authentication type for the cluster is automatically recognized.
To use a custom cluster location, enter the Falcon service URI and authentication type of
simple
orkerberos
.Click Save.
The Permissions section displays at the bottom of the Views page.
(Optional) Set the permissions for access to the view.
Hover over the Views icon to verify that your Falcon View is available in the menu.
Note Do not click on the Falcon link yet. You must make additional configuration changes before you can access the Falcon View.
Click the Ambari icon to return to the Dashboard window, then click the Falcon service and the Configs tab.
Scroll to the Falcon startup.properties section, locate the *.application.services field, and enter the following services immediately above the line
org.apache.falcon.metadata.MetadataMappingService
:org.apache.falcon.service.GroupsService,\
org.apache.falcon.service.ProxyUserService,\
Add the proxy user for hosts and groups in the Custom falcon-runtime.properties section.
The proxy user is the user that the Falcon process runs as, typically Falcon.
Click Add Property.
Add the following key/value pairs.
Substitute
#USER#
with the proxy user configured for the Ambari server.Key=*.falcon.service.ProxyUserService.proxyuser.#USER#.hosts, Value=*
These are the hosts from which
#USER#
can impersonate other users.Key=*.falcon.service.ProxyUserService.proxyuser.#USER#.groups, Value=*
These are the groups that the users being impersonated must belong to.
Example 8.1. Substitute
#USER#
In the key/value pairs above, if the
#USER#
is “falcon”, enter*.falcon.service.ProxyUserService.proxyuser.falcon.hosts
.The wildcard value=* (asterisk) is used to allow impersonation from any host or of any user. If you don't use the wildcard character, enter the appropriate host or group values.
Click Save on the information bar at the top of the Configs page.
If you try to leave the page without clicking Save, you see a Warning message. Click Save in the Warning dialog box.
A
Restart Required
message displays at the top of the Falcon Configs page.Click Restart > Restart All Affected to restart the Falcon services.
When the restart completes, verify that you can access the Falcon View by clicking Falcon in the Views menu.
3. Accessing the Falcon Documentation
You can access the Falcon documentation in the Data Movement and Integration guide on the Hortonworks documentation website.