Data Governance
Also available as:
PDF
loading table of contents...

Creating a Cluster Entity

Always specify a cluster entity before defining other elements in your data pipeline. The cluster entity defines where the data and the processes for your data pipeline are stored. For more information, see the cluster entity XSD here.

To use the Falcon web UI to define a cluster entity:

  1. At the top of the Falcon web UI page, click Cluster.

    Figure 2.4. New Cluster Configuration Dialog


  2. On the New Cluster page, specify the following values:

    Table 2.1. Cluster Entity Configuration Values

    Value

    Description

    Name

    Name of the cluster entity. Not necessarily the actual cluster name.

    Colo and Description

    Name and description of the data center.

    Tags

    Metadata tagging.

    Access Control List

    Specify the HDFS access permissions.

    Interfaces

    Specify the interface types:

    • readonly -- Required for distcp (distributed copy) used in replication.

    • write --Required to write to HDFS.

    • execute --Required to write jobs to MapReduce.

    • workflow --Required. This interface submits Oozie jobs.

    • messaging --Required to send alerts.

    • registry --Required to register or deregister partitions in the Hive Metastore and to fetch events on partition availability.

    Properties

    Specify a name and value for each property.

    Location

    Specify HDFS locations for the staging, temp, and working directories. For more information, see Prerequisite Setup Steps.


  3. Click Next to view a summary of your cluster entity definition. The XML file is displayed to the right of the summary. Click Edit XML to edit the XML directly.

  4. If you are satisfied with the cluster entity definition, click Save.

  5. To verify that you successfully created the cluster entity, enter the cluster entity name in the Falcon web UI Search well and press Enter. If the cluster entity name appears in the search results, it was successfully created. See Search For and Manage Data Pipeline Entities.