FetchCDPObjectStore 2.3.0.4.10.0.0-147

Bundle
com.cloudera | nifi-cdf-objectstore-nar
Description
Retrieves a file from an object store. The content of the incoming FlowFile is replaced by the content of the file in the object store. The file in the store is left intact without any changes being made to it.
Tags
ADLS, AWS, Azure, CDP, GCP, GCS, Google, HCFS, HDFS, S3, fetch, filesystem, get, hadoop, ingest, source
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
  • Additional Details for FetchCDPObjectStore 2.3.0.4.10.0.0-147

    FetchCDPObjectStore

    Description

    FetchCDPCObjectStore provides the capability for retrieving files from the object store. In most aspects it behaves identical to it’s HDFS counterpart (FetchHDFS). For these details please refer to the description of the FetchHDFS.

    This processor is easy to chain with ListCDPObjectStore: Flow files emitted by the ListCDPObjectStore will contain attributes ${path} and ${filename}. Using the default ${path}/${filename} value for property “Filename” will make it simple to retrieve the files from the object store listed by the ListCDPObjectStore.

    CDP Object Store processors

    FetchCDPCObjectStore is part of the CDP Object Store processor family. This comes with a number of consequences listed below.

    Object Store access

    This processor is designed to ease the interactions with the object store associated to the NiFi cluster. If used in CDP Private Cloud, it can be used to facilitate interactions with HDFS and/or Ozone. If used in CDP Public Cloud, it can be used to interact with the object store of the underlying cloud provider (S3 for AWS, ADLS for Azure, GCS for Google Cloud, etc) but not cross cloud providers. If the cluster is configured with RAZ, the processor will interact with RAZ to check the Ranger policies when accessing the resources in the object store. If RAZ is not enabled, it is possible to leverage the IDBroker mappings to map CDP users with cloud accounts and policies.

    Configuration file

    This processor needs a configuration which contains connection details to the object store. This should be a Hadoop-style XML file, occasionally with additional parameters that are specific to the given kind of object store and authentication method. Unless specified otherwise the processor is looking for the CDP-default /etc/hadoop/conf/core-site.xml configuration file.

    This configuration contains information specific to the object store provider (For example Amazon AWS) which, combined with the underlying Hadoop library provides the capability to connect to different kind of stores, authenticate with Kerberos and authorize with Ranger. In the majority of the cases the use of this default configuration is recommended.

    Users may override the default location by adding a dynamic parameter, by the name of “cdp.configuration.resources”. It is possible to add multiple configuration files as a comma-separated list. It is important to note however that for the additional features provided by the underlying Hadoop library to continue to work, a number of additional configuration parameters are needed.

    Storage Location

    If Storage Location property is not set, the default storage location will be used. The default value is defined by the “fs.defaultFS” property of the object store configuration. If the default CDP configuration is used, this will be the Data Lake’s object storage. If this is being set, the value of “fs.defaultFS” will be ignored. It is important to adjust the authentication and authorization settings accordingly.

    Dynamic parameters

    This processors supports dynamic parameters. All dynamic parameters, except the protected ones are passed to the object storage configuration. These will be added as additional configuration parameters or in case some parameters already exist, overwrite them. This provides the opportunity to fine tune the connection without changing the configuration file. The protected parameters are: “fs.defaultFS”, “cdp.configuration.resources”, “cdp.configuration.compression”.

    Compression

    Without further configuration FetchCDPObjectStore handles compression the same way as FetchHDFS does using “Automatically Detected” value for property “Compression codec”. For most of the cases this should provide satisfying behaviour but there is the possibility to override: adding “cdp.configuration.compression” dynamic parameter with a given compression codec name’s as value will override this behaviour. The allowed values are the same as in case of FetchHDFS, other values will result in a validation error.

    Authentication

    This processor supports Kerberos authentication via either Kerberos Credential Service or explicitly providing CDP Username and CDP Password. Both will authenticate against the cluster’s adherent Kerberos service.

Properties
Dynamic Properties
Restrictions
Required Permission Explanation
read distributed filesystem Provides operator the ability to retrieve any file that NiFi has access to in the object store or the local filesystem.
Relationships
Name Description
success FlowFiles will be routed to this relationship once they have been updated with the content of file from the object store.
comms.failure FlowFiles will be routed to this relationship if the content of the file from the object store cannot be retrieve due to a communications failure. This generally indicates that the Fetch should be tried again.
failure FlowFiles will be routed to this relationship if the content of the file from the object store cannot be retrieved and trying again will likely not be helpful. This would occur, for instance, if the file is not found or if there is a permissions issue
Writes Attributes
Name Description
hdfs.failure.reason When a FlowFile is routed to 'failure', this attribute is added indicating why the file could not be fetched from HDFS
hadoop.file.url The hadoop url for the file is stored in this attribute.
objectstore.failure.reason When a FlowFile is routed to 'failure', this attribute is added indicating why the file could not be fetched.
See Also