FetchCDPObjectStore

Description

PutCDPCObjectStore provides the capability to upload files. In most aspects it behaves identical to its HDFS counterpart (PutHDFS). For these details please refer to the description of the PutHDFS.

CDP Object Store processors

PutCDPCObjectStore is part of the CDP Object Store processor family. This comes with a number of consequences listed below.

Object Store access

This processor is designed to ease the interactions with the object store associated to the NiFi cluster. If used in CDP Private Cloud, it can be used to facilitate interactions with HDFS and/or Ozone. If used in CDP Public Cloud, it can be used to interact with the object store of the underlying cloud provider (S3 for AWS, ADLS for Azure, GCS for Google Cloud, etc) but not cross cloud providers. If the cluster is configured with RAZ, the processor will interact with RAZ to check the Ranger policies when accessing the resources in the object store. If RAZ is not enabled, it is possible to leverage the IDBroker mappings to map CDP users with cloud accounts and policies.

Configuration file

This processor needs a configuration which contains connection details to the object store. This should be a Hadoop-style XML file, occasionally with additional parameters that are specific to the given kind of object store and authentication method. Unless specified otherwise the processor is looking for the CDP-default /etc/hadoop/conf/core-site.xml configuration file.

This configuration contains information specific to the object store provider (For example Amazon AWS) which, combined with the underlying Hadoop library provides the capability to connect to different kind of stores, authenticate with Kerberos and authorize with Ranger. In the majority of the cases the use of this default configuration is recommended.

Users may override the default location by adding a dynamic parameter, by the name of "cdp.configuration.resources". It is possible to add multiple configuration files as a comma-separated list. It is important to note however that for the additional features provided by the underlying Hadoop library to continue to work, a number of additional configuration parameters are needed.

Storage Location

If Storage Location property is not set, the default storage location will be used. The default value is defined by the "fs.defaultFS" property of the object store configuration. If the default CDP configuration is used, this will be the Data Lake's object storage. If this is being set, the value of "fs.defaultFS" will be ignored. It is important to adjust the authentication and authorization settings accordingly.

Dynamic parameters

This processors supports dynamic parameters. All dynamic parameters, except the protected ones are passed to the object storage configuration. These will be added as additional configuration parameters or in case some parameters already exist, overwrite them. This provides the opportunity to fine tune the connection without changing the configuration file. The protected parameters are: "fs.defaultFS" and "cdp.configuration.resources".

Authentication

This processor supports Kerberos authentication via either Kerberos Credential Service or explicitly providing CDP Username and CDP Password. Both will authenticate against the cluster's adherent Kerberos service.