Ingesting Data into Cloud Object Stores with RAZ AuthorizationsPDF version

Understand the use case

Learn how you can use object store processors to build an end-to-end flow that ingests data into Cloud object stores in a CDP environment that is enabled for fine-grained access control through RAZ.

The object store processors are:
  • ListCDPObjectStore: Lists the FlowFiles in the source bucket that you want to ingest.
  • FetchCDPObjectStore: Fetches the FlowFiles from the source bucket. If the files are compressed, this processor uncompresses the files.
  • PutCDPObjectStore: Writes the contents of a FlowFile to the specified directory in the target bucket.
  • DeleteCDPObjectStore: Deletes the ingested data from the source bucket.
These object store processors have the following advantages:
  • The configuration of these processors is simple and easy.
  • These processors can access the underlying object store of the cloud provider where the NiFi cluster is running, If you are running NiFi in Azure, these processors can be used to access ADLS. If you are running NiFi in AWS, these processors can be used to access S3. If you are running NiFi in Google Cloud, these processors can be used to access Google Cloud Storage.
  • These processors are cloud-agnostic. If you have a multi-cloud architecture with, for example, a cluster running in AWS and another running in Azure, you can move this flow from one cloud provider to another with minimal change such as the input bucket.
  • These processors are integrated with RAZ. With RAZ you can define policies in Ranger to specify who has access to what in the object store.
  • Unlike HDFS processors that can only access the buckets that is configured for the data lake for your CDP environment, the object store processors can access any location in the underlying object store.