FetchS3Object

Description:

Retrieves the contents of an S3 Object and writes it to the content of a FlowFile

Tags:

Amazon, S3, AWS, Get, Fetch

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
BucketBucket${s3.bucket}The S3 Bucket to interact with
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Object KeyObject Key${filename}The S3 Object Key to use. This is analogous to a filename for traditional file systems.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
RegionRegionUS West (Oregon)
  • AWS GovCloud (US) AWS Region Code : us-gov-west-1
  • AWS GovCloud (US-East) AWS Region Code : us-gov-east-1
  • US East (N. Virginia) AWS Region Code : us-east-1
  • US East (Ohio) AWS Region Code : us-east-2
  • US West (N. California) AWS Region Code : us-west-1
  • US West (Oregon) AWS Region Code : us-west-2
  • EU (Ireland) AWS Region Code : eu-west-1
  • EU (London) AWS Region Code : eu-west-2
  • EU (Paris) AWS Region Code : eu-west-3
  • EU (Frankfurt) AWS Region Code : eu-central-1
  • EU (Zurich) AWS Region Code : eu-central-2
  • EU (Stockholm) AWS Region Code : eu-north-1
  • EU (Milan) AWS Region Code : eu-south-1
  • EU (Spain) AWS Region Code : eu-south-2
  • Asia Pacific (Hong Kong) AWS Region Code : ap-east-1
  • Asia Pacific (Mumbai) AWS Region Code : ap-south-1
  • Asia Pacific (Hyderabad) AWS Region Code : ap-south-2
  • Asia Pacific (Singapore) AWS Region Code : ap-southeast-1
  • Asia Pacific (Sydney) AWS Region Code : ap-southeast-2
  • Asia Pacific (Jakarta) AWS Region Code : ap-southeast-3
  • Asia Pacific (Melbourne) AWS Region Code : ap-southeast-4
  • Asia Pacific (Tokyo) AWS Region Code : ap-northeast-1
  • Asia Pacific (Seoul) AWS Region Code : ap-northeast-2
  • Asia Pacific (Osaka) AWS Region Code : ap-northeast-3
  • South America (Sao Paulo) AWS Region Code : sa-east-1
  • China (Beijing) AWS Region Code : cn-north-1
  • China (Ningxia) AWS Region Code : cn-northwest-1
  • Canada (Central) AWS Region Code : ca-central-1
  • Canada West (Calgary) AWS Region Code : ca-west-1
  • Middle East (UAE) AWS Region Code : me-central-1
  • Middle East (Bahrain) AWS Region Code : me-south-1
  • Africa (Cape Town) AWS Region Code : af-south-1
  • US ISO East AWS Region Code : us-iso-east-1
  • US ISOB East (Ohio) AWS Region Code : us-isob-east-1
  • US ISO West AWS Region Code : us-iso-west-1
  • Israel (Tel Aviv) AWS Region Code : il-central-1
  • Use 's3.region' Attribute Uses 's3.region' FlowFile attribute as region.
The AWS Region to connect to.
AWS Credentials Provider ServiceAWS Credentials Provider serviceController Service API:
AWSCredentialsProviderService
Implementations: AWSCredentialsProviderControllerService
AWSIDBrokerCloudCredentialsProviderControllerService
The Controller Service that is used to obtain AWS credentials provider
Communications TimeoutCommunications Timeout30 secsThe amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
VersionVersionThe Version of the Object to download
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
SSL Context ServiceSSL Context ServiceController Service API:
SSLContextService
Implementations: StandardRestrictedSSLContextService
StandardSSLContextService
Specifies an optional SSL Context Service that, if provided, will be used to create connections
Endpoint Override URLEndpoint Override URLEndpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
Supports Expression Language: true (will be evaluated using Environment variables only)
Signer OverrideSigner OverrideDefault Signature
  • Default Signature
  • Signature Version 4
  • Signature Version 2
  • Custom Signature
The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
Custom Signer Class Namecustom-signer-class-nameFully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth.Signer interface.
Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [Signer Override] Property has a value of "Custom Signature".
Custom Signer Module Locationcustom-signer-module-locationComma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).

This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: directory, file.

Supports Expression Language: true (will be evaluated using Environment variables only)

This Property is only considered if the [Signer Override] Property has a value of "Custom Signature".
Encryption Serviceencryption-serviceController Service API:
AmazonS3EncryptionService
Implementation: StandardS3EncryptionService
Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions.
Proxy Configuration Serviceproxy-configuration-serviceController Service API:
ProxyConfigurationService
Implementation: StandardProxyConfigurationService
Specifies the Proxy Configuration Controller Service to proxy network requests.
Requester Paysrequester-paysFalse
  • True Indicates that the requester consents to pay any charges associated with retrieving objects from the S3 bucket.
  • False Does not consent to pay requester charges for retrieving objects from the S3 bucket.
If true, indicates that the requester consents to pay any charges associated with retrieving objects from the S3 bucket. This sets the 'x-amz-request-payer' header to 'requester'.
Range Startrange-startThe byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Range Lengthrange-lengthThe number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

Relationships:

NameDescription
successFlowFiles are routed to this Relationship after they have been successfully processed.
failureIf the Processor is unable to process a given FlowFile, it will be routed to this Relationship.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
s3.bucketThe name of the S3 bucket
pathThe path of the file
absolute.pathThe path of the file
filenameThe name of the file
hash.valueThe MD5 sum of the file
hash.algorithmMD5
mime.typeIf S3 provides the content type/MIME type, this attribute will hold that file
s3.etagThe ETag that can be used to see if the file has changed
s3.exceptionThe class name of the exception thrown during processor execution
s3.additionalDetailsThe S3 supplied detail from the failed operation
s3.statusCodeThe HTTP error code (if available) from the failed operation
s3.errorCodeThe S3 moniker of the failed operation
s3.errorMessageThe S3 exception message from the failed operation
s3.expirationTimeIf the file has an expiration date, this attribute will be set, containing the milliseconds since epoch in UTC time
s3.expirationTimeRuleIdThe ID of the rule that dictates this object's expiration time
s3.sseAlgorithmThe server side encryption algorithm of the object
s3.versionThe version of the S3 object
s3.encryptionStrategyThe name of the encryption strategy that was used to store the S3 object (if it is encrypted)

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

Example Use Cases:

Use Case:

Fetch a specific file from S3

Configuration:

The "Bucket" property should be set to the name of the S3 bucket that contains the file. Typically this is defined as an attribute on an incoming FlowFile, so this property is set to ${s3.bucket}.

The "Object Key" property denotes the fully qualified filename of the file to fetch. Typically, the FlowFile's filename attribute is used, so this property is set to ${filename}.

The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like #{S3_REGION}.

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the file.



Example Use Cases Involving Other Components:

Use Case:

Retrieve all files in an S3 bucket

Keywords:

s3, state, retrieve, fetch, all, stream

Components involved:

Component Type: org.apache.nifi.processors.aws.s3.ListS3

Configuration:

The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like #{S3_SOURCE_BUCKET}.

The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like #{S3_SOURCE_REGION}.

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.

The 'success' Relationship of this Processor is then connected to FetchS3Object.



Component Type: org.apache.nifi.processors.aws.s3.FetchS3Object

Configuration:

"Bucket" = "${s3.bucket}"

"Object Key" = "${filename}"

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.

The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.





Use Case:

Retrieve only files from S3 that meet some specified criteria

Keywords:

s3, state, retrieve, filter, select, fetch, criteria

Components involved:

Component Type: org.apache.nifi.processors.aws.s3.ListS3

Configuration:

The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like #{S3_SOURCE_BUCKET}.

The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like #{S3_SOURCE_REGION}.

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.

The 'success' Relationship of this Processor is then connected to RouteOnAttribute.



Component Type: org.apache.nifi.processors.standard.RouteOnAttribute

Configuration:

If you would like to "OR" together all of the conditions (i.e., the file should be retrieved if any of the conditions are met), set "Routing Strategy" to "Route to 'matched' if any matches".

If you would like to "AND" together all of the conditions (i.e., the file should only be retrieved if all of the conditions are met), set "Routing Strategy" to "Route to 'matched' if all match".

For each condition that you would like to filter on, add a new property. The name of the property should describe the condition. The value of the property should be an Expression Language expression that returns true if the file meets the condition or false if the file does not meet the condition.

Some attributes that you may consider filtering on are:

- filename (the name of the file)

- s3.length (the number of bytes in the file)

- s3.tag.<tag name> (the value of the s3 tag with the name tag name)

- s3.user.metadata.<key name> (the value of the user metadata with the key named key name)

For example, to fetch only files that are at least 1 MB and have a filename ending in .zip we would set the following properties:

- "Routing Strategy" = "Route to 'matched' if all match"

- "At least 1 MB" = "${s3.length:ge(1000000)}"

- "Ends in .zip" = "${filename:endsWith('.zip')}"

Auto-terminate the unmatched Relationship.

Connect the matched Relationship to the FetchS3Object processor.



Component Type: org.apache.nifi.processors.aws.s3.FetchS3Object

Configuration:

"Bucket" = "${s3.bucket}"

"Object Key" = "${filename}"

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.

The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.





Use Case:

Retrieve new files as they arrive in an S3 bucket

Notes:

This method of retrieving files from S3 is more efficient than using ListS3 and more cost effective. It is the pattern recommended by AWS. However, it does require that the S3 bucket be configured to place notifications on an SQS queue when new files arrive. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html

Components involved:

Component Type: org.apache.nifi.processors.aws.sqs.GetSQS

Configuration:

The "Queue URL" must be set to the appropriate URL for the SQS queue. It is recommended that this property be parameterized, using a value such as #{SQS_QUEUE_URL}.

The "Region" property must be set to denote the SQS region that the queue resides in. It's a good idea to parameterize this property by setting it to something like #{SQS_REGION}.

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.

The 'success' relationship is connected to EvaluateJsonPath.



Component Type: org.apache.nifi.processors.standard.EvaluateJsonPath

Configuration:

"Destination" = "flowfile-attribute"

"s3.bucket" = "$.Records[0].s3.bucket.name"

"filename" = "$.Records[0].s3.object.key"

The 'success' relationship is connected to FetchS3Object.



Component Type: org.apache.nifi.processors.aws.s3.FetchS3Object

Configuration:

"Bucket" = "${s3.bucket}"

"Object Key" = "${filename}"

The "Region" property must be set to the same value as the "Region" property of the GetSQS Processor.

The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.





System Resource Considerations:

None specified.

See Also:

PutS3Object, DeleteS3Object, ListS3