Flow definition overview

The NiFi flow definition used in this quickstart is based on Apache NiFi 1.18.0 and you can download this flow definition from here.

The triggerPayload input port is where the Cloudera DataFlow Functions framework is going to ingest the payload of the trigger that is generated by the cloud provider.

Then you decide whether or not you want to log the attributes and payload of the flowfile in the logs of the function. This is particularly useful when deploying the function for the first time as it helps you to understand what is going on and what is being generated. It would disabled for a function running in production. The logging is enabled by default but you can turn it off by setting the below variable in the function’s configuration: logPayload = false

Then you have a RouteOnAttribute processor that redirects the flow file into a process group dedicated to each cloud provider depending on where the function is being deployed. The attribute containing the cloud provider name is automatically generated by the framework.

AWS Process Group

The payload of the flowfile generated by the API Gateway trigger is a JSON for which one of the fields will be the binary data of your image with base 64 encoding.

This is why you first need to extract the fields you are interested in, then do the base 64 decoding to have the binary data representing the image as the flow file content. After that you execute the ResizeImage processor, then re-encode the result with base 64 encoding and finally you generate the JSON payload that is expected by the AWS API Gateway to send back the response to the HTTP client.

Azure and GCP Process Groups

The binary data is the content of the generated flow file, so there is no need to do any conversion and you can directly use the ResizeImage processor based on the flow file attributes.