Configuring your Lambda function

After you create a function, you can use the built-in configuration options to control its behavior. You can configure additional capabilities, adjust resources associated with your function, such as memory and timeout, or you can also create and edit test events to test your function using the console.

Memory and timeout in runtime configuration

You can configure two very important elements, memory and timeout on the Configuration > General configuration tab of the Lambda function details screen.

The appropriate values for these two settings depend on the data flow to be deployed. Typically, it is recommended to start with 1536-2048 MB (1.5 - 2 GB) memory allocated to Lambda. This amount of memory may be more than necessary or less than required. The default value of 512 MB may be perfectly fine for many use cases. You can adjust this value later as you see how much memory your function needs.

While Cloudera Data Flow Functions tend to perform very well during a warm start, a cold start that must source extensions (processors, controller services, and so on) may take several seconds to initialize. So it is recommended to set the Timeout to at least 30 seconds. If the data flow to run reaches out to many other services or performs complex computation, it may be necessary to use a larger value, even several minutes.

Click Edit if you want to change the default values.

See the corresponding AWS documentation to learn more about timeout duration and memory sizing, which determines the amount of resources provisioned for your function.

Runtime environment variables

You must configure the function to specify which data flow to run and add any necessary runtime configuration using environment variables.

Navigate to the Configuration tab and select Environment variables from the menu on the left.
Click Edit on the Environment variables pane.

Define your environment variables as key-value pairs.

You can add one or more environment variables to configure the function. The following environment variables are supported:


Variable Name	Description	Required	Default Value
FLOW_CRN	The Cloudera Resource Name (CRN) for the data flow that is to be run. The data flow must be stored in the Cloudera Data Flow Catalog. This CRN should indicate the specific version of the data flow and as such will end with a suffix like /v.1. For more information, see Retrieving data flow CRN.	true	--
DF_PRIVATE_KEY	The Private Key for accessing the Cloudera Data Flow service. The Private Key and Access Key are used to authenticate with the Cloudera Data Flow Service and they must provide the necessary authorizations to access the specified data flow. For more information, see Provisioning Access Key ID and Private Key.	true	--
DF_ACCESS_KEY	The Access Key for accessing the Cloudera Data Flow service. The Private Key and Access Key are used to authenticate with the Cloudera Data Flow Service and they must provide the necessary authorizations to access the specified data flow. For more information, see Provisioning Access Key ID and Private Key.	true	--
INPUT_PORT	The name of the Input Port to use. If the specified data flow has more than one Input Port at the root group level, this environment variable must be specified, indicating the name of the Input Port to queue up the AWS Lambda notification. If there is only one Input Port, this variable is not required. If it is specified, it must properly match the name of the Input Port.	false	--
OUTPUT_PORT	The name of the Output Port to retrieve the results from. If no Output Port exists, the variable does not need to be specified and no data will be returned. If at least one Output Port exists in the data flow, this variable can be used to determine the name of the Output Port whose data will be sent along as the output of the function. For more information on how the appropriate Output Port is determined, see Output ports.	false	--
FAILURE_PORTS	A comma-separated list of Output Ports that exist at the root group level of the data flow. If any FlowFile is sent to one of these Output Ports, the function invocation is considered a failure. For more information, see Output ports.	false	--
DEFAULT_PARAM_CONTEXT	If the data flow uses a Parameter Context, the Lambda function will look for a Secret in the AWS Secrets Manager with the same name. If no Secret can be found with that name, this variable specifies the name of the Secret to default to. For more information, see Parameters.	false	--
PARAM_CONTEXT_*	If the data flow makes use of Parameter Contexts, this environment variable provides a mechanism for mapping a Parameter Context to one or more alternate Secrets in the AWS Secrets Manager, in a comma-separated list. For more information, see Parameters.	false	--
CONTENT_REPO	The contents of the FlowFiles can be stored either in memory, on the JVM heap, or on disk. If this environment variable is set, it specifies the path to a directory where the content should be stored. If it is not specified, the content is held in memory. For more information, see File system for content repository.	false	--
EXTENSIONS_*	The directory to look for custom extensions / NiFi Archives (NARs). For more information, see Providing custom extensions / NARs.	false	--
DF_SERVICE_URL	The Base URL for the Cloudera Data Flow Service.	false	https://api.us-west-1.cdp.cloudera.com/
NEXUS_URL	The Base URL for a Nexus Repository for downloading any NiFi Archives (NARs) needed for running the data flow.	false	https://repository.cloudera.com/artifactory/cloudera-repos/
STORAGE_BUCKET	An S3 bucket in which to look for custom extensions / NiFi Archives (NARs) and resources. For more information, see S3 Bucket storage.	false	--
STORAGE_EXTENSIONS_DIRECTORY	The directory in the S3 bucket to look for custom extensions / NiFi Archives (NARs). For more information, see S3 Bucket storage.	false	extensions
STORAGE_RESOURCES_DIRECTORY	The directory in the S3 bucket to look for custom resources. For more information, see Providing additional resources.	false	resources
WORKING_DIR	The working directory, where NAR files will be expanded.	false	/tmp/working
EXTENSIONS_DOWNLOAD_DIR	The directory to which missing extensions / NiFi Archives (NARs) will be downloaded. For more information, see Providing custom extensions / NARs.	false	/tmp/extensions
EXTENSIONS_DIR_*	It specifies read-only directories that may contain custom extensions / NiFi Archives (NARs). For more information, see Providing custom extensions / NARs.	false	--
DISABLE_STATE_PROVIDER	If true, it disables the DynamoDB data flow state provider, even if the data flow has stateful processors.	false	--
DYNAMODB_STATE_TABLE	The DynamoDB table name where the state will be stored	false	nifi_state
DYNAMODB_BILLING_MODE	It sets the DynamoDB Billing Mode (`PROVISIONED` or `PAY_PER_REQUEST`).	false	PAY_PER_REQUEST
DYNAMODB_WRITE_CAPACITY_UNITS	It sets the DynamoDB Write Capacity Units, when using `PROVISIONED` Billing Mode.	true if using `PROVISIONED` Billing Mode	--
DYNAMODB_READ_CAPACITY_UNITS	It sets the DynamoDB Read Capacity Units, when using `PROVISIONED` Billing Mode.	true if using `PROVISIONED` Billing Mode	--
KRB5_FILE	It specifies the filename of the krb5.conf file. This is necessary only if connecting to a Kerberos-protected endpoint. For more information, see Configuring Kerberos.	false	/etc/krb5.conf

When ready, click Save.