Configuring your Lambda function

After you create a function, you can use the built-in configuration options to control its behavior. You can configure additional capabilities, adjust resources associated with your function, such as memory and timeout, or you can also create and edit test events to test your function using the console.

Memory and timeout in runtime configuration

You can configure two very important elements, memory and timeout on the Configuration > General configuration tab of the Lambda function details screen.

The appropriate values for these two settings depend on the data flow to be deployed. Typically, it is recommended to start with 1536-2048 MB (1.5 - 2 GB) memory allocated to Lambda. This amount of memory may be more than necessary or less than required. The default value of 512 MB may be perfectly fine for many use cases. You can adjust this value later as you see how much memory your function needs.

While Cloudera DataFlow Functions tend to perform very well during a warm start, a cold start that must source extensions (processors, controller services, and so on) may take several seconds to initialize. So it is recommended to set the Timeout to at least 30 seconds. If the data flow to run reaches out to many other services or performs complex computation, it may be necessary to use a larger value, even several minutes.

Click Edit if you want to change the default values.

See the corresponding AWS documentation to learn more about timeout duration and memory sizing, which determines the amount of resources provisioned for your function.

Runtime environment variables

You must configure the function to specify which data flow to run and add any necessary runtime configuration using environment variables.

  1. Navigate to the Configuration tab and select Environment variables from the menu on the left.
  2. Click Edit on the Environment variables pane.
  3. Define your environment variables as key-value pairs.
    You can add one or more environment variables to configure the function. The following environment variables are supported:
    Variable Name Description Required Default Value
    FLOW_CRN

    The Cloudera Resource Name (CRN) for the data flow that is to be run.

    The data flow must be stored in the Cloudera DataFlow Catalog. This CRN should indicate the specific version of the data flow and as such will end with a suffix like /v.1.

    For more information, see Retrieving data flow CRN.

    true --
    DF_PRIVATE_KEY

    The Private Key for accessing the Cloudera DataFlow service.

    The Private Key and Access Key are used to authenticate with the Cloudera DataFlow Service and they must provide the necessary authorizations to access the specified data flow.

    For more information, see Provisioning Access Key ID and Private Key.

    true --
    DF_ACCESS_KEY

    The Access Key for accessing the Cloudera DataFlow service.

    The Private Key and Access Key are used to authenticate with the Cloudera DataFlow Service and they must provide the necessary authorizations to access the specified data flow.

    For more information, see Provisioning Access Key ID and Private Key.

    true --
    INPUT_PORT

    The name of the Input Port to use.

    If the specified data flow has more than one Input Port at the root group level, this environment variable must be specified, indicating the name of the Input Port to queue up the AWS Lambda notification. If there is only one Input Port, this variable is not required. If it is specified, it must properly match the name of the Input Port.

    false --
    OUTPUT_PORT

    The name of the Output Port to retrieve the results from.

    If no Output Port exists, the variable does not need to be specified and no data will be returned. If at least one Output Port exists in the data flow, this variable can be used to determine the name of the Output Port whose data will be sent along as the output of the function.

    For more information on how the appropriate Output Port is determined, see Output ports.

    false --
    FAILURE_PORTS

    A comma-separated list of Output Ports that exist at the root group level of the data flow. If any FlowFile is sent to one of these Output Ports, the function invocation is considered a failure.

    For more information, see Output ports.

    false --
    DEFAULT_PARAM_CONTEXT

    If the data flow uses a Parameter Context, the Lambda function will look for a Secret in the AWS Secrets Manager with the same name. If no Secret can be found with that name, this variable specifies the name of the Secret to default to.

    For more information, see Parameters.

    false --
    PARAM_CONTEXT_*

    If the data flow makes use of Parameter Contexts, this environment variable provides a mechanism for mapping a Parameter Context to one or more alternate Secrets in the AWS Secrets Manager, in a comma-separated list.

    For more information, see Parameters.

    false --
    CONTENT_REPO

    The contents of the FlowFiles can be stored either in memory, on the JVM heap, or on disk. If this environment variable is set, it specifies the path to a directory where the content should be stored. If it is not specified, the content is held in memory.

    For more information, see File system for content repository.

    false --
    EXTENSIONS_*

    The directory to look for custom extensions / NiFi Archives (NARs).

    For more information, see Providing custom extensions / NARs.

    false --
    DF_SERVICE_URL

    The Base URL for the Cloudera DataFlow Service.

    false https://api.us-west-1.cdp.cloudera.com/
    NEXUS_URL

    The Base URL for a Nexus Repository for downloading any NiFi Archives (NARs) needed for running the data flow.

    false https://repository.cloudera.com/artifactory/cloudera-repos/
    STORAGE_BUCKET

    An S3 bucket in which to look for custom extensions / NiFi Archives (NARs) and resources.

    For more information, see S3 Bucket storage.

    false --
    STORAGE_EXTENSIONS_DIRECTORY

    The directory in the S3 bucket to look for custom extensions / NiFi Archives (NARs).

    For more information, see S3 Bucket storage.

    false extensions
    STORAGE_RESOURCES_DIRECTORY

    The directory in the S3 bucket to look for custom resources.

    For more information, see Providing additional resources.

    false resources
    WORKING_DIR

    The working directory, where NAR files will be expanded.

    false /tmp/working
    EXTENSIONS_DOWNLOAD_DIR

    The directory to which missing extensions / NiFi Archives (NARs) will be downloaded.

    For more information, see Providing custom extensions / NARs.

    false /tmp/extensions
    EXTENSIONS_DIR_*

    It specifies read-only directories that may contain custom extensions / NiFi Archives (NARs).

    For more information, see Providing custom extensions / NARs.

    false --
    DISABLE_STATE_PROVIDER

    If true, it disables the DynamoDB data flow state provider, even if the data flow has stateful processors.

    false --
    DYNAMODB_STATE_TABLE

    The DynamoDB table name where the state will be stored

    false nifi_state
    DYNAMODB_BILLING_MODE

    It sets the DynamoDB Billing Mode (PROVISIONED or PAY_PER_REQUEST).

    false PAY_PER_REQUEST
    DYNAMODB_WRITE_CAPACITY_UNITS

    It sets the DynamoDB Write Capacity Units, when using PROVISIONED Billing Mode.

    true if using PROVISIONED Billing Mode --
    DYNAMODB_READ_CAPACITY_UNITS

    It sets the DynamoDB Read Capacity Units, when using PROVISIONED Billing Mode.

    true if using PROVISIONED Billing Mode --
    KRB5_FILE

    It specifies the filename of the krb5.conf file. This is necessary only if connecting to a Kerberos-protected endpoint.

    For more information, see Configuring Kerberos.

    false /etc/krb5.conf
  4. When ready, click Save.