Configuring your Google Cloud function

After you have created a function, you can use the built-in configuration options to control its behavior. You can configure additional capabilities, adjust resources associated with your function, such as memory and timeout, or you can also create and edit test events to test your function using the console.

Memory and timeout in runtime configuration

You can configure two important elements on the Runtime Configuration tab, timeout and memory allocated to the cloud function. The appropriate values for these two settings depend on the data flow to be deployed. Typically, it is recommended to start with allocating 1 GB - 2 GB memory. You can adjust this value later as you see how much memory your function needs.

While DataFlow Functions performs very well during a warm start, a cold start that must source extensions may take several seconds to initialize. As a result, it is recommended to set the timeout to at least 60 seconds. If the dataflow to run reaches out to many other services or performs complex computation, it may be necessary to use a larger value (even several minutes).

Runtime environment variables

You must configure the function to specify the data flow to run and add any necessary runtime configuration using environment variables.

  1. Navigate to the Runtime tab and choose Runtime Environment variables in the left-hand menu.
  2. Click Add variable to add the first environment variable.
    You can add one or more environment variables to configure the function. The following environment variables are supported:
    Variable Name Description Required Default Value
    FLOW_CRN

    The Cloudera Resource Name (CRN) for the data flow that is to be run.

    The data flow must be stored in the DataFlow Catalog. This CRN should indicate the specific version of the data flow and as such will end with a suffix like /v.1.

    For more information, see Retrieving data flow CRN.

    true --
    DF_PRIVATE_KEY

    The Private Key for accessing the Cloudera DataFlow service.

    The Private Key and Access Key are used to authenticate with the DataFlow Service and they must provide the necessary authorizations to access the specified data flow. For more information, see Provisioning Access Key ID and Private Key.

    true --
    DF_ACCESS_KEY

    The Access Key for accessing the Cloudera DataFlow service.

    The Private Key and Access Key are used to authenticate with the DataFlow Service and they must provide the necessary authorizations to access the specified data flow. For more information, see Provisioning Access Key ID and Private Key.

    true --
    INPUT_PORT

    The name of the Input Port to use. If the specified data flow has more than one Input Port at the root group level, this environment variable must be specified, indicating the name of the Input Port to queue up the Cloud Function event. If there is only one Input Port, this variable is not required. If it is specified, it must properly match the name of the Input Port.

    false --
    OUTPUT_PORT

    The name of the Output Port to retrieve the results from. If no Output Port exists, the variable does not need to be specified and no data will be returned. If at least one Output Port exists in the data flow, this variable can be used to determine the name of the Output Port whose data will be sent along as the output of the function.

    For more information on how the appropriate Output Port is determined, see Output ports.

    false --
    FAILURE_PORTS

    A comma-separated list of Output Ports that exist at the root group level of the data flow. If any FlowFile is sent to one of these Output Ports, the function invocation is considered a failure.

    For more information, see Output ports.

    false --
    DF_SERVICE_URL

    The Base URL for the Cloudera Dataflow Service.

    false https://api.us-west-1.cdp.cloudera.com/
    NEXUS_URL

    The Base URL for a Nexus Repository for downloading any NiFi Archives (NARs) needed for running the data flow.

    false https://repository.cloudera.com/artifactory/cloudera-repos/
    WORKING_DIR

    The working directory, where NAR files will be expanded.

    false /tmp/working
    EXTENSIONS_DOWNLOAD_DIR

    The directory to which missing extensions / NiFi Archives (NARs) will be downloaded.

    For more information, see Providing Custom Extensions / NARs.

    false /tmp/extensions
    STORAGE_BUCKET

    A Cloud Storage bucket in which to look for custom extensions / NiFi Archives (NARs) and resources.

    For more information, see Cloud storage.

    false --
    STORAGE_EXTENSIONS_DIRECTORY

    The directory in the Cload Storage bucket to look for custom extensions / NiFi Archives (NARs).

    For more information, see Cloud storage.

    false extensions
    STORAGE_RESOURCES_DIRECTORY

    The directory in the Cloud Storage bucket to look for custom resources.

    For more information, see Providing additional resources.

    false resources
    DISABLE_STATE_PROVIDER

    If true, it disables the Firestore data flow state provider, even if the data flow has stateful processors.

    false --
    FIRESTORE_STATE_COLLECTION

    The Firestore collection name where the state will be stored.

    false nifi_state
    KRB5_FILE

    It specifies the filename of the krb5.conf file. This is necessary only if connecting to a Kerberos-protected endpoint.

    For more information, see Configuring Kerberos.

    false /etc/krb5.conf

    The following environment variables apply only to HTTP triggers:

    Variable Name Description Required Default Value
    HEADER_ATTRIBUTE_PATTERN

    A Regular Expression capturing all flowfile attributes from the Output Port that should be returned as HTTP headers, if the trigger type is HTTP. For all other trigger types, this variable is ignored.

    false --
    HTTP_STATUS_CODE_ATTRIBUTE

    A flowfile attribute name that will set the response HTTP status code in a successful data flow, if the trigger type is HTTP. For all other trigger types, this variable is ignored.

    See the HTTP trigger section of Google Cloud Function triggers for more information.

    false --