Cold start

Cold start can be defined as the set-up time required to get a serverless application's environment up and running when it is invoked for the first time within a defined period.

What is a cold start?

The concept of cold start is very important when using Function as a Service solutions, especially if you are developing a customer-facing application that needs to operate in real time. A cold start is the first request that a new function instance handles. It happens because your function is not yet running and the cloud provider needs to provision resources and deploy your code before the processing can begin. This usually means that the processing is going to take much longer than expected.

Before execution, the cloud provider needs to:
  • Provision the container that will run the code
  • Initialize the container / runtime
  • Initialize your function
before the trigger information can be forwarded to the function’s handler.

For more information about the concept of cold start, see Operating Lambda: Performance optimization.

How to avoid a cold start?

Cloud providers offer the option to always have at least one instance of the function running to completely remove the occurrence of a cold start. You can read more about predictable start-up times in Provisioned concurrency for AWS Lambda.

Cold start with DataFlow Functions

When a cold starts happens with DataFlow Functions, the following steps are executed before the trigger payload gets processed:

  • The function retrieves the flow definition from the DataFlow Catalog.
  • The NiFi Stateless runtime is initialized.
  • The flow definition is parsed to list the required NAR files for executing the flow.
  • The required NAR files are downloaded from the specified repository unless the NARs are made available in the local storage attached to the function.
  • The NARs are loaded into the JVM.
  • The flow is initialized and components are started.

The step that might take a significant amount of time is the download of the NAR files. In order to reduce the time required for this step as much as possible, the best option is to list the NAR files required for the flow definition and make those files available to the function.

For the related instructions, see Cloud storage.