Data flow state

When your data flow uses stateful processors such as ListSFTP, you can store processor state in Firestore so that state persists across function invocations. Setting up the Firestore integration helps you avoid losing track of processed files or other state between function executions.

By default, if your data flow contains any stateful processors (e.g. ListSFTP), this state is automatically stored in a Firestore collection called nifi_state. You can set this name using the FIRESTORE_STATE_COLLECTION environment variable. In order to integrate with Firestore, some additional steps are required. Without these steps, the data flow will still run, but the state will not be preserved.
  1. In the GCP console, navigate to the Firestore service.
  2. Click Select Native Mode.
  3. Select the appropriate region, and click Create database.
  4. Navigate to the IAM service, and click the Edit (pencil) button on the service account principal that runs your Cloud Function.
  5. Click Add Role, and add the Firebase Admin role.
  6. Click Save.
    The Firestore state provider can be disabled even if your data flow contains stateful processors by setting the DISABLE_STATE_PROVIDER Environment Variable to true.