Troubleshooting flow deployment errors

Learn how to recognize and address common errors with your CDF flow deployments.

Setting up kubectl to connect to the DataFlow Kubernetes cluster

It is helpful to have access to the DataFlow Kubernetes cluster using command line tools such as kubectl when you are troubleshooting deployment failures. To set up kubectl access, follow these steps:

  1. Add the AWS IAM role that you will authenticate as to the list of authorized users for DataFlow by selecting the Manage User Access action and entering the ARN in the Add User dialog
  2. Download the Kubeconfig file
  3. Set up your kubectl to use the downloaded kubeconfig file

    export kubeconfig=path_to_downloaded_kubeconfig_file

  4. Run kubectl get ns and validate that your output looks similar to:

With kubectl being set up correctly, you are able to access NiFi and other DataFlow logs directly through the CLI.

Understanding flow deployment failures

The Flow Deployment process consists of two phases:

  1. Scheduling resources on Kubernetes and creating a new NiFi cluster
  2. Importing, configuring and starting the NiFi flow definition

If your flow deployment fails, check the Event History to see where exactly the issue occurred.

Deployment fails during Phase 1

If the issue occurs during Phase 1 while scheduling resources on Kubernetes and creating the NiFi cluster, you can get more details on why the deployment failed by looking at the DataFlow application logs.

Identify the DataFlow application pod by running:

kubectl get pods 
--namespace dfx-local

The result should look similar to

Copy the dfx-local-deployment pod ID and run

kubectl logs 
-f dfx-local-deployment-7f8b466c68-xwrbf 
-c dfx-local 
--namespace dfx-local

to view the DataFlow application logs.

Note: -f will tail the log file as DataFlow is writing to it.

A common reason for flow deployment issues is that the Kubernetes cluster doesn’t have enough resources available for the deployment pods to be scheduled. Pods that are stuck in pending are an indicator for not enough resources being available.

You can explore flow deployments and their resources by running:

get pods --namespace dfx-deployment_name-ns 
A healthy deployment should look similar to this with one or more NiFi pods (depending on the sizing & scaling settings), a Zookeeper and a Prometheus pod.

If one of the pods is stuck in Pending you can explore the pod further and identify any potential issues by looking at its events.

If in the above screenshot dfx-nifi-0 was Pending and you wanted to find out why, you would run

kubectl describe pod dfx-nifi-0 -n dfx-deployment_name-ns

to get detailed information about the containers in the pod. Find the Events section and check if there are any messages about why a container could not be scheduled.

If the flow deployment failed because of insufficient resources in the Kubernetes cluster, you can increase the Kubernetes cluster size by using the Edit Configuration action of the affected environment.

Deployment fails during Phase 2

If the issue occurs during Phase 2, check the NiFi canvas of the deployment for any error messages. To get there, open the deployment details, click Manage Deployment and in the following Deployment Manager page select the view in NiFi action.

If a processor or controller service failed to start, make sure that you have provided the correct values for the deployment parameters. You can adjust parameter values in the NiFi canvas and restart processors or controller services as needed. Once you have identified the issue, note down the correct parameter values and start a new deployment.

To view the NiFi log for a particular deployment run the following kubectl command.

kubectl logs -f dfx-nifi-0 -c app-log --namespace dfx-deployment_name-ns

Note: -f will tail the log file as NiFi is writing to i