Inbound connections

An Inbound Connection Endpoint allows you to stream data from an external source application to a flow.

An Inbound Connection Endpoint provides a stable hostname that can be used to send data to a Cloudera Data Flow deployment located in the same environment. You can create an Inbound Connection Endpoint during deployment, provided that the flow definition supports creating such endpoints.

Endpoints exist within the environment where they were created. They cannot be moved between environments. If the environment is deleted, the endpoint gets deleted as well, and cannot be reused.

One endpoint can be assigned to one deployment at a time. To reassign an existing inbound connection endpoint, you need to first terminate the deployment to which it is currently assigned, then assign the existing endpoint to a new deployment in the NiFi configuration step during flow deployment.

Setting up an Inbound Connection Endpoint is a complex task, affecting how you develop a flow definition in NiFi and how you deploy it in Cloudera Data Flow. Once your flow has been deployed, you need to configure your client, be it directly an external application or through an external load balancer, to communicate with the Inbound Connection Endpoint of your flow deployment.

You must have the DFAdmin role for the environments where you want to manage resources.
  1. Open Cloudera Data Flow by clicking the Data Flow tile in the Cloudera sidebar.
  2. Select Resources.
  3. Select a workspace.
  4. In the Workspace Resources view select the Inbound Connections tab.

Adding an inbound connection to an existing deployment

Learn how to add an inbound connection to an existing deployment.

  1. Under Deployment Settings, select the NiFi Configuration tab.
  2. In the Inbound Connections section select the Allow NiFi to receive data option.

    An Endpoint Hostname, generated from the deployment name is offered and port configuration options become available.

  3. You can accept the generated hostname as is, or you can change the prefix.
  4. Add Listening Ports.

    Select a protocol and add a port, then click Add Port. Repeat this step for each port where your flow will listen to incoming data. Using ports with mixed protocols (TCP and UDP) is not allowed.

    When you are finished adding ports, click OK.

  5. Specify Trusted IP Addresses.
    Specify a comma-separated list of trusted CIDRs or IP ranges, then click Add.
    To allow all traffic, select the Allow all traffic option.
  6. To promote the changes you made, click Apply Changes.
    Deployment status changes to Updating.

Renewing the certificate for an inbound connection endpoint

If you need to replace an X.509 certificate for an inbound connection endpoint before it expires, you can do so manually.

You need DFFlowAdmin privilege to perform this action.
  1. Select the Inbound Connection that you want to manage.
  2. Click Renew.
    • To renew the server certificate, select NiFi Inbound SSL Context Service.
    • To renew the client certificate, select Client SSL Context.
    • If you leave Revoke previously issued client certificates unchecked, existing client certificates remain valid and existing clients can continue to connect to your deployment using it. By selecting the Revoke previously issued client certificates option, you invalidate all existing certificates and you will need to add the new certificate to existing clients so that they can keep connecting to your Cloudera Data Flow deployment.
  3. Click Renew & Restart.
    The UI switches to the KPIs and Alerts pane where you can monitor as your deployment restarts and the new certificate or certificates become available.
If you have renewed the NiFi Inbound SSL Context Service:
You have to take no further action.
If you have renewed the Client SSL Context:
After your Cloudera Data Flow deployment has restarted, you switch to the NiFi Configuration pane to download the Client Certificate and the Client Private Key. You can then add these to your client.

Reassigning an inbound connection endpoint to a different project

Learn how to reassign an inbound connection to another project.

  • Make sure that you have DFDeveloper permission to perform this task. For information on account and resource roles, see Cloudera Data Flow Authorization.

You cannot reassign an inbound connection that is currently used by a deployment. You have to terminate the deployment using it making sure that the Delete assigned endpoint hostname option is not selected before you can reassign it to a different project.

  1. Select the Inbound Connection that you want to reassign.
  2. Click Reassign.
    If the inbound connection is not used by any deployment, the Reassign Resource modal opens.
  3. Select a Project and click Reassign.
  4. Click Apply Changes.

Using Inbound Connections with an external load balancer

Once a Cloudera Data Flow deployment with an Inbound Connection Endpoint is available, you can go on and connect an external load balancer to start sending data.

Inbound Connection Endpoints are created in Cloudera Data Flow with an internal Layer 4 (L4) load balancer (LB). Nevertheless, it is also possible to use your own native Layer 7 (L7) LB (Application Gateway on Azure, Application Load Balancer on AWS, respectively) in front of the Cloudera managed L4 LB.

Cloudera recommends achieving this by configuring your L7 LB to use the Cloudera Data Flow deployment LB as a backend. Enabling TLS between your LB and the Cloudera Data Flow LB is recommended, but mTLS is not possible for the backend connection. This means that your Listen Processor (e.g., ListenHTTP) in your NiFi flow cannot be configured with Client Auth = Required when using an external LB as a gateway.

You may configure the listening side of your LB and routing rules according to the requirements of your organization.

Alternatively, you may be required to use a L4 LB provided by your organization in front of the Cloudera managed LB. This is also possible, although Cloudera recommends directly using the Cloudera managed L4 LB when possible.

Typically, when using an external load balancer to act as a gateway, the internal managed load balancer should stay private. This can be accomplished by deselecting the “Use Public Endpoint” option when enabling Cloudera Data Flow for your environment, which limits Cloudera Data Flow to only use private subnets for all resources. If public access is needed, that would be done by exposing private resources via the external gateway load balancer.

Configuration workflow

Currently, an Inbound Connection Endpoint can only be created during flow deployment, and cannot be reassigned without terminating the flow deployment for which it was created.

To configure an external load balancer, you need to go through the following steps:

Configure an Application Gateway in Azure

Learn about the settings required to set up an Azure Application Gateway to communicate with an Inbound Connection Endpoint.

Create an Azure Application Gateway service (you find it in the Networking services category) using the following settings:

  1. Make the following Backend Pool settings:
    Backend Pool without targets
    Set to No.
    Backend Targets
    IP address or FQDN
    Set to the [*** Inbound Connection Endpoint Hostname ***] acquired from the NiFi settings of the flow deployment where you want to connect with your gateway.

    For example, my-endpoint.inbound.dfx.p8jdxchd.xcu2-8y8x.cloudera.site.

    For all other settings you can keep the default values.
  2. Make the following Backend Settings:
    • If your flow listen processor uses TLS (recommended):

      Backend protocol:
      HTTPS
      Trusted root certificate
      Yes
      Backend port
      Match the port of your HTTP Listen Processor.
      For all other settings you can keep the default values.
    • If your flow listen processor does not use TLS:

      Backend protocol
      HTTP
      Backend port
      Match the port of your HTTP Listen Processor.
      For all other settings you can keep the default values.

Tutorial: MiNiFi to Cloudera Data Flow flow deployment

This tutorial walks you through creating an inbound connection endpoint in Cloudera Data Flow used by a flow deployment to receive data from one or more MiNiFi agents managed by Edge Flow Manager.

  1. In a development NiFi environment, create a Controller Service of type StandardRestrictedSSLContextService at the root canvas level and name it Inbound SSL Context Service.
    1. In the Operate palette click Configuration > Controller Services > Create a new controller service
    2. Filter for ssl, select StandardRestrictedSSLContextService then click Add.
    3. Click Configure.
    4. On the Settings tab change the Name to Inbound SSL Context Service, then click Apply.

    You do not need to make further configuration on this Controller Service; it acts as a placeholder and will be created with a managed SSL Context when deployed by Cloudera Data Flow.

  2. Create a Process Group on the root canvas to hold your flow definition and give it a name.
    This tutorial uses the name ListenHTTP Flow.
  3. Enter the process group.
  4. Inside the Process Group, add a listen processor.
    This tutorial uses ListenHTTP.
  5. Configure the listen processor:
    Base Path
    This tutorial uses the default contentListener.
    Listening Port
    Define a value that is valid for your use case. This tutorial uses port 9000.
    SSL Context Service
    Select Inbound SSL Context Service.
    Client Authentication
    Select REQUIRED.
    Click Apply.
  6. Connect the ListenHTTP processor to a downstream processor of your choice.
    This tutorial uses LogAttribute, where all relationships terminate.
  7. From the root canvas, right click on the Process Group and select Download flow definition > Without external controller services.
  8. Upload the flow definition JSON to the Flow Catalog of your Cloudera Data Flow deployment.
  9. Deploy the flow.
    1. At the NiFi Configuration step of the Deployment wizard, select Inbound Connections > Allow NiFi to Receive Data to enable inbound connections.
      Accept the automatically created endpoint hostname and automatically discovered port by clicking Next.
    2. At Parameters, click Next.
    3. At Sizing & Scaling select the Extra Small NiFi Node Size then click Next.
    4. Add a KPI on the ListenHTTP processor to monitor how many bytes it is receiving, by clicking Add new KPI.
      Make the following settings:
      KPI Scope
      Processor
      Processor Name
      ListenHTTP
      Metric to Track
      Bytes Received
    5. Review the information provided and click Deploy.

    Soon after the flow deployment has started, the client certificate and private key required for sending data to the NiFi flow become available for the flow deployment that is being created.

  10. Collect the information required to configure your load balancer.
    1. Once the deployment has been created successfully, select it in the Deployments view and click Manage Deployment.

    2. In the Deployment Settings section, navigate to the NiFi Configuration tab to find information about the associated inbound connection endpoint.

    3. Copy the endpoint hostname and port and download the certificate and private key.

  11. Start designing your MiNiFi flow in EFM.

    To design a flow for your MiNiFi C++ agent class:

    1. Copy the downloaded client-private-key-encoded key and client-certificate-encoded.cer certificate files to the host with the running MiNiFi C++ agent, so they are accessible by filepath from the agent.

    2. Create a Service of type SSL Context Service with the following configuration:
      Service Name
      Specify a name for this service. This tutorial uses Client SSL Context Service.
      CA Certificate
      Leave it empty. As Cloudera Data Flow uses Let's Encrypt as a Certificate Authority, the certificate will be accepted automatically, without additional configuration.
      Client Certificate
      [***/PATH/TO/***]client-certificate-encoded.cer

      For example, /opt/minifi/minifi-test/client-certs/client-certificate-encoded.cer.

      Passphrase
      Set no value.
      Private Key
      [***PATH/TO/***]client-private-key-encoded

      For example, /opt/minifi/minifi-test/client-certs/client-private-key-encoded

      Use System Cert Store
      Keep the default False value.
    3. Click Apply.
    4. Create an InvokeHTTP processor named Send to CDF with the following configuration:
      Automatically Terminated Relationships
      Select all relationships.
      Content-type
      Depends on your flow file data type. This tutorial uses text/plain.
      HTTP Method
      POST
      Remote URL
      https://[***ENDPOINT HOSTNAME COPIED FROM CLOUDERADATAFLOW FLOW DEPLOYMENT MANAGER***]:9000/contentListener

      For example, https://my-flow.inbound.my-dfx.c94x5i9m.xcu2-8y8z.mycompany.test:9000/contentListener

      SSL Context Service
      Client SSL Context Service
      Leave all other settings with their default values.

    To design a flow for your MiNiFi Java agent class:

    1. Convert the downloaded client-private-key-encoded key and client-certificate-encoded.cer certificate files to a JKS Keystore:
      1. Create a PKCS12 keystore:

        openssl pkcs12 -export -in client-certificate-encoded -inkey client-private-key-encoded -out client-keystore.p12

      2. Convert the PKCS12 keystore to a JKS keystore:

        keytool -importkeystore -srckeystore client-keystore.p12 -srcstoretype pkcs12 -destkeystore client-keystore.jks

    2. Copy the resulting client-keystore.jks file to the host with the running MiNiFi Java agent, so they are accessible by filepath from the agent.
    3. Obtain the CA root cert and add it to truststore client-truststore.jks, by running the following commands:
      wget https://letsencrypt.org/certs/isrgrootx1.pem
      keytool -import -file isrgrootx1.pem -alias isrgrootx1 -keystore client-truststore.jks

      MiNiFi Java requires you to specify an explicit truststore for inbound connections. Remember the password you used for creating client-truststore.jks, as you will need it .

    4. Create a Service of type Restricted SSL Context Service with the following configuration:
      Service Name
      Specify a name for this service. This tutorial uses Client SSL Context Service.
      Keystore Filename
      [***/PATH/TO/***]client-truststore.jks
      Keystore Password
      [***THE PASSWORD YOU PROVIDED WHEN CREATING THE JKS STORE***]
      Key Password
      [***THE PASSWORD YOU PROVIDED WHEN CREATING THE JKS STORE***]
      Keystore Type
      JKS
      Truststore Filename
      client-truststore.jks
      Truststore Type
      JKS
      Truststore Password
      [***THE PASSWORD YOU PROVIDED WHEN CREATING THE CLIENT TRUSTSTORE***]
    5. Click Apply.
    6. Create an InvokeHTTP processor named Send to CDF with the following configuration:
      Automatically Terminated Relationships
      Select all relationships.
      Content-type
      Depends on your flow file data type. This tutorial uses text/plain.
      HTTP Method
      POST
      Remote URL
      https://[***ENDPOINT HOSTNAME COPIED FROM CLOUDERA DATAFLOW FLOW DEPLOYMENT MANAGER***]:9000/contentListener

      For example, https://my-flow.inbound.my-dfx.c94x5i9m.xcu2-8y8z.mycompany.test:9000/contentListener

      SSL Context Service
      Client SSL Context Service
      Leave all other settings with their default values.
  12. Build the rest of your data flow to read data and send to your Cloudera Data Flow flow deployment using InvokeHTTP. As a simple example, this tutorial uses the GenerateFlowFile processor, with the following settings:
    Run Schedule
    Set to 10000 ms (10 seconds).
    Custom Text
    The message you type here will be sent to the ListenHTTP Flow you have created, with the frequency specified by Run Schedule. For example, Hello DFX! This is MiNiFi.
    Data Format
    Set to Text.
    Unique FlowFiles
    Set to false.
  13. Connect the GenerateFlowFile processor to the InvokeHTTP processor.
  14. Click Actions > Publish...to publish the flow and start it on your MiNiFi agent.
  15. Select your flow deployment in the Cloudera Data Flow Dashboard and click KPIs.

    Observe that your Cloudera Data Flow flow deployment is now receiving data from MiNiFi.

Tutorial: Invoking an HTTP endpoint with curl

This tutorial walks you through invoking an HTTP Inbound Connection Endpoint with curl using the ListenHTTP filter to Kafka ReadyFlow from the ReadyFlow Gallery.

  1. Deploy the ListenHTTP filter to Kafka Ready Flow.
    1. Navigate to the ReadyFlow Gallery, locate the ListenHTTP filter to Kafka ReadyFlow and click View Added Flow Definition.
    2. Click Deploy and select your target environment to start the Deployment Wizard for the latest version of this ReadyFlow.
    3. Specify a deployment name, for example, Inbound Connections curl and click Next.
    4. Select Allow NiFi to receive data checkbox to configure an endpoint host.
    5. Accept the automatically created endpoint hostname and automatically discovered port by clicking Next.
    6. Optional: This ReadyFlow performs schema validation for incoming events using Cloudera’s Schema Registry before sending the events to a Kafka topic. If you have a Streams Messaging cluster available, fill in the Kafka and Schema Registry connection properties.

      If you only want to validate inbound connection endpoint connectivity, enter dummy values for the empty parameters, set the Input and Output format to JSON while keeping the Listening Port set to 7001.

    7. At Sizing & Scaling select the Extra Small NiFi Node Size and click Next.
    8. Add a KPI on the ListenHTTP processor to monitor how many bytes it is receiving, by clicking Add new KPI.

      Make the following settings:
      KPI Scope
      Processor
      Processor Name
      ListenHTTP
      Metric to Track
      Bytes Received
    9. Click Next.
    10. Review the information provided and click Deploy.

      Soon after the flow deployment has started, the client certificate and private key required for sending data to the NiFi flow become available for the flow deployment that is being created.

  2. Collect the information required to configure your load balancer.
    1. Once the deployment has been created successfully, select it in the Deployments view and click Manage Deployment.

    2. In the Deployment Settings section, navigate to the NiFi Configuration tab to find information about the associated inbound connection endpoint.

    3. Copy the endpoint hostname and port and download the certificate and private key.

  3. Create the curl request to validate connectivity to the HTTP inbound connection endpoint.
    Using the endpoint hostname, port, client certificate and private key you can now construct a curl command to call the endpoint and validate connectivity:

    curl -v -X POST https://[***ENDPOINT HOSTNAME***]:7001/contentListener --key [***/PATH/TO/***]client-private-key-encoded --cert [***/PATH/TO/***]client-certificate-encoded

    You receive an HTTP 200 response code in a similar message, indicating that your client was able to securely connect to the inbound connection endpoint:

    *   Trying 10.36.84.149:7001...
    * Connected to [***ENDPOINT HOSTNAME***] (10.36.84.149) port 7001 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *  CAfile: /Users/mkohs/letsencrypt-stg-root-x1.pem
    *  CApath: none
    * (304) (OUT), TLS handshake, Client hello (1):
    * (304) (IN), TLS handshake, Server hello (2):
    * TLSv1.2 (IN), TLS handshake, Certificate (11):
    * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
    * TLSv1.2 (IN), TLS handshake, Request CERT (13):
    * TLSv1.2 (IN), TLS handshake, Server finished (14):
    * TLSv1.2 (OUT), TLS handshake, Certificate (11):
    * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
    * TLSv1.2 (OUT), TLS handshake, CERT verify (15):
    * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.2 (OUT), TLS handshake, Finished (20):
    * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
    * TLSv1.2 (IN), TLS handshake, Finished (20):
    * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
    * ALPN, server did not agree to a protocol
    * Server certificate:
    *  subject: CN=dfx.dtefgqis.xcu2-8y8x.dev.cldr.work
    *  start date: May 11 21:02:20 2022 GMT
    *  expire date: Aug  9 21:02:19 2022 GMT
    *  subjectAltName: host "[***ENDPOINT HOSTNAME***]" matched cert's "[***ENDPOINT HOSTNAME***]"
    *  issuer: C=US; O=(STAGING) Let's Encrypt; CN=(STAGING) Artificial Apricot R3
    *  SSL certificate verify ok.
    > POST /contentListener HTTP/1.1
    > Host: [***ENDPOINT HOSTNAME***]:7001
    > User-Agent: curl/7.79.1
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Date: Thu, 12 May 2022 00:45:57 GMT
    < Content-Type: text/plain
    < Content-Length: 0
    < Server: Jetty(9.4.46.v20220331)
    < 
    * Connection #0 to host [***ENDPOINT HOSTNAME***] left intact
    
  4. Use curl to post data to the HTTP inbound connection endpoint.

    If you want to post events to the NiFi deployment you can add header and content definition to the curl request.

    This example sends JSON data following a simple schema to the endpoint:

    curl -v -X POST [***ENDPOINT HOSTNAME***]:7001/contentListener \
     --key [***/PATH/TO/***]client-private-key-encoded \
     --cert [***/PATH/TO/***]client-certificate-encoded \
     -H 'Content-Type: application/json' \
     -d '{"created_at":6453,"id":6453,"text":"This is my test event","timestamp_ms":34534,"id_store":12}' 
    The NiFi deployment tries to validate the schema against the schema name and Schema Registry that you provided when deploying the ReadyFlow. If you provided dummy values, you receive a response indicating that the flow was unable to look up the schema.

Connecting applications to an endpoint

Once a Cloudera Data Flow deployment with inbound connection is available, you can go on and connect an external application to start sending data.

  • A deployment with inbound connection is available.

  • A network connection through which the client can reach the deployment endpoint is available.

  • You have been assigned at least the DFFlowUser role for the environment to which you want to configure the inbound connection.
  1. Select the deployment where you want to send data and go to Deployment Manager > Deployment Settings.
  2. Select the NiFi Configuration tab.
  3. Make a note of the Endpoint Hostname and port.
  4. Click Download Client Certificate.

    The X.509 client certificate downloads to your computer in PEM format.

  5. Click Download Client Private Key to obtain the RSA Private Key.

    The unencrypted RSA Private Key encoded with PKCS8 downloads to your computer in PEM format.

  6. Depending on your client, you may have to convert the certificate and the private key to a different format.

    For example, to convert PEM to PKCS12 format, use the following command:

    openssl pkcs12 -export -in [***DOWNLOADED PEM CERT FILE***] -inkey [***DOWNLOADED PEM PRIVATE KEY***] -out certificate.p12

    To further convert the PKCS12 file to JKS format for a Java client, run the following command:

    keytool -importkeystore -srckeystore [***CERTIFICATE NAME***].p12 -srcstoretype pkcs12 -destkeystore [***DESTINATION KEYSTORE***].jks
  7. Add the certificate file and the private key files to the keystore of your application.
  8. Configure your application to stream data to the Endpoint Hostname, port, and protocol of the deployment.

TLS keys and certificates

When using Inbound Connection Endpoints, sensitive information is sent over the network between Cloudera Data Flow (CDF) and external data sources including configuration files that contain passwords. To secure this transfer, Cloudera strongly recommends that you configure mutual Transport Layer Security (TLS) encryption.

TLS is an industry standard set of cryptographic protocols for securing communications over a network.

Configuring TLS involves creating a private key and a public key for use by server and client processes to negotiate an encrypted connection. In addition, TLS can use certificates to verify the trustworthiness of keys presented during the negotiation to prevent spoofing and mitigate other potential security issues.