Custom Python configs

On the Custom Python Configs tab you can reassign and validate Custom Python configs.

You must have the DFAdmin role for the environments where you want to manage resources.

Open Cloudera Data Flow by clicking the Data Flow tile in the Cloudera sidebar.
Select Resources.
Select a workspace.

tip
Start typing the resource name in the search bar to filter for a workspace.
In the Workspace Resources view select the Custom Python Configs tab.

Validating a custom Python config

You can validate custom Python configs and their storage locations in the Resources view.

Make sure that you have DFDeveloper permission to perform this task. For information on account and resource roles, see Cloudera Data Flow Authorization.

Select the Custom Python Config that you want to validate.
Click [More] > Validate.
Cloudera Data Flow displays a message about the validity of the Python config and its storage location.

Reassigning a custom Python config to a different project

Learn how to reassign a custom Python config to another project.

Make sure that you have DFDeveloper permission to perform this task. For information on account and resource roles, see Cloudera Data Flow Authorization.

Select the Custom Python Config that you want to reassign.

tip
You can reassign multiple items simultaneously by selecting the checkboxes in front of them.
Click Reassign.
If the custom Python config is not used by any deployment, the Reassign Resource modal opens.
Select a Project and click Reassign.
Click Apply Changes.

Python scripts

Relying on Python scripts to perform data transformations within flows is a common pattern for NiFi users. Cloudera Data Flow flow deployments come with Python 3 and the following custom pre-installed packages: requests, urllib3. You can design your flows to use the pre-installed Python runtime as well as install additional custom packages which you might require.

Upload and run Python scripts in flow deployments

If running your data flow requires executing a Python script, you have to upload it when creating your data flow deployment through the Deployment Wizard or the CLI. Follow these steps to configure your NiFi processors correctly and upload your Python script.

Create your Python script and save it as a file.
For example:
```
#!/usr/bin/python3
print("Hello, World!")
```
Open the flow definition which requires a Python script in NiFi.
Add and configure an ExecuteStreamCommand processor to run your script.
Make the following property settings:

Command Arguments

provide #{Script}

Command Path

provide python

Leave all other properties with their default values.

note
If you need to upload additional supporting files for use by your script, add a dynamic property named Additional Resources referencing a parameter #{AdditionalResources}. The primary script may reference these files through the path /nifi-flow-assets/[***PARAMETER CONTEXT NAME***]/[***PARAMETER NAME***]/[***FILE NAME***]@f0.
If you have edited your data flow in NiFi, download it as a flow definition and import it to Cloudera Data Flow. If you have edited your data flow in the Flow Designer, publish the flow to the Catalog.
Initiate a flow deployment from the Catalog. In the Parameters step of the Deployment Wizard, upload your Python script to the Script parameter. Upload additional files to the AdditionalResources parameter if applicable. Complete the Wizard and submit your deployment request.

Your Python script is uploaded to the flow deployment and executed as part of the data flow.

Install custom Python libraries in flow deployments

If your data flow requires custom Python packages you can modify your Python script to install these dependencies through the use of NiFi processors.

Create a Python script, to install the package you want to add:

#!/usr/bin/python3
try: import [***PACKAGE NAME***] as [***IMPORT AS***]
except ImportError:
    from pip._internal import main as pip
    pip(['install', '--user', '[***PACKAGE NAME***]])
    import [***PACKAGE NAME***] as [***IMPORT AS***]
import sys
file = [***IMPORT AS***].read_csv(sys.stdin)

Replace [***PACKAGE NAME***] with the name of the package you want to import and [***IMPORT AS***] with a meaningful name you want the package to be called in your data flow.

#!/usr/bin/python3
try: import pandas as pd
except ImportError:
    from pip._internal import main as pip
    pip(['install', '--user', 'pandas'])
    import pandas as pd
import sys
file = pd.read_csv(sys.stdin)

Open the flow definition which requires custom packages in NiFi.
Add and configure an ExecuteStreamCommand processor to run your script.
Make the following property settings:

Command Arguments

provide #{Script}

Command Path

provide python

Leave all other properties with their default values.

note
If you need to upload additional supporting files for use by your script, add a dynamic property named Additional Resources referencing a parameter #{AdditionalResources}. The primary script may reference these files through the path /nifi-flow-assets/[***PARAMETER CONTEXT NAME***]/[***PARAMETER NAME***]/[***FILE NAME***]@f0.
If you have edited your data flow in NiFi, download it as a flow definition and import it to Cloudera Data Flow. If you have edited your data flow in the Flow Designer, publish the flow to the Catalog.
Initiate a flow deployment from the Catalog. In the Parameters step of the Deployment Wizard, upload your Python script to the Script parameter. Upload additional files to the AdditionalResources parameter if applicable. Complete the Wizard and submit your deployment request.

Your Python script is uploaded to the flow deployment and the required custom libraries are installed when the script is executed as part of the data flow.