Install custom Python libraries in flow deployments
If your data flow requires custom Python packages you can modify your Python script
to install these dependencies through the use of NiFi processors.
Create a Python script, to install the package you want to add:
#!/usr/bin/python3
try: import [***PACKAGE NAME***] as [***IMPORT AS***]
except ImportError:
from pip._internal import main as pip
pip(['install', '--user', '[***PACKAGE NAME***]])
import [***PACKAGE NAME***] as [***IMPORT AS***]
import sys
file = [***IMPORT AS***].read_csv(sys.stdin)
Replace
[***PACKAGE NAME***] with the name of the package you
want to import and [***IMPORT AS***] with a meaningful name
you want the package to be called in your data flow.
#!/usr/bin/python3
try: import pandas as pd
except ImportError:
from pip._internal import main as pip
pip(['install', '--user', 'pandas'])
import pandas as pd
import sys
file = pd.read_csv(sys.stdin)
Open the flow definition which requires custom packages in NiFi.
Add and configure an ExecuteStreamCommand processor to run
your script.
Make the following property settings:
Command Arguments
provide #{Script}
Command Path
provide python
Leave all other properties with their default values.
If you have edited your data flow in NiFi, download it as a flow definition
and import it to Cloudera DataFlow. If you have edited
your data flow in the Flow Designer, publish the flow to the
Catalog.
Initiate a flow deployment from the Catalog. In the
Parameters step of the
Deployment Wizard, upload your Python script to
the Script parameter. Upload additional files to the
AdditionalResources parameter if applicable.
Complete the Wizard and submit your deployment
request.
Your Python script is uploaded to the flow deployment and
the required custom libraries are installed when the script is executed as part of the
data flow.