Setting up Python for PyFlink

Before you can use Flink with the Python API, it is required to install and configure Python on every relevant node, or create and initialize a Python virtual environment.

  1. Connect to the Flink Gateway node using CLI.
    ssh root@[***FLINK GATEWAY NODE***]
    You are prompted to provide your password to the cluster.
  2. Check the version of Python.
    python --version
    If the command fails or the versions are lower than 3.6, install Python.
  3. Create a python virtual environment using the following command:
    conda create --copy -y -n flink_venv python=3.8
  4. Install PyFlink using the following command:
    python -m pip install apache-flink==1.18.0
  5. Install PyFlink on the YARN NameNode as well using the same steps.
  1. Connect to the Flink Gateway node using CLI.
    ssh root@[***FLINK GATEWAY NODE***]
    Provide your workload password when prompted.
  2. Create a Python virtual environment using the following command:
    conda create --copy -y -n flink_venv python=3.8
  3. Activate the newly created virtual environment:
    conda activate flink_venv
  4. Install PyFlink to the flink_venv virtual environment using the following command:
    python -m pip install apache-flink==1.18.0
  5. Create a ZIP archive from the flink_venv virtual environment so it can be deployed with a Flink job:
    cd path/to/flink_venv && zip -r venv.zip .

When the Python installation is complete, you can submit Flink application that were created using the Python API.