Running a Flink job
After developing your application, you can submit your Flink job in YARN per-job or session mode. To submit the Flink job, you need to run the Flink client in the command line including security parameters and other configurations with the run command.
Submitting a job means uploading the job’s JAR and related dependencies to the Flink cluster and initiating the job execution.
- Per-job mode
Per-job mode means that you run the Flink job in a dedaction YARN application. In this case each submitted Flink job has its own Flink cluster in YARN, with its own Job Manager and Task Managers. When you run Flink jobs in per-job mode, every job submission creates a new cluster. As the cluster deployment has to be created with every submission, the execution of the job can take up time.
- Session mode
Session mode means that you run multiple Flink jobs in the same YARN sessions. In this case every Flink job shares the cluster, the allocated resources, the Job Manager and Task Managers. When you run Flink jobs in session mode, the submitted jobs are created in one cluster and are long-lived. The execution time is shorter than in per-job mode, however you need to consider that in a session mode a cluster failure affects every Flink job, and recreation from a savepoint can take up time.
You can set how to run your Flink job with the
setting in the Flink configuration file. By default,
execution.target is set to
you can change it to
yarn-session. Alternatively, you can add the
corresponding arguments to the
flink run command when submitting
the Flink job.
- You have installed and configured the Flink service on your CDP Private Cloud Base cluster.
For more information, see the Adding Flink as a service documentation.
- You have HDFS Gateway, Flink and YARN Gateway roles assigned to the host you
are using for Flink submission.
For more information, see the Cloudera Manager documentation.
- You have uploaded the Flink application JAR file and job properties file to the Flink cluster.
Connect to the cluster using
sshwhere you want to run the Flink application.
Submit the Flink job using the
- Per-job mode
flink run \ -d \ -t yarn-per-job \ -ynm <job_name_in_yarn> \ -p <job_parallelism> \ -ys <slots_per_task_manager> \ -ytm <memory_per_container_in_mb> \ <job_jar_file> <job_parameters>
flink run \ -d \ -t yarn-per-job \ -ynm <job_name_in_yarn> \ -p <job_parallelism> \ -ys <slots_per_task_manager> \ -ytm <memory_per_container_in_mb> \ -py <python_script> <job_parameters>
- Session mode
- Start a Flink session
flink-yarn-session \ -d \ -nm <job_name_in_yarn> \ -s <slots_per_task_manager> \ -tm <memory_per_container_in_mb>
flink-yarn-sessioncommand outputs the ID of the corresponding YARN application. You need to add the YARN application ID to the
YARN ApplicationID: application_1616633166424_0024
- Submit the Flink job.
- Start a Flink session cluster.