Running a Flink job

Learn about running a built-in example application in a few simple steps. This application demonstrates the Flink client for submitting YARN jobs.

  • You have deployed the Flink parcel on your CDP Private Cloud Base cluster.
  • You have HDFS Gateway, Flink and YARN Gateway roles assigned to the host you are using for Flink submission. For instructions, see the Cloudera Manager documentation.
  • You have established your HDFS home directory.
The following is a working example of a word count application that reads text from a socket and counts the number of distinct words.
> hdfs dfs -put /opt/cloudera/parcels/FLINK/lib/flink/README.txt /tmp
> flink run --detached \
 /opt/cloudera/parcels/FLINK/lib/flink/examples/streaming/WordCount.jar \
 --input hdfs:///tmp/README.txt \
 --output hdfs:///tmp/ReadMe-Counts
> hdfs dfs -tail /tmp/ReadMe-Counts
...
(and,7)
(source,1)
(code,2)
...
You can set how to run your Flink job with the execution.target setting in the Flink configuration file. By default, execution.target is set to yarn-per-job, but you can change it to yarn-session. It is recommended to use per-job configuration to simple jobs, and the session configuration in case of SQL client.