Configuring HDFS Spout
The following member functions are required for HdfsSpout
:
.setReaderType()
Specifies which file reader to use:
To read sequence files, set this to
'seq'
.To read text files, set this to
'text'
.If you want to use a custom file reader class that implements interface
org.apache.storm.hdfs.spout.FileReader
, set this to the fully qualified class name.
.withOutputFields()
Specifies names of output fields for the spout. The number of fields depends upon the reader being used.
For convenience, built-in reader types expose a static member called
defaultFields
that can be used for setting this..setHdfsUri()
Specifies the HDFS URI for HDFS NameNode; for example:
hdfs://namenodehost:8020
..setSourceDir()
Specifies the HDFS directory from which to read files; for example,
/data/inputdir
..setArchiveDir()
Specifies the HDFS directory to move a file after the file is completely processed; for example,
/data/done
.If this directory does not exist, it will be created automatically.
.setBadFilesDir()
Specifies a directory to move a file if there is an error parsing the contents of the file; for example,
/data/badfiles
.If this directory does not exist it will be created automatically.
For additional configuration settings, see Apache HDFS spout Configuration Settings.