Configuring HDFS Spout
The following member functions are required for
HdfsSpout
:
- .setReaderType()
-
Specifies which file reader to use:
-
To read sequence files, set this to
'seq'
. -
To read text files, set this to
'text'
. -
If you want to use a custom file reader class that implements interface
org.apache.storm.hdfs.spout.FileReader
, set this to the fully qualified class name.
-
- .withOutputFields()
-
Specifies names of output fields for the spout. The number of fields depends upon the reader being used.
For convenience, built-in reader types expose a static member called
defaultFields
that can be used for setting this. - .setHdfsUri()
-
Specifies the HDFS URI for HDFS NameNode; for example:
hdfs://namenodehost:8020
. - .setSourceDir()
-
Specifies the HDFS directory from which to read files; for example, /data/inputdir.
- .setArchiveDir()
-
Specifies the HDFS directory to move a file after the file is completely processed; for example, /data/done.
If this directory does not exist, it will be created automatically.
- .setBadFilesDir()
-
Specifies a directory to move a file if there is an error parsing the contents of the file; for example, /data/badfiles.
If this directory does not exist it will be created automatically.
For additional configuration settings, see Apache HDFS spout Configuration Settings.