kite-morphlines-hadoop-core
downloadHdfsFile
The downloadHdfsFile
command (source code) downloads, on startup, zero or more
files or directory trees from HDFS to the local file system. These files are typically
static configuration files that are required by downstream morphline commands, e.g. Avro
schema files, XML join tables, grok dictionaries, etc. Storing such configuration files in
HDFS can help with consistent centralized configuration management across a set of cluster
nodes.
The output directory on the local file system defaults to the current working directory of the current process. If the effective output file or directory already exists it will be deleted and overwritten.
The command provides the following configuration options:
Property Name | Default | Description |
---|---|---|
inputFiles | The HDFS files or directories to download, in the form of a list of HDFS URIs. | |
outputDir | "." | The relative or absolute path of the destination directory on the local file system. Parent directories of that directory will be created automatically. Defaults to the current working directory of the current process. |
Example usage:
downloadHdfsFile {
inputFiles : ["hdfs://c2202.mycompany.com/user/foo/configs/sample-schema.avsc"]
outputDir : "myconfigs"
}
openHdfsFile
The openHdfsFile
command (source code) opens an HDFS file for read and returns
a corresponding Java InputStream.
The morphline record input field _attachment_body must contain the HDFS Path of the file to read. The command replaces the HDFS Path in this field with the corresponding Java InputStream. Said InputStream can then be parsed with other commands, such as readLine or similar.
The command automatically handles gzip files if the file path ends with the ".gz" file name extensions.
Example usage:
openHdfsFile {}