GetHDFSFileInfo 2.3.0.4.10.0.0-147

Bundle
org.apache.nifi | nifi-hadoop-nar
Description
Retrieves a listing of files and directories from HDFS. This processor creates a FlowFile(s) that represents the HDFS file/dir with relevant information. Main purpose of this processor to provide functionality similar to HDFS Client, i.e. count, du, ls, test, etc. Unlike ListHDFS, this processor is stateless, supports incoming connections and provides information on a dir level.
Tags
HCFS, HDFS, filesystem, get, hadoop, ingest, list, source
Input Requirement
ALLOWED
Supports Sensitive Dynamic Properties
false
Properties
Relationships
Name Description
failure All failed attempts to access HDFS will be routed to this relationship
original Original FlowFiles are transferred to this relationship
not found If no objects are found, original FlowFile are transferred to this relationship
success All successfully generated FlowFiles are transferred to this relationship
Writes Attributes
Name Description
hdfs.objectName The name of the file/dir found on HDFS.
hdfs.path The path is set to the absolute path of the object's parent directory on HDFS. For example, if an object is a directory 'foo', under directory '/bar' then 'hdfs.objectName' will have value 'foo', and 'hdfs.path' will be '/bar'
hdfs.type The type of an object. Possible values: directory, file, link
hdfs.owner The user that owns the object in HDFS
hdfs.group The group that owns the object in HDFS
hdfs.lastModified The timestamp of when the object in HDFS was last modified, as milliseconds since midnight Jan 1, 1970 UTC
hdfs.length In case of files: The number of bytes in the file in HDFS. In case of dirs: Retuns storage space consumed by directory.
hdfs.count.files In case of type='directory' will represent total count of files under this dir. Won't be populated to other types of HDFS objects.
hdfs.count.dirs In case of type='directory' will represent total count of directories under this dir (including itself). Won't be populated to other types of HDFS objects.
hdfs.replication The number of HDFS replicas for the file
hdfs.permissions The permissions for the object in HDFS. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r--
hdfs.status The status contains comma separated list of file/dir paths, which couldn't be listed/accessed. Status won't be set if no errors occured.
hdfs.full.tree When destination is 'attribute', will be populated with full tree of HDFS directory in JSON format.WARNING: In case when scan finds thousands or millions of objects, having huge values in attribute could impact flow file repo and GC/heap usage. Use content destination for such cases
See Also