PutHDFS

Description:

Write FlowFile data to Hadoop Distributed File System (HDFS)

Additional Details...

Tags:

hadoop, HCFS, HDFS, put, copy, filesystem

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesHadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. To use swebhdfs, see 'Additional Details' section of PutHDFS's documentation.

This property expects a comma-separated list of file resources.

Supports Expression Language: true (will be evaluated using Environment variables only)
Kerberos Credentials Servicekerberos-credentials-serviceController Service API:
KerberosCredentialsService
Implementation: KeytabCredentialsService
Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Kerberos User Servicekerberos-user-serviceController Service API:
KerberosUserService
Implementations: KerberosTicketCacheUserService
KerberosPasswordUserService
KerberosKeytabUserService
Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos
Kerberos PrincipalKerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true (will be evaluated using Environment variables only)
Kerberos KeytabKerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties

This property requires exactly one file to be provided..

Supports Expression Language: true (will be evaluated using Environment variables only)
Kerberos PasswordKerberos PasswordKerberos password associated with the principal.
Sensitive Property: true
Kerberos Relogin PeriodKerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin. This property has been deprecated, and has no effect on processing. Relogins now occur automatically.
Supports Expression Language: true (will be evaluated using Environment variables only)
Additional Classpath ResourcesAdditional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath and used for loading native libraries. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.

This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: directory, file.
DirectoryDirectoryThe parent HDFS directory to which files should be written. The directory will be created if it doesn't exist.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Conflict Resolution StrategyConflict Resolution Strategyfail
  • replace Replaces the existing file if any.
  • ignore Ignores the flow file and routes it to success.
  • fail Penalizes the flow file and routes it to failure.
  • append Appends to the existing file if any, creates a new file otherwise.
Indicates what should happen when a file with the same name already exists in the output directory
Writing Strategywriting-strategyWrite and rename
  • Write and rename The processor writes FlowFile data into a temporary file and renames it after completion. This prevents other processes from reading partially written files.
  • Simple write The processor writes FlowFile data directly to the destination file. In some cases this might cause reading partially written files.
Defines the approach for writing the FlowFile data.
Block SizeBlock SizeSize of each block as written to HDFS. This overrides the Hadoop Configuration
IO Buffer SizeIO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration
ReplicationReplicationNumber of times that HDFS will replicate each file. This overrides the Hadoop Configuration
Permissions umaskPermissions umaskA umask represented as an octal number which determines the permissions of files written to HDFS. This overrides the Hadoop property "fs.permissions.umask-mode". If this property and "fs.permissions.umask-mode" are undefined, the Hadoop default "022" will be used. If the PutHDFS target folder has a default ACL defined, the umask property is ignored by HDFS.
Remote OwnerRemote OwnerChanges the owner of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change owner
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Remote GroupRemote GroupChanges the group of the HDFS file to this value after it is written. This only works if NiFi is running as a user that has HDFS super user privilege to change group
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Compression codecCompression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No description provided.
Ignore LocalityIgnore Localityfalse
  • true
  • false
Directs the HDFS system to ignore locality rules so that data is distributed randomly throughout the cluster

Relationships:

NameDescription
successFiles that have been successfully written to HDFS are transferred to this relationship
failureFiles that could not be written to HDFS for some reason are transferred to this relationship

Reads Attributes:

NameDescription
filenameThe name of the file written to HDFS comes from the value of this attribute.

Writes Attributes:

NameDescription
filenameThe name of the file written to HDFS is stored in this attribute.
absolute.hdfs.pathThe absolute path to the file on HDFS is stored in this attribute.
hadoop.file.urlThe hadoop url for the file is stored in this attribute.
target.dir.createdThe result(true/false) indicates if the folder is created by the processor.

State management:

This component does not store state.

Restricted:

Required PermissionExplanation
write distributed filesystemProvides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

See Also:

GetHDFS