Supports Expression Language: true (will be evaluated using variable registry only)Directory | Directory | | | The directory from which files should be read Supports Expression Language: true (will be evaluated using variable registry only) |
Kerberos Credentials Service | kerberos-credentials-service | | Controller Service API: KerberosCredentialsService Implementation: KeytabCredentialsService | Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos |
CDP Username | Kerberos Principal | | | CDP User name. Recommendation is to create a dedicated Machine User in the CDP User Management UI. Supports Expression Language: true (will be evaluated using variable registry only) |
CDP Password | Kerberos Password | | | Workload password associated to your CDP User. You can set it in the CDP User Management UI. If you don't want to use a workload password, you can use the Kerberos Credentials controller service property. Sensitive Property: true |
Recurse Subdirectories | Recurse Subdirectories | true | | Indicates whether to list files from subdirectories of the HDFS directory |
Record Writer | record-writer | | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter ParquetRecordSetWriter CSVRecordSetWriter ScriptedRecordSetWriter XMLRecordSetWriter FreeFormTextRecordSetWriter AvroRecordSetWriter RecordSetWriterLookup | Specifies the Record Writer to use for creating the listing. If not specified, one FlowFile will be created for each entity that is listed. If the Record Writer is specified, all entities will be written to a single FlowFile. |
File Filter | File Filter | [^\.].* | | Only files whose names match the given regular expression will be picked up |
File Filter Mode | file-filter-mode | Directories and Files | - Directories and Files
![Filtering will be applied to the names of directories and files. If Recurse Subdirectories is set to true, only subdirectories with a matching name will be searched for files that match the regular expression defined in File Filter. Filtering will be applied to the names of directories and files. If Recurse Subdirectories is set to true, only subdirectories with a matching name will be searched for files that match the regular expression defined in File Filter.](../../../../../html/images/iconInfo.png) - Files Only
![Filtering will only be applied to the names of files. If Recurse Subdirectories is set to true, the entire subdirectory tree will be searched for files that match the regular expression defined in File Filter. Filtering will only be applied to the names of files. If Recurse Subdirectories is set to true, the entire subdirectory tree will be searched for files that match the regular expression defined in File Filter.](../../../../../html/images/iconInfo.png) - Full Path
![Filtering will be applied by evaluating the regular expression defined in File Filter against the full path of files with and without the scheme and authority. If Recurse Subdirectories is set to true, the entire subdirectory tree will be searched for files in which the full path of the file matches the regular expression defined in File Filter. See 'Additional Details' for more information. Filtering will be applied by evaluating the regular expression defined in File Filter against the full path of files with and without the scheme and authority. If Recurse Subdirectories is set to true, the entire subdirectory tree will be searched for files in which the full path of the file matches the regular expression defined in File Filter. See 'Additional Details' for more information.](../../../../../html/images/iconInfo.png)
| Determines how the regular expression in File Filter will be used when retrieving listings. |
Minimum File Age | minimum-file-age | | | The minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored |
Maximum File Age | maximum-file-age | | | The maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored. Minimum value is 100ms. |
Dynamic Properties:
Supports Sensitive Dynamic Properties: No
Dynamic Properties allow the user to specify both the name and value of a property.
Name | Value | Description |
---|
A Hadoop client configuration name | The value to set it to | Sets and if already set, overwrites the Hadoop client configuration with the given name. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) |
Relationships:
Name | Description |
---|
success | All FlowFiles are transferred to this relationship |
Reads Attributes:
None specified.Writes Attributes:
Name | Description |
---|
filename | The name of the file that was read from object store. |
path | The path is set to the absolute path of the file's directory on the object store. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "/tmp/abc/1/2/3". |
objectstore.owner | The user that owns the file in the object store. |
objectstore.group | The group that owns the file in the object store. |
objectstore.lastModified | The timestamp of when the file in the object store was last modified, as milliseconds since midnight Jan 1, 1970 UTC |
objectstore.length | The number of bytes in the file |
objectstore.replication | The number of replicas for hte file |
objectstore.permissions | The permissions for the file in the object store. This is formatted as 3 characters for the owner, 3 for the group, and 3 for other users. For example rw-rw-r-- |
State management:
Scope | Description |
---|
CLUSTER | After performing a listing of files, the latest timestamp of all the files listed and the latest timestamp of all the files transferred are both stored. This allows the Processor to list only files that have been added or modified after this date the next time that the Processor is run, without having to store all of the actual filenames/paths which could lead to performance problems. State is stored across the cluster so that this Processor can be run on Primary Node only and if a new Primary Node is selected, the new node can pick up where the previous node left off, without duplicating the data. |
Restricted:
This component is not restricted.Input requirement:
This component does not allow an incoming relationship.System Resource Considerations:
None specified.See Also:
FetchCDPObjectStore, DeleteCDPObjectStore, PutCDPObjectStore