Extractor configuration properties
Some of the extractor configuration properties.
Configuration Parameter (prefixed with
atlas.s3.extraction ) |
Purpose |
---|---|
.aws.region |
Required. AWS S3 region for extraction. CDP Public Cloud: provided by IDBroker integration. CDP Data Center: the region must be provided. |
|
AWS S3 credentials. CDP Public Cloud: provided by IDBroker integration. CDP Data Center:
|
.whitelist.paths |
Set of data assets to include in metadata extraction. If no paths are
included, the extractor extracts metadata for all buckets. Buckets inside the
region and accessible by the credential used to run the extractor. For public
cloud, this is defined in The list can include buckets or paths; separate multiple values with commas. See Defining what assets to extract metadata for. |
.blacklist.paths |
Set of data assets to exclude from metadata extraction. The list can include buckets or paths; separate multiple values with commas. See Defining what assets to extract metadata for. |
.max.object.per.call |
Number of objects returned in one call to AWS S3 in bulk extraction. Defaults to 1000. |
|
Enable bulk extraction resume. Disabled by default. |
.progress.file |
Name of the progress file that tracks bulk extraction that will be used if the
extraction has to be resumed mid-way after a failure. This file is created in the
Atlas log directory. Defaults to
extractor_progress_file.log |
.max.reconnect.count |
Number of times the extractor will try to reconnect to AWS in case of
credentials expiring or retryable exception from AWS S3. Defaults
to 2. |
.fs.scheme |
File system scheme used in case of AWS S3. At this time, S3a
is the only option. |
.incremental.queueName |
Required for incremental mode. AWS SQS Queue name configured to get the AWS S3 bucket events. |
.incremental.messagesPerRequest |
The number of messages the extractor will try to get from the SQS queue in one call. Defaults to 10. |
.incremental.requestWaitTime |
The interval that the extractor will wait to collect messages so long as the limit set in .incremental.messagesPerRequest hasn't been met. Defaults to 20 seconds. |
.incremental.max.retry |
Maximum retry count if the SQS queue doesn't respond during an extractor request. Defaults to 20. |