Extraction Configuration
Some of the configurations that must be set-up before you perform the metadata extraction.
The configuration file is located at: /opt/cloudera/parcels/CDH/lib/atlas/extractor/adls.conf
| Configuration Parameter | Purpoose | Default Value | 
|---|---|---|
| atlas.adls.extraction.account.name | ADLS Gen2 storage account which was created as part of Extraction Prerequisites. | Mandatory. | 
| atlas.adls.extraction.account.key | ADLS account key if IDBroker is not configured. | To be specified if Knox IDBroker is not configured at CDP. | 
| atlas.adls.extraction.access.token | Access token for token based authentication. | If Knox IDBroker is not configured at CDP, token based authentication is required. It must be configured. | 
| atlas.adls.extraction.allowlist.paths=abfs://<containername>@<accountname>.dfs.core.windows.net/<path> | Comma separated ABFS paths or patterns from which ADLS metadata (directory, blob) needs to be extracted. Multiple values can be configured by ',' separated. Example: abfs://testcontainer@teststorageaccount.dfs.core.windows.net/testdir1/ | |
| atlas.adls.extraction.denylist.paths=abfs://<containername>@<accountname>.dfs.core.windows.net/<path> | Comma separated ABFS paths or patterns from which ADLS metadata
                            should be excluded from extraction. Multiple values can be configured by
                            ',' separated. Example: abfs://testcontainer@teststorageaccount.dfs.core.windows.net/testdir2/ | |
| atlas.adls.extraction.max.blob.per.call | Number of blob storage to be fetched in one call to Azure ADLS by bulk extraction. | 1000 | 
| atlas.adls.extraction.timeout.per.call.in.sec | The timeout (seconds), used for each ADLS SDK call wherever it is required. | 30 | 
| atlas.adls.extraction.resume.from.progress.file | Resume from the last run in case of failure feature. | Set to false by default. Set it to true if resuming an extract. | 
| atlas.adls.extraction.progress.file | Progress file used for extraction in case the user wants to resume. | adls_extractor_progress_file.props | 
| atlas.adls.extraction.max.reconnect.count | Specify the maximum number of retries to: 
 | |
| atlas.adls.extraction.fs.system | File System used in Azure ADLS. | Default set to: abfs | 
| atlas.adls.extraction.incremental.queueNames | Azure list of Account:QueueName which is configured as part of Configuring ADLS Gen2 Storage Queue to get the blob/directory create, delete events.Example: teststorageaccount:testqueue | |
| atlas.adls.extraction.incremental.messagesPerRequest | The number of messages Incremental Extractor tries to fetch from ADLS Queue in a single call. | Default is 10. It ranges from 1 to 32. | 
| atlas.adls.extraction.incremental.requestWaitTime | The wait time in seconds in a single call to ADLS Queue to fetch atlas.adls.extraction.incremental.messagesPerRequest messages. | 20 | 
| atlas.adls.extraction.incremental.max.retry | Maximum retry count in case of Idle while reading Queue Messages in Incremental Extraction. | 20 | 
| atlas.adls.extraction.incremental.delete.needed.for.rename | Does an entity need deletion if it has been renamed to something which should not be created at Atlas due to allow and deny list. | false | 
| atlas.notification.hook.asynchronous | This setting should be set to "true" only when extracting a large number of adls metadata (directory, blob) where there is a possibility of a lag when publishing messages to ATLAS_HOOK Kafka topic. | Defaults to asynchronous sending of events: true To set synchronous sending of events: false (Synchronous) | 
