Running Bulk Extraction

Bulk extraction mechanism employs direct calls on Azure API calls to fetch all blobs and containers in ADLS Gen2 storage account.

If there is a failure while extracting the complete Azure metadata, the bulk extraction must be resumed from the last checkpoint by changing atlas.adls.extraction.resume.from.progress.file=true configuration at adls.conf.

The following command line example runs the bulk extraction. Assuming the mandatory properties are set in the default configuration file, only the parameter to enable bulk mode is required:

/opt/cloudera/parcels/CDH/lib/atlas/extractors/bin/adls-extractor.sh

Or

/opt/cloudera/parcels/CDH/lib/atlas/extractors/bin/adls-extractor.sh -e BULK

Refer to Extraction Configurationfor more details on different optional configurations.