Configuring Atlas Auto-Purging

Learn about how to configure the Atlas Auto-Purging mechanism to save resources and increase query efficiency.

  1. Go to Cloudera Manager > Clusters > Configuration > Atlas Server Advanced Configuration Snippet for conf/atlas-application.properties.
  2. Add atlas.delete.process.enabled and atlas.delete.columnlineages.enabled and set their values to true to enable soft-deletion of process entities along downstream datasets (tables).
  3. Add atlas.enable.process.soft.delete and set its value to true to activate Atlas Auto-Purging soft-deletion stage.
    Atlas will identify the process entities with deleted outputs or no outputs remaining and will soft delete them. Then these entities will also get purged based on their eligibility. Atlas continues to fetch these entities in batches, repeatedly until it scans all the active entities at each startup.
  4. Add atlas.soft.delete.enabled.process.type to enable the soft deletion of process, column lineage entities. Set its value to the type of entities to be deleted.
    Available values:
    • hive_column_lineage
    • hive_process
    • impala_column_lineage
  5. Add the atlas.cleanup.workers.count and set its value to the default 2 or to greater.
  6. Add the atlas.cleanup.worker.batch.size and set its value to the default 1000.
    This controls the concurrent processes of the soft-deletion workers.
  7. Add atlas.purge.cron.expression and set its value to a quartz cron expression to control how often Atlas has to check for soft-deleted entities to purge.
  8. Add atlas.purge.workers.count and set its value to the default 2 or to greater.
  9. Add atlas.purge.batch.size and set its value to the default 1000.
    This controls the concurrent processes of the purge workers.
  10. Add atlas.purge.enabled.services and set its value to any or all of the following: hive, impala, spark.
    For example, if hive is used, all Hive service entities are purged, such as, Hive tables, columns, Iceberg tables, column, processes, process executions, and so on.
  11. Add atlas.purge.deleted.entity.retention.days and set its value to 30 at least. Atlas will only auto-purge soft-deleted entities that have been deleted longer than this period. This prevents overwhelming Atlas with purge processes of all soft-deleted entities at once.
Your Atlas Server Advanced Configuration Snippet for conf/atlas-application.properties looks similarly:
# Enables the soft-deletion of process entities

atlas.delete.process.enabled=true
atlas.delete.columnlineages.enabled=true

# Controls the soft-deletion of process entities

atlas.enable.process.soft.delete=true
atlas.cleanup.workers.count=2
atlas.cleanup.worker.batch.size=1000
atlas.soft.delete.enabled.process.type=hive_column_lineage,hive_process

# Controls the purge of soft-deleted entities

atlas.purge.cron.expression=0 0/5 * * * ?
atlas.purge.workers.count=2
atlas.purge.batch.size=1000
atlas.purge.enabled.services=hive
atlas.purge.deleted.entity.retention.days=30