Learn about how to configure the Atlas Auto-Purging mechanism to save resources and
increase query efficiency.
-
Go to .
-
Add
atlas.delete.process.enabled and
atlas.delete.columnlineages.enabled and set their values to
true to enable soft-deletion of process entities along downstream datasets
(tables).
-
Add
atlas.enable.process.soft.delete and set its value to
true to activate Atlas Auto-Purging soft-deletion
stage.
Atlas will identify the process entities with deleted outputs or no outputs
remaining and will soft delete them. Then these entities will also get purged
based on their eligibility. Atlas continues to fetch these entities in batches,
repeatedly until it scans all the active entities at each startup.
-
Add
atlas.soft.delete.enabled.process.type to enable the soft
deletion of process, column lineage entities. Set its value to the type of
entities to be deleted.
Available values:
- hive_column_lineage
- hive_process
- impala_column_lineage
-
Add the
atlas.cleanup.workers.count and set its value to the
default 2 or to greater.
-
Add the
atlas.cleanup.worker.batch.size and set its value to
the default 1000.
This controls the concurrent processes of the soft-deletion workers.
-
Add
atlas.purge.cron.expression and set its value to a quartz
cron expression to control how often Atlas has to check for soft-deleted
entities to purge.
-
Add
atlas.purge.workers.count and set its value to the default
2 or to greater.
-
Add
atlas.purge.batch.size and set its value to the default
1000.
This controls the concurrent processes of the purge workers.
-
Add
atlas.purge.enabled.services and set its value to any or
all of the following: hive,
impala, spark.
For example, if hive is used, all Hive service entities
are purged, such as, Hive tables, columns, Iceberg tables, column, processes,
process executions, and so on.
-
Add
atlas.purge.deleted.entity.retention.days and set its
value to 30 at least. Atlas will only auto-purge
soft-deleted entities that have been deleted longer than this period. This
prevents overwhelming Atlas with purge processes of all soft-deleted entities at
once.
Your Atlas Server Advanced Configuration Snippet
for conf/atlas-application.properties looks
similarly:# Enables the soft-deletion of process entities
atlas.delete.process.enabled=true
atlas.delete.columnlineages.enabled=true
# Controls the soft-deletion of process entities
atlas.enable.process.soft.delete=true
atlas.cleanup.workers.count=2
atlas.cleanup.worker.batch.size=1000
atlas.soft.delete.enabled.process.type=hive_column_lineage,hive_process
# Controls the purge of soft-deleted entities
atlas.purge.cron.expression=0 0/5 * * * ?
atlas.purge.workers.count=2
atlas.purge.batch.size=1000
atlas.purge.enabled.services=hive
atlas.purge.deleted.entity.retention.days=30