Behavioral Changes in Atlas

Functional adjustments and behavioral updates for Atlas are introduced in Cloudera Runtime 7.3.2, its service packs, and cumulative hotfixes.

Cloudera Runtime 7.3.2 introduces functional adjustments, behavioral updates for Atlas, and includes all service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all functional adjustments in Cloudera Runtime 7.3.1.x, see Behavioral Changes.

Cloudera Runtime 7.3.2

Summary: Automatic purging of soft-deleted entities is introduced
Previous behavior:

Previously, only the API call DELETE /api/atlas/admin/purge was available to manually purge soft-deleted entities. Additionally, the DELETE api/atlas/v2/entity/guid/{{guid}} API call could not delete the column lineages entities of Hive, Impala and Spark process entities. This could lead to sparse graphs resulting in reduced query performance.

New behavior:

A built-in auto-purge mechanism is introduced, deleted entities are purged in two stages. The first stage is a soft-delete at each Atlas startup. In the second-stage, soft-deleted process entities are purged based on a cron job. For more information, see Atlas Auto-Purging overview.

Summary:

Entity attributes details and sparkPlanDescription are no longer sent in the Spark process entity

Previous behavior:

The spark_process entity attributes details and sparkPlanDescription are populated with query plan details, which can contain a large amount of text, often in megabytes. This amount of data can incur unnecessary processing costs.

New behavior:

The atlas.spark.plan.enabled is set to false by default. Set it to true to send the details and sparkPlanDescription attributes in the Spark process entity. When these attributes are not sent, the cost of having large amount of data processed in Atlas is avoided.