Audit enhancements

When any entity is created in Atlas, the created entity is stored in HBase tables.

Currently, when the created entity is modified or updated, for example, entity core attributes, relationship attributes, custom attributes, or associated classifications, the changed entity is captured by Atlas either as a full entity object or partial object (with only updated attributes or relations) and stored in HBase tables.

While processing update requests, Atlas generates entity audit events and stores them in the HBase table. These audit events have complete entity information in the JSON format. Multiple updates on a single entity results in generating multiple audit events, each of which has a complete entity instance duplicated with minimal changes.

For example, if entity A1 is updated or modified for about five times, for every update, along with the changes (minimal), the entire entity is stored in the HBase tables. This process consumes additional storage space in HBase tables. In simple terms, A1 + the number of times the changes made is the resultant output that is saved in the HBase tables. Even if the changes are minimal (updating an attribute or relations or something that is not significant), Atlas captures the complete entity.

The Atlas audit enhancement optimizes storage space by not capturing the entire entity instance in each audit event. You can configure Atlas to store only differential information in audit events.

You must enable the application flag by setting the parameter using:

Cloudera Manager UI > Atlas > Configuration > Click Advanced (Under Category) > Enter the following parameter and the value under Atlas Server Advanced Configuration Snippet (Safety Valve) for confi/atlas-application.properties text field.

atlas.entity.audit.differential=true

To have significant savings in the HBase table memory footprint, only the difference between the original and updated entity state is captured in Atlas.

Previously, Atlas exhibited full entity as shown in the image:

Currently, Atlas displays only the differences between the original and updated entity state as shown in the image

As a use case, previously when you add data under user-defined-properties, for example, key_1 and val_1, Atlas displayed the same as seen in the image.

Currently Atlas displays only the changed or updated entity values.