Atlas metadata model overview
Atlas' model represents cluster data assets and operations, and is flexible enough to let you represent objects from other sources.
The flexibility Atlas’ metadata model lets you represent the objects and relationships among them so you can produce a map of your data lake. Atlas lets you create new instances of predefined entity types and lets you define new types of entities so you can represent data assets and actions from additional data sources or even services that do not reside in Hadoop. Atlas’ building blocks are entities, relationships, classifications, enumerations, and structures.
Entities are a collection of attributes that model or represent a data asset or data action. Entities are the unit that Atlas returns in search results or shows as nodes in a lineage diagram. Labels are modeled as attributes on a given entity instance; you can add user-defined properties to individual entity instances (without affecting the entity type definition).
Relationships describe connections between two entities. You can create relationship definitions with custom attributes to represent behaviors that are specific to your processes. Changes to relationship definitions require changing the model through the Atlas API.
- Classifications are not part of entity metadata so they are a way to add metadata to entities without updating entity type definitions.
- Classifications can be added to any entity type.
- Atlas can propagate classifications through lineage relationships.
- Classifications can be used in Ranger to drive access policies.
Business Metadata are a set of custom attributes that an administrator can use to extend the metadata stored for each entity type. User access to each set of attributes can be managed using Ranger policies.
Atlas supports defining custom enumerations and data structures as well, similar to those constructs in structured programming languages. Enums can be used in attribute definitions to store lists of predetermined values; structs can be used in attribute definitions to identify more complex data types.