Atlas Metadata model overview

Atlas' model represents cluster data assets and operations, and is flexible enough to let you represent objects from other sources.

The flexibility Atlas’ metadata model lets you represent whatever objects and relationships among them that you want to create a map of your data lake. Atlas lets you create new instances of predefined entity types and lets you define new types of entities so you can represent data assets and actions from additional data sources or even services that do not reside in Hadoop. Atlas’ building blocks are entities, relationships, classifications, enumerations, and structures.

Entities are a collection of attributes that model or represent a data asset or data action. Entities are the unit that Atlas returns in search results or shows as nodes in a lineage diagram. Use Classifications to add metadata to entities; create Relationships connect entities.

Relationships describe connections between two entities. Because relationships are their own type in the Atlas data model, you can create new relationships with custom attributes to represent behaviors that are specific to your organization.

Classifications are reusable labels that can be attached to entities. Atlas supports two separate systems of labels: classifications can be used to describe data, clarify field names, identify status, and other manual or automated metadata. Glossary terms—which are implemented as classifications but managed separately—are used to associate data assets with formal names for agreed-upon business concepts and in business contexts. When you build a department or company-wide glossary and use its terms to label data, you create a search structure that allows everyone to access data with a common language. Creating and applying classifications and terms to entities lets you group data assets, mark them based on sensitivity or other access requirements, and label them to allow easier searching. The Atlas user interface leverages these labels to make it easy to find data assets marked with a given classification or term.

Atlas supports defining custom enumerations and data structures as well, similar to those constructs in structured programming languages. Enums can be used in attribute definitions to store lists of predetermined values; structs can be used in attribute definitions and relationship endpoints to identify more complex groupings.