Working with Atlas classifications and labels
Add metadata to Atlas entities using labels and classifications.
You can add metadata to Atlas entities to help your organization find, organize, and share your understanding of the data assets that drive business processes. Atlas provides two mechanisms for adding metadata to entities: labels and classifications. Both labels and classifications can be applied to entities to help describe the entity's content, status, or other business-driven value.
- Labels
- Labels are words or phrases (strings) that you can associate with an
entity and reuse for other entities. They are a light-weight way to
add information to an entity so you can find it easily and share your
knowledge about the entity with others.
Anyone can create labels and associate labels with entities.
- Classifications
- Classifications are strings like labels, with added complexity and
structure:
- Atlas includes precise search tools for finding entities using classifications.
- Classifications can automatically propagate to additional entities through lineage relationships.
- You can use classifications to drive access control policies in Ranger.
- You can enrich a classification with attributes in the form of key-value pairs and set the value to describe a particular entity.
In short, use labels for annotating entities; use classifications to involve entities in processes inside and outside Atlas.
More about classifications: attributes
You can add key-value pairs, or attributes to the definition of a classification. A typical use for classification attributes would be to refine the meaning of a general category. Data assets identified with a classification of “PII” or Personally Identifiable Information can have classification attributes that indicate the nature of the information to drive data masking or expiration policies. Columns tagged with “PII” might be further separated into phone numbers, credit card numbers, and “other.” A Ranger policy based on the classification can use the attribute values to identify masks for the phone and credit card numbers and to block columns tagged as “PII” with attribute “Other.”
Planning for classifications and labels
Here are some questions to help you think about how you define classifications and labels in your system:
- Will you use the metadata to drive a workflow outside of Atlas? If so, use a classification. If not, a label may work fine.
- Do you need the text of the metadata to be exact values? You can limit who has the ability to create classifications: you can manage a single list for an organization. The ability to apply a classification to an entity can be controlled separately from defining classifications, which let's you potentially allow more users to applying the "official" classifications. Any users can create and add labels to entities, so it may be more difficult to standardize the content of labels.
- What types of entities do you expect the metadata to apply to? Labels can be added to any entity type; classifications can be defined to apply to a specific entity type. If you define a classification to only to apply to certain entity types, such as table columns, make sure that the name and description helps data stewards use the classification correctly.
- Are you adding metadata to make Atlas searches easier? Consider using classifications with attributes to allow refinement of search results. If your Atlas users are more likely to search using terms, you might consider connecting the classification to a glossary term so searches from either mechanism return the correct results. (Assigning the term to an entity automatically assigns the classification to the same entity.) For both labels and classifications, consider setting guidelines so everyone creating metadata uses the same conventions: underscores or hyphens? detail first or last? table-level metadata or column-level?
- Do you want a classification to follow lineage relationships and be assigned automatically as data is used in other entities? If not, you can use the classification description to help data stewards understand that they should turn off propagation when assigning this classification to entities or terms. Labels do not propagate.