Working with Atlas classifications
Add metadata labels to Atlas entities using classifications
In Atlas, classifications are labels that can be assigned to entities. The flexibility of classifications makes them useful for many applications. For example, you can define classifications to describe the phases in your data prep processes and assign the classifications to specific assets to mark where they are in the process. You can define classifications to identify data to block or mask and use the classifications in access control policies in Ranger.
Classifications can be simple labels but they can also be defined with attributes that allow you to assign values to further describe the entity where the classification is assigned. A typical use for attributes would be to refine the meaning of a general category. Data assets identified with a classification of “PII” or Personally Identifiable Information can have classification attributes that indicate the nature of the information to drive data masking or expiration policies. Columns tagged with “PII” might be further separated into phone numbers, credit card numbers, and “other” where a Ranger policy identifies masks for the phone and credit card numbers and simply blocks columns tagged as “PII” with attribute “Other”.
Here are some questions to ask yourself about your classification choices:
- What entities do you expect the classification to apply to? If you mean them only to apply to table columns, make sure that the name and description helps data stewards use the classification correctly.
- Will the classification be used for Atlas searches? Consider including attributes to allow further refinement of search results. If your Atlas users are more likely to search using terms, you might consider connecting the classification to a glossary term so searches from either mechanism return the correct results. (Assigning the term to an entity automatically assigns the classification to the same entity)
- Do you want the classification to follow lineage relationships and be assigned automatically to data that is created from the entities assigned this classification? If not, you can use the classification description to help data stewards understand that they should turn off propagation when assigning this classification to entities or terms.