Creating tag rules in compute cluster environments
With tag rules, you can apply Apache Atlas classifications to your assets based on regex expressions or similarity to a set of values in a table.
- To start applying tags, go to Profilers and select your data lake.
- Go to Profilers > Data Compliance > Tag Rules.
- Click + Create Tag Rule.
-
Name your tag rule and add a description to it in General
Information.
-
Select the tags to be applied from the list of available tags synchronized from
the list of Atlas classifications.
If you select a child tag, its parent tag is also automatically selected. By default, if the child tag is applied to a column, the table receives the parent tag.
-
Select your Data Pattern Type:
Option Description Regular Expression You can upload a text file containing your regex expression or directly type it in the Configure Tag Rule page. The required format of the CSV file can be seen by clicking Download Sample Tag Rule.
Continue in step 7.
Single Column File Upload Upload a CSV file with values to be matched against the actual values in your tables. After uploading your file, continue with step 11.
Creating regular expression based tag rule:
- Optional: Define your regular expression for the table name.
-
When using Column Level regex expressions, you can define
multiple expression for both of the following:
- Column Name
- Column Values
-
Define the Column Value Weightage in percentage with the
slider.
The remainder percentage is the column name weightage percentage. The results of the individual regex matches are weighted according to this setting before determining the final result confidence for applying the tag.
Tag rule testing:
- Optional:
You can make a sanity check of your tag rule in Test Tag
Rule by uploading a sample dataset in CSV format.
-
Review all your input before clicking Create Tag
Rule.
- Click Confirm to finalize your tag rule.
Your tag rule is created with Status Disabled() and the Test Status will be Test Pending.
-
Click
> Dry Run.
-
Click Run to start an on-demand dry run profiling job on
up to 10 tables from your data.
Your tag rule becomes VALIDATED after a successful dry run.
-
After the "Dry run" test was passed, click
> Enable to start your using your tag rule on your live data.