Cloudera Navigator Provenance Use Case
- How was this mortgage credit score computed?
- How can I prove that this number on a sales report is correct?
- What data sources were used in this calculation?
How Can I Verify a Value in a Table?
A number of business transactions require you to verify that information is correct and that it is derived from a reliable source. For example, if you work in a sales orgainization, you might verify that information in sales reports is accurate, that you can trust the contents, and that you can identify the origin of the information.
- Log into the Cloudera Navigator data management UI and click the Search tab.
- Type s_neighbor in the search box.
You see four instances of the s_neighbor field.
- View details of the field in the top_10 table by clicking s_neighbor in the entry with the Parent Path
/default/top10.
You see that the parent table is top_10, and the input or upstream source of the data is the salesdata database.
Where did salesdata come from originally? It was imported using sqoop, with syntax similar to the following; actual arguments vary:> sqoop import-all-tables -m {{cluster_data.worker_node_hostname.length}} \ --connect jdbc:mysql://{{cluster_data.manager_node_hostname}}:3306/retail_db \ --username=admin \ --password=password \ --compression-codec=snappy \ --as-parquetfile \ --warehouse-dir=/user/hive/warehouse \ --hive-import
- To see a graphical representation of the relationships among the entities:
- Click the Lineage tab.
- In Lineage Options, select Operations and clear any other check boxes.
See that s_neighbor can be traced back to the original table salesdata.
- Click the operation entity in the center of the lineage diagram, and see details about it on the lower right side of the lineage window.
Information about the selected entity indicates that the operation is an Impala query. Click the information icon on the Query Text line to see the entire query. This query was used to derive top_10 from the original table.