Cloudera Navigator Overview

Cloudera Navigator Data Management component is a complete solution for data governance, auditing, and related data management tasks that is fully integrated with CDH, and that collects governance artifacts—lineage, metadata, and auditing—automatically. Cloudera Navigator lets compliance groups, data stewards, administrators, and others work effectively with data at scale. Working "effectively with data at scale" means that finding specific data entities—to prove to regulators that deposit reserves have not been artificially manipulated, for example—is easier when the data entities in multi-hundred-gigabyte (or terabyte, and beyond) clusters have been cataloged.

An effective catalog of cluster data supports self-service data discovery for an organization. For example, business users can find all the data associated with particular projects by looking for meaningful labels without needing to know the low-level structures within the cluster. Finding all the tables or files relevant for a specific project in a given department can be as simple as applying filters to the Search page. For example, to start exploring the finance team's data stored in an organization's Amazon Simple Storage Service (S3), authorized users can select the source type and the tag filters:

Cloudera Navigator enables the cataloging function for Hadoop clusters thanks to its metadata architecture, which lets organizations devise comprehensive metadata models and apply them automatically as data is ingested into the cluster. Using the Cloudera Navigator console, data stewards and other business users can get an at-a-glance view of cluster data through charts, graphs, and histograms that allow further drill-down into all the details, including displaying lineage diagrams that capture transformations to data entities that may have occurred since origination. By tracing data entities back to their source, lineage diagrams show an entity's provenance and can be used to authenticate values by rendering precisely the transformations that may have occurred.

In addition to its metadata infrastructure, Cloudera Navigator also provides auditing infrastructure. Finding the data needed to pull together reports for regulators using standard Hadoop tools can take days but Cloudera Navigator lets organizations find needed information easily and quickly, from menu selectable reports, dashboards, and the like. Using the analytics feature to look at HDFS utilization, system administrators can glean meaningful information about data operations that might be tying up system resources and many other details. With further drill-down, administrators can identify root causes of issues and can also pro-actively monitor and pre-empt potential issues that could occur due to poor organization or consumption of data resources.