Cloudera Data Management

This guide describes how to perform data management using Cloudera Navigator. Data management activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects.

Cloudera Navigator is a fully integrated data-management and security system for the Hadoop platform. Cloudera Navigator enables a broad range of stakeholders to work with data at scale:

  • Compliance groups must track and protect access to sensitive data. They must be prepared for an audit, track who accesses data and what are they do with it, and ensure that sensitive data is governed and protected.
  • Hadoop administrators and DBAs are responsible for boosting user productivity and cluster performance. They want to see how data is being used and how it can be optimized for future workloads.
  • Data stewards and curators manage and organize data assets at Hadoop scale. They manage the data lifecycle, from ingest to purge.
  • Data scientists and Business Intelligence users need to find the data that matters most. They must be able to explore data, trust what they find, and visualize relationships between data sets.
To address the requirements of all these users, Cloudera Navigator provides the following functionality:
  • Data Management - Provides visibility into and control over the data in Hadoop datastores, and the computations performed on that data. Hadoop administrators, data stewards, and data scientists can use Cloudera Navigator to:
    • Audit data access and verify access privileges - The goal of auditing is to capture a complete and immutable record of all activity within a system. Cloudera Navigator auditing adds secure, real-time audit components to key data and access frameworks. Compliance groups can use Cloudera Navigator to configure, collect, and view audit events that show who accessed data, and how.
    • Search metadata and visualize lineage - Cloudera Navigator metadata management allows DBAs, data stewards, business analysts, and data scientists to define, search for, amend the properties of, and tag data entities and view relationships between datasets.
    • Policies - Data stewards can use Cloudera Navigator policies to define automated actions, based on data access or on a schedule, to add metadata, create alerts, and move or purge data.
    • Analytics - Hadoop administrators can use Cloudera Navigator analytics to examine data usage patterns and create policies based on those patterns.
  • Data Encryption - Data encryption and key management provide a critical layer of protection against potential threats by malicious actors on the network or in the datacenter. Encryption and key management are also requirements for meeting key compliance initiatives and ensuring the integrity of your enterprise data. The following Cloudera Navigator components enable compliance groups to manage encryption:
    • Cloudera Navigator Encrypt transparently encrypts and secures data at rest without requiring changes to your applications and ensures there is minimal performance lag in the encryption or decryption process.
    • Cloudera Navigator Key Trustee Server is an enterprise-grade virtual safe-deposit box that stores and manages cryptographic keys and other security artifacts.
    • Cloudera Navigator Key HSM allows Cloudera Navigator Key Trustee Server to seamlessly integrate with a hardware security module (HSM).

Cloudera Navigator data management and data encryption components can be installed independently.