Cloudera Navigator support for Virtual Private Clusters

Cloudera Manager supports deploying workloads in virtual private compute clusters, allowing administrators to access resources for high-demand times or to isolate workloads. In this environment Cloudera Navigator continues to extract metadata and track audit events from services running on the Base cluster. Navigator does not extract metadata from services running on the Compute cluster. Navigator does not track audit events from services running on the Compute cluster.

When you use Compute clusters, you define a data context to control how data is shared between a Compute cluster and the Base cluster. The interaction between the Compute clusters and the Base clusters through the data context means that some of the activity that occurs on Compute clusters does affect the metadata collected in Navigator. For example, if you create Hive data assets using HiveServer2 or SparkSQL on the Compute cluster and you have Hive in your data context, you will see entities for the new Hive data assets in Navigator. You won't see lineage for how these assets were created because the operations on the Compute cluster are not extracted. You won't see audits for the events that created the assets because audits are not collected from the services running on the Compute cluster. The following tables describe the behavior of Navigator metadata and audit collection in Base and Compute clusters for the services Navigator supports.

Navigator Auditing in Virtual Private Compute Clusters

No audits appear in Navigator for events that occur on a Compute cluster. If Sentry is included in the data context for a cluster, you will see audit events for Sentry actions when those actions are performed in HiveServer2 or Impala on the Compute cluster.

Audit Behavior in Virtual Private Clusters
Audited Service	Compute Cluster	Notes
HBase
HDFS
HiveServer2	Sentry events	Sentry in the Data Context
Hue
Impala	Sentry events	Sentry in the Data Context
Sentry
Solr

Navigator Metadata and Lineage Extraction in Virtual Private Compute Clusters

No metadata is extracted from services running on a Compute cluster. However, if HDFS or Hive is included in the data context for a Compute cluster, Navigator shows entities created or updated on a Compute cluster and stored in HDFS or Hive Metastore on the Base cluster. For example, when directories or files are created from actions on a Compute cluster with HDFS in its data context, the directories and files are stored on the HDFS in the Base cluster. Navigator collects the metadata from the Base cluster HDFS and creates entities for the directories and files. Similarly, when Hive databases, tables, views, or partitions are created or modified by HiveServer2, Impala, or SparkSQL operations on a Compute cluster and Hive is included in the data context for that cluster, the updated metadata is extracted from HMS on the Base cluster and collected by Navigator. Because Navigator does not extract metadata directly from the Compute cluster, the operations and operation executions that created the data assets are not collected; therefore, Navigator does not calculate lineage for these data assets.

Metadata and Lineage Behavior in Virtual Private Clusters
Service Providing Metadata	Metadata		Lineage		Notes
Service Providing Metadata	Base Cluster	Compute Cluster	Base Cluster	Compute Cluster	Notes
HDFS					HDFS in the Data Context
HiveServer2
HMS					Hive in the Data Context
Impala
MapReduce (v1 and v2)
Oozie
Pig
Spark (v1 and v2)
Sqoop (v1)
YARN
Cluster
S3					Extraction occurs outside the Base or Compute clusters

Categories: Data Management | Navigator | SDX | Virtual Private Cluster | All Categories

Configuring Cloudera Navigator to work with Hue HA

Encryption (TLS/SSL) and Cloudera Navigator