Navigator Data Management in a High Availability Environment

Navigator Metadata Server and Navigator Audit Server do not currently support highly availability configurations.

When Cloudera Manager is configured for high availability using a load balancer with an active and passive Cloudera Management Service, you would configure single instances of Navigator Metadata Server and Navigator Audit Server. Therefore, if a failover is triggered, Cloudera Manager may fail over to a host where Cloudera Navigator is not available.

Navigator Configuration in a Highly Available Environment

When Cloudera Manager is configured for high availability, set up Navigator as follows:

Navigator Audit Server

One instance of Navigator Audit Server.
Navigator Audit Server does not provide a mechanism for reconciling or synchronizing two independent audit databases.
RDBMS configured for high availability.
Use database-specific mechanisms to ensure high availability.

Navigator Metadata Server

One instance of Navigator Metadata Server.
Navigator Metadata Server does not provide a mechanism for reconciling or synchronizing two independent storage directories. Typically in a production environment, the Navigator Metadata Server is installed on a host other than where Cloudera Manager is running to ensure there are enough compute resources and storage.
RDBMS configured for high availability.
Use database-specific mechanisms to ensure high availability.
Solr storage directory.
No high availability configuration is supported.

Navigator Behavior in a High Availability Environment

If a failover is triggered, Cloudera Manager may fail over to a host where Cloudera Navigator is not available. Here's the behavior you can expect if some component (Navigator or otherwise) goes down:

Navigator Audit Server

If Navigator Audit Server stops running, audit services will continue to queue audit events. When Navigator Audit Server is available again, Cloudera Manager agents collect the events and pass them to Navigator Audit Server. The same behavior occurs if Navigator Audit Server is running but it can't access the audit database: Navigator Audit Server stops accepting events from the cm_agent(s) and the events are held on the host where the service is running.

There are two potential problems that can occur if Navigator Audit Server (or its underlying RDBMS instance) stays offline:

Audit events can fill up the local file system on the host where services are running. If Navigator Audit Server is down for a prolonged interval and space becomes an issue, consider archiving audit files elsewhere and replacing them after the audit server processes some of the events.
After 24 or more hours, Navigator Audit Server runs out of pre-created audit tables in its database. If Navigator Audit Server is offline for more than a day, you may need to manually create the missing audit tables before it can resume processing. See Processing a backlog of audit logs.

Navigator Metadata Server

When Navigator Metadata Server stops running, the embedded Solr instance stops running, or Navigator Metadata Server can't access its database:

Navigator console is not available.
Metadata extractors stop collecting metadata from supported services.
Lineage relations are not calculated.
Scheduled metadata purge jobs don't run.
Periodic policy jobs don't run.

While metadata for existing data assets—files, tables, partitions, and so on—will be collected when Navigator Metadata Server restarts, there are some circumstances where it is possible to lose metadata:

Operations and operation execution entities from services that use pull extractors (Yarn, MapReduce, Sqoop, and Oozie) are extracted from the JobHistory server. If Navigator Metadata Server is not running, the JobHistory server persists the information, and it is collected when Navigator Metadata Server starts again. However, it is possible to lose metadata for operation executions if Navigator Metadata Server is stopped long enough that the JobHistory server logs are recycled. Operation executions roll up into operation entities and operations are used to generate lineage: If the missing operation executions correspond to existing operations, there is no impact on lineage relations. If these operation executions are on-going, even these lineage relations are eventually created after Navigator is restored. If the missing operation executions are unique in that they do not correspond to existing operations and later operation executions don't produce the same operation, it is possible to be missing lineage for entities referenced by the missing operation executions.
Operations and operation executions from push extractors such as for HiveServer2, Impala, and Spark are held at the service until Navigator Metadata Server is restarted and are not at risk for loss during a Navigator Metadata Server outage.
Data asset entities for data assets that are created and destroyed during the outage will not appear in Navigator.

If Navigator Metadata Server (or its underlying RDBMS instance or Solr) stays offline, metadata can fill up the local file system on the host where services are running. If Navigator Metadata Server is down for a prolonged interval and space becomes an issue, consider archiving JobHistory files and service log files elsewhere and replacing them after the metadata server processes some of the metadata.

Categories: Concepts | Data Management | High Availability | Navigator | All Categories

Search High Availability

Configuring Cloudera Manager for High Availability With a Load Balancer