Troubleshooting Navigator Data Management
This page contains troubleshooting tips and workarounds for various issues that can arise.
Cloudera Navigator-Cloudera Altus
No metadata or lineage collected from an Altus deployed cluster.
Symptom: No metadata collected from a Cloudera Altus instantiated cluster. Although both the Cloudera Altus environment has been setup to use a given Amazon S3 bucket and the Cloudera Navigator instance has been configured to read from that same S3 bucket, no metadata is collected.
Possible cause: Permissions on the Amazon S3 bucket have not been applied correctly.
- Check that permissions have been set properly on the Amazon S3 bucket.
- Verify that the Amazon S3 bucket name has not been changed.
...Metadata export failed com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: D28543E2494BD521), S3 Extended Request ID: ...
The 404 status code and "NoSuchBucket" in the error message indicate the Telemetry Publisher was unable to write to Amazon S3 bucket because the bucket was misidentified or that permissions were configured incorrectly.
Navigator Audit Server
"No serializer found" error
Symptom: Selecting an Audits page results in error message.
No serializer found
Possible cause: Communication failure somewhere between Navigator Metadata Server and Cloudera Manager Server and its communication to Navigator Audit Server. This is an existing Known Issue that needs to be resolved so that the error message can be properly mapped.
- Log in to the Cloudera Navigator console.
- Enable Navigator API logging.
- Perform your action on the Audits page that raised the error message and see if the underlying issue is caught in the API log. If the Navigator API logging reveals no additional
information, turn on API logging for Cloudera Manager:
- Log in to the Cloudera Manager Admin Console.
- Select Administrator > Settings
- Select Advanced for the Category filter.
- Click the Enable Debugging of API check box to select it. The server log will contain all API calls. Try your request again and see if the source error message displays in the response.
Processing a backlog of audit logs
Problem: You have logs that include audit events that were not processed by the Cloudera Navigator Audit Server, such as logs for events that took place before audit server was online or during a period when audit server was offline.
Solution: A backlog of logs can be processed by audit server as follows:
- Backup audit files for all roles on all hosts.
Do this right away as there are retention periods configured for these files and they will gradually be deleted.
- Determine what days were not processed. The audit log files have the UNIX epoch appended to their name:
You want the dates from the oldest and newest files. You can do this by sorting the list and identifying the first and last files. For example, for HDFS audit files:
The oldest: $ ls ./hdfs* | head -1
The newest: $ ls ./hdfs* | tail -1
Depending on your shell, you can convert the epoch value to a date using date -r epoch.
- Create partitions in the Navigator Audit Server database for days that have missing audits.
Create a partition from an existing template table using SQL commands. Use SHOW TABLES to list the template tables. For example, the template table for HDFS is HDFS_AUDIT_EVENTS.
Using MySQL as an example, if you have partitions up to March 31, 2017 and your audit logs include HDFS data for the first week of April, you would run the following SQL commands:
CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_01 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_02 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_03 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_04 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_05 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_06 LIKE HDFS_AUDIT_EVENTS; CREATE TABLE HDFS_AUDIT_EVENTS__2017_04_07 LIKE HDFS_AUDIT_EVENTS;
- In the PARTITION_INFO table, add new rows for each of the tables you created in the previous step, including the "END_TS" in Unix epoch time, 24 hours after the previous entry.
The following example command inserts a row for Feb 14, 2019 into PARTITION_INFO table with correct END_TS value for multiple partition tables:
insert into PARTITION_INFO values (PARTITION_INFO_SEQUENCE.nextval, ‘HDFS_AUDIT_EVENTS_2019_02_03’ , ‘HDFS_AUDIT_EVENTS’, 1550145600000, null,0);
- Change the Navigator Audit expiration period to be longer than the time period of the logs you want to process.
Set the expiration period in Cloudera Manager. Search for the property “Navigator Audit Server Data Expiration Period”.
- Stop the Cloudera Manager agent on the host running the role for the service whose logs will be restored.
For example, if the logs are for HDFS, stop the agent on the namenode. For HiveServer2, stop the Cloudera Manager agent on the node where HiveServer2 is running. See Starting, Stopping, and Restarting Cloudera Manager Agents.
If you aren’t sure of which host is involved, the log files contain the UUID of the role which generated them. Go to the host defined for that role. For high-availability clusters, make sure that you stop the role on the active host.
- On that host, copy the backed up audit logs to the audit directory.
The location is typically /var/log/service/audit. For example, /var/log/hadoop-hdfs/audit. The specific locations are configured in the Cloudera Manager audit_event_log_dir properties for each service.
- Move the WAL file out of this directory. (Back it up in a safe location.)
- Restart the Cloudera Manager agent.
On restart, Cloudera Manager agent will not find the WAL file and will create a new one. It will process the older audit logs one by one.
- Verify that the audits appear in Navigator.
You can also check the contents of the WAL file to see which logs were processed.
Navigator Metadata Server
Server restart needed
If you see the following errors in the Navigator Metadata Server log, it's highly likely that the server needs to be restarted.
- Embedded Solr isn't running
- The following error indicates that Solr isn't running. Restarting Navigator Metadata Server is the best way to make sure Solr is running in the correct context.
2018-11-08 02:24:39,777 ERROR com.cloudera.nav.hive.extractor.AbstractHiveExtractor [CDHExecutor-0-CDHUrlClassLoader@4446b570]: Failed to extract table example_table from database example_db with error: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:7187/solr/nav_elements java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:7187/solr/nav_elements