Known Issues in Cloudera Manager 7.6.2

Known issues in Cloudera Manager 7.6.2

Cloudera bug: OPSAPS-59764: Memory leak in the Cloudera Manager agent while downloading the parcels.

When using the M2Crpyto library in the Cloudera Manager agent to download parcels causes a memory leak.

The Cloudera Manager server requires parcels to install a cluster. If any of the URLs of parcels are modified, then the server provides information to all the Cloudera Manager agent processes that are installed on each cluster host.

The Cloudera Manager agent then starts checking for updates regularly by downloading the manifest file that is available under each of the URLs. However, if the URL is invalid or not reachable to download the parcel, then the Cloudera Manager agent shows a 404 error message and the memory of the Cloudera Manager agent process increases due to a memory leak in the file downloader code of the agent.

To prevent this memory leak, ensure all URLs of parcels in Cloudera Manager are reachable. To achieve this, delete all unused and unreachable parcels from the Cloudera Manager parcels page.

OPSAPS-63640: Monitoring a high number of Kafka producers might cause Cloudera Manager to slow down and run out of memory
This issue has two workarounds. You can either configure a Kafka producer metric allow list or completely disable producer metrics.
  • Configure a Kafka producer metric allow list:
    A producer metric allow list can be configured by adding the following properties to Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties.
    producer.metrics.whitelist.enabled=true
    producer.metrics.whitelist=[***ALLOW LIST REGEX***]

    Replace [***ALLOW LIST REGEX***] with a regular expression matching the client.id of the producers that you want to add to the allow list. This regular expression uses the java.util.regex.Pattern class to compile the regular expression, and uses the match() method on the client.id to determine whether it fits the regular expression.

    Once configured, the metrics of producers whose client.id does not match the regular expression provided in producer.metrics.whitelist are filtered.Kafka no longer reports these metrics through the HTTP metrics endpoint. Additionally, existing metrics of the producers whose client.id does not match the regular expression are deleted.

    Because the allow list filters metrics based on the client.id of the producers, you must ensure that the client.id property is specified in each producer's configuration. Automatically generated client IDs might cause the number of unnecessary metrics to increase even if an allow list is configured.

  • Completely disable producer metrics:

    Producer metrics can be completely disabled by unchecking the Enable Producer Metrics Kafka service property.

Cloudera bug: OPSAPS-63881: When CDP Private Cloud Base is running on RHEL/CentOS/Oracle Linux 8.4, services fail to start because service directories under the /var/lib directory are created with 700 permission instead of 755.
Run the following command on all managed hosts to change the permissions to 755. Run the command for each directory under /var/lib:
chmod -R 755 [***path_to_service_dir***]
OPSAPS-65189: Accessing Cloudera Manager through Knox displays the following error:

Bad Message 431 reason: Request Header Fields Too Large

Workaround: Modify the Cloudera Manager Server configuration /etc/default/cloudera-scm-server file to increase the header size from 8 KB, which is the default value, to 65 KB in the Java options as shown below:
export CMF_JAVA_OPTS="...existing options...
-Dcom.cloudera.server.cmf.WebServerImpl.HTTP_HEADER_SIZE_BYTES=65536
-Dcom.cloudera.server.cmf.WebServerImpl.HTTPS_HEADER_SIZE_BYTES=65536"
OPSAPS-65213: Ending the maintenance mode for a commissioned host with either an Ozone DataNode role or a Kafka Broker role running on it, might result in an error.

You may see the following error if you end the maintenance mode for Ozone and Kafka services from Cloudera Manager when the roles are not decommissioned on the host.

Execute command Recommission and Start on service OZONE-1
Failed to execute command Recommission and Start on service OZONE-1
Recommission and Start
Command Recommission and Start is not currently available for execution.
To resolve this issue, use the API support feature to take the host out of maintenance mode.
  1. Log into Cloudera Manager as an Administrator.
  2. Go to Hosts > All Hosts.
  3. Select the host for which you need to end the maintenance mode from the available list and click the link to open the host details page.
  4. Copy the Host ID from the Details section.
  5. Go to Support > API Explorer.
  6. Locate and click the /hosts/{hostId}/commands/exitMaintenanceMode endpoint for HostsResource API to view the API parameters.
  7. Click Try it out.
  8. Enter the ID of your host in the hostId field.
  9. Click Execute.
  10. Verify that the maintenance mode status is cleared for the host by checking the Server response code.

    The operation is successful if the API response code is 200.

If you need any guidance during this process, contact Cloudera support for further assistance.

Technical Service Bulletins

TSB 2022-597: Cloudera Manager Event server does not clean up old events
The Event Server in Cloudera Manager (CM) does not clean up old events from its index, which can fill up the disk. This leads to wrong “Event Store Size” health checks.
Component affected:
  • Event Server
Products affected:
  • Cloudera Data Platform (CDP) Private Cloud Base
  • CDP Public Cloud
Releases affected:
  • CDP Public Cloud 7.2.14 (CM 7.6.0), and 7.2.15 (CM 7.6.2)
  • CDP Private Cloud Base 7.1.7 Service Pack (SP) 1 (CM 7.6.1)
Users affected:
  • Users who have Event Server running
Impact:
  • Event Server’s index fills up the space on the used disk eventually.
Action required
Patch: Please contact support for a patch to address this issue.
  • Workaround
    Suggested workaround instructions:
    1. Stop the Event Server.
    2. Check path for Event Server’s index [eventserver_index_dir] in Cloudera Manager.
    3. Archive /v4 folder in this path*.
      1. Compress the v4 folder using the following command:
        tar -czvf event_archive.tar.gz ${eventserver_index_dir}/v4
      2. Copy the archived version to an external disk.
      3. Remove the ${eventserver_index_dir}/v4 folder.
    4. Start the Event Server**.

      *The archived version can be restored, by archiving the current index as described above, and extracting the archived version with the following steps:

      1. Stop the Event Server.
      2. Copy event_archive.tar.gz to ${eventserver_index_dir}.
      3. Extract event_archive.tar.gz using
        tar -xvf event_archive.tar.gz

        The extracted v4 folder should be under ${eventserver_index_dir}.

      4. Start the Event Server.***

      ** After the Event Server is restarted a new index is built, which cannot be merged with the previously archived index, if that is being restored.

      *** After the archived index is restored, the Event Server will continue to build that index with the new events.

    5. Delete the Event Server’s index which is under /var/lib/cloudera-scm-eventserver/v4 by default, can be changed using eventserver_index_dir parameter which is without the v4 subfolder.
    6. Restart the Event Server.
Monitoring:
  • CM by default has thresholds to monitor the Event Server space using [eventserver_index_directory_free_space_percentage_thresholds] parameter.

    You can adjust these as well by following the Cloudera Manager documentation.

Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2022-597: Cloudera Manager Event server does not clean up old events