Scaling issues pertaining to Logging and Diagnostic bundle collection
Logging and diagnostic bundle collection pipeline cannot process large volumes of data.
Condition
If the cdp-fluentd-aggregator pod repeatedly restarts with an
OOMKilled exit status, the aggregator's memory limit might be
insufficient for the current log volume or number of active workers.
Log messages such as buffer flush took too long or retry
flush in the aggregator pod logs indicate that the underlying Longhorn
storage is not keeping pace with the incoming log rate.
Cause
Scaling issues occur in the logging and diagnostic bundle collection pipeline due to lack of dynamic scalability to handle large volume of logs.
Remedy
Contact Cloudera Support and share the following information:
- The output of the
kubectl describe pod cdp-fluentd-aggregator-0 -n cdpcommand. - The output of the
kubectl logs cdp-fluentd-aggregator-0 -n cdp --previouscommand if the pod has restarted. - The current PVC usage using the
kubectl get pvc -n cdp | grep logs-rwxcommand. - The number of active
Fluentdaggregator replicas and your approximate log ingestion rate.
