Cloudera Management Service
Activity Monitor
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | ACTIVITYMONITOR_role_env_safety_valve | false | ||
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | activityevents.event.publish.queue.max | 20000 | actmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | activityevents.event.publish.retry.ms | 5000 | actmon_event_publication_retry_period | true |
Java Configuration Options for Activity Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Activity Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only. A string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Activity Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true |
Database
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Database Hostname | Name of host where Activity Monitor's database is running. It is highly recommended that this database is on the same host as the Activity Monitor. If the database is not running on its default port, specify the port number using this syntax: 'host:port' | localhost | firehose_database_host | false | |
Activity Monitor Database Name | Name of the Activity Monitor's database. | firehose_database_name | true | ||
Activity Monitor Database Password | Password for logging in to the Activity Monitor database | db.hibernate.connection.password | firehose_database_password | false | |
Activity Monitor Database Type | Type of database to use for Activity Monitor. | mysql | firehose_database_type | false | |
Activity Monitor Database Username | Username for logging in to the Activity Monitor database. | db.hibernate.connection.username | firehose_database_user | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Logging Threshold | The minimum log level for Activity Monitor logs | INFO | log_threshold | false | |
Activity Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Activity Monitor logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Activity Monitor Max Log Size | The maximum size, in megabytes, per log file for Activity Monitor logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Activity Monitor Log Directory | Location of log files for Activity Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Activity Monitor Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Activity Monitor activity monitor pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | activitymonitor_activity_monitor_pipeline_thresholds | false | |
Activity Monitor Activity Monitor Pipeline Monitoring Time Period | The time period over which the Activity Monitor activity monitor pipeline will be monitored for dropped messages. | 5 minute(s) | activitymonitor_activity_monitor_pipeline_window | false | |
Activity Monitor Activity Tree Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Activity Monitor activity tree pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | activitymonitor_activity_tree_pipeline_thresholds | false | |
Activity Monitor Activity Tree Pipeline Monitoring Time Period | The time period over which the Activity Monitor activity tree pipeline will be monitored for dropped messages. | 5 minute(s) | activitymonitor_activity_tree_pipeline_window | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | activitymonitor_fd_thresholds | false | |
Activity Monitor Host Health Test | When computing the overall Activity Monitor health, consider the host's health. | true | activitymonitor_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | activitymonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | activitymonitor_pause_duration_window | false | |
Activity Monitor Process Health Test | Enables the health test that the Activity Monitor's process state is consistent with the role configuration | true | activitymonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | activitymonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | activitymonitor_web_metric_collection_thresholds | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a log
message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Instead, use .* , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Use .* instead , alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the health
system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Cloudera Manager Descriptor Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager descriptor was last refreshed. | Warning: 60000.0, Critical: 120000.0 | scm_descriptor_age_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | activityevents.event.publish.log.suppress.window.ms | 1 minute(s) | actmon_event_publication_log_suppress_window | true |
Use the Authentication Service to enable Single Sign On | Use the Authentication Service to enable Single Sign On for the Firehose debug servers. Requires a running Authentication Service. | debug.servlet.auth.enabled | false | debug_servlet_auth_enabled | false |
Purge Activities Data at This Age | In Activity Monitor, purge data about MapReduce jobs and aggregate activities when the data reaches this age in hours. By default, Activity Monitor keeps data about activities for 336 hours (14 days). | firehose.activity.purge.duration.hours | 14 day(s) | firehose_activity_purge_duration_hours | false |
Purge Attempts Data at This Age | In the Activity Monitor, purge data about MapReduce attempts when the data reaches this age in hours. Because attempt data may consume large amounts of database space, you may wish to purge it more frequently than activity data. By default, Activity Monitor keeps data about attempts for 336 hours (14 days). | firehose.attempt.purge.duration.hours | 14 day(s) | firehose_attempt_purge_duration_hours | false |
Descriptor Fetch Tries Interval | The interval between fetch tries for SCM descriptor when Cloudera Management Service roles are starting. | mgmt.descriptor.fetch.frequency | 2 second(s) | mgmt_descriptor_fetch_frequency | true |
Descriptor Fetch Max Tries | Maximum number of tries to fetch SCM descriptor when Cloudera Management Service roles are starting. If the roles are not able to get the descriptor in these many tries, then they exit. | mgmt.num.descriptor.fetch.tries | 5 | mgmt_num_descriptor_fetch_tries | true |
Purge MapReduce Service Data at This Age | The number of hours of past service-level data to keep in the Activity Monitor database, such as total slots running. The default is to keep data for 336 hours (14 days). | timeseries.expiration.hours | 14 day(s) | timeseries_expiration_hours | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Bind Activity Monitor to Wildcard Address | If enabled, the Activity Monitor binds to the wildcard address ("0.0.0.0") on all of its ports. | false | amon_bind_wildcard | false | |
Activity Monitor Web UI Port | Port for Activity Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8087 | firehose_debug_port | false |
Activity Monitor Web UI HTTPS Port | Port for Activity Monitor's HTTPS Debug page. | debug.servlet.https.port | 9087 | firehose_debug_tls_port | false |
Activity Monitor Listen Port | Port where Activity Monitor is listening for agent messages. | firehose.server.port | 9999 | firehose_listen_port | false |
Activity Monitor Nozzle Port | Port where Activity Monitor's query API is exposed. | nozzle.server.port | 9998 | firehose_nozzle_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Activity Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Kerberos Principal | Kerberos principal used by the Activity Monitor. Note: Activity Monitoring should always use the principal used by Hue service. | hue | kerberos_role_princ_name | true | |
Enable TLS/SSL for Firehose Debug Server | Encrypt communication between clients and Firehose Debug Server using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). | debug.servlet.https.enabled | false | ssl_enabled | false |
Firehose Debug Server TLS/SSL Server JKS Keystore File Location | The path to the TLS/SSL keystore file containing the server certificate and private key used for TLS/SSL. Used when Firehose Debug Server is acting as a TLS/SSL server. The keystore must be in JKS format. | debug.servlet.https.keystorePath | ssl_server_keystore_location | false | |
Firehose Debug Server TLS/SSL Server JKS Keystore File Password | The password for the Firehose Debug Server JKS keystore file. | debug.servlet.https.keystorePassword | ssl_server_keystore_password | false |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Parameter Validation: Activity Monitor Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_activitymonitor_role_env_safety_valve | true | |
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Activity Monitor Database Hostname | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Database Hostname parameter. | false | role_config_suppression_firehose_database_host | true | |
Suppress Parameter Validation: Activity Monitor Database Name | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Database Name parameter. | false | role_config_suppression_firehose_database_name | true | |
Suppress Parameter Validation: Activity Monitor Database Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Database Password parameter. | false | role_config_suppression_firehose_database_password | true | |
Suppress Parameter Validation: Activity Monitor Database Username | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Database Username parameter. | false | role_config_suppression_firehose_database_user | true | |
Suppress Parameter Validation: Java Configuration Options for Activity Monitor | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Activity Monitor parameter. | false | role_config_suppression_firehose_java_opts | true | |
Suppress Parameter Validation: Activity Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf parameter. | false | role_config_suppression_firehose_safety_valve | true | |
Suppress Parameter Validation: Activity Monitor Kerberos Principal | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Kerberos Principal parameter. | false | role_config_suppression_kerberos_role_princ_name | true | |
Suppress Parameter Validation: Activity Monitor Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Activity Monitor Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Activity Monitor Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Location parameter. | false | role_config_suppression_ssl_server_keystore_location | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Password parameter. | false | role_config_suppression_ssl_server_keystore_password | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Health Test: Activity Monitor Pipeline | Whether to suppress the results of the Activity Monitor Pipeline heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_activity_monitor_pipeline | true | |
Suppress Health Test: Activity Tree Pipeline | Whether to suppress the results of the Activity Tree Pipeline heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_activity_tree_pipeline | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_audit_health | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_file_descriptor | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_host_health | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_log_directory_free_space | true | |
Suppress Health Test: Pause Duration | Whether to suppress the results of the Pause Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_pause_duration | true | |
Suppress Health Test: Cloudera Manager Descriptor Age | Whether to suppress the results of the Cloudera Manager Descriptor Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_scm_descriptor_fetch | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_scm_health | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_unexpected_exits | true | |
Suppress Health Test: Web Server Status | Whether to suppress the results of the Web Server Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_activity_monitor_web_metric_collection | true |
Alert Publisher
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Alert Publisher | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | alertpublisher_java_opts | false | ||
Alert Publisher Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | ALERTPUBLISHER_role_env_safety_valve | false | ||
Alert Publisher Advanced Configuration Snippet (Safety Valve) for alertpublisher.conf | For advanced use only. A string to be inserted into alertpublisher.conf for this role only. | alertpublisher_safety_valve | false | ||
Alert Publisher Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alert Publisher Logging Threshold | The minimum log level for Alert Publisher logs | INFO | log_threshold | false | |
Alert Publisher Maximum Log File Backups | The maximum number of rolled log files to keep for Alert Publisher logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Alert Publisher Max Log Size | The maximum size, in megabytes, per log file for Alert Publisher logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Alert Publisher Log Directory | Directory where Alert Publisher will place its log files. | /var/log/cloudera-scm-alertpublisher | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | alertpublisher_fd_thresholds | false | |
Alert Publisher Host Health Test | When computing the overall Alert Publisher health, consider the host's health. | true | alertpublisher_host_health_enabled | false | |
Alert Publisher Process Health Test | Enables the health test that the Alert Publisher's process state is consistent with the role configuration | true | alertpublisher_scm_health_enabled | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a log
message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alerts: Enable Email Alerts | This setting allows you to turn email alert delivery on and off. | mailserver.enabled | true | alert_mailserver_enabled | false |
Alert: Mail From Address | The 'From' address to use for alert emails | noreply@localhost | alert_mailserver_from_address | false | |
Alerts: Mail Server Hostname | The IP address or hostname of the mail server to send alerts to | localhost | alert_mailserver_hostname | true | |
Alerts: Mail Server Password | The password to use to log into the mail server. Warning: this password will be sent over the network to the Alert Publisher host in clear text. In addition, the password will be stored in a plain text file on the Alert Publisher host with restrictive file system permissions. | alert_mailserver_password | false | ||
Alerts: Mail Server Protocol | The protocol to use for sending email alerts. | smtp | alert_mailserver_protocol | true | |
Alerts: Mail Message Recipients | A comma-separated list of email addresses to send alerts to | root@localhost | alert_mailserver_recipients | true | |
Alerts: Mail Server Username | The username to use to log into the mail server | alert_mailserver_username | false | ||
Custom Alert Script | If configured, this script is invoked on the machine hosting the alert publisher role. The script must be readable and executable by the cloudera-scm user. The script is passed, as a single argument, a path to a UTF-8 JSON file containing a list of alerts. Alerts are, by default, batched over time, and the batch size and the batch interval are configurable with the "Alert Publisher: Maximum Batch Size" and "Alert Publisher: Maximum Batch Interval" configuration options. The alerts file is deleted when the script finishes executing. Only one instance of this script is invoked at any given time, and the script must terminate. The standard out and standard error messages from this script are logged to the alert publisher role's log file. | alert.script.path | alert_script_path | false | |
Alert Publisher: Maximum Batch Size | The Alert Publisher can be configured to batch multiple alerts into a single email. This setting specifies the maximum number of alerts that will be batched into a single email (regardless of the batch interval). | alert.aggregate.maxSize | 32 | alertpublisher_aggregate_max_size | false |
Alert Publisher: Maximum Batch Interval | The Alert Publisher can be configured to batch multiple alerts into a single email. This setting specifies the maximum amount of time (in milliseconds) that the Alert Publisher waits before sending an email of the current batch. | alert.aggregate.timeout.millis | 1 minute(s) | alertpublisher_aggregate_timeout | false |
Alerts: Email footer | Optional. If not empty, the text entered here will be inserted verbatim as a footer in HTML and plain-text emails. | alert.email.footer | alertpublisher_email_footer | false | |
Alerts: Email header | Optional. If not empty, the text entered here will be inserted verbatim as a header in HTML and plain-text emails. | alert.email.header | alertpublisher_email_header | false | |
Alerts: Mail Message Format | The format of the email alert message. The 'JSON' format is easy for scripts/programs to parse. The 'HTML' and 'text' formats are designed to be easily read by people. | mail.format | html | mail_format | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alerts: Mail Server TCP Port | Optional. The TCP port where the mail server is listening. If not specified, defaults to 25 if SMTP is selected, or 465 if SMTPS is selected. | alert_mailserver_port | false | ||
Alerts: Listen Port | Port where the Alert Publisher listens for internal API requests. | alertpublisher.internalapi.port | 10101 | alertpublisher_internalapi_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Alert Publisher in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 256 MiB | alert_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
SNMP
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
SNMP Authentication Protocol Pass Phrase | Pass phrase to use for SNMP authentication protocol | alert.snmp.auth.password | alert_snmp_auth_password | false | |
SNMP Authentication Protocol | Authentication algorithm to use for authentication | alert.snmp.auth.protocol | SHA | alert_snmp_auth_protocol | false |
SNMPv2 Community String | Community string to use to identify this service. Generated SNMPv2 traps will use this string for authentication purpose. | alert.snmp.community | alert_snmp_community | false | |
SNMP Retry Count | Number of time to try before trap is timed out. If this value is set to '0' the trap will be sent only once. | alert.snmp.retries | 0 | alert_snmp_retries | true |
SNMP Server Engine Id | Engine Id to use for authentication and privacy. Engine Id is normally a hexadecimal number (e.g. 8000173e03a0c095f80c68). Engine Id along with pass phrases are used to generate keys for authentication and privacy protocols. | alert.snmp.security.engineid | alert_snmp_security_engineid | false | |
SNMP Security Level | Level of security to use for SNMP v3 protocol. Currently only 'no authentication' and 'authentication with no privacy' is supported. Select 'SNMPv2' to use 'Community String' based SNMPv2 authentication. | alert.snmp.security.level | SNMPv2 | alert_snmp_security_level | true |
SNMP NMS Hostname | Hostname of the SNMP NMS (network management software). It can be a DNS name or IP address of the host listening for SNMP traps and notifications. For reference, here is Cloudera Manager SNMP Mib . | alert.snmp.server.hostname | alert_snmp_server_hostname | false | |
SNMP Server Port | Port number on which SNMP server is listening. | alert.snmp.server.port | 162 | alert_snmp_server_port | true |
SNMP Timeout | Time to wait before an SNMP trap is resent or timed out. | alert.snmp.timeout | 5 second(s) | alert_snmp_timeout | true |
SNMP Security UserName | Name of a user to use for SNMP security. | alert.snmp.username | alert_snmp_username | false |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Parameter Validation: Alert: Mail From Address | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alert: Mail From Address parameter. | false | role_config_suppression_alert_mailserver_from_address | true | |
Suppress Parameter Validation: Alerts: Mail Server Hostname | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Mail Server Hostname parameter. | false | role_config_suppression_alert_mailserver_hostname | true | |
Suppress Parameter Validation: Alerts: Mail Server Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Mail Server Password parameter. | false | role_config_suppression_alert_mailserver_password | true | |
Suppress Parameter Validation: Alerts: Mail Message Recipients | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Mail Message Recipients parameter. | false | role_config_suppression_alert_mailserver_recipients | true | |
Suppress Parameter Validation: Alerts: Mail Server Username | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Mail Server Username parameter. | false | role_config_suppression_alert_mailserver_username | true | |
Suppress Parameter Validation: Custom Alert Script | Whether to suppress configuration warnings produced by the built-in parameter validation for the Custom Alert Script parameter. | false | role_config_suppression_alert_script_path | true | |
Suppress Parameter Validation: SNMP Authentication Protocol Pass Phrase | Whether to suppress configuration warnings produced by the built-in parameter validation for the SNMP Authentication Protocol Pass Phrase parameter. | false | role_config_suppression_alert_snmp_auth_password | true | |
Suppress Parameter Validation: SNMPv2 Community String | Whether to suppress configuration warnings produced by the built-in parameter validation for the SNMPv2 Community String parameter. | false | role_config_suppression_alert_snmp_community | true | |
Suppress Parameter Validation: SNMP Server Engine Id | Whether to suppress configuration warnings produced by the built-in parameter validation for the SNMP Server Engine Id parameter. | false | role_config_suppression_alert_snmp_security_engineid | true | |
Suppress Parameter Validation: SNMP NMS Hostname | Whether to suppress configuration warnings produced by the built-in parameter validation for the SNMP NMS Hostname parameter. | false | role_config_suppression_alert_snmp_server_hostname | true | |
Suppress Parameter Validation: SNMP Security UserName | Whether to suppress configuration warnings produced by the built-in parameter validation for the SNMP Security UserName parameter. | false | role_config_suppression_alert_snmp_username | true | |
Suppress Parameter Validation: Alerts: Email footer | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Email footer parameter. | false | role_config_suppression_alertpublisher_email_footer | true | |
Suppress Parameter Validation: Alerts: Email header | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alerts: Email header parameter. | false | role_config_suppression_alertpublisher_email_header | true | |
Suppress Parameter Validation: Java Configuration Options for Alert Publisher | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Alert Publisher parameter. | false | role_config_suppression_alertpublisher_java_opts | true | |
Suppress Parameter Validation: Alert Publisher Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alert Publisher Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_alertpublisher_role_env_safety_valve | true | |
Suppress Parameter Validation: Alert Publisher Advanced Configuration Snippet (Safety Valve) for alertpublisher.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alert Publisher Advanced Configuration Snippet (Safety Valve) for alertpublisher.conf parameter. | false | role_config_suppression_alertpublisher_safety_valve | true | |
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Alert Publisher Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alert Publisher Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Alert Publisher Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Alert Publisher Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Configuration Validator: SNMP Validator | Whether to suppress configuration warnings produced by the SNMP Validator configuration validator. | false | role_config_suppression_snmp_validator | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_audit_health | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_file_descriptor | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_host_health | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_log_directory_free_space | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_scm_health | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_alert_publisher_unexpected_exits | true |
Event Server
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Event Server | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | eventserver_java_opts | false | ||
Maximum Number of Events Returned by Any Query | The maximum number of events that any query can return. Note: A high value can increase the amount of memory required by Event Server, as well as affect query response times. | eventcatcher.max.query.events | 10000 | eventserver_max_query_events | true |
Maximum Write Queue Length | The maximum number of events that can be queued for write before further requests are rejected | eventcatcher.ingest.pipeline.max | 10000 | eventserver_max_write_queue_size | true |
Number of Core Event Writer Threads | The number of threads that Event Server will use to write events to its store concurrently | eventcatcher.num.ingest.threads | 2 | eventserver_num_pipeline_threads | true |
Event Server Query Timeout | The amount of time, in milliseconds, that Cloudera Manager and the Alert Publisher will wait for the Event Server to respond to a query. | eventserver.query.timeout | 60000 | eventserver_query_timeout | false |
Event Server Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | EVENTSERVER_role_env_safety_valve | false | ||
Event Server Advanced Configuration Snippet (Safety Valve) for eventserver.conf | For advanced use only. A string to be inserted into eventserver.conf for this role only. | eventserver_safety_valve | false | ||
Event Server Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Server Logging Threshold | The minimum log level for Event Server logs | INFO | log_threshold | false | |
Event Server Maximum Log File Backups | The maximum number of rolled log files to keep for Event Server logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Event Server Max Log Size | The maximum size, in megabytes, per log file for Event Server logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Event Server Log Directory | Directory where Event Server will place its log files. | /var/log/cloudera-scm-eventserver | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Event Store Capacity Monitoring Thresholds | The health test thresholds on the number of events in the event store. Specified as a percentage of the maximum number of events in Event Server store. | Warning: 115.0 %, Critical: 130.0 % | eventserver_capacity_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | eventserver_fd_thresholds | false | |
Garbage Collection Duration Thresholds | The health test thresholds for the weighted average time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | eventserver_gc_duration_thresholds | false | |
Garbage Collection Duration Monitoring Period | The period to review when computing the moving average of garbage collection time. | 5 minute(s) | eventserver_gc_duration_window | false | |
Event Server Host Health Test | When computing the overall Event Server health, consider the host's health. | true | eventserver_host_health_enabled | false | |
Event Server Index Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Event Server Index Directory. | Warning: 10 GiB, Critical: 5 GiB | eventserver_index_directory_free_space_absolute_thresholds | false | |
Event Server Index Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Event Server Index Directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Event Server Index Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | eventserver_index_directory_free_space_percentage_thresholds | false | |
Event Server Process Health Test | Enables the health test that the Event Server's process state is consistent with the role configuration | true | eventserver_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | eventserver_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | eventserver_web_metric_collection_thresholds | false | |
Event Server Write Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Event Server write pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | eventserver_write_pipeline_thresholds | false | |
Event Server Write Pipeline Monitoring Time Period | The time period over which the Event Server write pipeline will be monitored for dropped messages. | 5 minute(s) | eventserver_write_pipeline_window | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a log
message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Cloudera Manager Descriptor Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager descriptor was last refreshed. | Warning: 60000.0, Critical: 120000.0 | scm_descriptor_age_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alert On Transitions Out of Alerting Health | If set, the health events for transitions out of an alertable health level will also be considered an alert. For example, consider an entity that is configured to alert when it has bad health. If that entity's health becomes bad, an alert will be generated. If this setting is enabled, an alert will also be generated when it returns to good health. If this setting is disabled, then no alert will be generated when it returns to good health. Note that an entity must have enable_alerts set to true for health alerts to be generated for it. And make sure to reference the per-entity setting to turn on health alerts. | false | eventserver_alert_on_transition_out_of_alerting_health_enabled | false | |
Health Alert Threshold | Threshold at which a health event will be considered an alert. Note that an entity must have enable_alerts set to true for health alerts to be generated for it. And make sure to reference the per-entity setting to turn on health alerts. | Bad | eventserver_health_events_alert_threshold | false | |
Event Server Index Directory | Location of the Lucene index for Event Server | eventcatcher.server.lucenedir | /var/lib/cloudera-scm-eventserver | eventserver_index_dir | false |
Maximum Number of Events in the Event Server Store | The maximum size of the Event Server store, in events. Once this size is exceeded, events is deleted started with the oldest first until the size of the store returns below this threshold | eventcatcher.event.capacity | 5000000 | eventserver_max_index_size | true |
Descriptor Fetch Tries Interval | The interval between fetch tries for SCM descriptor when Cloudera Management Service roles are starting. | mgmt.descriptor.fetch.frequency | 2 second(s) | mgmt_descriptor_fetch_frequency | true |
Descriptor Fetch Max Tries | Maximum number of tries to fetch SCM descriptor when Cloudera Management Service roles are starting. If the roles are not able to get the descriptor in these many tries, then they exit. | mgmt.num.descriptor.fetch.tries | 5 | mgmt_num_descriptor_fetch_tries | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Server Web UI Port | Port for the Event Server's Debug page. Set to -1 to disable debug server. | eventcatcher.server.debug.port | 8084 | eventserver_debug_port | false |
Event Query Port | Port on which the Event Server listens for queries for events. | eventcatcher.server.httpport | 7185 | eventserver_http_port | false |
Event Publish Port | Port on which the Event Server listens for the publication of events. | eventcatcher.server.port | 7184 | eventserver_listen_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of EventServer in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | event_server_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Event Server Index Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Event Server Index Directory parameter. | false | role_config_suppression_eventserver_index_dir | true | |
Suppress Parameter Validation: Java Configuration Options for Event Server | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Event Server parameter. | false | role_config_suppression_eventserver_java_opts | true | |
Suppress Parameter Validation: Event Server Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Event Server Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_eventserver_role_env_safety_valve | true | |
Suppress Parameter Validation: Event Server Advanced Configuration Snippet (Safety Valve) for eventserver.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Event Server Advanced Configuration Snippet (Safety Valve) for eventserver.conf parameter. | false | role_config_suppression_eventserver_safety_valve | true | |
Suppress Parameter Validation: Event Server Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Event Server Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Event Server Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Event Server Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_audit_health | true | |
Suppress Health Test: Event Store Size | Whether to suppress the results of the Event Store Size heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_event_store_size | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_file_descriptor | true | |
Suppress Health Test: GC Duration | Whether to suppress the results of the GC Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_gc_duration | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_host_health | true | |
Suppress Health Test: Event Server Index Directory Free Space | Whether to suppress the results of the Event Server Index Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_index_directory_free_space | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_log_directory_free_space | true | |
Suppress Health Test: Cloudera Manager Descriptor Age | Whether to suppress the results of the Cloudera Manager Descriptor Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_scm_descriptor_fetch | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_scm_health | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_unexpected_exits | true | |
Suppress Health Test: Web Server Status | Whether to suppress the results of the Web Server Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_web_metric_collection | true | |
Suppress Health Test: Write Pipeline | Whether to suppress the results of the Write Pipeline heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_event_server_write_pipeline | true |
Host Monitor
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Host Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Host Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only. A string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Host Monitor Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | HOSTMONITOR_role_env_safety_valve | false | ||
Host Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true | |
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | health.event.publish.queue.max | 20000 | svcmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | health.event.publish.retry.ms | 5000 | svcmon_event_publication_retry_period | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Logging Threshold | The minimum log level for Host Monitor logs | INFO | log_threshold | false | |
Host Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Host Monitor logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Host Monitor Max Log Size | The maximum size, in megabytes, per log file for Host Monitor logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Host Monitor Log Directory | Location of log files for Host Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Metrics Aggregation Run Duration Thresholds | The health test thresholds for monitoring the metrics aggregation run duration. | Warning: 10 second(s), Critical: 30 second(s) | aggregation_run_duration_thresholds | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Host Monitor Storage Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Host Monitor Storage Directory. | Warning: 10 GiB, Critical: 5 GiB | firehose_storage_directory_free_space_absolute_thresholds | false | |
Host Monitor Storage Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Host Monitor Storage Directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Host Monitor Storage Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | firehose_storage_directory_free_space_percentage_thresholds | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | hostmonitor_fd_thresholds | false | |
Host Monitor Host Health Test | When computing the overall Host Monitor health, consider the host's health. | true | hostmonitor_host_health_enabled | false | |
Host Monitor Host Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Host Monitor host pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | hostmonitor_host_pipeline_thresholds | false | |
Host Monitor Host Pipeline Monitoring Time Period | The time period over which the Host Monitor host pipeline will be monitored for dropped messages. | 5 minute(s) | hostmonitor_host_pipeline_window | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | hostmonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | hostmonitor_pause_duration_window | false | |
Host Monitor Process Health Test | Enables the health test that the Host Monitor's process state is consistent with the role configuration | true | hostmonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | hostmonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | hostmonitor_web_metric_collection_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a log
message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Instead, use .* , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Use .* instead , alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Cloudera Manager Metric Schema Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager metric schema was last refreshed. | Warning: 60000.0, Critical: 120000.0 | metric_schema_age_thresholds_name | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Cloudera Manager Descriptor Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager descriptor was last refreshed. | Warning: 60000.0, Critical: 120000.0 | scm_descriptor_age_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Use the Authentication Service to enable Single Sign On | Use the Authentication Service to enable Single Sign On for the Firehose debug servers. Requires a running Authentication Service. | debug.servlet.auth.enabled | false | debug_servlet_auth_enabled | false |
Host Monitor Storage Directory | The directory where Host Monitor data is stored. The Host Monitor stores metric time series and health information. | firehose.storage.base.directory | /var/lib/cloudera-host-monitor | firehose_storage_dir | true |
Time-Series Storage | The approximate amount of disk space dedicated to storing time series and health data. Once the store has reached its maximum size, older data is deleted to make room for newer data. The disk usage is approximate because data is deleted only when the limit is reached.Note that Cloudera Manager stores time-series data at a number of different data granularities, and these granularities have different effective retention periods. Specifically, Cloudera Manager stores metric data as both raw data points and ten-minutely, hourly, six-hourly, daily, and weekly summary data points. Raw data consumes the bulk of the allocated storage space, weekly summaries the least. As such, raw data is retained for the shortest amount of time, while weekly summary points are unlikely to ever be deleted.See the "Storage" tab on the 'Host Monitor' -> 'Charts Library' -> 'Host Monitor Storgae' page for more information on how space is consumed within the Host Monitor. This tab also shows information about the amount of data retained and time window covered by each data granularity. | firehose_time_series_storage_bytes | 10 GiB | firehose_time_series_storage_bytes | false |
Health Event Startup Policy | This setting controls whether health events are emitted when this monitoring role is started. If set to "none", then no health events are emitted. If set to "bad" then health events are emitted for subjects with bad or concerning health. If set to "all" then health events are emitted for all subjects for all health values. The default is "bad". | health.event.publish.startup.policy | bad | health_event_publish_startup_policy | false |
Descriptor Fetch Tries Interval | The interval between fetch tries for SCM descriptor when Cloudera Management Service roles are starting. | mgmt.descriptor.fetch.frequency | 2 second(s) | mgmt_descriptor_fetch_frequency | true |
Descriptor Fetch Max Tries | Maximum number of tries to fetch SCM descriptor when Cloudera Management Service roles are starting. If the roles are not able to get the descriptor in these many tries, then they exit. | mgmt.num.descriptor.fetch.tries | 5 | mgmt_num_descriptor_fetch_tries | true |
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | health.event.publish.log.suppress.window.ms | 1 minute(s) | svcmon_event_publication_log_suppress_window | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Web UI Port | Port for Host Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8091 | firehose_debug_port | false |
Host Monitor Web UI HTTPS Port | Port for Host Monitor's HTTPS Debug page. | debug.servlet.https.port | 9091 | firehose_debug_tls_port | false |
Host Monitor Listen Port | Port where Host Monitor is listening for agent messages. | firehose.server.port | 9995 | firehose_listen_port | false |
Host Monitor Nozzle Port | Port where Host Monitor's query API is exposed. | nozzle.server.port | 9994 | firehose_nozzle_port | false |
Bind Host Monitor to Wildcard Address | If enabled, the Host Monitor binds to the wildcard address ("0.0.0.0") on all of its ports. | false | hmon_bind_wildcard | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Host Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Maximum Non-Java Memory of Host Monitor | The amount of memory the Host Monitor can use off of the Java heap. | firehose_non_java_memory_bytes | 2 GiB | firehose_non_java_memory_bytes | false |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable TLS/SSL for Firehose Debug Server | Encrypt communication between clients and Firehose Debug Server using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). | debug.servlet.https.enabled | false | ssl_enabled | false |
Firehose Debug Server TLS/SSL Server JKS Keystore File Location | The path to the TLS/SSL keystore file containing the server certificate and private key used for TLS/SSL. Used when Firehose Debug Server is acting as a TLS/SSL server. The keystore must be in JKS format. | debug.servlet.https.keystorePath | ssl_server_keystore_location | false | |
Firehose Debug Server TLS/SSL Server JKS Keystore File Password | The password for the Firehose Debug Server JKS keystore file. | debug.servlet.https.keystorePassword | ssl_server_keystore_password | false |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Configuration Validator: Host Monitor Heap Size Validator | Whether to suppress configuration warnings produced by the Host Monitor Heap Size Validator configuration validator. | false | role_config_suppression_firehose_host_monitor_heap_role_validator | true | |
Suppress Configuration Validator: Host Monitor Off Heap Memory Size Validator | Whether to suppress configuration warnings produced by the Host Monitor Off Heap Memory Size Validator configuration validator. | false | role_config_suppression_firehose_host_monitor_non_java_memory_role_validator | true | |
Suppress Parameter Validation: Java Configuration Options for Host Monitor | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Host Monitor parameter. | false | role_config_suppression_firehose_java_opts | true | |
Suppress Parameter Validation: Host Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Host Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf parameter. | false | role_config_suppression_firehose_safety_valve | true | |
Suppress Parameter Validation: Host Monitor Storage Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Host Monitor Storage Directory parameter. | false | role_config_suppression_firehose_storage_dir | true | |
Suppress Parameter Validation: Host Monitor Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Host Monitor Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_hostmonitor_role_env_safety_valve | true | |
Suppress Parameter Validation: Host Monitor Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Host Monitor Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Host Monitor Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Host Monitor Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Location parameter. | false | role_config_suppression_ssl_server_keystore_location | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Password parameter. | false | role_config_suppression_ssl_server_keystore_password | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Health Test: Metrics Aggregation Run Duration Test | Whether to suppress the results of the Metrics Aggregation Run Duration Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_aggregation_run_duration | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_audit_health | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_file_descriptor | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_host_health | true | |
Suppress Health Test: Host Pipeline | Whether to suppress the results of the Host Pipeline heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_host_pipeline | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_log_directory_free_space | true | |
Suppress Health Test: Cloudera Manager Metric Schema Age | Whether to suppress the results of the Cloudera Manager Metric Schema Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_metric_schema_fetch | true | |
Suppress Health Test: Pause Duration | Whether to suppress the results of the Pause Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_pause_duration | true | |
Suppress Health Test: Cloudera Manager Descriptor Age | Whether to suppress the results of the Cloudera Manager Descriptor Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_scm_descriptor_fetch | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_scm_health | true | |
Suppress Health Test: Host Monitor Storage Directory Free Space | Whether to suppress the results of the Host Monitor Storage Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_storage_directory_free_space | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_unexpected_exits | true | |
Suppress Health Test: Web Server Status | Whether to suppress the results of the Web Server Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_host_monitor_web_metric_collection | true |
Reports Manager
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Extra Space Ratio for Indexing | Reports Manager uses an array to store HDFS directory tree during indexing. The size of this array is 3 * number of filesystem objects in HDFS * (1 + extra space ratio). Increasing this ratio allows Reports Manager to create the directory tree faster, but consumes more memory. Also, extra space ratio must be set to a small enough value so that size of the array is below the maximum allowed in Java, which is 2^31 - 1. | index.space.extra.ratio | 0.2 | headlamp_index_space_extra_ratio | false |
Index Writer Thread Pool Queue Size | Size of the queue to use for holding index writer tasks before they are executed. For faster indexing performance, consider increasing this to a small multiple of the Maximum Index Writer Threads configured value. | index.writer.max.queue.size | 4 | headlamp_index_writer_max_queue_size | false |
Maximum Index Writer Threads | Maximum number of concurrent threads to use when writing the index. For faster indexing performance, consider increasing it to a small multiple of the number of cores on the Reports Manager host. | index.writer.num.threads | 2 | headlamp_index_writer_num_threads | false |
Java Configuration Options for Reports Manager | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | headlamp_java_opts | false | ||
Maximum Document Buffer Size | Amount of memory that can be used for buffering documents before they are flushed to the index. For faster indexing performance, consider increasing this value. | lucene.max.buffer.size.mb | 32 MiB | headlamp_lucene_max_buffer_size_mb | false |
Index Merge Factor | Reports Manager index is built in sections that are merged as the build progresses. This configuration determines how often index sections are merged. With smaller values, less memory is used while indexing, but indexing speed is slower. For faster indexing performance, consider increasing this value. | lucene.merge.factor | 100 | headlamp_lucene_merge_factor | false |
Publish HBase Space Usage | When set, publishes HBase space usage metrics to support HBase usage reporting. This feature is only supported for CDH5+ HBase deployments. | publish.hbase.space | true | headlamp_publish_hbase_metrics | false |
Reports Manager Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true | |
Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.db.properties | For advanced use only. A string to be inserted into headlamp.db.properties for this role only. | reportsmanager_db_safety_valve | false | ||
Reports Manager Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | REPORTSMANAGER_role_env_safety_valve | false | ||
Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.conf | For advanced use only. A string to be inserted into headlamp.conf for this role only. | reportsmanager_safety_valve | false |
Database
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Database Hostname | Name of the host where Reports Manager's database is running. It is highly recommended that this database is on the same host as Reports Manager. If the database is not running on its default port, specify the port number using this syntax: 'host:port' | com.cloudera.headlamp.db.host | localhost | headlamp_database_host | false |
Reports Manager Database Name | The name of the Reports Manager's database. | com.cloudera.headlamp.db.name | headlamp_database_name | true | |
Reports Manager Database Password | The password for Reports Manager's database user account. | com.cloudera.headlamp.db.password | headlamp_database_password | false | |
Reports Manager Database Type | Type of database used for Reports Manager. | com.cloudera.headlamp.db.type | mysql | headlamp_database_type | false |
Reports Manager Database Username | The username to use to log into Reports Manager's database. | com.cloudera.headlamp.db.user | headlamp_database_user | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Logging Threshold | The minimum log level for Reports Manager logs | INFO | log_threshold | false | |
Reports Manager Maximum Log File Backups | The maximum number of rolled log files to keep for Reports Manager logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Reports Manager Max Log Size | The maximum size, in megabytes, per log file for Reports Manager logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Reports Manager Log Directory | Directory where Reports Manager will place its log files. | /var/log/cloudera-scm-headlamp | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role
loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a
log message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Instead, use .* , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Use .* instead , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | reportsmanager_fd_thresholds | false | |
Reports Manager Host Health Test | When computing the overall Reports Manager health, consider the host's health. | true | reportsmanager_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | reportsmanager_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | reportsmanager_pause_duration_window | false | |
Reports Manager Process Health Test | Enables the health test that the Reports Manager's process state is consistent with the role configuration | true | reportsmanager_scm_health_enabled | false | |
Reports Manager Working Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Reports Manager Working Directory. | Warning: 10 GiB, Critical: 5 GiB | reportsmanager_scratch_directory_free_space_absolute_thresholds | false | |
Reports Manager Working Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Reports Manager Working Directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Reports Manager Working Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | reportsmanager_scratch_directory_free_space_percentage_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Cloudera Manager Descriptor Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager descriptor was last refreshed. | Warning: 60000.0, Critical: 120000.0 | scm_descriptor_age_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Working Directory | Directory for Reports Manager to use for its working files | scratch.dir | /var/lib/cloudera-scm-headlamp | headlamp_scratch_dir | false |
Reports Manager Update Frequency | Frequency in which Reports Manager refreshes its view of HDFS. | update.frequency.seconds | 1 hour(s) | headlamp_update_frequency_seconds | false |
Descriptor Fetch Tries Interval | The interval between fetch tries for SCM descriptor when Cloudera Management Service roles are starting. | mgmt.descriptor.fetch.frequency | 2 second(s) | mgmt_descriptor_fetch_frequency | true |
Descriptor Fetch Max Tries | Maximum number of tries to fetch SCM descriptor when Cloudera Management Service roles are starting. If the roles are not able to get the descriptor in these many tries, then they exit. | mgmt.num.descriptor.fetch.tries | 5 | mgmt_num_descriptor_fetch_tries | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Bind Reports Manager to Wildcard Address | If enabled, the Reports Manager binds to the wildcard address ("0.0.0.0") on all of its ports. | false | headlamp_bind_wildcard | false | |
Reports Manager Web UI Port | The port where Reports Manager starts a debug web server. Set to -1 to disable debug server. | debug.server.port | 8083 | headlamp_debug_port | false |
Reports Manager Server Port | The port where Reports Manager listens for requests | server.port | 5678 | headlamp_server_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Reports Manager in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | headlamp_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Kerberos Principal | Kerberos principal used by Reports Manager. Note: This principal must have administrator and superuser privileges on all HDFS services. | hdfs | kerberos_role_princ_name | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Reports Manager Database Hostname | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Database Hostname parameter. | false | role_config_suppression_headlamp_database_host | true | |
Suppress Parameter Validation: Reports Manager Database Name | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Database Name parameter. | false | role_config_suppression_headlamp_database_name | true | |
Suppress Parameter Validation: Reports Manager Database Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Database Password parameter. | false | role_config_suppression_headlamp_database_password | true | |
Suppress Parameter Validation: Reports Manager Database Username | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Database Username parameter. | false | role_config_suppression_headlamp_database_user | true | |
Suppress Parameter Validation: Java Configuration Options for Reports Manager | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Reports Manager parameter. | false | role_config_suppression_headlamp_java_opts | true | |
Suppress Parameter Validation: Reports Manager Working Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Working Directory parameter. | false | role_config_suppression_headlamp_scratch_dir | true | |
Suppress Parameter Validation: Reports Manager Kerberos Principal | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Kerberos Principal parameter. | false | role_config_suppression_kerberos_role_princ_name | true | |
Suppress Parameter Validation: Reports Manager Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Reports Manager Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.db.properties | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.db.properties parameter. | false | role_config_suppression_reportsmanager_db_safety_valve | true | |
Suppress Parameter Validation: Reports Manager Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_reportsmanager_role_env_safety_valve | true | |
Suppress Parameter Validation: Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.conf parameter. | false | role_config_suppression_reportsmanager_safety_valve | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_audit_health | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_file_descriptor | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_host_health | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_log_directory_free_space | true | |
Suppress Health Test: Pause Duration | Whether to suppress the results of the Pause Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_pause_duration | true | |
Suppress Health Test: Cloudera Manager Descriptor Age | Whether to suppress the results of the Cloudera Manager Descriptor Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_scm_descriptor_fetch | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_scm_health | true | |
Suppress Health Test: Reports Manager Working Directory Free Space | Whether to suppress the results of the Reports Manager Working Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_scratch_directory_free_space | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_reports_manager_unexpected_exits | true |
Service Monitor
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Service Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Service Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only. A string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Service Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true | |
Service Monitor Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | SERVICEMONITOR_role_env_safety_valve | false | ||
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | health.event.publish.queue.max | 20000 | svcmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | health.event.publish.retry.ms | 5000 | svcmon_event_publication_retry_period | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Service Monitor Logging Threshold | The minimum log level for Service Monitor logs | INFO | log_threshold | false | |
Service Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Service Monitor logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Service Monitor Max Log Size | The maximum size, in megabytes, per log file for Service Monitor logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Service Monitor Log Directory | Location of log files for Service Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Metrics Aggregation Run Duration Thresholds | The health test thresholds for monitoring the metrics aggregation run duration. | Warning: 10 second(s), Critical: 30 second(s) | aggregation_run_duration_thresholds | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Service Monitor Storage Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Service Monitor Storage Directory. | Warning: 10 GiB, Critical: 5 GiB | firehose_storage_directory_free_space_absolute_thresholds | false | |
Service Monitor Storage Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Service Monitor Storage Directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Service Monitor Storage Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | firehose_storage_directory_free_space_percentage_thresholds | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role
loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a
log message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Instead, use .* , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Use .* instead , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Cloudera Manager Metric Schema Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager metric schema was last refreshed. | Warning: 60000.0, Critical: 120000.0 | metric_schema_age_thresholds_name | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Cloudera Manager Descriptor Age Thresholds | The health test thresholds for monitoring the time since the Cloudera Manager descriptor was last refreshed. | Warning: 60000.0, Critical: 120000.0 | scm_descriptor_age_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | servicemonitor_fd_thresholds | false | |
Heap Size Thresholds | The health test thresholds for the heap used. | Warning: 90.0 %, Critical: 95.0 % | servicemonitor_heap_size_thresholds | false | |
Service Monitor Host Health Test | When computing the overall Service Monitor health, consider the host's health. | true | servicemonitor_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | servicemonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | servicemonitor_pause_duration_window | false | |
Service Monitor Role Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Service Monitor role pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | servicemonitor_role_pipeline_thresholds | false | |
Service Monitor Role Pipeline Monitoring Time Period | The time period over which the Service Monitor role pipeline will be monitored for dropped messages. | 5 minute(s) | servicemonitor_role_pipeline_window | false | |
Service Monitor Process Health Test | Enables the health test that the Service Monitor's process state is consistent with the role configuration | true | servicemonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | servicemonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | servicemonitor_web_metric_collection_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false | |
YARN MapReduce Counter Descriptions | This JSON document contains metadata that is used by the Service Monitor's YARN application monitoring feature for YARN-based
MapReduce counter handling. Each counter description has the following fields:
|
[ name: org.apache.hadoop.mapreduce.jobcounter.num_failed_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.num_failed_reduces, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.total_launched_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.total_launched_reduces, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.other_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.data_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.rack_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.slots_millis_maps, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.slots_millis_reduces, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.fallow_slots_millis_maps, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.fallow_slots_millis_reduces, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.mb_millis_maps, units: mb millis , name: org.apache.hadoop.mapreduce.jobcounter.mb_millis_reduces, units: mb millis , name: org.apache.hadoop.mapreduce.jobcounter.vcores_millis_maps, units: vcore millis , name: org.apache.hadoop.mapreduce.jobcounter.vcores_millis_reduces, units: vcore millis , name: org.apache.hadoop.mapreduce.filesystemcounter.file_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.file_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.file_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.file_large_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.file_write_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_large_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_write_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.s3a_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.s3a_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.adl_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.adl_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.map_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.map_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.map_output_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.map_output_materialized_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.split_raw_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.combine_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.combine_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.reduce_input_groups, units: groups , name: org.apache.hadoop.mapreduce.taskcounter.reduce_shuffle_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.reduce_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.reduce_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.spilled_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.shuffled_maps, units: tasks , name: org.apache.hadoop.mapreduce.taskcounter.failed_shuffle, units: failures , name: org.apache.hadoop.mapreduce.taskcounter.merged_map_outputs, units: outputs , name: org.apache.hadoop.mapreduce.taskcounter.gc_time_millis, units: ms , name: org.apache.hadoop.mapreduce.taskcounter.cpu_milliseconds, units: ms , name: org.apache.hadoop.mapreduce.taskcounter.physical_memory_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.virtual_memory_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.committed_heap_bytes, units: bytes , attributeName: shuffle_errors_bad_id, name: shuffle_errors.bad_id, units: errors , attributeName: shuffle_errors_connection, name: shuffle_errors.connection, units: errors , attributeName: shuffle_errors_io, name: shuffle_errors.io_error, units: errors , attributeName: shuffle_errors_wrong_length, name: shuffle_errors.wrong_length, units: errors , attributeName: shuffle_errors_wrong_map, name: shuffle_errors.wrong_map, units: errors , attributeName: shuffle_errors_wrong_reduce, name: shuffle_errors.wrong_reduce, units: errors , name: org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter.bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.lib.output.fileoutputformatcounter.bytes_written, units: bytes ] | yarn_application_mapreduce_counters | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Use the Authentication Service to enable Single Sign On | Use the Authentication Service to enable Single Sign On for the Firehose debug servers. Requires a running Authentication Service. | debug.servlet.auth.enabled | false | debug_servlet_auth_enabled | false |
Impala Storage | The approximate amount of disk space dedicated to storing Impala query data. Once the store has reached its maximum size, older data is deleted to make room for newer queries. The disk usage is approximate because data is deleted only when the limit is reached. | firehose_impala_storage_bytes | 1 GiB | firehose_impala_storage_bytes | false |
Reports Time-series Storage | The approximate amount of disk space dedicated to storing time series for reporting data. Once the store has reached its maximum size, older data is deleted to make room for newer data. The disk usage is approximate because data is deleted only when the limit is reached. See the "Disk Usage" tab on the Service Monitor page for more information on how space is consumed in the Service Monitor. This tab also shows information about the amount of data retained and the time window covered by each data granularity. | firehose_reports_storage_bytes | 1 GiB | firehose_reports_storage_bytes | false |
Service Monitor Storage Directory | The directory where Service Monitor data is stored. The Service Monitor stores metric time series and health information, as well as Impala query and YARN application metadata if Impala and/or YARN are configured. | firehose.storage.base.directory | /var/lib/cloudera-service-monitor | firehose_storage_dir | true |
Time-Series Storage | The approximate amount of disk space dedicated to storing time series and health data. Once the store has reached its maximum size, older data is deleted to make room for newer data. The disk usage is approximate because data is deleted only when the limit is reached.Note that Cloudera Manager stores time-series data at a number of different data granularities, and these granularities have different effective retention periods. Specifically, Cloudera Manager stores metric data as both raw data points and ten-minutely, hourly, six-hourly, daily, and weekly summary data points. Raw data consumes the bulk of the allocated storage space, weekly summaries the least. As such, raw data is retained for the shortest amount of time, while weekly summary points are unlikely to ever be deleted.See the "Storage" tab on the 'Service Monitor' -> 'Charts Library' -> 'Service Monitor Storgae' page for more information on how space is consumed within the Service Monitor. This tab also shows information about the amount of data retained and time window covered by each data granularity. | firehose_time_series_storage_bytes | 10 GiB | firehose_time_series_storage_bytes | false |
YARN Storage | The approximate amount of disk space dedicated to storing YARN application data. Once the store has reached its maximum size, older data is deleted to make room for newer applications. The disk usage is approximate because data is deleted only when the limit is reached. | firehose_yarn_storage_bytes | 1 GiB | firehose_yarn_storage_bytes | false |
Health Event Startup Policy | This setting controls whether health events are emitted when this monitoring role is started. If set to "none", then no health events are emitted. If set to "bad" then health events are emitted for subjects with bad or concerning health. If set to "all" then health events are emitted for all subjects for all health values. The default is "bad". | health.event.publish.startup.policy | bad | health_event_publish_startup_policy | false |
Descriptor Fetch Tries Interval | The interval between fetch tries for SCM descriptor when Cloudera Management Service roles are starting. | mgmt.descriptor.fetch.frequency | 2 second(s) | mgmt_descriptor_fetch_frequency | true |
Descriptor Fetch Max Tries | Maximum number of tries to fetch SCM descriptor when Cloudera Management Service roles are starting. If the roles are not able to get the descriptor in these many tries, then they exit. | mgmt.num.descriptor.fetch.tries | 5 | mgmt_num_descriptor_fetch_tries | true |
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | health.event.publish.log.suppress.window.ms | 1 minute(s) | svcmon_event_publication_log_suppress_window | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Service Monitor Web UI Port | Port for Service Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8086 | firehose_debug_port | false |
Service Monitor Web UI HTTPS Port | Port for Service Monitor's HTTPS Debug page. | debug.servlet.https.port | 9086 | firehose_debug_tls_port | false |
Service Monitor Listen Port | Port where Service Monitor is listening for agent messages. | firehose.server.port | 9997 | firehose_listen_port | false |
Service Monitor Nozzle Port | Port where Service Monitor's query API is exposed. | nozzle.server.port | 9996 | firehose_nozzle_port | false |
Bind Service Monitor to Wildcard Address | If enabled, the Service Monitor binds to the wildcard address ("0.0.0.0") on all of its ports. | false | smon_bind_wildcard | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Service Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Maximum Non-Java Memory of Service Monitor | The amount of memory the Service Monitor can use off of the Java heap. | firehose_non_java_memory_bytes | 2 GiB | firehose_non_java_memory_bytes | false |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Role-Specific Kerberos Principal | Kerberos principal used by the Service Monitor roles. | hue | kerberos_role_princ_name | true | |
Enable TLS/SSL for Firehose Debug Server | Encrypt communication between clients and Firehose Debug Server using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). | debug.servlet.https.enabled | false | ssl_enabled | false |
Firehose Debug Server TLS/SSL Server JKS Keystore File Location | The path to the TLS/SSL keystore file containing the server certificate and private key used for TLS/SSL. Used when Firehose Debug Server is acting as a TLS/SSL server. The keystore must be in JKS format. | debug.servlet.https.keystorePath | ssl_server_keystore_location | false | |
Firehose Debug Server TLS/SSL Server JKS Keystore File Password | The password for the Firehose Debug Server JKS keystore file. | debug.servlet.https.keystorePassword | ssl_server_keystore_password | false |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Java Configuration Options for Service Monitor | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Service Monitor parameter. | false | role_config_suppression_firehose_java_opts | true | |
Suppress Parameter Validation: Service Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf parameter. | false | role_config_suppression_firehose_safety_valve | true | |
Suppress Configuration Validator: Service Monitor Heap Size Validator | Whether to suppress configuration warnings produced by the Service Monitor Heap Size Validator configuration validator. | false | role_config_suppression_firehose_service_monitor_heap_role_validator | true | |
Suppress Configuration Validator: Service Monitor Off Heap Memory Size Validator | Whether to suppress configuration warnings produced by the Service Monitor Off Heap Memory Size Validator configuration validator. | false | role_config_suppression_firehose_service_monitor_non_java_memory_role_validator | true | |
Suppress Parameter Validation: Service Monitor Storage Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Storage Directory parameter. | false | role_config_suppression_firehose_storage_dir | true | |
Suppress Parameter Validation: Role-Specific Kerberos Principal | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role-Specific Kerberos Principal parameter. | false | role_config_suppression_kerberos_role_princ_name | true | |
Suppress Parameter Validation: Service Monitor Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Service Monitor Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Service Monitor Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_servicemonitor_role_env_safety_valve | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Location parameter. | false | role_config_suppression_ssl_server_keystore_location | true | |
Suppress Parameter Validation: Firehose Debug Server TLS/SSL Server JKS Keystore File Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Firehose Debug Server TLS/SSL Server JKS Keystore File Password parameter. | false | role_config_suppression_ssl_server_keystore_password | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Parameter Validation: YARN MapReduce Counter Descriptions | Whether to suppress configuration warnings produced by the built-in parameter validation for the YARN MapReduce Counter Descriptions parameter. | false | role_config_suppression_yarn_application_mapreduce_counters | true | |
Suppress Health Test: Metrics Aggregation Run Duration Test | Whether to suppress the results of the Metrics Aggregation Run Duration Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_aggregation_run_duration | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_audit_health | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_file_descriptor | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_heap_dump_directory_free_space | true | |
Suppress Health Test: Heap Size | Whether to suppress the results of the Heap Size heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_heap_size | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_host_health | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_log_directory_free_space | true | |
Suppress Health Test: Cloudera Manager Metric Schema Age | Whether to suppress the results of the Cloudera Manager Metric Schema Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_metric_schema_fetch | true | |
Suppress Health Test: Pause Duration | Whether to suppress the results of the Pause Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_pause_duration | true | |
Suppress Health Test: Role Pipeline | Whether to suppress the results of the Role Pipeline heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_role_pipeline | true | |
Suppress Health Test: Cloudera Manager Descriptor Age | Whether to suppress the results of the Cloudera Manager Descriptor Age heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_scm_descriptor_fetch | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_scm_health | true | |
Suppress Health Test: Service Monitor Storage Directory Free Space | Whether to suppress the results of the Service Monitor Storage Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_storage_directory_free_space | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_unexpected_exits | true | |
Suppress Health Test: Web Server Status | Whether to suppress the results of the Web Server Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_service_monitor_web_metric_collection | true |
Service-Wide
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Cloudera Management Service Service Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of all roles in this service except client configuration. | mgmt_service_env_safety_valve | false | ||
Cloudera Management Service Advanced Configuration Snippet (Safety Valve) for ssl-client.xml | For advanced use only, a string to be inserted into ssl-client.xml. This setting currently applies to the Reports Manager only. | mgmt_ssl_client_safety_valve | false | ||
Small Files Reporting: HDFS Service for Data Staging | Data collection for small files analysis requires a data staging area in HDFS. If you enable data collection for small files reporting, this property sets which HDFS service stages the data. | nav.smallfiles.hdfs.staging.service.name | navigator_small_files_staging_hdfs_service_name | false | |
Small Files Reporting: Enable Data Collection | When Small Files Reporting is enabled, Navigator passes additional metadata to the Telemetry Publisher so the data can be used by Cloudera Workload XM (WXM). This additional data allows WXM to identify Impala query performance issues caused when data is organized into small files in HDFS. Enable this option only when Telemetry Publisher is enabled. | nav.smallfiles.reporting.enabled | false | navigator_smallfiles_enabled | true |
Small Files Reporting: HDFS Staging Location | Data collection for small files analysis requires a data staging area in HDFS. If you enable data collection for small files reporting, this property sets the HDFS location where Small Files Reporting data is staged. If the directory doesn't already exist, Navigator creates it using the same credentials it uses for HDFS extraction from this service. | nav.smallfiles.hdfs.staging.root.path | /user/cloudera/navigator/smallfiles | navigator_smallfiles_hdfs_path | false |
System Group | The group that this service's processes should run as. | cloudera-scm | process_groupname | true | |
System User | The user that this service's processes should run as. | cloudera-scm | process_username | true |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Log Event Capture | When set, each role identifies important log events and forwards them to Cloudera Manager. | true | catch_events | false | |
Enable Service Level Health Alerts | When set, Cloudera Manager will send alerts when the health of this service reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | false | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Log Event Retry Frequency | The frequency in which the log4j event publication appender will retry sending undelivered log events to the Event server, in seconds | 30 | log_event_retry_frequency | false | |
Activity Monitor Role Health Test | When computing the overall MGMT health, consider Activity Monitor's health | true | mgmt_activitymonitor_health_enabled | false | |
Alert Publisher Role Health Test | When computing the overall MGMT health, consider Alert Publisher's health | true | mgmt_alertpublisher_health_enabled | false | |
Cloudera Manager Server Clock Offset Thresholds | The health test thresholds for monitoring the clock offset between the Cloudera Manager Server and the Service Monitor. | Warning: 30 second(s), Critical: 1 minute(s) | mgmt_clock_offset_with_smon_thresholds | false | |
Command Storage Directory Free Space Monitoring Thresholds | The health test thresholds for monitoring the free space on the filesystem that contains the Cloudera Manager Server command storage directory. | Warning: 2 GiB, Critical: 1 GiB | mgmt_command_storage_directory_free_space_absolute_thresholds | false | |
Embedded Database Free Space Monitoring Thresholds | The health test thresholds for monitoring the free space on the volume for the embedded PostgreSQL database optionally running on the Cloudera Manager Server. If the embedded database is not in use, this has no effect. | Warning: 2 GiB, Critical: 1 GiB | mgmt_embedded_database_free_space_absolute_thresholds | false | |
Event Server Role Health Test | When computing the overall MGMT health, consider Event Server's health | true | mgmt_eventserver_health_enabled | false | |
Host Monitor Role Health Test | When computing the overall MGMT health, consider Host Monitor's health | true | mgmt_hostmonitor_health_enabled | false | |
Navigator Audit Server Role Health Test | When computing the overall MGMT health, consider Navigator Audit Server's health | true | mgmt_navigator_health_enabled | false | |
Navigator Metadata Server Role Health Test | When computing the overall MGMT health, consider Navigator Metadata Server's health | true | mgmt_navigatormetaserver_health_enabled | false | |
Reports Manager Role Health Test | When computing the overall MGMT health, consider Reports Manager's health | true | mgmt_reportsmanager_health_enabled | false | |
Service Monitor Role Health Test | When computing the overall MGMT health, consider Service Monitor's health | true | mgmt_servicemonitor_health_enabled | false | |
Telemetry Publisher Role Health Test | When computing the overall MGMT health, consider Telemetry Publisher's health | true | mgmt_telemetrypublisher_health_enabled | false | |
Service Triggers | The configured triggers for this service. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | service_triggers | true | |
Service Monitor Derived Configs Advanced Configuration Snippet (Safety Valve) | For advanced use only, a list of derived configuration properties that will be used by the Service Monitor instead of the default ones. | smon_derived_configs_safety_valve | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Emit Sensitive Data In Stderr | If set, sensitive data, like passwords, are emitted to stderr. | false | mgmt_emit_sensitive_data_in_stderr | true | |
Minimum Kerberos Ticket Validity Period | The minimum Kerberos ticket validity period. The Cloudera Management Servies attempt to log in again only after this minimum period of time has elapsed. | tgt.login.validity.period | 1 hour(s) | tgt_login_validity_period | false |
Publishing
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Kafka Service | The Kafka service where Navigator will publish audit events. | navigator_kafka_publishing_service | false |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
TLS/SSL Client Truststore File Location | Path to the client truststore file used in HTTPS communication. This truststore contains certificates of trusted servers, or of Certificate Authorities trusted to identify servers. If set, this is used to verify certificates in HTTPS communication with CDH services and the Cloudera Manager Server. If not set, the default Java truststore is used to verify certificates. The contents of this truststore can be modified without restarting the Cloudera Management Service roles. By default, changes to its contents are picked up within ten seconds. | ssl.client.truststore.location | ssl_client_truststore_location | false | |
Cloudera Manager Server TLS/SSL Client Trust Store Password | The password for the Cloudera Manager Server TLS/SSL Certificate Trust Store File. This password is not required to access the trust store; this field can be left blank. This password provides optional integrity checking of the file. The contents of trust stores are certificates, and certificates are public information. | ssl.client.truststore.password | ssl_client_truststore_password | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: Activity Monitor Count Validator | Whether to suppress configuration warnings produced by the Activity Monitor Count Validator configuration validator. | false | service_config_suppression_activitymonitor_count_validator | true | |
Suppress Configuration Validator: Alert Publisher Count Validator | Whether to suppress configuration warnings produced by the Alert Publisher Count Validator configuration validator. | false | service_config_suppression_alertpublisher_count_validator | true | |
Suppress Configuration Validator: Event Server Count Validator | Whether to suppress configuration warnings produced by the Event Server Count Validator configuration validator. | false | service_config_suppression_eventserver_count_validator | true | |
Suppress Configuration Validator: Host Monitor Count Validator | Whether to suppress configuration warnings produced by the Host Monitor Count Validator configuration validator. | false | service_config_suppression_hostmonitor_count_validator | true | |
Suppress Configuration Validator: Cloudera Management Service Host Colocation Validator | Whether to suppress configuration warnings produced by the Cloudera Management Service Host Colocation Validator configuration validator. | false | service_config_suppression_mgmt_colocation_validator | true | |
Suppress Parameter Validation: Cloudera Management Service Service Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Cloudera Management Service Service Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | service_config_suppression_mgmt_service_env_safety_valve | true | |
Suppress Parameter Validation: Cloudera Management Service Advanced Configuration Snippet (Safety Valve) for ssl-client.xml | Whether to suppress configuration warnings produced by the built-in parameter validation for the Cloudera Management Service Advanced Configuration Snippet (Safety Valve) for ssl-client.xml parameter. | false | service_config_suppression_mgmt_ssl_client_safety_valve | true | |
Suppress Configuration Validator: Navigator Audit Server Count Validator | Whether to suppress configuration warnings produced by the Navigator Audit Server Count Validator configuration validator. | false | service_config_suppression_navigator_count_validator | true | |
Suppress Parameter Validation: Small Files Reporting: HDFS Staging Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the Small Files Reporting: HDFS Staging Location parameter. | false | service_config_suppression_navigator_smallfiles_hdfs_path | true | |
Suppress Configuration Validator: Navigator Metadata Server Count Validator | Whether to suppress configuration warnings produced by the Navigator Metadata Server Count Validator configuration validator. | false | service_config_suppression_navigatormetaserver_count_validator | true | |
Suppress Parameter Validation: System Group | Whether to suppress configuration warnings produced by the built-in parameter validation for the System Group parameter. | false | service_config_suppression_process_groupname | true | |
Suppress Parameter Validation: System User | Whether to suppress configuration warnings produced by the built-in parameter validation for the System User parameter. | false | service_config_suppression_process_username | true | |
Suppress Configuration Validator: Reports Manager Count Validator | Whether to suppress configuration warnings produced by the Reports Manager Count Validator configuration validator. | false | service_config_suppression_reportsmanager_count_validator | true | |
Suppress Parameter Validation: Service Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Triggers parameter. | false | service_config_suppression_service_triggers | true | |
Suppress Configuration Validator: Service Monitor Count Validator | Whether to suppress configuration warnings produced by the Service Monitor Count Validator configuration validator. | false | service_config_suppression_servicemonitor_count_validator | true | |
Suppress Parameter Validation: Service Monitor Derived Configs Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Service Monitor Derived Configs Advanced Configuration Snippet (Safety Valve) parameter. | false | service_config_suppression_smon_derived_configs_safety_valve | true | |
Suppress Parameter Validation: TLS/SSL Client Truststore File Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the TLS/SSL Client Truststore File Location parameter. | false | service_config_suppression_ssl_client_truststore_location | true | |
Suppress Parameter Validation: Cloudera Manager Server TLS/SSL Client Trust Store Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Cloudera Manager Server TLS/SSL Client Trust Store Password parameter. | false | service_config_suppression_ssl_client_truststore_password | true | |
Suppress Configuration Validator: Telemetry Publisher Count Validator | Whether to suppress configuration warnings produced by the Telemetry Publisher Count Validator configuration validator. | false | service_config_suppression_telemetrypublisher_count_validator | true | |
Suppress Health Test: Activity Monitor Health | Whether to suppress the results of the Activity Monitor Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_activity_monitor_health | true | |
Suppress Health Test: Alert Publisher Health | Whether to suppress the results of the Alert Publisher Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_alert_publisher_health | true | |
Suppress Health Test: Cloudera Manager Server Clock Offset | Whether to suppress the results of the Cloudera Manager Server Clock Offset heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_clock_offset_with_smon | true | |
Suppress Health Test: Command Storage Directory Free Space | Whether to suppress the results of the Command Storage Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_command_storage_directory_free_space | true | |
Suppress Health Test: Embedded Database Free Space | Whether to suppress the results of the Embedded Database Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_embedded_db_free_space | true | |
Suppress Health Test: Event Server Health | Whether to suppress the results of the Event Server Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_event_server_health | true | |
Suppress Health Test: Host Monitor Health | Whether to suppress the results of the Host Monitor Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_host_monitor_health | true | |
Suppress Health Test: Navigator Audit Server Health | Whether to suppress the results of the Navigator Audit Server Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_navigator_health | true | |
Suppress Health Test: Navigator Metadata Server Health | Whether to suppress the results of the Navigator Metadata Server Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_navigatormetaserver_health | true | |
Suppress Health Test: Reports Manager Health | Whether to suppress the results of the Reports Manager Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_reports_manager_health | true | |
Suppress Health Test: Service Monitor Health | Whether to suppress the results of the Service Monitor Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_service_monitor_health | true | |
Suppress Health Test: Telemetry Publisher Health | Whether to suppress the results of the Telemetry Publisher Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | service_health_suppression_mgmt_telemetrypublisher_health | true |
Telemetry Publisher
Categories:
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Telemetry Publisher Export Period | The export period in seconds. | export.period | 1 minute(s) | export_period | true |
Telemetry Publisher Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Telemetry Publisher Data Directory | Storage for tracking persistent state of the role. | data.dir | /var/lib/cloudera-scm-telemetrypublisher | mgmt_data_dir | false |
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it does not exist. If this directory already exists, role user must have write access to this directory. If this directory is shared among multiple roles, it should have 1777 permissions. The heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | oom_heap_dump_dir | /tmp | oom_heap_dump_dir | false |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | true | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Telemetry Publisher Polling Period | The extractor polling period in seconds. | extractor.poll_period | 1 minute(s) | poll_period | true |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Enable Metric Collection | Cloudera Manager agent monitors each service and each of its role by publishing metrics to the Cloudera Manager Service Monitor. Setting it to false will stop Cloudera Manager agent from publishing any metric for corresponding service/roles. This is usually helpful for services that generate large amount of metrics which Service Monitor is not able to process. | true | process_should_monitor | true | |
Java Configuration Options for Telemetry Publisher | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags, PermGen, or extra debugging flags would be passed here. | telemetrypublisher_java_opts | false | ||
Log and Query Redaction | Telemetry Publisher recommends and by default requires that Log and Query Redaction be enabled for all CDH clusters. If disabled for any cluster, an alert will be raised during role start. Disable this setting to allow running without redaction. | log_query_redaction | true | telemetrypublisher_log_query_redaction | true |
Proxy Support for Telemetry Publisher | When set, Telemetry Publisher sends telemetry through a proxy server. | telemetrypublisher.proxy.enabled | false | telemetrypublisher_proxy_enabled | false |
Proxy Password | Proxy Server Password. This configuration is used only when proxy support is enabled for Telemetry Publisher. | telemetrypublisher.proxy.password | telemetrypublisher_proxy_password | false | |
Proxy Port | Proxy Server Port. This configuration is used only when proxy support is enabled for Telemetry Publisher. | telemetrypublisher.proxy.port | telemetrypublisher_proxy_port | false | |
Proxy Server | Proxy Server Hostname. This configuration is used only when proxy support is enabled for Telemetry Publisher. | telemetrypublisher.proxy.server | telemetrypublisher_proxy_server | false | |
Proxy User | Proxy Server User. This configuration is used only when proxy support is enabled for Telemetry Publisher. | telemetrypublisher.proxy.user | telemetrypublisher_proxy_user | false | |
Telemetry Publisher Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of this role except client configuration. | TELEMETRYPUBLISHER_role_env_safety_valve | false | ||
Telemetry Publisher Advanced Configuration Snippet (Safety Valve) for telemetrypublisher.conf | For advanced use only. A string to be inserted into telemetrypublisher.conf for this role only. | telemetrypublisher_safety_valve | false | ||
Telemetry Publisher Thread Pool Size | The number of parallel threads used for extractor task execution. | extractor.thread_pool_size | 10 | thread_pool_size | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Telemetry Publisher Logging Threshold | The minimum log level for Telemetry Publisher logs | INFO | log_threshold | false | |
Telemetry Publisher Maximum Log File Backups | The maximum number of rolled log files to keep for Telemetry Publisher logs. Typically used by log4j or logback. | 10 | max_log_backup_index | false | |
Telemetry Publisher Max Log Size | The maximum size, in megabytes, per log file for Telemetry Publisher logs. Typically used by log4j or logback. | 200 MiB | max_log_size | false | |
Telemetry Publisher Log Directory | Directory where Telemetry Publisher will place its log files. | /var/log/cloudera-scm-telemetrypublisher | mgmt_log_dir | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Heap Dump Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. | Warning: 10 GiB, Critical: 5 GiB | heap_dump_directory_free_space_absolute_thresholds | false | |
Heap Dump Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's heap dump directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Heap Dump Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | heap_dump_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules that govern how log messages are turned into events by the custom log4j appender that this role
loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. If a
log message matches multiple rules, the first matching rule is used.. Each rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold: FATAL , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Instead, use .* , alert: false, rate: 0, threshold: WARN, content: .* is deprecated. Use .* instead , alert: false, rate: 1, periodminutes: 2, exceptiontype: .* , alert: false, rate: 1, periodminutes: 1, threshold: WARN ] | log_event_whitelist | false | |
Process Swap Memory Thresholds | The health test thresholds on the swap memory usage of the process. | Warning: Any, Critical: Never | process_swap_memory_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON-formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has the following fields:
|
[] | role_triggers | true | |
Telemetry Publisher Data Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Telemetry Publisher Data Directory. | Warning: 10 GiB, Critical: 5 GiB | telemetrypublisher_data_directory_free_space_absolute_thresholds | false | |
Telemetry Publisher Data Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's Telemetry Publisher Data Directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Telemetry Publisher Data Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | telemetrypublisher_data_directory_free_space_percentage_thresholds | false | |
Metrics Data Export Failure Thresholds | The health test thresholds for monitoring the data export failure count. | Warning: 3.0 time(s), Critical: 5.0 time(s) | telemetrypublisher_data_export_failure_thresholds | true | |
Telemetry Publisher Data Export Monitoring Time Period | The time period over which the telemetry publisher data export for streams will be monitored for failed export. | 5 minute(s) | telemetrypublisher_data_export_failure_window | true | |
Metrics Data Ingest Failure Thresholds | The health test thresholds for monitoring the data ingest failure count. | Warning: 3.0 time(s), Critical: 5.0 time(s) | telemetrypublisher_data_ingest_failure_thresholds | true | |
Telemetry Publisher Data Ingest Monitoring Time Period | The time period over which the telemetry publisher data ingest for streams will be monitored for failed injest. | 5 minute(s) | telemetrypublisher_data_ingest_failure_window | true | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | telemetrypublisher_fd_thresholds | false | |
Garbage Collection Duration Thresholds | The health test thresholds for the weighted average time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | telemetrypublisher_gc_duration_thresholds | false | |
Garbage Collection Duration Monitoring Period | The period to review when computing the moving average of garbage collection time. | 5 minute(s) | telemetrypublisher_gc_duration_window | false | |
Telemetry Publisher Host Health Test | When computing the overall Telemetry Publisher health, consider the host's health. | true | telemetrypublisher_host_health_enabled | false | |
Telemetry Publisher Process Health Test | Enables the health test that the Telemetry Publisher's process state is consistent with the role configuration | true | telemetrypublisher_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | telemetrypublisher_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | telemetrypublisher_web_metric_collection_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Telemetry Publisher Web UI IPaddress. | The IP where Telemetry Publisher starts a debug web server. | telemetry_publisher.debug.server.interface | 0.0.0.0 | telemetry_publisher_debug_server_interface | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Telemetry Publisher Web UI Port. | The port where Telemetry Publisher starts a debug web server. Set to -1 to disable debug server. | telemetry_publisher.debug.port | 10111 | telemetry_publisher_debug_port | false |
Telemetry Publisher Server Port | The port where Telemetry Publisher listens for requests | telemetry_publisher.server.port | 10110 | telemetry_publisher_server_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Java Heap Size of TelemetryPublisher in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | telemetry_publisher_heapsize | false |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Telemetry Kerberos Principal | Kerberos principal used by Telemetry Publisher to authenticate to all services except HDFS. Note: Telemetry should use the principal used by Hue service if you are using MapReduce1 service in any of the clusters. | hue | kerberos_role_princ_name | true | |
Enable TLS/SSL for Telemetry Publisher | Encrypt communication between clients and Telemetry Publisher using Transport Layer Security (TLS) (formerly known as Secure Socket Layer (SSL)). | telemetrypublisher.http.enable_ssl | false | ssl_enabled | false |
Telemetry Publisher TLS/SSL Server JKS Keystore Key Password | The password that protects the private key contained in the JKS keystore used when Telemetry Publisher is acting as a TLS/SSL server. | telemetrypublisher.ssl.keyManagerPassword | ssl_server_keystore_keypassword | false | |
Telemetry Publisher TLS/SSL Server JKS Keystore File Location | The path to the TLS/SSL keystore file containing the server certificate and private key used for TLS/SSL. Used when Telemetry Publisher is acting as a TLS/SSL server. The keystore must be in JKS format. | telemetrypublisher.ssl.keyStore | ssl_server_keystore_location | false | |
Telemetry Publisher TLS/SSL Server JKS Keystore File Password | The password for the Telemetry Publisher JKS keystore file. | telemetrypublisher.ssl.keyStorePassword | ssl_server_keystore_password | false | |
Telemetry Kerberos Principal for HDFS | Kerberos principal used by Telemetry Publisher to authenticate to HDFS services. Note: This principal must be in the same groups as the principals used by Job History and Spark History Servers. | telemetrypublisher.dfs.user | hdfs | tp_hdfs_kerberos_princ | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that is retained. After the retention limit is reached, the oldest data is deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs are placed. If not set, stacks are logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks are collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected, that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
Suppressions
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Suppress Configuration Validator: CDH Version Validator | Whether to suppress configuration warnings produced by the CDH Version Validator configuration validator. | false | role_config_suppression_cdh_version_validator | true | |
Suppress Parameter Validation: Telemetry Kerberos Principal | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Kerberos Principal parameter. | false | role_config_suppression_kerberos_role_princ_name | true | |
Suppress Parameter Validation: Telemetry Publisher Logging Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Logging Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_log4j_safety_valve | true | |
Suppress Parameter Validation: Rules to Extract Events from Log Files | Whether to suppress configuration warnings produced by the built-in parameter validation for the Rules to Extract Events from Log Files parameter. | false | role_config_suppression_log_event_whitelist | true | |
Suppress Parameter Validation: Telemetry Publisher Data Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Data Directory parameter. | false | role_config_suppression_mgmt_data_dir | true | |
Suppress Parameter Validation: Telemetry Publisher Log Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Log Directory parameter. | false | role_config_suppression_mgmt_log_dir | true | |
Suppress Parameter Validation: Heap Dump Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Heap Dump Directory parameter. | false | role_config_suppression_oom_heap_dump_dir | true | |
Suppress Parameter Validation: Role Triggers | Whether to suppress configuration warnings produced by the built-in parameter validation for the Role Triggers parameter. | false | role_config_suppression_role_triggers | true | |
Suppress Parameter Validation: Telemetry Publisher TLS/SSL Server JKS Keystore Key Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher TLS/SSL Server JKS Keystore Key Password parameter. | false | role_config_suppression_ssl_server_keystore_keypassword | true | |
Suppress Parameter Validation: Telemetry Publisher TLS/SSL Server JKS Keystore File Location | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher TLS/SSL Server JKS Keystore File Location parameter. | false | role_config_suppression_ssl_server_keystore_location | true | |
Suppress Parameter Validation: Telemetry Publisher TLS/SSL Server JKS Keystore File Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher TLS/SSL Server JKS Keystore File Password parameter. | false | role_config_suppression_ssl_server_keystore_password | true | |
Suppress Parameter Validation: Stacks Collection Directory | Whether to suppress configuration warnings produced by the built-in parameter validation for the Stacks Collection Directory parameter. | false | role_config_suppression_stacks_collection_directory | true | |
Suppress Parameter Validation: Telemetry Publisher Web UI IPaddress. | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Web UI IPaddress. parameter. | false | role_config_suppression_telemetry_publisher_debug_server_interface | true | |
Suppress Parameter Validation: Java Configuration Options for Telemetry Publisher | Whether to suppress configuration warnings produced by the built-in parameter validation for the Java Configuration Options for Telemetry Publisher parameter. | false | role_config_suppression_telemetrypublisher_java_opts | true | |
Suppress Parameter Validation: Proxy Password | Whether to suppress configuration warnings produced by the built-in parameter validation for the Proxy Password parameter. | false | role_config_suppression_telemetrypublisher_proxy_password | true | |
Suppress Parameter Validation: Proxy Server | Whether to suppress configuration warnings produced by the built-in parameter validation for the Proxy Server parameter. | false | role_config_suppression_telemetrypublisher_proxy_server | true | |
Suppress Parameter Validation: Proxy User | Whether to suppress configuration warnings produced by the built-in parameter validation for the Proxy User parameter. | false | role_config_suppression_telemetrypublisher_proxy_user | true | |
Suppress Parameter Validation: Telemetry Publisher Environment Advanced Configuration Snippet (Safety Valve) | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Environment Advanced Configuration Snippet (Safety Valve) parameter. | false | role_config_suppression_telemetrypublisher_role_env_safety_valve | true | |
Suppress Parameter Validation: Telemetry Publisher Advanced Configuration Snippet (Safety Valve) for telemetrypublisher.conf | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Publisher Advanced Configuration Snippet (Safety Valve) for telemetrypublisher.conf parameter. | false | role_config_suppression_telemetrypublisher_safety_valve | true | |
Suppress Parameter Validation: Telemetry Kerberos Principal for HDFS | Whether to suppress configuration warnings produced by the built-in parameter validation for the Telemetry Kerberos Principal for HDFS parameter. | false | role_config_suppression_tp_hdfs_kerberos_princ | true | |
Suppress Health Test: Data Export Test For Stream Hive-Query-Audits | Whether to suppress the results of the Data Export Test For Stream Hive-Query-Audits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_hive__query__audits_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Hive-Query-Audits | Whether to suppress the results of the Data Ingest Test For Stream Hive-Query-Audits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_hive__query__audits_data_ingest_failure | true | |
Suppress Health Test: Data Export Test For Stream Impala-Query-Profile | Whether to suppress the results of the Data Export Test For Stream Impala-Query-Profile heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_impala__query__profile_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Impala-Query-Profile | Whether to suppress the results of the Data Ingest Test For Stream Impala-Query-Profile heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_impala__query__profile_data_ingest_failure | true | |
Suppress Health Test: Data Export Test For Stream Oozie-Workflows | Whether to suppress the results of the Data Export Test For Stream Oozie-Workflows heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_oozie__workflows_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Oozie-Workflows | Whether to suppress the results of the Data Ingest Test For Stream Oozie-Workflows heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_oozie__workflows_data_ingest_failure | true | |
Suppress Health Test: Data Export Test For Stream Spark2_on_yarn-Event-Log | Whether to suppress the results of the Data Export Test For Stream Spark2_on_yarn-Event-Log heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_spark2_on_yarn__event__log_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Spark2_on_yarn-Event-Log | Whether to suppress the results of the Data Ingest Test For Stream Spark2_on_yarn-Event-Log heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_spark2_on_yarn__event__log_data_ingest_failure | true | |
Suppress Health Test: Audit Pipeline Test | Whether to suppress the results of the Audit Pipeline Test heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_audit_health | true | |
Suppress Health Test: Telemetry Publisher Data Directory Free Space | Whether to suppress the results of the Telemetry Publisher Data Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_data_directory_free_space | true | |
Suppress Health Test: File Descriptors | Whether to suppress the results of the File Descriptors heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_file_descriptor | true | |
Suppress Health Test: GC Duration | Whether to suppress the results of the GC Duration heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_gc_duration | true | |
Suppress Health Test: Heap Dump Directory Free Space | Whether to suppress the results of the Heap Dump Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_heap_dump_directory_free_space | true | |
Suppress Health Test: Host Health | Whether to suppress the results of the Host Health heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_host_health | true | |
Suppress Health Test: Log Directory Free Space | Whether to suppress the results of the Log Directory Free Space heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_log_directory_free_space | true | |
Suppress Health Test: Process Status | Whether to suppress the results of the Process Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_scm_health | true | |
Suppress Health Test: Swap Memory Usage | Whether to suppress the results of the Swap Memory Usage heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_swap_memory_usage | true | |
Suppress Health Test: Unexpected Exits | Whether to suppress the results of the Unexpected Exits heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_unexpected_exits | true | |
Suppress Health Test: Web Server Status | Whether to suppress the results of the Web Server Status heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_telemetrypublisher_web_metric_collection | true | |
Suppress Health Test: Data Export Test For Stream Yarn-Apps | Whether to suppress the results of the Data Export Test For Stream Yarn-Apps heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_yarn__apps_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Yarn-Apps | Whether to suppress the results of the Data Ingest Test For Stream Yarn-Apps heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_yarn__apps_data_ingest_failure | true | |
Suppress Health Test: Data Export Test For Stream Yarn-Jhist | Whether to suppress the results of the Data Export Test For Stream Yarn-Jhist heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_yarn__jhist_data_export_failure | true | |
Suppress Health Test: Data Ingest Test For Stream Yarn-Jhist | Whether to suppress the results of the Data Ingest Test For Stream Yarn-Jhist heath test. The results of suppressed health tests are ignored when computing the overall health of the associated host, role or service, so suppressed health tests will not generate alerts. | false | role_health_suppression_yarn__jhist_data_ingest_failure | true |