Cloudera Management Service
activitymonitordefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | activityevents.event.publish.queue.max | 20000 | actmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | activityevents.event.publish.retry.ms | 5000 | actmon_event_publication_retry_period | true |
Java Configuration Options for Activity Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Activity Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only, a string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Activity Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true |
Database
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Database Hostname | Name of host where Activity Monitor's database is running. It is highly recommended that this database is on the same host as the Activity Monitor. If the database is not running on its default port, specify the port number using this syntax: 'host:port' | localhost | firehose_database_host | false | |
Activity Monitor Database Name | Name of the Activity Monitor's database. | firehose_database_name | true | ||
Activity Monitor Database Password | Password for logging in to the Activity Monitor database | db.hibernate.connection.password | firehose_database_password | false | |
Activity Monitor Database Type | Type of database to use for Activity Monitor. | mysql | firehose_database_type | false | |
Activity Monitor Database Username | Username for logging in to the Activity Monitor database. | db.hibernate.connection.username | firehose_database_user | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Logging Threshold | The minimum log level for Activity Monitor logs | INFO | log_threshold | false | |
Activity Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Activity Monitor logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Activity Monitor Max Log Size | The maximum size, in megabytes, per log file for Activity Monitor logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Activity Monitor Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Activity Monitor activity monitor pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | activitymonitor_activity_monitor_pipeline_thresholds | false | |
Activity Monitor Activity Monitor Pipeline Monitoring Time Period | The time period over which the Activity Monitor activity monitor pipeline will be monitored for dropped messages. | 5 minute(s) | activitymonitor_activity_monitor_pipeline_window | false | |
Activity Monitor Activity Tree Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Activity Monitor activity tree pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | activitymonitor_activity_tree_pipeline_thresholds | false | |
Activity Monitor Activity Tree Pipeline Monitoring Time Period | The time period over which the Activity Monitor activity tree pipeline will be monitored for dropped messages. | 5 minute(s) | activitymonitor_activity_tree_pipeline_window | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | activitymonitor_fd_thresholds | false | |
Activity Monitor Host Health Test | When computing the overall Activity Monitor health, consider the host's health. | true | activitymonitor_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | activitymonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | activitymonitor_pause_duration_window | false | |
Activity Monitor Process Health Test | Enables the health test that the Activity Monitor's process state is consistent with the role configuration | true | activitymonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | activitymonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | activitymonitor_web_metric_collection_thresholds | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has
some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Instead, use .*, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Use .* instead, alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health
system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | activityevents.event.publish.log.suppress.window.ms | 1 minute(s) | actmon_event_publication_log_suppress_window | true |
Purge Activities Data at This Age | In Activity Monitor, purge data about MapReduce jobs and aggregate activities when the data reaches this age in hours. By default, Activity Monitor keeps data about activities for 336 hours (14 days). | firehose.activity.purge.duration.hours | 14 day(s) | firehose_activity_purge_duration_hours | false |
Purge Attempts Data at This Age | In the Activity Monitor, purge data about MapReduce attempts when the data reaches this age in hours. Because attempt data may consume large amounts of database space, you may wish to purge it more frequently than activity data. By default, Activity Monitor keeps data about attempts for 336 hours (14 days). | firehose.attempt.purge.duration.hours | 14 day(s) | firehose_attempt_purge_duration_hours | false |
Activity Monitor Log Directory | Location of log files for Activity Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false | |
Purge MapReduce Service Data at This Age | The number of hours of past service-level data to keep in the Activity Monitor database, such as total slots running. The default is to keep data for 336 hours (14 days). | timeseries.expiration.hours | 14 day(s) | timeseries_expiration_hours | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Activity Monitor Web UI Port | Port for Activity Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8087 | firehose_debug_port | false |
Activity Monitor Listen Port | Port where Activity Monitor is listening for agent messages. | firehose.server.port | 9999 | firehose_listen_port | false |
Activity Monitor Nozzle Port | Port where Activity Monitor's query API is exposed. | nozzle.server.port | 9998 | firehose_nozzle_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Activity Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
alertpublisherdefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Alert Publisher | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | alertpublisher_java_opts | false | ||
Alert Publisher Advanced Configuration Snippet (Safety Valve) for alertpublisher.conf | For advanced use only, a string to be inserted into alertpublisher.conf for this role only. | alertpublisher_safety_valve | false | ||
Alert Publisher Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alert Publisher Logging Threshold | The minimum log level for Alert Publisher logs | INFO | log_threshold | false | |
Alert Publisher Maximum Log File Backups | The maximum number of rolled log files to keep for Alert Publisher logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Alert Publisher Max Log Size | The maximum size, in megabytes, per log file for Alert Publisher logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | alertpublisher_fd_thresholds | false | |
Alert Publisher Host Health Test | When computing the overall Alert Publisher health, consider the host's health. | true | alertpublisher_host_health_enabled | false | |
Alert Publisher Process Health Test | Enables the health test that the Alert Publisher's process state is consistent with the role configuration | true | alertpublisher_scm_health_enabled | false | |
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has
some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health
system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alerts: Enable Email Alerts | This setting allows you to turn email alert delivery on and off. | mailserver.enabled | true | alert_mailserver_enabled | false |
Alert: Mail From Address | The 'From' address to use for alert emails | noreply@localhost | alert_mailserver_from_address | false | |
Alerts: Mail Server Hostname | The IP address or hostname of the mail server to send alerts to | localhost | alert_mailserver_hostname | true | |
Alerts: Mail Server Password | The password to use to log into the mail server. Warning: this password will be sent over the network to the Alert Publisher host in clear text. In addition, the password will be stored in a plain text file on the Alert Publisher host with restrictive file system permissions. | alert_mailserver_password | false | ||
Alerts: Mail Server Protocol | The protocol to use for sending email alerts. | smtp | alert_mailserver_protocol | true | |
Alerts: Mail Message Recipients | A comma-separated list of email addresses to send alerts to | root@localhost | alert_mailserver_recipients | true | |
Alerts: Mail Server Username | The username to use to log into the mail server | alert_mailserver_username | false | ||
Alert Publisher: Maximum Batch Size | The Alert Publisher can be configured to batch multiple alerts into a single email. This setting specifies the maximum number of alerts that will be batched into a single email (regardless of the batch interval). | alert.aggregate.maxSize | 32 | alertpublisher_aggregate_max_size | false |
Alert Publisher: Maximum Batch Interval | The Alert Publisher can be configured to batch multiple alerts into a single email. This setting specifies the maximum amount of time (in milliseconds) that the Alert Publisher waits before sending an email of the current batch. | alert.aggregate.timeout.millis | 1 minute(s) | alertpublisher_aggregate_timeout | false |
Alerts: Email footer | Optional. If not empty, the text entered here will be inserted verbatim as a footer in HTML and plain-text emails. | alert.email.footer | alertpublisher_email_footer | false | |
Alerts: Email header | Optional. If not empty, the text entered here will be inserted verbatim as a header in HTML and plain-text emails. | alert.email.header | alertpublisher_email_header | false | |
Alerts: Mail Message Format | The format of the email alert message. The 'JSON' format is easy for scripts/programs to parse. The 'HTML' and 'text' formats are designed to be easily read by people. | mail.format | html | mail_format | true |
Alert Publisher Log Directory | Directory where Alert Publisher will place its log files. | /var/log/cloudera-scm-alertpublisher | mgmt_log_dir | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alerts: Mail Server TCP Port | Optional. The TCP port where the mail server is listening. If not specified, defaults to 25 if SMTP is selected, or 465 if SMTPS is selected. | alert_mailserver_port | false | ||
Alerts: Listen Port | Port where the Alert Publisher listens for internal API requests. | alertpublisher.internalapi.port | 10101 | alertpublisher_internalapi_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Alert Publisher in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 256 MiB | alert_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
SNMP
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
SNMP Authentication Protocol Pass Phrase | Pass phrase to use for SNMP authentication protocol | alert.snmp.auth.password | alert_snmp_auth_password | false | |
SNMP Authentication Protocol | Authentication algorithm to use for authentication | alert.snmp.auth.protocol | SHA | alert_snmp_auth_protocol | false |
SNMPv2 Community String | Community string to use to identify this service. Generated SNMPv2 traps will use this string for authentication purpose. | alert.snmp.community | alert_snmp_community | false | |
SNMP Retry Count | Number of time to try before trap is timed out. If this value is set to '0' the trap will be sent only once. | alert.snmp.retries | 0 | alert_snmp_retries | true |
SNMP Server Engine Id | Engine Id to use for authentication and privacy. Engine Id is normally a hexadecimal number (e.g. 8000173e03a0c095f80c68). Engine Id along with pass phrases are used to generate keys for authentication and privacy protocols. | alert.snmp.security.engineid | alert_snmp_security_engineid | false | |
SNMP Security Level | Level of security to use for SNMP v3 protocol. Currently only 'no authentication' and 'authentication with no privacy' is supported. Select 'SNMPv2' to use 'Community String' based SNMPv2 authentication. | alert.snmp.security.level | SNMPv2 | alert_snmp_security_level | true |
SNMP NMS Hostname | Hostname of the SNMP NMS (network management software). It can be a DNS name or IP address of the host listening for SNMP traps and notifications. For reference, here is Cloudera Manager SNMP Mib . | alert.snmp.server.hostname | alert_snmp_server_hostname | false | |
SNMP Server Port | Port number on which SNMP server is listening. | alert.snmp.server.port | 162 | alert_snmp_server_port | true |
SNMP Timeout | Time to wait before an SNMP trap is resent or timed out. | alert.snmp.timeout | 5 second(s) | alert_snmp_timeout | true |
SNMP Security UserName | Name of a user to use for SNMP security. | alert.snmp.username | alert_snmp_username | false |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
eventserverdefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Configuration Options for Event Server | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | eventserver_java_opts | false | ||
Maximum Number of Events Returned by Any Query | The maximum number of events that any query can return. Note: A high value can increase the amount of memory required by Event Server, as well as affect query response times. | eventcatcher.max.query.events | 10000 | eventserver_max_query_events | true |
Maximum Write Queue Length | The maximum number of events that can be queued for write before further requests are rejected | eventcatcher.ingest.pipeline.max | 10000 | eventserver_max_write_queue_size | true |
Number of Core Event Writer Threads | The number of threads that Event Server will use to write events to its store concurrently | eventcatcher.num.ingest.threads | 2 | eventserver_num_pipeline_threads | true |
Event Server Query Timeout | The amount of time, in milliseconds, that Cloudera Manager and the Alert Publisher will wait for the Event Server to respond to a query. | eventserver.query.timeout | 60000 | eventserver_query_timeout | false |
Event Server Advanced Configuration Snippet (Safety Valve) for eventserver.conf | For advanced use only, a string to be inserted into eventserver.conf for this role only. | eventserver_safety_valve | false | ||
Event Server Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Server Logging Threshold | The minimum log level for Event Server logs | INFO | log_threshold | false | |
Event Server Maximum Log File Backups | The maximum number of rolled log files to keep for Event Server logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Event Server Max Log Size | The maximum size, in megabytes, per log file for Event Server logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Event Store Capacity Monitoring Thresholds | The health test thresholds on the number of events in the event store. Specified as a percentage of the maximum number of events in Event Server store. | Warning: 115.0 %, Critical: 130.0 % | eventserver_capacity_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | eventserver_fd_thresholds | false | |
Garbage Collection Duration Thresholds | The health test thresholds for the weighted average time spent in Java garbage collection. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | eventserver_gc_duration_thresholds | false | |
Garbage Collection Duration Monitoring Period | The period to review when computing the moving average of garbage collection time. | 5 minute(s) | eventserver_gc_duration_window | false | |
Event Server Host Health Test | When computing the overall Event Server health, consider the host's health. | true | eventserver_host_health_enabled | false | |
Index Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the index directory. | Warning: 10 GiB, Critical: 5 GiB | eventserver_index_directory_free_space_absolute_thresholds | false | |
Index Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the index directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if an Index Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | eventserver_index_directory_free_space_percentage_thresholds | false | |
Event Server Process Health Test | Enables the health test that the Event Server's process state is consistent with the role configuration | true | eventserver_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | eventserver_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | eventserver_web_metric_collection_thresholds | false | |
Event Server Write Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Event Server write pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | eventserver_write_pipeline_thresholds | false | |
Event Server Write Pipeline Monitoring Time Period | The time period over which the Event Server write pipeline will be monitored for dropped messages. | 5 minute(s) | eventserver_write_pipeline_window | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has
some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health
system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Alert On Transitions Out of Alerting Health | If set, the health events for transitions out of an alertable health level will also be considered an alert. For example, consider an entity that is configured to alert when it has bad health. If that entity's health becomes bad, an alert will be generated. If this setting is enabled, an alert will also be generated when it returns to good health. If this setting is disabled, then no alert will be generated when it returns to good health. Note that an entity must have enable_alerts set to true for health alerts to be generated for it. And make sure to reference the per-entity setting to turn on health alerts. | false | eventserver_alert_on_transition_out_of_alerting_health_enabled | false | |
Health Alert Threshold | Threshold at which a health event will be considered an alert. Note that an entity must have enable_alerts set to true for health alerts to be generated for it. And make sure to reference the per-entity setting to turn on health alerts. | Bad | eventserver_health_events_alert_threshold | false | |
Event Server Index Directory | Location of the Lucene index for Event Server | eventcatcher.server.lucenedir | /var/lib/cloudera-scm-eventserver | eventserver_index_dir | false |
Maximum Number of Events in the Event Server Store | The maximum size of the Event Server store, in events. Once this size is exceeded, events will be deleted started with the oldest first until the size of the store returns below this threshold | eventcatcher.event.capacity | 5000000 | eventserver_max_index_size | true |
Event Server Log Directory | Directory where Event Server will place its log files. | /var/log/cloudera-scm-eventserver | mgmt_log_dir | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Event Server Web UI Port | Port for the Event Server's Debug page. Set to -1 to disable debug server. | eventcatcher.server.debug.port | 8084 | eventserver_debug_port | false |
Event Query Port | Port on which the Event Server listens for queries for events. | eventcatcher.server.httpport | 7185 | eventserver_http_port | false |
Event Publish Port | Port on which the Event Server listens for the publication of events. | eventcatcher.server.port | 7184 | eventserver_listen_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of EventServer in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | event_server_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
hostmonitordefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Legacy Database Hostname | Hostname of the Host Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Host Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | localhost | firehose_database_host | false | |
Host Monitor Legacy Database Name | The name of the Host Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Host Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | firehose_database_name | false | ||
Host Monitor Legacy Database Password | Password for logging into the Host Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Host Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | db.hibernate.connection.password | firehose_database_password | false | |
Host Monitor Legacy Database Type | Type of the Host Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Host Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | mysql | firehose_database_type | false | |
Host Monitor Legacy Database Username | Username for logging into the Host Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Host Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | db.hibernate.connection.username | firehose_database_user | false | |
Java Configuration Options for Host Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Host Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only, a string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Host Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | health.event.publish.queue.max | 20000 | svcmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | health.event.publish.retry.ms | 5000 | svcmon_event_publication_retry_period | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Logging Threshold | The minimum log level for Host Monitor logs | INFO | log_threshold | false | |
Host Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Host Monitor logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Host Monitor Max Log Size | The maximum size, in megabytes, per log file for Host Monitor logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Storage Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the storage directory. | Warning: 10 GiB, Critical: 5 GiB | firehose_storage_directory_free_space_absolute_thresholds | false | |
Storage Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the storage directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Storage Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | firehose_storage_directory_free_space_percentage_thresholds | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | hostmonitor_fd_thresholds | false | |
Host Monitor Host Health Test | When computing the overall Host Monitor health, consider the host's health. | true | hostmonitor_host_health_enabled | false | |
Host Monitor Host Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Host Monitor host pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | hostmonitor_host_pipeline_thresholds | false | |
Host Monitor Host Pipeline Monitoring Time Period | The time period over which the Host Monitor host pipeline will be monitored for dropped messages. | 5 minute(s) | hostmonitor_host_pipeline_window | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | hostmonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | hostmonitor_pause_duration_window | false | |
Host Monitor Process Health Test | Enables the health test that the Host Monitor's process state is consistent with the role configuration | true | hostmonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | hostmonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | hostmonitor_web_metric_collection_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role loads.
It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each rule has
some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Instead, use .*, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Use .* instead, alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the health
system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Storage Directory | The directory where Host Monitor data is stored. The Host Monitor stores metric time series and health information. | firehose.storage.base.directory | /var/lib/cloudera-host-monitor | firehose_storage_dir | true |
Time-Series Storage | The approximate amount of disk space dedicated to storing time series and health data. Once the store has reached its maximum size older data will be deleted to make room for newer data. The disk usage is approximate because we only begin deleting data once we've reached the limit. Note that Cloudera Manager stores time-series data at a number of different data granularities, and these granularities have different effective retention periods. Specifically, Cloudera Manager stores metric data as both raw data points and ten-minutely, hourly, six-hourly, daily, and weekly summary data points. Raw data consumes the bulk of the allocated storage space, weekly summaries the least. As such, raw data is retained for the shortest amount of time, while weekly summary points are unlikely to ever be deleted. See the "Disk Usage" tab on the Host Monitor page for more information on how space is consumed within the Host Monitor. This tab also shows information about the amount of data retained and time window covered by each data granularity. | firehose_time_series_storage_bytes | 10 GiB | firehose_time_series_storage_bytes | false |
Health Event Startup Policy | This setting controls whether health events are emitted when this monitoring role is started. If set to "none", then no health events are emitted. If set to "bad" then health events are emitted for subjects with bad or concerning health. If set to "all" then health events are emitted for all subjects for all health values. The default is "bad". | health.event.publish.startup.policy | bad | health_event_publish_startup_policy | false |
Host Monitor Log Directory | Location of log files for Host Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false | |
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | health.event.publish.log.suppress.window.ms | 1 minute(s) | svcmon_event_publication_log_suppress_window | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Host Monitor Web UI Port | Port for Host Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8091 | firehose_debug_port | false |
Host Monitor Listen Port | Port where Host Monitor is listening for agent messages. | firehose.server.port | 9995 | firehose_listen_port | false |
Host Monitor Nozzle Port | Port where Host Monitor's query API is exposed. | nozzle.server.port | 9994 | firehose_nozzle_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Host Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Maximum Non-Java Memory of Host Monitor | The amount of memory the Host Monitor can use off of the Java heap. | firehose_non_java_memory_bytes | 2 GiB | firehose_non_java_memory_bytes | false |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
reportsmanagerdefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Index Writer Thread Pool Queue Size | Size of the queue to use for holding index writer tasks before they are executed. For faster indexing performance, consider increasing this to a small multiple of Maximum Index Writer Threads configured value. | index.writer.max.queue.size | 4 | headlamp_index_writer_max_queue_size | false |
Maximum Index Writer Threads | Maximum number of concurrent threads to use when writing the index. For faster indexing performance, consider increasing it to a small multiple of number of cores on Reports Manager host. | index.writer.num.threads | 2 | headlamp_index_writer_num_threads | false |
Java Configuration Options for Reports Manager | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | headlamp_java_opts | false | ||
Maximum Document Buffer Size | Amount of memory that may be used for buffering documents before they are flushed to the index. For faster indexing performance, consider increasing this value. | lucene.max.buffer.size.mb | 32 MiB | headlamp_lucene_max_buffer_size_mb | false |
Index Merge Factor | Reports Manager index is built in sections which are merged as the build progresses. This configuration determines how often index sections are merged. With smaller values, less memory is used while indexing, but indexing speed is slower. For faster indexing performance, consider increasing this value. | lucene.merge.factor | 100 | headlamp_lucene_merge_factor | false |
Publish HBase Space Usage | When set, publishes HBase space usage metrics to support HBase usage reporting. This feature is only supported for CDH5+ HBase deployments. | publish.hbase.space | true | headlamp_publish_hbase_metrics | false |
Reports Manager Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Reports Manager Advanced Configuration Snippet (Safety Valve) for headlamp.conf | For advanced use only, a string to be inserted into headlamp.conf for this role only. | reportsmanager_safety_valve | false |
Database
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Database Hostname | Name of the host where Reports Manager's database is running. It is highly recommended that this database is on the same host as Reports Manager. If the database is not running on its default port, specify the port number using this syntax: 'host:port' | com.cloudera.headlamp.db.host | localhost | headlamp_database_host | false |
Reports Manager Database Name | The name of the Reports Manager's database. | com.cloudera.headlamp.db.name | headlamp_database_name | true | |
Reports Manager Database Password | The password for Reports Manager's database user account. | com.cloudera.headlamp.db.password | headlamp_database_password | false | |
Reports Manager Database Type | Type of database used for Reports Manager. | com.cloudera.headlamp.db.type | mysql | headlamp_database_type | false |
Reports Manager Database Username | The username to use to log into Reports Manager's database. | com.cloudera.headlamp.db.user | headlamp_database_user | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Logging Threshold | The minimum log level for Reports Manager logs | INFO | log_threshold | false | |
Reports Manager Maximum Log File Backups | The maximum number of rolled log files to keep for Reports Manager logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Reports Manager Max Log Size | The maximum size, in megabytes, per log file for Reports Manager logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role
loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each
rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Instead, use .*, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Use .* instead, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | reportsmanager_fd_thresholds | false | |
Reports Manager Host Health Test | When computing the overall Reports Manager health, consider the host's health. | true | reportsmanager_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | reportsmanager_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | reportsmanager_pause_duration_window | false | |
Reports Manager Process Health Test | Enables the health test that the Reports Manager's process state is consistent with the role configuration | true | reportsmanager_scm_health_enabled | false | |
Scratch Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the scratch directory. | Warning: 10 GiB, Critical: 5 GiB | reportsmanager_scratch_directory_free_space_absolute_thresholds | false | |
Scratch Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the scratch directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Scratch Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | reportsmanager_scratch_directory_free_space_percentage_thresholds | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following
fields:
|
[] | role_triggers | true | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Working Directory | Directory for Reports Manager to use for its working files | scratch.dir | /var/lib/cloudera-scm-headlamp | headlamp_scratch_dir | false |
Reports Manager Update Frequency | Frequency in which Reports Manager refreshes its view of HDFS. | update.frequency.seconds | 1 hour(s) | headlamp_update_frequency_seconds | false |
Reports Manager Log Directory | Directory where Reports Manager will place its log files. | /var/log/cloudera-scm-headlamp | mgmt_log_dir | false |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Reports Manager Web UI Port | The port where Reports Manager starts a debug web server. Set to -1 to disable debug server. | debug.server.port | 8083 | headlamp_debug_port | false |
Reports Manager Server Port | The port where Reports Manager listens for requests | server.port | 5678 | headlamp_server_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Reports Manager in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | headlamp_heapsize | false | |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
servicemonitordefaultgroup
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Service Monitor Legacy Database Hostname | Hostname of the Service Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Service Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | localhost | firehose_database_host | false | |
Service Monitor Legacy Database Name | The name of the Service Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Service Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | firehose_database_name | false | ||
Service Monitor Legacy Database Password | Password for logging into the Service Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Service Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | db.hibernate.connection.password | firehose_database_password | false | |
Service Monitor Legacy Database Type | Type of the Service Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Service Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | mysql | firehose_database_type | false | |
Service Monitor Legacy Database Username | Username for logging into the Service Monitor legacy database. The legacy database stores data generated before upgrading to Cloudera Manager 5.0. If this setting is set the Service Monitor will attempt to migrate data from the legacy database to its current datastore. Once that process has been completed this configuration can be cleared. | db.hibernate.connection.username | firehose_database_user | false | |
Java Configuration Options for Service Monitor | These arguments will be passed as part of the Java command line. Commonly, garbage collection flags or extra debugging flags would be passed here. | firehose_java_opts | false | ||
Service Monitor Advanced Configuration Snippet (Safety Valve) for cmon.conf | For advanced use only, a string to be inserted into cmon.conf for this role only. | firehose_safety_valve | false | ||
Service Monitor Logging Advanced Configuration Snippet (Safety Valve) | For advanced use only, a string to be inserted into log4j.properties for this role only. | log4j_safety_valve | false | ||
Heap Dump Directory | Path to directory where heap dumps are generated when java.lang.OutOfMemoryError error is thrown. This directory is automatically created if it doesn't exist. However, if this directory already exists, role user must have write access to this directory. If this directory is shared amongst multiple roles, it should have 1777 permissions. Note that the heap dump files are created with 600 permissions and are owned by the role user. The amount of free space in this directory should be greater than the maximum Java Process heap size configured for this role. | /tmp | oom_heap_dump_dir | false | |
Dump Heap When Out of Memory | When set, generates heap dump file when java.lang.OutOfMemoryError is thrown. | false | oom_heap_dump_enabled | true | |
Kill When Out of Memory | When set, a SIGKILL signal is sent to the role process when java.lang.OutOfMemoryError is thrown. | true | oom_sigkill_enabled | true | |
Automatically Restart Process | When set, this role's process is automatically (and transparently) restarted in the event of an unexpected failure. | true | process_auto_restart | true | |
Event Publication Maximum Queue Size | The maximum size of the queue in which events published from this role will be buffered. If this queue becomes full (for example, due to an outage), subsequent events will be dropped. | health.event.publish.queue.max | 20000 | svcmon_event_publication_queue_size_max | true |
Event Publication Retry Period | If an event cannot be delivered immediately by this role, this value controls how long to wait before Event Publisher retries delivery. | health.event.publish.retry.ms | 5000 | svcmon_event_publication_retry_period | true |
Logs
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Service Monitor Logging Threshold | The minimum log level for Service Monitor logs | INFO | log_threshold | false | |
Service Monitor Maximum Log File Backups | The maximum number of rolled log files to keep for Service Monitor logs. Typically used by log4j. | 10 | max_log_backup_index | false | |
Service Monitor Max Log Size | The maximum size, in megabytes, per log file for Service Monitor logs. Typically used by log4j. | 200 MiB | max_log_size | false |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Health Alerts for this Role | When set, Cloudera Manager will send alerts when the health of this role reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | true | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Storage Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the storage directory. | Warning: 10 GiB, Critical: 5 GiB | firehose_storage_directory_free_space_absolute_thresholds | false | |
Storage Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains the storage directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Storage Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | firehose_storage_directory_free_space_percentage_thresholds | false | |
Log Directory Free Space Monitoring Absolute Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. | Warning: 10 GiB, Critical: 5 GiB | log_directory_free_space_absolute_thresholds | false | |
Log Directory Free Space Monitoring Percentage Thresholds | The health test thresholds for monitoring of free space on the filesystem that contains this role's log directory. Specified as a percentage of the capacity on that filesystem. This setting is not used if a Log Directory Free Space Monitoring Absolute Thresholds setting is configured. | Warning: Never, Critical: Never | log_directory_free_space_percentage_thresholds | false | |
Rules to Extract Events from Log Files | This file contains the rules which govern how log messages are turned into events by the custom log4j appender that this role
loads. It is in JSON format, and is composed of a list of rules. Every log message is evaluated against each of these rules in turn to decide whether or not to send an event for that message. Each
rule has some or all of the following fields:
|
version: 0, rules: [ alert: false, rate: 1, periodminutes: 1, threshold:FATAL, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Instead, use .*, alert: false, rate: 0, threshold:WARN, content: .* is deprecated. Use .* instead, alert: false, rate: 1, periodminutes: 2, exceptiontype: .*, alert: false, rate: 1, periodminutes: 1, threshold:WARN ] | log_event_whitelist | false | |
Role Triggers | The configured triggers for this role. This is a JSON formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following
fields:
|
[] | role_triggers | true | |
File Descriptor Monitoring Thresholds | The health test thresholds of the number of file descriptors used. Specified as a percentage of file descriptor limit. | Warning: 50.0 %, Critical: 70.0 % | servicemonitor_fd_thresholds | false | |
Service Monitor Host Health Test | When computing the overall Service Monitor health, consider the host's health. | true | servicemonitor_host_health_enabled | false | |
Pause Duration Thresholds | The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. | Warning: 30.0, Critical: 60.0 | servicemonitor_pause_duration_thresholds | false | |
Pause Duration Monitoring Period | The period to review when computing the moving average of extra time the pause monitor spent paused. | 5 minute(s) | servicemonitor_pause_duration_window | false | |
Service Monitor Role Pipeline Monitoring Thresholds | The health test thresholds for monitoring the Service Monitor role pipeline. This specifies the number of dropped messages that will be tolerated over the monitoring time period. | Warning: Never, Critical: Any | servicemonitor_role_pipeline_thresholds | false | |
Service Monitor Role Pipeline Monitoring Time Period | The time period over which the Service Monitor role pipeline will be monitored for dropped messages. | 5 minute(s) | servicemonitor_role_pipeline_window | false | |
Service Monitor Process Health Test | Enables the health test that the Service Monitor's process state is consistent with the role configuration | true | servicemonitor_scm_health_enabled | false | |
Web Metric Collection | Enables the health test that the Cloudera Manager Agent can successfully contact and gather metrics from the web server. | true | servicemonitor_web_metric_collection_enabled | false | |
Web Metric Collection Duration | The health test thresholds on the duration of the metrics request to the web server. | Warning: 10 second(s), Critical: Never | servicemonitor_web_metric_collection_thresholds | false | |
Unexpected Exits Thresholds | The health test thresholds for unexpected exits encountered within a recent period specified by the unexpected_exits_window configuration for the role. | Warning: Never, Critical: Any | unexpected_exits_thresholds | false | |
Unexpected Exits Monitoring Period | The period to review when computing unexpected exits. | 5 minute(s) | unexpected_exits_window | false | |
YARN MapReduce Counter Descriptions | This JSON document contains metadata that is used by the Service Monitor's YARN application monitoring feature for YARN-based
MapReduce counter handling. Each counter description has the following fields:
|
[ name: org.apache.hadoop.mapreduce.jobcounter.num_failed_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.num_failed_reduces, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.total_launched_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.total_launched_reduces, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.other_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.data_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.rack_local_maps, units: tasks , name: org.apache.hadoop.mapreduce.jobcounter.slots_millis_maps, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.slots_millis_reduces, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.fallow_slots_millis_maps, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.fallow_slots_millis_reduces, units: ms , name: org.apache.hadoop.mapreduce.jobcounter.mb_millis_maps, units: mb millis , name: org.apache.hadoop.mapreduce.jobcounter.mb_millis_reduces, units: mb millis , name: org.apache.hadoop.mapreduce.jobcounter.vcores_millis_maps, units: vcore millis , name: org.apache.hadoop.mapreduce.jobcounter.vcores_millis_reduces, units: vcore millis , name: org.apache.hadoop.mapreduce.filesystemcounter.file_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.file_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.file_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.file_large_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.file_write_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_bytes_written, units: bytes , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_large_read_ops, units: operations , name: org.apache.hadoop.mapreduce.filesystemcounter.hdfs_write_ops, units: operations , name: org.apache.hadoop.mapreduce.taskcounter.map_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.map_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.map_output_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.map_output_materialized_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.split_raw_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.combine_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.combine_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.reduce_input_groups, units: groups , name: org.apache.hadoop.mapreduce.taskcounter.reduce_shuffle_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.reduce_input_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.reduce_output_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.spilled_records, units: records , name: org.apache.hadoop.mapreduce.taskcounter.shuffled_maps, units: tasks , name: org.apache.hadoop.mapreduce.taskcounter.failed_shuffle, units: failures , name: org.apache.hadoop.mapreduce.taskcounter.merged_map_outputs, units: outputs , name: org.apache.hadoop.mapreduce.taskcounter.gc_time_millis, units: ms , name: org.apache.hadoop.mapreduce.taskcounter.cpu_milliseconds, units: ms , name: org.apache.hadoop.mapreduce.taskcounter.physical_memory_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.virtual_memory_bytes, units: bytes , name: org.apache.hadoop.mapreduce.taskcounter.committed_heap_bytes, units: bytes , attributeName: shuffle_errors_bad_id, name: shuffle_errors.bad_id, units: errors , attributeName: shuffle_errors_connection, name: shuffle_errors.connection, units: errors , attributeName: shuffle_errors_io, name: shuffle_errors.io_error, units: errors , attributeName: shuffle_errors_wrong_length, name: shuffle_errors.wrong_length, units: errors , attributeName: shuffle_errors_wrong_map, name: shuffle_errors.wrong_map, units: errors , attributeName: shuffle_errors_wrong_reduce, name: shuffle_errors.wrong_reduce, units: errors , name: org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter.bytes_read, units: bytes , name: org.apache.hadoop.mapreduce.lib.output.fileoutputformatcounter.bytes_written, units: bytes ] | yarn_application_mapreduce_counters | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Impala Storage | The approximate amount of disk space dedicated to storing Impala query data. Once the store has reached its maximum size older data will be deleted to make room for newer queries. The disk usage is approximate because we only begin deleting data once we've reached the limit. | firehose_impala_storage_bytes | 1 GiB | firehose_impala_storage_bytes | false |
Service Monitor Storage Directory | The directory where Service Monitor data is stored. The Service Monitor stores metric time series and health information, as well as Impala query and YARN application metadata if Impala and/or YARN are configured. | firehose.storage.base.directory | /var/lib/cloudera-service-monitor | firehose_storage_dir | true |
Time-Series Storage | The approximate amount of disk space dedicated to storing time series and health data. Once the store has reached its maximum size older data will be deleted to make room for newer data. The disk usage is approximate because we only begin deleting data once we've reached the limit. Note that Cloudera Manager stores time-series data at a number of different data granularities, and these granularities have different effective retention periods. Specifically, Cloudera Manager stores metric data as both raw data points and ten-minutely, hourly, six-hourly, daily, and weekly summary data points. Raw data consumes the bulk of the allocated storage space, weekly summaries the least. As such, raw data is retained for the shortest amount of time, while weekly summary points are unlikely to ever be deleted. See the "Disk Usage" tab on the Service Monitor page for more information on how space is consumed within the Service Monitor. This tab also shows information about the amount of data retained and time window covered by each data granularity. | firehose_time_series_storage_bytes | 10 GiB | firehose_time_series_storage_bytes | false |
YARN Storage | The approximate amount of disk space dedicated to storing YARN application data. Once the store has reached its maximum size older data will be deleted to make room for newer applications. The disk usage is approximate because we only begin deleting data once we've reached the limit. | firehose_yarn_storage_bytes | 1 GiB | firehose_yarn_storage_bytes | false |
Health Event Startup Policy | This setting controls whether health events are emitted when this monitoring role is started. If set to "none", then no health events are emitted. If set to "bad" then health events are emitted for subjects with bad or concerning health. If set to "all" then health events are emitted for all subjects for all health values. The default is "bad". | health.event.publish.startup.policy | bad | health_event_publish_startup_policy | false |
Service Monitor Log Directory | Location of log files for Service Monitor | /var/log/cloudera-scm-firehose | mgmt_log_dir | false | |
Event Publication Log Quiet Time Period | To avoid producing excessive amounts of log output, the Event Publisher component of this role is limited to emitting one message per time period. This value controls the size of that time period. | health.event.publish.log.suppress.window.ms | 1 minute(s) | svcmon_event_publication_log_suppress_window | true |
Performance
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Maximum Process File Descriptors | If configured, overrides the process soft and hard rlimits (also called ulimits) for file descriptors to the configured value. | rlimit_fds | false |
Ports and Addresses
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Service Monitor Web UI Port | Port for Service Monitor's Debug page. Set to -1 to disable the debug server. | debug.servlet.port | 8086 | firehose_debug_port | false |
Service Monitor Listen Port | Port where Service Monitor is listening for agent messages. | firehose.server.port | 9997 | firehose_listen_port | false |
Service Monitor Nozzle Port | Port where Service Monitor's query API is exposed. | nozzle.server.port | 9996 | firehose_nozzle_port | false |
Resource Management
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Java Heap Size of Service Monitor in Bytes | Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. | 1 GiB | firehose_heapsize | false | |
Maximum Non-Java Memory of Service Monitor | The amount of memory the Service Monitor can use off of the Java heap. | firehose_non_java_memory_bytes | 2 GiB | firehose_non_java_memory_bytes | false |
Cgroup CPU Shares | Number of CPU shares to assign to this role. The greater the number of shares, the larger the share of the host's CPUs that will be given to this role when the host experiences CPU contention. Must be between 2 and 262144. Defaults to 1024 for processes not managed by Cloudera Manager. | cpu.shares | 1024 | rm_cpu_shares | true |
Cgroup I/O Weight | Weight for the read I/O requests issued by this role. The greater the weight, the higher the priority of the requests when the host experiences I/O contention. Must be between 100 and 1000. Defaults to 1000 for processes not managed by Cloudera Manager. | blkio.weight | 500 | rm_io_weight | true |
Cgroup Memory Hard Limit | Hard memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.limit_in_bytes | -1 MiB | rm_memory_hard_limit | true |
Cgroup Memory Soft Limit | Soft memory limit to assign to this role, enforced by the Linux kernel. When the limit is reached, the kernel will reclaim pages charged to the process if and only if the host is facing memory pressure. If reclaiming fails, the kernel may kill the process. Both anonymous as well as page cache pages contribute to the limit. Use a value of -1 B to specify no limit. By default processes not managed by Cloudera Manager will have no limit. | memory.soft_limit_in_bytes | -1 MiB | rm_memory_soft_limit | true |
Stacks Collection
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Stacks Collection Data Retention | The amount of stacks data that will be retained. After the retention limit is reached, the oldest data will be deleted. | stacks_collection_data_retention | 100 MiB | stacks_collection_data_retention | false |
Stacks Collection Directory | The directory in which stacks logs will be placed. If not set, stacks will be logged into a stacks subdirectory of the role's log directory. | stacks_collection_directory | stacks_collection_directory | false | |
Stacks Collection Enabled | Whether or not periodic stacks collection is enabled. | stacks_collection_enabled | false | stacks_collection_enabled | true |
Stacks Collection Frequency | The frequency with which stacks will be collected. | stacks_collection_frequency | 5.0 second(s) | stacks_collection_frequency | false |
Stacks Collection Method | The method that will be used to collect stacks. The jstack option involves periodically running the jstack command against the role's daemon process. The servlet method is available for those roles that have an HTTP server endpoint exposing the current stacks traces of all threads. When the servlet method is selected that HTTP endpoint is periodically scraped. | stacks_collection_method | jstack | stacks_collection_method | false |
service_wide
Advanced
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Cloudera Management Service Environment Advanced Configuration Snippet (Safety Valve) | For advanced use only, key-value pairs (one on each line) to be inserted into a role's environment. Applies to configurations of all roles in this service except client configuration. | mgmt_service_env_safety_valve | false | ||
Cloudera Management Service Advanced Configuration Snippet (Safety Valve) for ssl-client.xml | For advanced use only, a string to be inserted into ssl-client.xml. This setting currently applies to the Reports Manager only. | mgmt_ssl_client_safety_valve | false | ||
System Group | The group that this service's processes should run as. | cloudera-scm | process_groupname | true | |
System User | The user that this service's processes should run as. | cloudera-scm | process_username | true |
Monitoring
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Enable Log Event Capture | When set, each role identifies important log events and forwards them to Cloudera Manager. | true | catch_events | false | |
Enable Service Level Health Alerts | When set, Cloudera Manager will send alerts when the health of this service reaches the threshold specified by the EventServer setting eventserver_health_events_alert_threshold | false | enable_alerts | false | |
Enable Configuration Change Alerts | When set, Cloudera Manager will send alerts when this entity's configuration changes. | false | enable_config_alerts | false | |
Log Event Retry Frequency | The frequency in which the log4j event publication appender will retry sending undelivered log events to the Event server, in seconds | 30 | log_event_retry_frequency | false | |
Activity Monitor Role Health Test | When computing the overall MGMT health, consider Activity Monitor's health | true | mgmt_activitymonitor_health_enabled | false | |
Alert Publisher Role Health Test | When computing the overall MGMT health, consider Alert Publisher's health | true | mgmt_alertpublisher_health_enabled | false | |
Cloudera Manager Server Clock Offset Thresholds | The health test thresholds for monitoring the clock offset between the Cloudera Manager Server and the Service Monitor. | Warning: 30 second(s), Critical: 1 minute(s) | mgmt_clock_offset_with_smon_thresholds | false | |
Embedded Database Free Space Monitoring Thresholds | The health test thresholds for monitoring the free space on the volume for the embedded PostgreSQL database optionally running on the Cloudera Manager Server. If the embedded database is not in use, this has no effect. | Warning: 2 GiB, Critical: 1 GiB | mgmt_embedded_database_free_space_absolute_thresholds | false | |
Event Server Role Health Test | When computing the overall MGMT health, consider Event Server's health | true | mgmt_eventserver_health_enabled | false | |
Host Monitor Role Health Test | When computing the overall MGMT health, consider Host Monitor's health | true | mgmt_hostmonitor_health_enabled | false | |
Navigator Audit Server Role Health Test | When computing the overall MGMT health, consider Navigator Audit Server's health | true | mgmt_navigator_health_enabled | false | |
Navigator Metadata Server Role Health Test | When computing the overall MGMT health, consider Navigator Metadata Server's health | true | mgmt_navigatormetaserver_health_enabled | false | |
Reports Manager Role Health Test | When computing the overall MGMT health, consider Reports Manager's health | true | mgmt_reportsmanager_health_enabled | false | |
Service Monitor Role Health Test | When computing the overall MGMT health, consider Service Monitor's health | true | mgmt_servicemonitor_health_enabled | false | |
Service Triggers | The configured triggers for this service. This is a JSON formatted list of triggers. These triggers are evaluated as part as the
health system. Every trigger expression is parsed, and if the trigger condition is met, the list of actions provided in the trigger expression is executed. Each trigger has all of the following
fields:
|
[] | service_triggers | true | |
Service Monitor Derived Configs Advanced Configuration Snippet (Safety Valve) | For advanced use only, a list of derived configuration properties that will be used by the Service Monitor instead of the default ones. | smon_derived_configs_safety_valve | false |
Other
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
Minimum Kerberos Ticket Validity Period | The minimum Kerberos ticket validity period. The Cloudera Management Servies only attempt to log in again after this minimum period of time has elapsed. | tgt.login.validity.period | 1 hour(s) | tgt_login_validity_period | false |
Security
Display Name | Description | Related Name | Default Value | API Name | Required |
---|---|---|---|---|---|
SSL Client Truststore File Location | Path to the client truststore file used in HTTPS communication. This truststore contains certificates of trusted servers, or of Certificate Authorities trusted to identify servers. If set, this is used to verify certificates in HTTPS communication with CDH services and the Cloudera Manager Server. If not set, the default Java truststore is used to verify certificates. The contents of this truststore can be modified without restarting the Cloudera Management Service roles. By default, changes to its contents are picked up within ten seconds. | ssl.client.truststore.location | ssl_client_truststore_location | false | |
SSL Client Truststore File Password | Password for the client truststore file. | ssl.client.truststore.password | ssl_client_truststore_password | false |