Activity Analyzer
The following configuration properties are available for Activity Analyzer:
Table 3.5. Activity Analyzer Configuration Properties
Property Name | Description | Where to Configure | Guidelines |
---|---|---|---|
phoenix.sink.batch.size |
Activities are batched for better storage performance. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds has elapsed. Type: int Default Value: 100 |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini |
Increasing batch size can lower the load on storage and improve storage performance; however, it can delay the availability of data and increase memory pressure. Reducing batch size can make data available sooner but has negative performance impact on storage layer. |
global.activity.processing.parallelism |
Number of parallel threads that process each activity type. Controls the threads used for Tez, YARN, MR, and HDFS activity data collection. Type: int Default Value: 8 |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini |
Reduce the number of threads if you encounter out of memory exceptions. |
phoenix.sink.flush.interval.seconds |
Time after which data will be flushed to Phoenix. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds has elapsed. Type: int Default Value: 30 |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini |
Increase the time to reduce the number of persist operations to Phoenix only if number of records to be batched together is much less than 100. |
mr_job.activity.watcher.enabled |
Enables automatic activity analysis for MapReduce jobs. Type: boolean Default Value: true |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini |
Disable only if you do not want to analyze MapReduce jobs. |
mr_job.max.job.size.mb.for.parallel.exec ution |
Maximum size (in bytes) that a MapReduce job can have in order to be executed in parallel. Some large MapReduce jobs may contain thousands of tasks. Such jobs require a lot of memory and they put memory pressure on JVM, especially in multi-threaded execution. Any job with history size larger than specified in this parameter will be executed in synchronized fashion. This may slow the performance down, but will avoid OOM errors. Any job with history file size smaller than specified in this parameter will be executed in parallel. Type: int Default Value: 500 |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini |
Reduce the parallel execution job size if you encounter OOM errors. |
tez_job.activity.watcher.enabled |
Enables automatic activity analysis for Tez jobs. Type: boolean Default Value: true |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini | Disable only if you do not want to analyze Tez jobs. |
tez_job.tmp.dir |
Temporary location where Tez job information is downloaded. Type: string Default Value: /var/lib/smartsense/activity-analyzer/tez/tmp/ |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini | You can symlink it to a non-root partition or change it to use a directory in a non-root partition. |
yarn_app.activity.watcher.enabled |
Enables automatic activity analysis for YARN apps. Type: boolean Default Value: true |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini | Disable only if you do not want to analyze YARN jobs. |
hdfs.activity.watcher.enabled |
Enables automatic analysis for HDFS files. Type: boolean Default Value: true |
Ambari Config: Activity Analysis Config File: /etc/smartsense-activity/conf/activity.ini | Disable only if you do not want to analyze HDFS fsImage. |
global.activity.analyzer.user |
Defines the user used to read activity data from HDFS and YARN. This user must have read access to all activity data from HDFS/YARN/ATS, and so on. Type: string Default Value: activity_explorer |
Ambari Config: Advanced > Advanced activity-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
activity.explorer.user |
Defines the user used to read pre-analyzed data. This user does not need access to HDFS and YARN. Type: string Default Value: activity_explorer |
Ambari Config: Advanced > Advanced activity-conf Config File: /etc/smartsense-activity/conf/activity.ini | Default value is suitable for all clusters. |
analyzer_jvm_opts |
Allows you to specify multiple jvm options separated by space. Type: string Default Value: -Xms128m |
Ambari Config: Advanced > Advanced activity-env Config File: /etc/smartsense-activity/conf/activity-env.sh |
This parameter allows you to add any additional jvm options for executing activity analyzers, for example for GC tuning. |
analyzer_jvm_heap |
Maximum heap space (in MB) allocated for Activity Analyzer process. Type: int Default Value: 8192 |
Ambari Config: Advanced > Advanced activity-env Config File: /etc/smartsense-activity/conf/activity-env.sh | Usually 8192 MB is sufficient, but it can be increased if you encounter OOM errors. |
activity_log_dir |
Directory where activity log files are created. Type: string Default Value: var/log/smartsense-activity |
Ambari Config: Advanced > Advanced activity-log4j Config File: /etc/smartsense-activity/conf/log4j.properties |
Default value is suitable for most clusters. If you change this directory, you must provide read/write/create permissions on the new directory to activity_analyzer user. |
activity_max_file_size |
Maximum size (in MB) for SmartSense activity log files. Type: int Default Value: 30 |
Ambari Config: Advanced > Advanced activity-log4j Config File: /etc/smartsense-activity/conf/log4j.properties |
Default value is suitable for most clusters. Check available storage capacity before updating this property. |
activity_max_backup_index |
Maximum number of SmartSense activity log files. Type: int Default Value: 10 |
Ambari Config: Advanced > Advanced activity-log4j Config File: /etc/smartsense-activity/conf/log4j.properties | You can increase this number to keep the record of older logs. Check available storage capacity before updating this property. |
global.date.format |
Format in which dates are converted to strings and sometimes persisted. Type: string Default Value: "YYYY-mm-DD" |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
global.activity.status.update.interval.sec onds |
Interval (in seconds) after which status of processed/failed/in process activities is updated in DB. Type: int Default Value: 30 |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini | Default value is suitable for all clusters. |
activity.batch.interval.seconds |
Interval for batching activities. Activities are batched for better storage performance. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds is elapsed. Type: int Default Value: 60 |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini | Increasing the batch interval can lower the load on storage and improve storage performance; however, it can also delay the availability of data and increase memory pressure. Reducing the interval size can make data available sooner, but has negative performance impact on storage layer. |
activity.watcher.enabled |
Enables regular collection of job data for analysis. Type: boolean Default Value: true |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Disable this only if you want to temporarily turn off data collection. |
activity.history.max.back.track.days |
The number of days of history to retrieve job information. Type: int Default Value: 7 |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Increase this number if you have to refer to older jobs. Note that older jobs should have data available in AMS. This is used only during first run after installation. |
phoenix.setup.continue.on.error |
During initial setup, errors in DB setup may occur. This parameter indicates whether to continue if any error occurs. Type: boolean Default Value: false |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
phoenix.setup.drop.existing.tables |
During initial setup matching tables may be found in the DB (typically from previous install attempts). This parameter determines whether they should be dropped and recreated. By default, the existing entries are kept. Type: boolean Default Value: false |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
phoenix.activity.analyzer.jdbc.url |
JDBC URL used by Activity Analyzer to store its data. Type: string Default Value: (no value) |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Do not change it. It is auto configured based on the cluster setup. |
ams.jdbc.url |
JDBC URL used by Activity Analyzer to fetch data from AMS. Type: string Default Value: (no value) |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Do not change it. It is auto configured based on the cluster setup. |
global.store.job.configs |
Enables storing job-specific configs in AMS after analysis. Type: boolean Default Value: true |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Do not disable it. Keeping it on helps in debugging. |
global.store.tasks |
Enables persisting task-level data in AMS after analysis. Type: boolean Default Value: false |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini | Task-level data can be huge and may overwhelm AMS, so keep it disabled unless absolutely needed. If enabling, disable again later. |
global.store.task.counters |
Enables storing task counter data in the AMS after analysis. Type: boolean Default Value: false |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
All task counters can be huge and may overwhelm AMS, so keep it disabled unless absolutely needed. If enabling, disable again later. |
global.activity.fetch.retry.interval.seconds |
Interval (in seconds) between retry attempts to fetch the activity details. Type: int Default Value: 5 |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
global.activity.fetch.retry.attempts |
Number of tries to fetch activities before giving up. Type: int Default Value: 5 |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
Default value is suitable for all clusters. |
global.tmp.dir |
Temporary directory used by activity-analyzer for internal purposes. Type: string Default Value: /var/lib/smartsense/activity-analyzer/tmp/ |
Ambari Config: Advanced > Custom activity-analyzer-conf Config File: /etc/smartsense-activity/conf/activity.ini |
We do not recommended to change this unless you have a very specific requirement. If using a different directory than the default, verify that permissions are set accordingly. |