SmartSense Configuration Guidelines
Also available as:
PDF

Activity analyzer

The following configuration properties are available for activity analyzer:

Table 1. Activity Analyzer Configuration Properties
Property Name Description Where to Configure Guidelines

phoenix.sink.batch.size

Activities are batched for better storage performance. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds has elapsed.

Type: int

Default Value: 100

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Increasing batch size can lower the load on storage and improve storage performance; however, it can delay the availability of data and increase memory pressure.

Reducing batch size can make data available sooner but has negative performance impact on storage layer.

global.activity.processing.parallelism

Number of parallel threads that process each activity type. Controls the threads used for Tez, YARN, MR, and HDFS activity data collection.

Type: int

Default Value: 8

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Reduce the number of threads if you encounter out of memory exceptions.

phoenix.sink.flush.interval.seconds

Time after which data will be flushed to Phoenix. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds has elapsed.

Type: int

Default Value: 30

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Increase the time to reduce the number of persist operations to Phoenix only if number of records to be batched together is much less than 100.

mr_job.activity.watcher.enabled

Enables automatic activity analysis for MapReduce jobs.

Type: boolean

Default Value: true

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Disable only if you do not want to analyze MapReduce jobs.

mr_job.max.job.size.mb.for.parallel.exec

ution

Maximum size (in bytes) that a MapReduce job can have in order to be executed in parallel.

Some large MapReduce jobs may contain thousands of tasks. Such jobs require a lot of memory and they put memory pressure on JVM, especially in multi-threaded execution.

Any job with history size larger than specified in this parameter will be executed in synchronized fashion. This may slow the performance down, but will avoid OOM errors.

Any job with history file size smaller than specified in this parameter will be executed in parallel.

Type: int

Default Value: 500

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Reduce the parallel execution job size if you encounter OOM errors.

tez_job.activity.watcher.enabled

Enables automatic activity analysis for Tez jobs.

Type: boolean

Default Value: true

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Disable only if you do not want to analyze Tez jobs.

tez_job.tmp.dir

Temporary location where Tez job information is downloaded.

Type: string

Default Value:

/var/lib/smartsense/activity-analyzer/tez/tmp/

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

You can symlink it to a non-root partition or change it to use a directory in a non-root partition.

yarn_app.activity.watcher.enabled

Enables automatic activity analysis for YARN apps.

Type: boolean

Default Value: true

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Disable only if you do not want to analyze YARN jobs.

hdfs.activity.watcher.enabled

Enables automatic analysis for HDFS files.

Type: boolean

Default Value: true

Ambari Config:

Activity Analysis

Config File:

/etc/smartsense-activity/conf/activity.ini

Disable only if you do not want to analyze HDFS fsImage.

global.activity.analyzer.user

Defines the user used to read activity data from HDFS and YARN. This user must have read access to all activity data from HDFS/YARN/ATS, and so on.

Type: string

Default Value: activity_explorer

Ambari Config:

Advanced > Advanced activity-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

activity.explorer.user

Defines the user used to read pre-analyzed data. This user does not need access to HDFS and YARN.

Type: string

Default Value: activity_explorer

Ambari Config:

Advanced > Advanced activity-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

analyzer_jvm_opts

Allows you to specify multiple jvm options separated by space.

Type: string

Default Value: -Xms128m

Ambari Config:

Advanced > Advanced activity-env

Config File:

/etc/smartsense-activity/conf/activity-env.sh

This parameter allows you to add any additional jvm options for executing activity analyzers, for example for GC tuning.

analyzer_jvm_heap

Maximum heap space (in MB) allocated for Activity Analyzer process.

Type: int

Default Value: 8192

Ambari Config:

Advanced > Advanced activity-env

Config File:

/etc/smartsense-activity/conf/activity-env.sh

Usually 8192 MB is sufficient, but it can be increased if you encounter OOM errors.

activity_log_dir

Directory where activity log files are created.

Type: string

Default Value:

var/log/smartsense-activity

Ambari Config:

Advanced > Advanced activity-log4j

Config File:

/etc/smartsense-activity/conf/log4j.properties

Default value is suitable for most clusters.

If you change this directory, you must provide read/write/create permissions on the new directory to activity_analyzer user.

activity_max_file_size

Maximum size (in MB) for SmartSense activity log files.

Type: int

Default Value: 30

Ambari Config:

Advanced > Advanced activity-log4j

Config File:

/etc/smartsense-activity/conf/log4j.properties

Default value is suitable for most clusters.

Check available storage capacity before updating this property.

activity_max_backup_index

Maximum number of SmartSense activity log files.

Type: int

Default Value: 10

Ambari Config:

Advanced > Advanced activity-log4j

Config File:

/etc/smartsense-activity/conf/log4j.properties

You can increase this number to keep the record of older logs. Check available storage capacity before updating this property.

global.date.format

Format in which dates are converted to strings and sometimes persisted.

Type: string

Default Value: "YYYY-mm-DD"

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

global.activity.status.update.interval.sec

onds

Interval (in seconds) after which status of processed/failed/in process activities is updated in DB.

Type: int

Default Value: 30

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.
activity.batch.interval.seconds

Interval for batching activities.

Activities are batched for better storage performance. A batch is persisted when either the batch size becomes equal to phoenix.sink.batch.size or activity.status.update.interval.seconds is elapsed.

Type: int

Default Value: 60

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Increasing the batch interval can lower the load on storage and improve storage performance; however, it can also delay the availability of data and increase memory pressure.

Reducing the interval size can make data available sooner, but has negative performance impact on storage layer.

activity.watcher.enabled

Enables regular collection of job data for analysis.

Type: boolean

Default Value: true

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Disable this only if you want to temporarily turn off data collection.

activity.history.max.back.track.days

The number of days of history to retrieve job information.

Type: int

Default Value: 7

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Increase this number if you have to refer to older jobs. Note that older jobs should have data available in AMS. This is used only during first run after installation.

phoenix.setup.continue.on.error

During initial setup, errors in DB setup may occur. This parameter indicates whether to continue if any error occurs.

Type: boolean

Default Value: false

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

phoenix.setup.drop.existing.tables

During initial setup matching tables may be found in the DB (typically from previous install attempts). This parameter determines whether they should be dropped and recreated. By default, the existing entries are kept.

Type: boolean

Default Value: false

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

phoenix.activity.analyzer.jdbc.url

JDBC URL used by activity analyzer to store its data.

Type: string

Default Value: (no value)

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Do not change it. It is auto configured based on the cluster setup.

ams.jdbc.url

JDBC URL used by activity analyzer to fetch data from AMS.

Type: string

Default Value: (no value)

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Do not change it. It is auto configured based on the cluster setup.

global.store.job.configs

Enables storing job-specific configs in AMS after analysis.

Type: boolean

Default Value: true

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Do not disable it. Keeping it on helps in debugging.

global.store.tasks

Enables persisting task-level data in AMS after analysis.

Type: boolean

Default Value: false

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Task-level data can be huge and may overwhelm AMS, so keep it disabled unless absolutely needed. If enabling, disable again later.
global.store.task.counters

Enables storing task counter data in the AMS after analysis.

Type: boolean

Default Value: false

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

All task counters can be huge and may overwhelm AMS, so keep it disabled unless absolutely needed. If enabling, disable again later.

global.activity.fetch.retry.interval.seconds

Interval (in seconds) between retry attempts to fetch the activity details.

Type: int

Default Value: 5

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

global.activity.fetch.retry.attempts

Number of tries to fetch activities before giving up.

Type: int

Default Value: 5

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

Default value is suitable for all clusters.

global.tmp.dir

Temporary directory used by activity-analyzer for internal purposes.

Type: string

Default Value:

/var/lib/smartsense/activity-analyzer/tmp/

Ambari Config:

Advanced > Custom activity-analyzer-conf

Config File:

/etc/smartsense-activity/conf/activity.ini

We do not recommended to change this unless you have a very specific requirement. If using a different directory than the default, verify that permissions are set accordingly.