DSS Getting Started
Also available as:
PDF

Ambari Dataplane Profiler Configs

From Ambari > Dataplane Profiler > Configs, you can view or update your database or advanced configurations.

Dataplane Profiler Database Configs

From Ambari > Dataplane Profiler > Configs > Database, you can view or update the DataPlane Profiler Database configurations.

Table 1. Database configs
Value Description Example
DP Profiler Database Database type or flavor used for DSS profiler.

H2

MySQL

POSTGRES

Slick JDBC Driver Class System driver that is used to connect to the database.
Important
Important
Do not modify.

H2: slick.driver.H2Driver$

MySQL: slick.driver.MySQLDriver$

POSTGRES: slick.driver.PostgresDriver$

Database Username A Database user needs to be created in the MySQL or Postgres DB that the profiler service would use to connect to the DB. This is name of that database user. profileragent
Database Name Name must be “profileragent”.
Important
Important
Do not modify.
profileragent
Database URL The URL of DP profiler database.

H2: jdbc:h2:/var/lib/profiler_agent/h2/profileragent;DATABASE_TO_UPPER=false;DB_CLOSE_DELAY=-1

MySQL: jdbc:mysql://hostname:3306/profileragent?autoreconnect=true

POSTGRES: jdbc:postgresql://hostname:5432/profileragent

Database Host Database host name for Profiler Agent server <hostname>
JDBC Driver Class
Driver name for your profiler database.
Important
Important
Do not modify.

H2: org.h2.Driver

MySQL: com.mysql.jdbc.Driver

POSTGRES: org.postgresql.Driver

Database password The password for your DP database. <your_password>

Dataplane Profiler Advanced Configs

From Ambari > Dataplane Profiler > Configs > Advanced, you can view or update the DataPlane Profiler advanced configurations.

Table 2. Advanced dpprofiler-config
Value Description Example
Cluster Configs

Provides various cluster configurations, including: atlasUrl

rangerAuditDir

metastoreUrl

metastoreKeytab

metastorePrincipal

atlasUrl=application-properties/atlas.rest.address;rangerAuditDir=ranger-env/xasecure.audit.destination.hdfs.dir;metastoreUrl=hive-site/hive.metastore.uris;metastoreKeytab=hive-site/hive.metastore.kerberos.keytab.file;metastorePrincipal=hive-site/hive.metastore.kerberos.principal
Job Status Refresh in seconds How often the profiler job status should refresh, in seconds. 15
Autoregister profilers Looks for the profilers in {Profilers local Dir} directory and install them (if not installed) at the time of startup. true
Profilers local Dir Local directory for the profilers. /usr/dss/current/profilers
Profilers DWH Dir The HDFS directory where DSS Profilers will store their metrics output. Ensure the dpprofiler user has full access to this directory. /user/dpprofiler/dwh
Profilers Hdfs Dir HDFS directory for the profilers. /apps/dpprofiler/profilers
Refresh table cron

The format is a standard CRON expression.

This will periodically refresh the metrics cache.

0 0/30 * * * ?
Refresh table retry Number of time profiler agent will retry to clear cache in case of error. 3
Partitioned table location for sensitive tags Metric name where Hive sensitive information is stored in partitioned format.
Important
Important
Do not modify.
hivesensitivitypartitioned
Partitioned table location for all sensitive tags Metric name where Hive sensitive information is stored.
Important
Important
Do not modify.
hivesensitivity
SPNEGO Cookie Name Cookie name that is returned to the client after successful SPNEGO authentication. dpprofiler.spnego.cookie
SPNEGO Signature Secret Secret for verifying and signing the generated cookie after successful authentication ***some***secret**
Submitter Batch Size Max number of assets to be submitted in one profiler job. 50
Submitter Max Jobs Number of profiler jobs active at a point in time. This is per profiler. 2
Submitter Job Scan Time Time in seconds after which the profiler looks for an asset in the queue and schedules the job if the queue is not empty. 30
Submitter Queue Size Max size of the profiler queue. After which it rejects any new asset submission request. 500
Livy Session Config

Specifies the configuration required for interactive Livy sessions the profiler creates.

These sessions will be swapped with new ones based on their lifetime. Lifetime of session is decided by the configurations below.

session.lifetime.minutes - Session lifetime in minutes after its creation before it will be swapped.

session.lifetime.requests - Maximum number of requests a session can process before it will swapped.

session.max.errors - Number of adjacent errors after which session will be swapped

There are two separate session.config sections describing interactive session’s Spark configurations. Both read and write has same schema and following Livy session properties can be specified here.

name,heartbeatTimeoutInSecond,driverMemory,driverCores,executorMemory,executorCores,numExecutors,queue

For more on above properties refer to Livy documentation.

The properties session.config.read.timeoutInSeconds and session.config.write.timeoutInSeconds specifies timeouts for requests using interactive session.

Notes:

session.starting.message and session.dead.message are for internal use.
Important
Important
Do not modify.

It is advisable to have a separate YARN queue for sessions created by the profiler.

session {
        	lifetime {
                    	minutes = 2880
                    	requests = 500
                    	}
        	max.errors = 20
        	starting.message = "java.lang.IllegalStateException: Session is in state starting"
        	dead.message = "java.lang.IllegalStateException: Session is in state dead"
        	config {
                    	read {
                                	name = "dpprofiler-read"                                                                                                   	heartbeatTimeoutInSecond = 172800
                                	timeoutInSeconds = 90
                                	driverMemory = "5G"
                                	driverCores = 4
                                	executorMemory = "4G"
                                	executorCores = 2
                                	numExecutors = 25
                                	queue = "profilerqueue"
                    	  	}
                    	write {
                                	name = "dpprofiler-write"
                                	heartbeatTimeoutInSecond = 172800
                                	timeoutInSeconds = 90
                                	driverMemory = "2G"
                                	driverCores = 2
                                	executorMemory = "1G"
                                	executorCores = 1
                                	numExecutors = 4
                                	queue = "profilerqueue"
                    	 	}
        	     	}
	}
Table 3. Advanced dpprofiler-env
Value Description Example
dpprofiler.conf.dir Configuration files directory. /etc/profiler_agent/conf
dpprofiler.data.dir

Data directory. If using h2, data is stored here.

/var/lib/profiler_agent

dpprofiler.http.port

Port where profiler agent runs. 21900
dpprofiler.kerberos.enabled True if Kerberos is enabled. false
dpprofiler.kerberos.keytab Profiler agent keytab location. /etc/security/keytabs/dpprofiler.kerberos.keytab
dpprofiler.kerberos.principal Profiler agent kerberos principal.

dpprofiler${principalSuffix}@REALM.COM

principalSuffix is a random string which is generated by Ambari for a cluster. This string is used to uniquely identify services on a cluster in case of multiple clusters being managed by single KDC

dpprofiler.log.dir Log Directory /var/log/profiler_agent
dpprofiler.pid.dir Pid Directory /var/run/profiler_agent
dpprofiler.spengo.kerberos.keytab SPNEGO keytab location. /etc/security/keytabs/spnego.service.keytab
dpprofiler.spnego.kerberos.principal SPNEGO Kerberos principal. HTTP/${FQDN}@REALM.COM

FQDN - fully qualified domain name of the machine

logback.content Content for logback.xml.
<configuration>

<conversionRule conversionWord="coloredLevel" converterClass="play.api.libs.logback.ColoredLevel" />

<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<file>{{dpprofiler_log_dir}}/application.log</file>
<encoder>
<pattern>%date [%level] from %logger in %thread - %message%n%xException</pattern>
</encoder>
</appender>

<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%coloredLevel %logger{15} - %message%n%xException{10}</pattern>
</encoder>
</appender>

<appender name="ASYNCFILE" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="FILE" />
</appender>

<appender name="ASYNCSTDOUT" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="STDOUT" />
</appender>

<logger name="play" level="INFO" />
<logger name="application" level="DEBUG" />

<!-- Off these ones as they are annoying, and anyway we manage configuration ourselves -->
<logger name="com.avaje.ebean.config.PropertyMapLoader" level="OFF" />
<logger name="com.avaje.ebeaninternal.server.core.XmlConfigLoader" level="OFF" />
<logger name="com.avaje.ebeaninternal.server.lib.BackgroundThread" level="OFF" />
<logger name="com.gargoylesoftware.htmlunit.javascript" level="OFF" />

<root level="WARN">
<appender-ref ref="ASYNCFILE" />
<appender-ref ref="ASYNCSTDOUT" />
</root>

</configuration>
Table 4. Custom dpprofiler-config
Value Description Example
dpprofiler.user User for Profiler Agent
Important
Important
Do not modify.
dpprofiler