On-Premises Installation

After a brief introduction to Workload XM architecture, you prepare to install Workload XM 2.1.3 on-premises. You must meet Java heap requirements by setting properties in Cloudera Manager. You add the WXM package to Cloudera Manager, and deploy the WXM package following step-by-step procedures.

Introduction to Workload XM Installation

Workload XM on-premises must be installed in a dedicated cluster, separate from your development, test, or production workload clusters. This configuration minimizes the impact on the cluster and prevents the need to upgrade your workload clusters to meet the needs of Workload XM.

The following terminology is used in this guide:

  • Workload XM on-premises - A CDP cluster managed by Cloudera Manager where Workload XM is installed on-premises. This cluster is used exclusively for running Workload XM.

  • Workload Cluster - A CDH, CDP, or HDP cluster managed by Cloudera Manager where analytical workloads are run that Workload XM analyzes.

Architecture

Before installing Workload XM, familiarize yourself with the architecture. Workload XM on-premises consists of two or more clusters:
  • The cluster on which Workload WXM runs
  • One or more additional workload clusters where Telemetry Publisher is configured

Workload XM on-premises interacts with your other clusters. The Workload XM service, installed on-premises, is in the cluster on the left. The cluster contains other required services, which are listed in Prerequisites. On the right are the clusters that are running the workloads. Telemetry data is passed from these clusters to the Workload XM on-premises cluster.



Installation Prerequisites

Before you install Workload XM on-premises, meet the requirements listed in this documentation.

Hardware Requirements

In addition to the requirements for services you have installed, running Workload XM on-premises requires the following hardware:

Required number of nodes: 5

Each node must have:

  • 16 Cores
  • 64 GB RAM
  • 12 TB disk space

A dedicated disk for Telemetry Publisher is recommended. On the host cluster that you designate for the Telemetry Publisher Service role, using the dedicated disk prevents any issues sending data to WXM from affecting operations other than Telemetry Publisher.

Supported File systems

The following file systems are supported: HDFS, S3, and ADLS.

Operating System Requirements

Workload XM on-premises supports the following CentOS and RHEL operating system versions: 7.6, 7.5, 7.4, 7.3, 7.2

Supported Cloudera Versions for Running WXM

Workload XM on-premises runs on CDP Private Cloud Base 7.0.3 or later with Cloudera Manager 7.0.3 or later.



Supported Workload Clusters for Analysis

Workloads in the following HDP, CDH, and CM clusters are supported for analysis in Workload XM:

  • HDP 3.x clusters
  • CDH 5.x clusters:
    • CDH version 5.8 and later
    • Cloudera Manager version 5.15.1 and later
  • CDH 6.x clusters:
    • CDH version 6.1 and later
    • Cloudera Manager version 6.1 and later
  • CDP 7.x clusters:
    • Private Cloud Base 7.0.3 or later clusters
    • Cloudera Manager version 7.1.1 or later

Unsupported Versions

The following versions are not supported:

  • CDH 6.0 not supported
  • Cloudera Manager 6.0 and 7.0.3 not supported

Configuring Java Heap Requirements

To ensure the long-term success of a WXM deployment, configure the Java heap as follows:

  • Zookeeper

    Java Heap Size of ZooKeeper Server >= 4 GB

  • HBase

    Java Heap Size of HBase RegionServer(RegionServer Default Group) >= 16 GB

  • HDFS

    Java Heap Size of NameNode >= 4 GB

  • Phoenix

    Phoenix Query Server Max Heapsize >= 8 GB

To configure these options for the services listed above:

  1. In Cloudera Manager, click Clusters > ZooKeeper > Configuration, search for java heap.

  2. In Java Heap Size of ZooKeeper Server in Bytes, change the value to 4 GiB.

  3. Save changes.
  4. Repeat these steps for the other required Java heap property settings for HBase, HDFS, and Phoenix listed above.

Installed Services Requirements

In Cloudera Manager, click Clusters, and check that you have the following required services installed on the cluster where you plan to install Workload XM cluster:

  • HBase
  • HDFS
  • Hive
  • Hue - Hue is optional, but recommended for troubleshooting and data extraction
  • Impala
  • Phoenix
  • Zookeeper

Downloading Installation Files

From the Cloudera web site, you download files for installing Workload XM On-Premises to a machine on the same network as the on-premises cluster.

  1. Go to https://www.cloudera.com/downloads/workloadxm.html.

    If you do not have access to the downloads site, contact your administrator for login information.

  2. In Choose Installation Type, click WXM Parcel.
  3. In Select Parcel, click EL7 to install to an RHEL 7 host or the equivalent CentOS host.

    Advanced users who install WXM parcels by setting up local parcel repository can use manifest.json.

  4. Click Download Now, accept the license terms, and click Submit.
  5. Click WXM (Parcel).

    The parcel downloads.

  6. Click WXM (SHA).

    The SHA downloads.

  7. Scroll up on https://www.cloudera.com/downloads/workloadxm.html, and in Installation Type, choose the WXM CSD installation type, and then click Download Now.

  8. Click WXM (CSD) to download the Custom Service Descriptor file for installing the Workload XM service.

    You now have the files for installing Workload XM On-Premises.

  • Parcel: WXM-2.1.3.2.1.3-b9-7082632-el7.parcel
  • Parcel SHA: WXM-2.1.3.2.1.3-b9-7082632-el7.parcel.sha
  • CSD: WXM-2.1.3.2.1.3-b9-7082632.jar

Deploying the WXM Package to Cloudera Manager

From the download machine, you copy the WXM files that you downloaded to the cluster where you plan to install Workload XM on-premises. You copy the files to a Cloudera Manager (CM) directory, effectively deploying WXM to CM.

  1. Check the domain name in Cloudera Manager to ensure that you ssh to the Cloudera Manager Server host, and then ssh to that host.

    For example:
    ssh root@khanwxm2-1.khanwxm2.myhost
  2. On the machine where you downloaded the WXM package, copy the parcel and .sha file to the following Cloudera Manager Server host directory on the Workload XM on-premises cluster: /opt/cloudera/parcel-repo/
    For example, on the command line of your machine, in the directory where you downloaded the WXM package:
    scp WXM-2.1.3.2.1.3-b9-7082632-el7.parcel root@<CM Server host>:/opt/cloudera/parcel-repo/
    
    scp WXM-2.1.3.2.1.3-b9-7082632-el7.parcel.sha root@<CM Server host>:/opt/cloudera/parcel-repo/
  3. Copy the CSD named WXM-2.1.3.2.1.3-b9-7082632.jar that you downloaded to the following Cloudera Manager Server host directory: /opt/cloudera/csd
  4. In /opt/cloudera/parcel-repo, set ownership of the files you copied there using the following command:
    chown cloudera-scm:cloudera-scm WXM-*;
    chmod 644 WXM-*;
  5. In /opt/cloudera/csd, set ownership of the file you copied there as described in the last step.
  6. Restart the Cloudera Manager Server with the following command:
    service cloudera-scm-server restart

    After the restart, you need to log into Cloudera Manager.

  7. In Cloudera Manager, click Clusters, and locate Cloudera Management Service section in the lower left panel.

  8. Click Cloudera Management Service.

  9. In Cloudera Managment Service, click Actions > Restart.

  10. In Restart, click Restart.

Activating the Workload XM Parcel

  1. In Cloudera Manager, click Hosts > Parcels.



  2. In Parcels, scroll through the Parcel Name list to WXM, and click Distribute for the WXM parcel.

    The Distributed and Activated indicators appear for the WXM parcel.



  3. Click Activate.



  4. Click OK.

    Indicators show the WXM parcel as distributed and activated.

Securing WXM Service Data

Workload XM on-premises stores workload data in HDFS and HBase. The HDFS data is created on the root path and the directories have wxm:impala ownership. You need to configure Kerberos and TLS/SSL for secure storage of this data.

Configuring Kerberos

If you install Workload XM on a Kerberized environment, Workload XM must be able to create Phoenix tables for data storage. You must add the wxm user to the hbase.superuser property to create these tables as shown below:



Configuring TLS

Cloudera recommends that you configure your cluster to use auto-TLS to ease the process of configuring TLS/SSL.

SSL is supported in the following contexts:

  • Between the browser and the Workload XM UI
  • Between the Telemetry Publisher and Workload XM API
  • Between the Workload XM UI and the Workload XM API
  • Between the Workload XM Servers and Impala

Advanced TLS Configuration

Configure the TLS properties based on the edge that you want to encrypt. The tables below list the properties you configure to enable TLS on a number of edge connections. You can encrypt communication between the following components:

  • Browser connected to Workload XM UI
  • Console Server and other REST Clients connected to Admin API Server, API Server, Databus API Server
  • Pipeline Server, Analytic Database Server, Entities Server, Databus Server, SDX Server connected to Impala Server
Encryption between the browser and Workload XM UI
TLS Parameters Component on which to set property
Property Value
TLS/SSL Server Private Key File (PEM) ssl.privatekey.path Console Server
TLS/SSL Server Certificate File (PEM) ssl.cert.path Console Server
TLS/SSL Private Key Password ssl.privatekey.password Console Server
Enable TLS/SSL ssl.enabled

Console Server

For example, to encrypt communications between the browser and Workload XM UI (column 3), on the Console Server, set the properties to the values shown in the table.

The following table shows the parameters to set and where to set them to encrypt a connection from the Console Server and other REST clients to several servers: Admin API Server, API Server, and Databus API Server.

Encryption between the Console Server plus other REST clients and the Admin API Server, API Server, and Databus API Server
TLS Parameters Component on which to set property
Property Value
TLS/SSL Certificate Trust Store File ssl.cacert.path Console Server
ssl.trustStore.path Admin API Server
Enable TLS/SSL ssl.enabled Admin API Server

API Server

Databus API Server

TLS/SSL Server JKS Keystore File Location ssl.keyStore.path

Admin API Server

API Server

Databus API Server

TLS/SSL Server JKS Keystore File Password ssl.keyStore.password

Admin API Server

API Server

Databus API Server

TLS/SSL Server JKS Keystore Key Password ssl.keyManager.password

Admin API Server

API Server

Databus API Server

The following table shows the parameters to set and where to set them to encrypt a connection from the Pipeline Server and several other servers to the Phoenix Server and Impala Server.

Encryption between the Pipeline Server and several other servers to the Impala Server
TLS Parameters Component on which to set property
Property Value
TLS/SSL Client Trust Store File ssl.trustStore.path Pipelines Server

Analytic Database Server

Entities Server

Databus Server

SDX Server
TLS/SSL Client Trust Store Password ssl.trustStore.password Pipelines Server

Analytic Database Server

Entities Server

Databus Server

SDX Server

Adding Safety Valves

In addition to setting the ZooKeeper maxClientCnxns to 300, you need to add properties to the Phoenix and HBase site.xml files using the Cloudera Manager Safety Valve. You must have the Cluster Administrator or Full Administrator role to complete this task.

  1. If you have not yet set ZooKeeper maxClientCnxns to 300, do so now, and save changes.
  2. In Cloudera Manager, click Clusters > HBase > Configuration, search for snippet, and find the following safety valve: HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml



  3. Click View as XML and add the following text before or after the existing text:
    <property>
       <name>hbase.regionserver.wal.codec</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
       <description>Set hbase.regionserver.wal.codec to enable custom Write Ahead Log ("WAL") edits to be written</description>
    </property>
       <property>
    <name>hbase.region.server.rpc.scheduler.factory.class</name>
    <value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
       <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
    </property>
    <property>
       <name>hbase.rpc.controllerfactory.class</name>
    <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
       <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
    </property>
    <property>
      <name>phoenix.functions.allowUserDefinedFunctions</name>
       <value>true</value>
       <description>enable UDF functions</description>
    </property>
    <property>
       <name>phoenix.queryserver.serialization</name>
       <value>JSON</value>
       <description>serialization format between client and query server</description>
    </property>
    <property><name>hbase.server.keyvalue.maxsize</name>
       <value>52428800</value>
       <description>limits max file size for blobs</description>
    </property>
    <property>
       <name>phoenix.schema.isNamespaceMappingEnabled</name><value>true</value>
    </property>
    <property>
       <name>hbase.ipc.server.max.callqueue.size</name>
    <value>2147483</value>
    </property>

    Save changes.

  4. Search for the WAL Codec Class property and confirm that the property is set to the following value:
    org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec



  5. Search for Maximum Size of HBase Client KeyValue, and set the property to 50 Mib.



  6. Change the HBase RegionServer Handler Count to 40.

  7. Change the HStore Blocking Store Files to 100.



  8. If you are installing Workload XM on a Kerberized environment, make sure the wxm user is added to the hbase.superuser property, as described in the Security section above.



    Save changes.

  9. Click Clusters > Phoenix > Configuration, search for snippet and find the following safety valve: Query Server Advanced Configuration Snippet (Safety Valve) for phoenix-site.xml.



  10. Click View as XML and enter the following text:
    <property>
       <name>phoenix.queryserver.serialization</name>
       <value>JSON</value>
       <description>serialization format between client and query server</description>
    </property>
    <property>
       <name>phoenix.schema.isNamespaceMappingEnabled</name>
       <value>true</value>
    </property>

Save changes.

Restart Stale Configurations

  1. Click CLOUDERA Manager in the upper-left corner to go to Home.



  2. On Cloudera Manager > Status, on the cluster options menu , click Deploy Client Configuration.

  3. In Deploy Client Connection, click Deploy Client Configuration.

  4. Monitor the progress of the client configuration deployment until you see the success message.

    Indicators appear for services that need to be restarted.



  5. You must restart Phoenix, HBase. From the options menu , you can restart the services individually. Alternatively, you can click a service Stale Configuration indicator > Restart Stale Services > Restart Now.



  6. Click Finish.

Add Phoenix Query Server Hosts

  1. Click Clusters > Phoenix, and from the Actions menu, select Add Role Instances.

  2. Click inside the text box labeled Query Server xn.
  3. Select all hosts in the Hostname column to add the Query Server role to any node that does not have one. For example:

  4. Click OK, and then Continue to finish this task.
  5. In Clusters > Phoenix > Instances > Actions, start all Query Servers.

Deploying Workload XM

You must meet a few prerequisites before you can successfully deploy Workload XM. You check that you activated the WXM parcel and added Query Servers to each host in the on-premises, installation cluster. To make the deployment go smoothly, you gather some information about the location of two hosts.

Prerequisites

  • Check that you distributed and activated the WXM parcel.
  • Add Phoenix Query Server hosts

    Click Hosts > Roles and look for a Phoenix Query Server (QS) hosts on each host. If each host does not have QS role, add one as described above.

  • Phoenix Query Server host name
    • In Cloudera Manager > Clusters > Phoenix > Instances, make a note of the name of one of your Query Server hosts.

  • Impala Daemon host name
    • In Clusters > Impala > Instances, find the name of one of the Impala Daemon hosts, and make a note of it.

Select Dependencies

  1. Click CLOUDERA Manager to go to Home.

  2. In Status, on the cluster options menu at the top, click Add Service.



  3. Select Workload XM in the list of services.

  4. Click Continue.

Assign Roles, Review changes

Assuming Query Server roles exist on all your nodes, you skip assigning roles now; otherwise, do these steps as described in Add Phoenix Query Server Hosts.

  1. In Add Workload XM Service - Assign Roles, click Continue.
  2. In Add Workload XM Service to Cluster 1 - Review Changes, in Phoenix Query Server Host click +.

  1. Add required Phoenix Query Server Host name you noted as a prerequisite for this procedure.
  2. In Impala Daemon Host, enter the host name you noted as a prerequisite for this procedure.



  3. Scroll down the Review Changes page and look at the Java Heap requirements of servers, and other properties, you can adjust, and then click Continue.

You can see much memory is being allocated for components. If there is insufficient memory, Cloudera Manager lowers the default and shows a warning. The following table contains the recommended heap sizes for various components:

Service Heap Size
Analytic Database Server 16 GB
API Server 4 GB
Admin API Server 2 GB
Baseline Server 8 GB
Databus API Server 4 GB
Databus Server 2 GB
Entities Server 8 GB
Pipelines Server 16 GB

On the bottom of the Review Changes page, you can edit these properties:

  • Console Service TLS/SSL Server Private Key File. Enter the location of your private key file.
  • Console Service TLS/SSL Server Certificate File. Enter the location of the certificate.
  • Console Service TLS/SSL Private Key password. Enter the password for your private key.
  • Leave the Console Service TLS/SSL Server CA Certificate text box blank.



Finish Deploying Workload XM

  1. Watch the progress of the deployment, and when complete, click Continue.



  2. When complete, click Finish.

  3. In Cloudera Management Service Actions, click Restart.

  4. Click Restart.

    The Workload XM service appears in the list of services.


  5. Lay out components.

Laying Out Components

Cloudera recommends that you horizontally scale WXM by placing components on nodes as shown in the table below. You can improve performance by placing components as recommended.

The following table shows an example of how to lay out components for scalability:

Service Node 1

(All master components of all services)

Node 2, 3, 4

(Worker nodes + ZK + WXM processing components)

Node 5

(Worker nodes + WXM processing components + WXM UI )

Cloudera Management
  • Alert Publisher
  • Event Server
  • Host Monitor
  • Reports Manager
  • Service Monitor
   
HBase
  • Gateway
  • Master
  • Thrift Server (optional)
  • Gateway
  • RegionServer
  • Gateway
  • RegionServer
HDFS
  • Balancer
  • Gateway
  • NameNode
  • NFS Gateway (optional)
  • SecondaryNameNode
  • DataNode
  • Gateway
  • DataNode
  • Gateway
Hive
  • Gateway
  • Metastore Server
  • HiveServer2
  • Gateway
  • Gateway
Hue (Optional)
  • Load Balancer
  • Hue Server
   
Impala
  • Catalog Server
  • StateStore
  • Impala Daemon
  • Impala Daemon
Phoenix
  • Query Server
  • Query Server
  • Query Server
WXM  
  • Analytic Database Server
  • Baseline Server
  • Databus API Server
  • Databus Server
  • Entities Server
  • Pipelines Server
  • SDX Server
  • Admin API Server
  • Analytic Database Server
  • API Server
  • Baseline Server
  • Console Server
  • Databus API Server
  • Databus Server
  • Entities Server
  • Pipelines Server
  • SDX Server
Zookeeper  
  • Server
 

To lay out components:

  1. In Cloudera Manager, click Hosts > Roles.

    The roles assigned to each node appear. For example:


  2. Optionally spread WXM roles throughout the cluster to leverage resources. The table above shows an example layout.
  3. Configure multiple Phoenix Query Server hosts. The number of Phoenix Query Server hosts should be proportional to the number of WXM roles. For example, if you have roles on 5 nodes, at least 5 query servers are recommended for Phoenix. WXM internally balances loads on those hosts. You can configure only one host for Impala.

  4. Observe the following guidelines when laying out components:
    • One node must include all WXM role types (node 5 in the table).
    • Databus API Server, Databus Server, Analytic Database Server, Baseline Server, Entities Server, SDX Server, and Pipelines Server can scale out to multiple nodes (Nodes 2, 3, 4 in the table).
    • When scaling out these role types, group roles as follows:
      • Databus API Server, Databus Server
      • Analytic Database Server, Baseline Server, Entities Server, SDX Server, and Pipelines Server. For example, if you add a new Databus API Server, you must also add a new Databus Server to that same node.

Troubleshooting

Failure Creating Phoenix Schema

You may encounter a stack trace that looks like this:

Role Log

at org.apache.phoenix.shaded.org.eclipse.jetty.util.thread.strategy.ExecuteProduceCon
sume.executeProduceConsume(ExecuteProduceConsume.java:303)

at org.apache.phoenix.shaded.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

at org.apache.phoenix.shaded.org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)

at org.apache.phoenix.shaded.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

at org.apache.phoenix.shaded.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: ERROR 725 (43M08): Cannot create schema because config phoenix.schema.isNamespaceMappingEnabled for enabling name space mapping isn't enabled. schemaName=SIGMA_DB

The solution to this is to add the following property in the hbase-site safety valve:

<property><name>phoenix.schema.isNamespaceMappingEnabled</name><value>true</value></property>

After you add the property, redeploy the client configurations and restart HBase and the dependent services.

Installation Fails with Message

  1. The installation fails with the a message to the effect that a "getpwnam()" error was found.

    Verify whether the parcel is correctly distributed and activated.

  2. The parcels page shows the following errors while distributing the parcel:



    This error happens when the parcel for Workload XM was placed in an incorrect directory and cloudera-scm-server was restarted. Remove the parcel from any unwanted directory and place it back in the parcel-repo directory. Then restart cloudera-scm-server and cloudera-scm-agent using the following commands:
    • service cloudera-scm-server restart
    • service cloudera-scm-agent restart