Data Steward Studio Installation and Upgrade
Also available as:

Install the Data Plane Profiler Agent

DSS requires that the DP Profiler Agent be installed on all custers. The Profiler is installed on the Ambari host, using an Ambari management pack (MPack). An MPack bundles service definitions, stack definitions, and stack add-on service definitions.

This task must be completed on all clusters to be used with DSS.
You must have root access to the Ambari Server host node to perform this task.
Prior to starting installation, you must have downloaded the required repository tarballs from the Hortonworks customer portal, following the instructions provided as part of the product procurement process. The repository tarballs for the Data Plane Profiler agent are different from the DSS app repository tarballs.
  1. Log in as root to an Ambari host on a cluster.
    ssh root@<ambari-ip-address>
  2. Install the Data Plane Profiler MPack by running the following command, replacing <mpack-file-name> with the name of the MPack.
    ambari-server install-mpack --mpack <mpack-file-name> --verbose 
  3. Restart the Ambari server.
    ambari-server restart
  4. Launch Ambari in a browser and log in.
    Default credentials are:
    • Username: admin
    • Password: admin
  5. Click Admin>Manage Ambari.

  6. Click Versions, and then do the following on the Versions page:
    1. Click the HDP version in the Name column.
    2. Change the Base URL path for the DSS service to point to the local repository, for example:
    URLs shown are for example purposes only. Actual URLs might be different.
  7. Click the Ambari logo to return to the main Ambari page.
  8. In the Ambari Services navigation pane, click Actions>Add Service.

    The Add Service Wizard displays.
  9. On the Choose Services page of the Wizard, select the Dataplane Profiler service to install in Ambari, and then follow the on-screen instructions.
    Other required services are automatically selected.
  10. When prompted to confirm addition of dependent services, give a positive confirmation to all.
    This adds other required services.
  11. On the Assign Masters page, you can choose the default settings.
  12. On the Customize Services page, fill out the database details and other required fields that are highlighted.
    Make sure to enter the credentials that you set while configuring the external database. Change the username profileragent to the values set in the external database.
    Make sure to add the database driver to the machine based on the external database that you configured.
  13. Complete the remaining installation wizard steps and exit the wizard.
  14. Ensure that all components required for your DataPlane Platform have started successfully.
    As part of the installation verification screen, an earlier version of DSS repositories might appear in the labels. You can ignore the version number in the version number and proceed further.
  15. Enable Knox SSO for DP Profiler Agent.
    1. Set dpprofiler.sso.knox.enabled to true in Advanced dpprofiler-env section in Ambari DP Profiler Configs.
    2. Run the following CLI command to export the Knox certificate:
      JAVA_HOME/bin/keytool -export -alias gateway-identity -rfc -file knox-pub-key.cert -keystore /usr/hdp/current/knox-server/data/security/keystores/gateway.jks

      When prompted, enter the Knox master password.

    3. After generating the certificate, paste the contents of the certificate in the dpprofiler.sso.knox.public.key field under Advanced dpprofiler-env properties of DP Profiler Configs in Ambari.
  16. Open the quick link of the profiler for service verification.
  17. Add /profilers to the quick link URL.
    If the quick link is xyz:21900, change it to xyz:21900/profilers.
    For non-Kerberized clusters, this request returns the list of all registered profilers. For kerberos-enabled clusters where Knox is not enabled for DP Profiler Agent, you will see an HTTP-401 response which is expected.
  18. After installing the profiler agent using Add Service Wizard in Ambari, the NodeManager hosts do not have the dpprofiler user. For Ambari to automatically create these users, restart all NodeManagers by going to Services->YARN->Restart NodeManagers (NodeManagers can be restarted in a rolling fashion - Ambari UI shows restart batching options)
    During DP Profiler Agent installation, two new Atlas types - dss_hive_column_profile_data and dss_hive_table_profile_data, are registered. These types contain attributes to store metrics computed by DSS profilers. In addition, existing Atlas types hive_table and hive_column are updated to add an additional attribute profileData. For hive_table type, attribute profileData is a reference to dss_hive_table_profile_data and for type hive_column, attribute profileData is a reference to dss_hive_column_profile_data.
    As part of installation of DataPlane Profiler Agent on HDP 3.x versions, make sure you enter the details of DP Profiler extra JARs when prompted as part of the advanced dpprofiler-env properties. To get the value of the version of the JARs, log in to the Livy machine and navigate to this location:
    Extract the details of the exact location with specific version details and paste in the Ambari section. Enter the value of the property as follows:
  19. If TDE zones are set up in the cluster and if any of the following locations fall within the TDE zones, the dpprofiler user must have Decrypt_EEK access to the Key/Keys used to encrypt that zone.
    • /user/dpprofiler
    • /ranger/audit/hiveServer2
    • /apps/dpprofiler
    • all locations of Hive tables
  20. In the Advanced dpprofiler-config section of the DP Profiler service in Ambari, make sure you enter the Zookeeper Connection String details.