Upgrading HDP Manually
Also available as:
PDF
loading table of contents...

Configure, Start, and Validate Apache Mahout

Before you can upgrade Apache Mahout, you must have first upgraded your HDP components to the latest version (in this case, 2.4.0). This section assumes that you have already upgraded your components for HDP 2.4.0. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.0 Components for instructions on how to upgrade your HDP components to 2.4.0.

Replace your configuration after upgrading. Copy /etc/mahout/conf from the template to the conf directory in mahout hosts.

To validate Mahout:

  1. Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.

  2. Export the required environment variables for Mahout:

    export JAVA_HOME="your_jdk_home_install_location_here
    export HADOOP_HOME=/usr/hdp/current/hadoop-client
    export MAHOUT_HOME=/usr/hdp.current/mahout-client
    export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin
    export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
  3. Upload a few megabytes of natural-language plain text to the Linux server as /tmp/sample-test.txt.

  4. Transfer the sample-test.txt file to a subdirectory of the testusers's HDFS home directory.

    hdfs dfs -mkdir /user/testuser/testdata
    hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
  5. Enter the mahout command to convert the plain text file sample-test.txt into a sequence file stored in the output directory mahouttest:

    mahout seqdirectory --input /user/testuser/testdata --output /user/testuser/mahouttest -ow --charset utf-8