Command Line Upgrade
Also available as:
PDF
loading table of contents...

Configure, Start, and Validate Apache Mahout

[Note]Note

The su commands in this section use keywords to represent the Service user. For example, "hdfs" is used to represent the HDFS Service user. If you are using another name for your Service users, you need to substitute your Service user name in each of the su commands.

Replace your configuration after upgrading. Copy /etc/mahout/conf from the template to the conf directory in mahout hosts.

To validate Apache Mahout:

  1. Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.

  2. Export the required environment variables for Mahout:

    export JAVA_HOME="your_jdk_home_install_location_here"
    export HADOOP_HOME=/usr/hdp/current/hadoop-client
    export MAHOUT_HOME=/usr/hdp.current/mahout-client
    export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin
    export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
  3. Upload a few megabytes of natural-language plain text to the Linux server as /tmp/sample-test.txt.

  4. Transfer the sample-test.txt file to a subdirectory of the testusers's HDFS home directory.

    hdfs dfs -mkdir /user/testuser/testdata
    hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
  5. Set up mahout to convert the plain text file sample-test.txt into a sequence file that is in the output directory mahouttest.

    mahout seqdirectory --input /user/testuser/sample-test.txt --output /user/testuser/mahouttest --charset utf-8