Command Line Upgrade
Also available as:
PDF
loading table of contents...

Configure and Validate Apache Mahout

[Note]Note

The su commands in this section use keywords to represent the Service user. For example, "hdfs" is used to represent the HDFS Service user. If you are using another name for your Service users, you need to substitute your Service user name in each of the su commands.

Replace your configuration after upgrading. Copy /etc/mahout/conf from the backup if it existed to the conf directory in mahout hosts.

To validate Apache Mahout:

  1. Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.

  2. Export the required environment variables for Mahout:

    export JAVA_HOME="your_jdk_home_install_location_here
    export HADOOP_HOME=/usr/hdp/current/hadoop-client
    export MAHOUT_HOME=/usr/hdp.current/mahout-client
    export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin
    export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
  3. Upload a few megabytes of natural-language plain text to the Linux server as /tmp/sample-test.txt.

  4. Transfer the sample-test.txt file to a subdirectory of the testusers's HDFS home directory.

    hdfs dfs -mkdir /user/testuser/testdata
    hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
  5. Enter the mahout command to convert the plain text file sample-test.txt into a sequence file stored in the output directory mahouttest:

    mahout seqdirectory --input /user/testuser/testdata --output /user/testuser/mahouttest -ow --charset utf-8