18. Configure, Start, and Validate Apache Mahout

Replace your configuration after upgrading. Copy /etc/mahout/conf from the template to the conf directory in mahout hosts.

To validate Mahout:

  1. Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.

    hdfs dfs -put /tmp/sample-test.txt /user/testuser

  2. Export the required environment variables for Mahout:

    export JAVA_HOME="your_jdk_home_install_location_here
    export HADOOP_HOME=/usr/hdp/current/hadoop-client
    export MAHOUT_HOME=/usr/hdp.current/mahout-client
    export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin
    export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
  3. Upload a few megabytes of natural-language plain text to the Linux server as /tmp/sample-test.txt.

  4. Transfer the sample-test file to a subdirectory of the testuser's HDFS home directory.

    hdfs dfs -mkdir /user/testuser/testdata
    hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata

  5. Enter the mahout command to convert the plain text file sample-test.txt into a sequence file stored in the output directory mahouttest:

    mahout seqdirectory --input /user/testuser/testdata --output /user/testuser/mahouttest -ow --charset utf-8

loading table of contents...