Configure, Start, and Validate Apache Mahout
Before you can upgrade Apache Mahout, you must have first upgraded your HDP components to the latest version (in this case, 2.4.2). This section assumes that you have already upgraded your components for HDP 2.4.2. If you have not already completed these steps, return to Getting Ready to Upgrade and Upgrade 2.0 Components for instructions on how to upgrade your HDP components to 2.4.2.
Replace your configuration after upgrading. Copy
/etc/mahout/conf
from the template to the conf directory in mahout hosts.
To validate Mahout:
Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.
Export the required environment variables for Mahout:
export JAVA_HOME="your_jdk_home_install_location_here export HADOOP_HOME=/usr/hdp/current/hadoop-client export MAHOUT_HOME=/usr/hdp.current/mahout-client export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
Upload a few megabytes of natural-language plain text to the Linux server as
/tmp/sample-test.txt
.Transfer the sample-test.txt file to a subdirectory of the testusers's HDFS home directory.
hdfs dfs -mkdir /user/testuser/testdata hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
Enter the mahout command to convert the plain text file sample-test.txt into a sequence file stored in the output directory mahouttest:
mahout seqdirectory --input /user/testuser/testdata --output /user/testuser/mahouttest -ow --charset utf-8