Validate Mahout
To validate Mahout:
Create a test user named "testuser" on the client host, the Linux cluster, and in HDFS, then log in to the client host as user.
hdfs dfs -put /tmp/sample-test.txt /user/testuser
Export the required environment variables for Mahout:
export JAVA_HOME=<your jdk home install location here> export HADOOP_HOME=/usr/hdp/current/hadoop-client export MAHOUT_HOME=/usr/hdp/current/mahout-client export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
Upload a few megabytes of natural-language plain text to the client host as
/tmp/sample-test.txt
.Transfer the
sample-test.txt
file to a subdirectory of the testuser's HDFS home directory.hdfs dfs -mkdir /user/testuser/testdata hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
Create a mahout test output directory:
hdfs dfs -mkdir /user/testuser/mahouttest
Use the following command to instruct Mahout to convert the plain text file sample-test.txt into a sequence file that is in the output directory mahouttest:
mahout seqdirectory --input /user/testuser/testdata --output /user/ testuser/mahouttest -ow --charset utf-8