Configure, Start, and Validate Apache Mahout
Replace your configuration after upgrading. Copy
/etc/mahout/conf
from the template to the conf
directory in mahout hosts.
To validate Apache Mahout:
Create a test user named "testuser" in the Linux cluster and in HDFS, and log in as that user.
hdfs dfs -put /tmp/sample-test.txt /user/testuser
Export the required environment variables for Mahout:
export JAVA_HOME="your_jdk_home_install_location_here export HADOOP_HOME=/usr/hdp/current/hadoop-client export MAHOUT_HOME=/usr/hdp.current/mahout-client export PATH="$PATH":$HADOOP_HOME/bin:$MAHOUT_HOME/bin export CLASSPATH="$CLASSPATH":$MAHOUT_HOME
Upload a few megabytes of natural-language plain text to the Linux server as
/tmp/sample-test.txt
.Transfer the
sample-test file
to a subdirectory of the testuser's HDFS home directory.hdfs dfs -mkdir /user/testuser/testdata hdfs dfs -put /tmp/sample-test.txt /user/testuser/testdata
Enter the mahout command to convert the plain text file
sample-test.txt
into a sequence file stored in the output directory mahouttest:mahout seqdirectory --input /user/testuser/testdata --output /user/testuser/mahouttest -ow --charset utf-8