Copy Sample Tweets to HDFS

  1. Copy the provided sample tweets to HDFS. These tweets will be used to demonstrate the batch indexing capabilities of Cloudera Search:
    • Security Enabled:
      kinit hdfs@EXAMPLE.COM
      hdfs dfs -mkdir -p /user/jdoe
      hdfs dfs -chown jdoe:jdoe /user/jdoe
      kinit jdoe@EXAMPLE.COM
      hdfs dfs -mkdir -p /user/jdoe/indir
      hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/jdoe/indir/
      hdfs dfs -ls /user/jdoe/indir
    • Security Disabled:
      sudo -u hdfs hdfs dfs -mkdir -p /user/jdoe
      sudo -u hdfs hdfs dfs -chown jdoe:jdoe /user/jdoe
      hdfs dfs -mkdir -p /user/jdoe/indir
      hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/jdoe/indir/
      hdfs dfs -ls /user/jdoe/indir
  2. Ensure that outdir is empty and exists in HDFS:
    hdfs dfs -rm -r -skipTrash /user/jdoe/outdir
    hdfs dfs -mkdir /user/jdoe/outdir
    hdfs dfs -ls /user/jdoe/outdir

The sample tweets are now in HDFS and ready to be indexed. Continue to the next section to index the sample tweets.