Search TutorialPDF version

Copy Sample Tweets to HDFS

  1. Copy the provided sample tweets to HDFS. These tweets will be used to demonstrate the batch indexing capabilities of Cloudera Search:
    • Security Enabled:
      kinit hdfs@EXAMPLE.COM
      
      hdfs dfs -mkdir -p /user/jdoe
      
      hdfs dfs -chown jdoe:jdoe /user/jdoe
      
      kinit jdoe@EXAMPLE.COM
      
      hdfs dfs -mkdir -p /user/jdoe/indir
      
      hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/jdoe/indir/
      
      hdfs dfs -ls /user/jdoe/indir
      
    • Security Disabled:
      sudo -u hdfs hdfs dfs -mkdir -p /user/jdoe
      
      sudo -u hdfs hdfs dfs -chown jdoe:jdoe /user/jdoe
      
      hdfs dfs -mkdir -p /user/jdoe/indir
      
      hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/jdoe/indir/
      
      hdfs dfs -ls /user/jdoe/indir
      
  2. Ensure that outdir is empty and exists in HDFS:
    hdfs dfs -rm -r -skipTrash /user/jdoe/outdir
    
    hdfs dfs -mkdir /user/jdoe/outdir
    
    hdfs dfs -ls /user/jdoe/outdir
    

The sample tweets are now in HDFS and ready to be indexed. Continue to the next section to index the sample tweets.

We want your opinion

How can we improve this page?

What kind of feedback do you have?