Copy sample tweets to HDFS

Copy the provided sample tweets to HDFS. These tweets are used to demonstrate the batch indexing capabilities of Cloudera Search.

  1. Copy the provided sample tweets to HDFS:
    Security Enabled:
    1. kinit [***hdfs@EXAMPLE.COM***]
    2. hdfs dfs -mkdir -p /user/[***USER***]
    3. hdfs dfs -chown [***USER***]:[***GROUP***] /user/[***USER***]
    4. kinit [***USER@EXAMPLE.COM***]
    5. hdfs dfs -mkdir -p /user/[***USER***]/indir
    6. hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/[***USER***]/indir/
    7. hdfs dfs -ls /user/[***USER***]/indir
    Security Disabled: Run the following commands as [***USER***]:
    sudo -u hdfs hdfs dfs -mkdir -p /user/[***USER***]
    sudo -u hdfs hdfs dfs -chown [***USER***]:[***GROUP***] /user/[***USER***]
    hdfs dfs -mkdir -p /user/[***USER***]/indir
    hdfs dfs -put /opt/cloudera/parcels/CDH/share/doc/search*/examples/test-documents/sample-statuses-*.avro /user/[***USER***]/indir/
    hdfs dfs -ls /user/[***USER***]/indir
  2. Ensure that outdir is empty and exists in HDFS:
    hdfs dfs -rm -r -skipTrash /user/[***USER***]/outdir
    hdfs dfs -mkdir /user/[***USER***]/outdir
    hdfs dfs -ls /user/[***USER***]/outdir

The sample tweets are now in HDFS and ready to be indexed. Continue to the next section to index the sample tweets.