Indexing a File Containing Tweets with Flume HTTPSource
HTTPSource lets you ingest data into Solr by POSTing a file using HTTP. HTTPSource sends data using a channel to a sink, in this case a SolrSink. For more information, see Flume Solr BlobHandler Configuration Options.
- Delete all existing documents in
Solr:
$ sudo /etc/init.d/flume-ng-agent stop $ solrctl collection --deletedocs collection3
- Comment out TwitterSource in /etc/flume-ng/conf/flume.conf and
uncomment
HTTPSource:
# comment out “agent.sources = twitterSrc” # uncomment “agent.sources = httpSrc”
- Restart the Flume
Agent:
$ sudo /etc/init.d/flume-ng-agent restart
- Send a file containing tweets to the
HTTPSource:
$ curl --data-binary \ @/usr/share/doc/search-0.1.4/examples/test-documents/sample-statuses-20120906-141433-medium.avro \ 'http://127.0.0.1:5140?resourceName=sample-statuses-20120906-141433-medium.avro' \ --header 'Content-Type:application/octet-stream' --verbose
- Check the log for status or
errors:
$ cat /var/log/flume-ng/flume.log
Use the Cloudera Search GUI at http://localhost:8983/solr/collection3/select?q=*%3A*&wt=json&indent=true to verify that new tweets have been ingested into Solr, as expected.
<< Starting Flume Agent | Indexing a File Containing Tweets with Flume SpoolingDirectorySource >> | |