Create a collection for tweets

In this part of the Cloudera Search tutorial, you create a collection for tweets.

The remaining examples in the tutorial use the same collection, so make sure that you follow these instructions carefully.
  1. On a host with Solr Server installed, make sure that the SOLR_ZK_ENSEMBLE environment variable is set in /etc/solr/conf/solr-env.sh.
    For example:
    cat /etc/solr/conf/solr-env.sh
    export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr

    This is automatically set on hosts with a Solr Server or Gateway role in Cloudera Manager.

  2. If you are using Kerberos, kinit as the user that has privileges to create the collection.
    For example:
    kinit solr@EXAMPLE.COM

    Replace solr@EXAMPLE.COM with your user name and Kerberos realm name respectively.

  3. Generate the configuration files for the collection, including the tweet-specific managed-schema:
    solrctl instancedir --generate $HOME/cloudera_tutorial_tweets_config
                     cp /opt/cloudera/parcels/CDH/share/doc/search*/search-crunch/solr/collection1/conf/schema.xml $HOME/cloudera_tutorial_tweets_config/conf/managed_schema
    To the overwrite confirmation prompt:
    cp: overwrite 'cloudera_tutorial_tweets_config/conf/managed-schema'?
    type 'y'.
  4. Upload the configuration to ZooKeeper:
    solrctl config --upload cloudera_tutorial_tweets_config $HOME/cloudera_tutorial_tweets_config
  5. Create a new collection with two shards (specified by the -s parameter) using the named configuration (specified by the -c parameter):
    solrctl collection --create cloudera_tutorial_tweets -s 2 -c cloudera_tutorial_tweets_config
  6. Verify that the collection is live. Open the Solr admin web interface in a browser by accessing the relevant URL:
    • https://search01.example.com:8985/solr/#/~cloud
    If you have Kerberos authentication enabled on your cluster, enter the credentials for the solr@EXAMPLE.COM principal when prompted. Replace search01.example.com with the name of any host running the Solr Server process. Look for the cloudera_tutorial_tweets collection to verify that it exists.
  7. Prepare the configuration for use with MapReduce:
    cp -r $HOME/cloudera_tutorial_tweets_config $HOME/cloudera_tutorial_tweets_mr_config