Create a Collection for Tweets

  1. On a host with Solr Server installed, make sure that the SOLR_ZK_ENSEMBLE environment variable is set in /etc/solr/conf/solr-env.sh.
    For example:
    cat /etc/solr/conf/solr-env.sh
    export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr

    This is automatically set on hosts with a Solr Server or Gateway role in Cloudera Manager.

  2. If you are using Kerberos, kinit as the user that has privileges to create the collection:
    kinit solr@EXAMPLE.COM

    Replace EXAMPLE.COM with your Kerberos realm name.

  3. Generate the configuration files for the collection, including the tweet-specific schema.xml:
    solrctl instancedir --generate $HOME/cloudera_tutorial_tweets_config
                     cp /opt/cloudera/parcels/CDH/share/doc/search*/search-crunch/solr/collection1/conf/schema.xml $HOME/cloudera_tutorial_tweets_config/conf
  4. Upload the configuration to ZooKeeper:
    • Security Enabled:
      solrctl --jaas $HOME/jaas.conf instancedir --create cloudera_tutorial_tweets_config $HOME/cloudera_tutorial_tweets_config
    • Security Disabled:
      solrctl instancedir --create cloudera_tutorial_tweets_config $HOME/cloudera_tutorial_tweets_config
  5. Create a new collection with two shards (specified by the -s parameter) using the named configuration (specified by the -c parameter):
    solrctl collection --create cloudera_tutorial_tweets -s 2 -c cloudera_tutorial_tweets_config
  6. Verify that the collection is live. Open the Solr admin web interface in a browser by accessing the relevant URL:
    • TLS Enabled: https://search01.example.com:8985/solr/#/~cloud
    • TLS Disabled: http://search01.example.com:8983/solr/#/~cloud
    If you have Kerberos authentication enabled on your cluster, enter the credentials for the solr@EXAMPLE.COM principal when prompted. Replace search01.example.com with the name of any host running the Solr Server process. Look for the cloudera_tutorial_tweets collection to verify that it exists.
  7. Prepare the configuration for use with MapReduce:
    cp -r $HOME/cloudera_tutorial_tweets_config $HOME/cloudera_tutorial_tweets_mr_config