Managing Cloudera Search Configuration

Cloudera Search configuration is primarily controlled by several configuration files, some of which are stored in Apache ZooKeeper:

  • solr.xml

    This file is stored in ZooKeeper, and controls global properties for Apache Solr. To edit this file, you must download it from ZooKeeper, make your changes, and then upload the modified file back to ZooKeeper using the solrctl cluster command. For information about the solr.xml file, see Solr.xml 4.4 and beyond in the Solr Wiki.

  • solrconfig.xml

    Each collection in Solr uses a solrconfig.xml file, stored in ZooKeeper, to control collection behavior. For information about the solrconfig.xml file, see solrconfig.xml in the Solr Wiki.

  • schema.xml

    This file, also stored in ZooKeeper and assigned to a collection, defines the schema for the documents you are indexing. For example, it specifies which fields to index, the expected data type for each field, the default field to query when the field is unspecified, and so on. For information about the schema.xml file, see SchemaXml in the Solr Wiki.

  • core.properties

    Unlike other configuration files, this file is stored in the local filesystem rather than ZooKeeper, and is used for core discovery. For more information on this process and the structure of the file, see Core Discovery (4.4 and beyond) in the Solr Wiki.

Managing Configuration Using config or instancedir

The solrctl utility includes the config and instancedir commands for managing configuration. Configs and instance directories refer to the same thing: named configuration sets used by collections, as specified by the solrctl collection --create -c <configName> command.

Although configs and instance directories are functionally identical from the perspective of the Solr server, there a number of important administrative differences between these two implementations:

Config and instancedir Comparison
Attribute Config instancedir
Security
  • No ZooKeeper security support. Any user can create, delete, or modify an instancedir directly in ZooKeeper.
  • Because instancedir updates ZooKeeper directly, it is the client's responsibility to add the proper ACLs, which can be cumbersome.
Creation method Generated from existing configs or instance directories in ZooKeeper using the ConfigSets API. Manually edited locally and re-uploaded directly to ZooKeeper using solrctl utility.
Template support
  • Several predefined templates are available. These can be used as the basis for creating additional configs. Additional templates can be created by creating configs that are immutable.
  • Mutable configs that use a managed schema can only be modified using the Schema API as opposed to being manually edited. As a result, configs are less flexible, but they are also less error-prone than instance directories.
One standard template.
Sentry support Configs include a number of templates, each with Sentry-enabled and non-Sentry-enabled versions. To enable Sentry, choose a Sentry-enabled template. Instance directories include a single template that supports enabling Sentry. To enable Sentry with instancedirs, overwrite the original solrconfig.xml file with solrconfig.xml.secure as described in Enabling Solr as a Client for the Sentry Service Using the Command Line.

Managing Configs

You can manage configuration objects directly using the solrctl config command, which is a wrapper script for the ConfigSets API.

Configs are named configuration sets that you can reference when creating collections. The solrctl config command syntax is as follows:

solrctl config [--create <name> <baseConfig> [-p <name>=<value>]...]
               [--delete <name>]
  • --create <name> <baseConfig>: Creates a new config based on an existing config. The config is created with the specified <name>, using <baseConfig> as the template. For more information about config templates, see Config Templates.
    • -p <name>=<value>: Overrides a <baseConfig> setting. The only config property that you can override is immutable, so the possible options are -p immutable=true and -p immutable=false. If you are copying an immutable config, such as a template, use -p immutable=false to make sure that you can edit the new config.
  • --delete <name>: Deletes the specified config. You cannot delete an immutable config without accessing ZooKeeper directly as the solr super user.

If you are using Apache Sentry, you must have permissions for the specific config you are creating or deleting, as well as the special admin collection.

Managing Instance Directories

An instance directory is a named set of configuration files. You can generate an instance directory template locally, edit the configuration, and then upload the directory to ZooKeeper as a named configuration set. You can then reference this named configuration set when creating a collection.

Creating configuration sets using instance directories cannot be restricted using Sentry. If you want to control access to configuration sets, you must enable ZooKeeper ACLs and use configs instead.

The solrctl instancedir command syntax is as follows:

solrctl instancedir [--generate <path> [-schemaless]]
                    [--create <name> <path>]
                    [--update <name> <path>]
                    [--get <name> <path>]
                    [--delete <name>]
                    [--list]
  • --generate <path>: Generates an instance directory template on the local filesystem at <path>. The configuration files are located in the conf subdirectory under <path>.
  • --create <name> <path>: Uploads a copy of the instance directory from <path> on the local filesystem to ZooKeeper. If an instance directory with the specified <name> already exists, this command fails. Use --update to modify existing instance directories.
  • --update <name> <path>: Overwrites an existing instance directory in ZooKeeper using the specified files on the local filesystem. This command is analogous to first running --delete <name> followed by --create <name> <path>.
  • --get <name> <path>: Downloads the specified instance directory from ZooKeeper to the specified path on the local filesystem. You can then edit the configuration and then re-upload it using --update.
  • --delete <name>: Deletes the specified instance directory from ZooKeeper.
  • --list: Lists existing instance directories as well as configs created by the solrctl config command.

Securing Configs with ZooKeeper ACLs and Sentry

You can restrict access to configuration sets by setting ZooKeeper ACLs on all znodes under and including /solr and using Sentry to control access to the ConfigSets API. Sentry requires Kerberos authentication.

The solrctl instancedir command interacts directly with ZooKeeper, and therefore cannot be protected by Sentry. Because the solrctl config command is a wrapper script for the ConfigSets API, it can be protected by Sentry.

To force users to use the ConfigSets API, you must set all ZooKeeper znodes under and including /solr to read-only (except the solr user):

  1. Create a jaas.conf file containing the following:
    Client {
      com.sun.security.auth.module.Krb5LoginModule required
      useKeyTab=false
      useTicketCache=true
      principal="solr@EXAMPLE.COM";
    };
    

    Replace EXAMPLE.COM with your Kerberos realm name.

  2. Set the LOG4J_PROPS environment variable to a log4j.properties file:
    export LOG4J_PROPS=/etc/zookeeper/conf/log4j.properties
  3. Set the ZKCLI_JVM_FLAGS environment variable:
    export ZKCLI_JVM_FLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf \
    -DzkACLProvider=org.apache.solr.common.cloud.ConfigAwareSaslZkACLProvider \
    -Droot.logger=INFO,console"
  4. Authenticate as the solr user:
    kinit solr@EXAMPLE.COM

    Replace EXAMPLE.COM with your Kerberos realm name.

  5. Run the zkcli.sh script as follows:
    • Cloudera Manager:
      /opt/cloudera/parcels/CDH/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr
    • Unmanaged:
      /usr/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr

    Replace zk01.example.com with the hostname of a ZooKeeper server.

After completing these steps, you cannot run commands such as solrctl instancedir --create or solrctl instancedir --delete without first authenticating as the solr@EXAMPLE.COM super user principal. Unauthenticated users can still run solrctl instancedir --list and solrctl instancedir --get, because those commands only perform read operations against ZooKeeper.

After setting ZooKeeper ACLs, you must configure Sentry to allow users to create and delete configs. For instructions on configuring Sentry for configs, see Configuring Sentry Authorization for Cloudera Search.

Config Templates

Configs can be declared as immutable, which means they cannot be deleted or have their Schema updated by the Schema API. Immutable configs are uneditable config templates that are the basis for additional configs. After a config is made immutable, you cannot change it back without accessing ZooKeeper directly as the solr (or solr@EXAMPLE.COM principal, if you are using Kerberos) super user.

Solr provides a set of immutable config templates. These templates are only available after Solr initialization, so templates are not available in upgrades until after Solr is initialized or re-initialized. Templates include:

Available Config Templates and Attributes
Template Name Supports Schema API Uses Schemaless Solr Supports Sentry
predefinedTemplate Feature Absent Feature Absent Feature Absent
managedTemplate Feature Present Feature Absent Feature Absent
schemalessTemplate Feature Present Feature Present Feature Absent
predefinedTemplateSecure Feature Absent Feature Absent Feature Present
managedTemplateSecure Feature Present Feature Absent Feature Present
schemalessTemplateSecure Feature Present Feature Present Feature Present

Config templates are managed using the solrctl config command. For example:

  • To create a new config based on the predefinedTemplateSecure template:
    solrctl config --create newConfig predefinedTemplateSecure -p immutable=false
  • To create a new template (immutable config) from an existing config:
    solrctl config --create newTemplate existingConfig -p immutable=true

Updating the Schema in a Solr Collection

If your collection was configured using an instance directory, you can download the instance directory, edit schema.xml, then re-upload it to ZooKeeper. For instructions, see Managing Instance Directories.

If your collection was configured using a config, you can update the schema using the Schema API. For information on using the Schema API, see the Schema API section of Apache Solr Reference Guide 4.10 (PDF).