Managing Cloudera Search Configuration
Cloudera Search configuration is primarily controlled by several configuration files, some of which are stored in Apache ZooKeeper:
- solr.xml
This file is stored in ZooKeeper, and controls global properties for Apache Solr. To edit this file, you must download it from ZooKeeper, make your changes, and then upload the modified file back to ZooKeeper using the solrctl cluster command. For information about the solr.xml file, see Solr Configuration Files and Solr Cores and solr.xml in the Solr documentation.
- solrconfig.xml
Each collection in Solr uses a solrconfig.xml file, stored in ZooKeeper, to control collection behavior. For information about the solrconfig.xml file, see Solr Configuration Files and Configuring solrconfig.xml in the Solr documentation.
- managed-schema or schema.xml
In CDH 6, Cloudera recommends using a managed schema, and making schema changes using the Schema API (Apache Solr documentation). Collections in CDH 6 use either a managed schema or the legacy schema.xml file. These files, also stored in ZooKeeper and assigned to a collection, define the schema for the documents you are indexing. For example, they specify which fields to index, the expected data type for each field, the default field to query when the field is unspecified, and so on. For information about managed-schema and schema.xml, see Schema Factory Definition in SolrConfig in the Solr documentation.
- core.properties
Unlike other configuration files, this file is stored in the local filesystem rather than ZooKeeper, and is used for core discovery. For more information on this process and the structure of the file, see Defining core.properties in the Solr documentation.
Managing Configuration Using Configs or Instance Directories
The solrctl utility includes the config and instancedir commands for managing configuration. Configs and instance directories refer to the same thing: named configuration sets used by collections,as specified by the solrctl collection --create -c <configName> command.
Although configs and instance directories are functionally identical from the perspective of the Solr server, there a number of important administrative differences between these two implementations:
Attribute | Config | Instance Directory |
---|---|---|
Security |
|
|
Creation method | Generated from existing configs or instance directories in ZooKeeper using the ConfigSets API. | Manually edited locally and re-uploaded directly to ZooKeeper using solrctl utility. |
Template support |
|
One standard template. |
Sentry support | Configs include a number of templates, each with Sentry-enabled and non-Sentry-enabled versions. To enable Sentry, choose a Sentry-enabled template. | Instance directories include a single template that supports enabling Sentry. To enable Sentry with instancedirs, overwrite the original solrconfig.xml file with solrconfig.xml.secure as described in Enabling Sentry for a Solr Collection. |
Managing Configs
You can manage configuration objects directly using the solrctl config command, which is a wrapper script for the ConfigSets API.
Configs are named configuration sets that you can reference when creating collections. The solrctl config command syntax is as follows:
solrctl config [--create <name> <baseConfig> [-p <name>=<value>]...] [--delete <name>]
- --create <name> <baseConfig>: Creates a new config based on an existing
config. The config is created with the specified <name>, using <baseConfig> as the template. For more information
about config templates, see Config Templates.
- -p <name>=<value>: Overrides a <baseConfig> setting. The only config property that you can override is immutable, so the possible options are -p immutable=true and -p immutable=false. If you are copying an immutable config, such as a template, use -p immutable=false to make sure that you can edit the new config.
- --delete <name>: Deletes the specified config. You cannot delete an immutable config without accessing ZooKeeper directly as the solr super user.
If you are using Apache Sentry, you must have permissions for the specific config you are creating or deleting, as well as the admin=collections privilege object.
Managing Instance Directories
An instance directory is a named set of configuration files. You can generate an instance directory template locally, edit the configuration, and then upload the directory to ZooKeeper as a named configuration set. You can then reference this named configuration set when creating a collection.
Creating configuration sets using instance directories cannot be restricted using Sentry. If you want to control access to configuration sets, you must enable ZooKeeper ACLs and use configs instead.
The solrctl instancedir command syntax is as follows:
solrctl instancedir [--generate <path> [-schemaless]] [--create <name> <path>] [--update <name> <path>] [--get <name> <path>] [--delete <name>] [--list]
- --generate <path>: Generates an instance directory template on the local filesystem at <path>. The configuration files are located in the conf subdirectory under <path>.
- -schemaless: Generates a schemaless instance directory template. For more information on schemaless support, see Schemaless Mode Overview and Best Practices.
- --create <name> <path>: Uploads a copy of the instance directory from <path> on the local filesystem to ZooKeeper. If an instance directory with the specified <name> already exists, this command fails. Use --update to modify existing instance directories.
- --update <name> <path>: Overwrites an existing instance directory in ZooKeeper using the specified files on the local filesystem. This command is analogous to first running --delete <name> followed by --create <name> <path>.
- --get <name> <path>: Downloads the specified instance directory from ZooKeeper to the specified path on the local filesystem. You can then edit the configuration and then re-upload it using --update.
- --delete <name>: Deletes the specified instance directory from ZooKeeper.
- --list: Lists existing instance directories as well as configs created by the solrctl config command.
Securing Configs with ZooKeeper ACLs and Sentry
You can restrict access to configuration sets by setting ZooKeeper ACLs on all znodes under and including /solr and using Sentry to control access to the ConfigSets API. Sentry requires Kerberos authentication.
The solrctl instancedir command interacts directly with ZooKeeper, and therefore cannot be protected by Sentry. Because the solrctl config command is a wrapper script for the ConfigSets API, it can be protected by Sentry.
To force users to use the ConfigSets API, you must set all ZooKeeper znodes under and including /solr to read-only (except the solr user):
- Create a jaas.conf file containing the following:
Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false useTicketCache=true principal="solr@EXAMPLE.COM"; };
Replace EXAMPLE.COM with your Kerberos realm name.
- Set the LOG4J_PROPS environment variable to a log4j.properties file:
export LOG4J_PROPS=/etc/zookeeper/conf/log4j.properties
- Set the ZKCLI_JVM_FLAGS environment variable:
export ZKCLI_JVM_FLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf \ -DzkACLProvider=org.apache.solr.common.cloud.SaslZkACLProvider \ -Droot.logger=INFO,console"
- Authenticate as the solr user:
kinit solr@EXAMPLE.COM
Replace EXAMPLE.COM with your Kerberos realm name.
- Run the zkcli.sh script as follows:
- Cloudera Manager Deployment:
/opt/cloudera/parcels/CDH/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr
- Unmanaged Deployment:
/usr/lib/solr/bin/zkcli.sh -zkhost zk01.example.com:2181 -cmd updateacls /solr
Replace zk01.example.com with the hostname of a ZooKeeper server.
- Cloudera Manager Deployment:
After completing these steps, you cannot run commands such as solrctl instancedir --create or solrctl instancedir --delete without first authenticating as the solr@EXAMPLE.COM super user principal. Unauthenticated users can still run solrctl instancedir --list and solrctl instancedir --get, because those commands only perform read operations against ZooKeeper.
After setting ZooKeeper ACLs, you must configure Sentry to allow users to create and delete configs. For instructions on configuring Sentry for configs, see Configuring Sentry Authorization for Cloudera Search.
Config Templates
Configs can be declared as immutable, which means they cannot be deleted or have their Schema updated by the Schema API. Immutable configs are uneditable config templates that are the basis for additional configs. After a config is made immutable, you cannot change it back without accessing ZooKeeper directly as the solr (or solr@EXAMPLE.COM principal, if you are using Kerberos) super user.
Solr provides a set of immutable config templates. These templates are only available after Solr initialization, so templates are not available in upgrades until after Solr is initialized or re-initialized. Templates include:
Template Name | Supports Schema API | Uses Schemaless Solr | Supports Sentry |
---|---|---|---|
managedTemplate | |||
schemalessTemplate | |||
managedTemplateSecure | |||
schemalessTemplateSecure |
Config templates are managed using the solrctl config command. For example:
- To create a new config based on the managedTemplateSecure template:
solrctl config --create newConfig managedTemplateSecure -p immutable=false
- To create a new template (immutable config) from an existing config:
solrctl config --create newTemplate existingConfig -p immutable=true
Updating the Schema in a Solr Collection
If your collection was configured using an instance directory, you can download the instance directory, edit schema.xml, then re-upload it to ZooKeeper. For instructions, see Managing Instance Directories.
If your collection was configured using a config, you can update the schema using the Schema API. For information on using the Schema API, see Schema API in the Apache Solr Reference Guide.