Migrating Cloudera Search Configuration Before Upgrading to CDH 6

Because Cloudera Search is included in CDH, upgrading CDH upgrades Cloudera Search. If you are upgrading to CDH 6 from CDH 5, and you are using Cloudera Search, you must complete some preparatory work.

Cloudera Search in CDH 6 uses Apache Solr 7, which has some incompatibilities with previous Solr versions. To facilitate the upgrade, Cloudera provides a Solr configuration migration script, solr-upgrade.sh. This script is included with Cloudera Manager 6 and CDH 6. You must upgrade to Cloudera Manager 6 before you can complete these procedures.

Before You Begin

Before upgrading:

  • Make sure you are running Cloudera Manager 6.0 or higher. For instructions on upgrading to Cloudera Manager 6, see Upgrading Cloudera Manager.
  • If you are using Apache Sentry with policy files, complete the steps in Migrating from Sentry Policy Files to the Sentry Service.
  • Stop making changes to your Cloudera Search environment. Make sure that no configuration changes are made to Cloudera Search for the duration of the migration and upgrade. This includes adding or removing Solr Server hosts, moving Solr Server roles between hosts, changing hostnames, and so on.
  • Plan for an outage for any applications or services that use your Search deployment until you have completed re-indexing after the upgrade.
  • If you are using Kerberos, create a jaas.conf file for the Search service superuser (solr by default). For instructions on creating a jaas.conf file, see Configuring a jaas.conf File.
  • If you are using the Lily HBase Indexer service, stop writing to any tables that are indexed into Solr, and stop the Lily HBase Indexer service (Key-Value Store Indexer service > Actions > Stop).
  • If you are using the Lily HBase Indexer service with Apache Sentry policy files:
    • If you are upgrading from a CDH version lower than 5.14.0, disable Sentry for the Lily HBase Indexer service before upgrading. To do so, uncheck the box labeled Enable Sentry Authorization using Policy Files (Key-Value Store Indexer service > Configuration > Category > Policy File Based Sentry).
    • If you are upgrading from CDH 5.14.0 or higher, migrate to the Sentry Service before upgrade. CDH versions lower than 5.14.0 do not support using the Sentry Service with the HBase Indexer service. For instructions, see Migrating HBase Indexer Sentry Policy Files to the Sentry Service.
  • Do not create, delete, or modify any collections for the duration of the migration and upgrade.

You can continue indexing to existing collections (except Lily HBase indexing) until otherwise instructed.

Solr Configuration Migration Script Overview

Despite widespread enterprise adoption, Solr lacks automated upgrade tooling. It has long been a challenge to understand the implications of a Solr upgrade. Solr admins were required to review the Solr release notes and manually identify configuration changes needed to address incompatibilities or to take advantage of new features. Additionally, admins had to determine whether they could upgrade existing indexes, or if they had to re-index the raw data.

Starting in Cloudera Enterprise 6, Cloudera provides a Solr configuration migration script to simplify the upgrade process by providing upgrade instructions tailored to your configuration. These instructions can help you to answer following questions:

  • Does my Solr configuration use any configurations that are incompatible with the new version? If so, which ones?
  • For each incompatibility, what do I need to do to address it? Where can I get more information about this incompatibility, and why it was introduced?
  • Are there any changes in Lucene or Solr that require me to do a full re-index, or is it sufficient to upgrade the index? For all upgrades to CDH 6 from CDH 5, re-indexing is required.

This tool is built using the Extensible Stylesheet Language Transformations engine. The upgrade rules, implemented as XSLT transformations, can identify incompatibilities and in some cases can fix them automatically.

In general, incompatibilities are categorized as follows:

  • ERROR: The removal of a Lucene or Solr configuration element (such as a field type) is marked as ERROR in the validation output. These types of incompatibilities typically result in failure to start the Solr service or load the core. To address this, you must manually fix the Solr configuration.
  • WARNING: The deprecation of a configuration element in the new Solr version is marked as WARNING in the validation output. In general, these types of incompatibilities do not prevent starting the Solr service or loading cores, but may prevent applications from using new Lucene or Solr features (or bug fixes). You can choose to make changes to Solr configuration using application specific knowledge to fix such incompatibility.
  • INFO: Incompatibilities that can be fixed automatically by rewriting the Solr configuration are marked INFO in the validation output. This can include incompatibilities in the underlying Lucene implementation (for example, LUCENE-6058) that would require rebuilding the index instead of upgrading it. Typically, these incompatibilities do not result in failure to start the Solr service or load cores, but may affect query result accuracy or consistency of the underlying indexed data.

Running the Solr Configuration Migration Script

The Solr configuration migration script, solr-upgrade.sh, is included with CDH 6 and Cloudera Manager 6 agent software. This enables you to run the script after upgrading to Cloudera Manager 6, but before upgrading to CDH 6.

Make sure to run the script on a host that is assigned a Solr Server or Solr service Gateway role. Confirm that the SOLR_ZK_ENSEMBLE environment variable is set in /etc/solr/conf/solr-env.sh:

cat /etc/solr/conf/solr-env.sh
export SOLR_ZK_ENSEMBLE=zk01.example.com:2181,zk02.example.com:2181,zk03.example.com:2181/solr
export SENTRY_CONF_DIR=/etc/solr/conf.cloudera.SOLR-1/sentry-conf

The script is located at:

  • Cloudera Manager 6 Agent: /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh
  • CDH 6 Parcels: /opt/cloudera/parcels/CDH/lib/solr/solr-upgrade/solr-upgrade.sh
  • CDH 6 Packages: /usr/lib/solr/solr-upgrade/solr-upgrade.sh

When running the script included with Cloudera Manager 6 Agent, you must specify the location of the CDH 5 Solr binaries using the CDH_SOLR_HOME environment variable. If you are using parcels, Solr binaries are located at /opt/cloudera/parcels/CDH/lib/solr. For package installations, the location is /usr/lib/solr.

For example:

export CDH_SOLR_HOME=/opt/cloudera/parcels/CDH/lib/solr

For your reference, the solr-upgrade.sh command syntax is as follows. The appropriate command arguments are provided in later steps.

./solr-upgrade.sh help

Usage: ./solr-upgrade.sh command [command-arg]Options:
    --zk   zk_ensemble
    --jaas jaas.conf
    --debug Prints error output of calls
    --trace Prints executed commands
Commands:
  help
  download-metadata -d dest_dir
  validate-metadata -c metadata_dir
  bootstrap-config -c metadata_dir
  config-upgrade [--dry-run] -c conf_path -t conf_type -u upgrade_processor_conf -d result_dir [-v]
  bootstrap-collections -c metadata_folder_path -d local_work_dir -h hdfs_work_dir
Parameters:
  -c <arg>     This parameter specifies the path of Solr configuration to be operated upon.
  -t <arg>     This parameter specifies the type of Solr configuration to be validated and
               transformed.The tool currently supports schema.xml, solrconfig.xml and solr.xml
  -d <arg>     This parameter specifies the directory path where the result of the command
               should be stored.
  -h <arg>     This parameter specifies the HDFS directory path where the result of the command
               should be stored on HDFS. Eg. /solr-backup
  -u <arg>     This parameter specifies the path of the Solr upgrade processor configuration.
  --dry-run    This command will perform compatibility checks for the specified Solr configuration.
  -v           This parameter enables printing XSLT compiler warnings on the command output.

Migrating Cloudera Search Configuration for Compatibility with CDH 6

Use the following procedures to migrate your Cloudera Search configuration and upgrade to CDH 6. The provided migration script cannot upgrade the Lucene index files. After upgrading, you must re-index your collections. For more information, see Reindexing in Solr in the Apache Solr wiki. The upgrade process is as follows:

Set Configuration Properties in Cloudera Manager

As part of the CDH 6 upgrade, Cloudera Manager backs up and validates your migrated Solr configuration to ensure that it is compatible with CDH 6. For the upgrade to succeed, you must designate one of your Solr Server hosts to perform these actions, and specify HDFS and local directories for the backup and migrated configuration files, respectively:

  1. In the Cloudera Manager Admin Console, go to Solr service > Configuration.
  2. In the Search field, type upgrade to filter the configuration parameters.
  3. For the Solr Server for Upgrade property, select one of your Solr Server hosts. When you run the migration script, make sure to run it on this host.
  4. For the Upgrade Backup Directory property, specify a directory in HDFS. This directory must exist and be readable by the Solr service superuser (solr by default). Examples in these procedures use /cdh6-solr-upgrade/backup for this HDFS directory.
  5. For the Upgrade Metadata Directory property, specify a directory on the local filesystem of the host you selected as the Solr Server for Upgrade. When you run the migration script, make sure that you copy the migrated configuration to this directory. This directory must also be readable by the Solr service superuser. Examples in these procedures use /cdh6-solr-metadata/migrated-config for this local directory.
  6. Enter a Reason for change, and then click Save Changes to commit the changes.

Back Up Solr Configuration and Data

Before upgrading to CDH 6, back up your Solr collections using the following procedure. This allows you to roll back to the pre-upgrade state if any problems occur during the upgrade process.

  1. Make sure that the HDFS and ZooKeeper services are running.
  2. Stop the Solr service (Solr service > Actions > Stop). If you see a message about stopping dependent services, click Cancel and stop the dependent services first, and then stop the Solr service.
  3. Back up the Solr configuration metadata (Solr service > Actions > Backup Solr Configuration Meta-data for Upgrade). Make sure that the directory you specified for the Upgrade Backup Directory configuration property exists in HDFS and is writable by the Search superuser (solr by default).
  4. Start the Solr service (Solr service > Actions > Start).
  5. Start any dependent services that you stopped.

Migrate the Configuration

The migration tool supports migrating schema.xml (and managed-schema), solrconfig.xml, and solr.xml configuration files. Run these commands on the host you selected as the Solr Server for Upgrade in Set Configuration Properties in Cloudera Manager.

  1. Set the CDH_SOLR_HOME environment variable for your installation method:
    • Parcels:
      export CDH_SOLR_HOME=/opt/cloudera/parcels/CDH/lib/solr
    • Packages:
      export CDH_SOLR_HOME=/usr/lib/solr
  2. Set the JAVA_HOME environment variable to the JDK you are using. For example:
    export JAVA_HOME="/usr/java/jdk1.8.0_141-cloudera"
  3. Create a working directory for the migration:
    mkdir $HOME/cdh6-solr-migration
  4. If you have enabled Kerberos, run kinit with the Solr service superuser. For example:
    kinit solr@EXAMPLE.COM
  5. Download the current (CDH 5) Solr configuration from ZooKeeper:
    /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh download-metadata -d $HOME/cdh6-solr-migration

    If you have enabled Kerberos and configured ZooKeeper access control lists (ACLs), specify your JAAS configuration file by adding the --jaas parameter to the command. For example:

    /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh --jaas $HOME/solr-jaas.conf download-metadata -d $HOME/cdh6-solr-migration
  6. Initialize a directory for the migrated configuration as a copy of the current config:
    cp -r $HOME/cdh6-solr-migration $HOME/cdh6-migrated-solr-config
  7. Migrate the solr.xml file:
    1. Run the migration script:
      /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh config-upgrade -t solrxml -c $HOME/cdh6-solr-migration/solr.xml -u /opt/cloudera/cm/solr-upgrade/validators/solr_4_to_7_processors.xml -d /tmp

      If you have enabled Kerberos, specify your JAAS configuration file by appending --jaas /path/to/solr-jaas.conf to the command.

    2. If the script reports any incompatibilities, fix them in the working directory ($HOME/cdh6-solr-migration/solr.xml in this example) and then re-run the script. Each time you run the script, the files in the output directory (/tmp in this example) are overwritten. Repeat until the script outputs no incompatibilities and the solr.xml migration is successful. For example:
      Validating solrxml...
      No configuration errors found...
      No configuration warnings found...
      
      Following incompatibilities will be fixed by auto-transformations (using --upgrade command):
          * System property used to define SOLR server port has changed from solr.port to jetty.port
      
      Solr solrxml validation is successful. Please review /tmp/solrxml_validation.html for more details.
      
      Applying auto transformations...
      
      The upgraded configuration file is available at /tmp/solr.xml
  8. Copy the migrated solr.xml file to the migrated configuration directory:
    cp /tmp/solr.xml $HOME/cdh6-migrated-solr-config
  9. For each collection configuration set in $HOME/cdh6-solr-migration/configs/, migrate the configuration and schema. Each time you run the script, the output file is overwritten. Do not proceed to the next collection until the migration is successful and you have copied the migrated file to its final destination.
    1. Run the migration script for solrconfig.xml. For example, for a configuration set named tweets_config:
      /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh config-upgrade -t solrconfig -c $HOME/cdh6-solr-migration/configs/tweets_config/conf/solrconfig.xml -u /opt/cloudera/cm/solr-upgrade/validators/solr_4_to_7_processors.xml -d /tmp
    2. If the script reports any incompatibilities, fix them in the working directory ($HOME/cdh6-solr-migration/configs/tweets_config/conf/solrconfig.xml in this example) and then re-run the script. Repeat until the script outputs no incompatibilities and the solrconfig.xml migration is successful. You should see a message similar to the following:
      Solr solrconfig validation is successful. Please review /tmp/solrconfig_validation.html for more details.
      
      Applying auto transformations...
      
      The upgraded configuration file is available at /tmp/solrconfig.xml
    3. Copy the migrated solrconfig.xml file to the collection configuration directory in the migrated directory. For example:
      cp /tmp/solrconfig.xml $HOME/cdh6-migrated-solr-config/configs/tweets_config/conf/
    4. Run the migration script for schema.xml (or managed-schema). For example, for a configuration set named tweets_config:
      /opt/cloudera/cm/solr-upgrade/solr-upgrade.sh config-upgrade -t schema -c $HOME/cdh6-solr-migration/configs/tweets_config/conf/schema.xml -u /opt/cloudera/cm/solr-upgrade/validators/solr_4_to_7_processors.xml -d /tmp
    5. If the script reports any incompatibilities, fix them in the working directory ($HOME/cdh6-solr-migration/configs/tweets_config/conf/schema.xml in this example) and then re-run the script. Repeat until the script outputs no incompatibilities and the solrconfig.xml migrations are all successful.
    6. Copy the migrated schema.xml file to the collection configuration directory in the migrated directory. For example:
      cp /tmp/schema.xml $HOME/cdh6-migrated-solr-config/configs/tweets_config/conf/
    7. Repeat for all configuration sets in $HOME/cdh6-solr-migration/configs/.

Validate the Migrated Configuration

The solr-upgrade.sh script includes a validate-metadata command that you can run against the migrated Solr configuration and metadata to make sure that they can be used to re-initialize the Solr service after the upgrade. The script performs a series of checks to make sure that:

  • Required configuration files (such as solr.xml, clusterstate.json, and collection configuration sets) are present.
  • The configuration files are compatible with the Solr version being upgraded to (Solr 7, in this case).

For example:

/opt/cloudera/cm/solr-upgrade/solr-upgrade.sh validate-metadata -c $HOME/cdh6-migrated-solr-config

If you have enabled Kerberos, specify your JAAS configuration file by appending --jaas /path/to/solr-jaas.conf to the command.

If the validation is successful, the script outputs a message similar to the following:

Validation successful for metadata in /home/solruser/cdh6-migrated-solr-config

If the validation fails, you can revisit the steps in Migrate the Configuration.

Test the Migrated Configuration on a CDH 6 Cluster

Although the configuration migration script addresses known incompatibilities, there might be incompatible configurations that are not detected by the script. Do not upgrade your production Cloudera Search environments to CDH 6 until you have tested the migrated configuration with CDH 6 on a testing or development cluster.

To test the migrated configuration on a CDH 6 cluster, you can either provision a new host in your upgraded Cloudera Manager 6 environment and add a CDH 6 cluster using that host, or you can continue to Migrating Cloudera Search Configuration Before Upgrading to CDH 6 and upgrade a development or test cluster to CDH 6. You can then use the CDH 6 cluster to test the migrated configuration to make sure that it works on CDH 6 before upgrading your production environment.

Copy the Migrated Configuration to the Upgrade Metadata Directory

After you have successfully migrated the configuration, and verified that the updated configuration works in CDH 6, you must copy it to the directory you designated as the Upgrade Metadata Directory in Set Configuration Properties in Cloudera Manager (/cdh6-solr-metadata/migrated-config in this example), and change the ownership to the Solr service superuser. For example:

sudo mkdir -p /cdh6-solr-metadata/migrated-config
sudo cp -r $HOME/cdh6-migrated-solr-config/* /cdh6-solr-metadata/migrated-config
sudo chown -R solr:solr /cdh6-solr-metadata

If you have made any changes to the configuration after testing on a CDH 6 cluster, make sure that you copy the updated configuration from the CDH 6 cluster to the CDH 5 cluster you are upgrading.

Upgrade to CDH 6

After completing all of these procedures, upgrade to CDH following the regular process as documented in Upgrading the CDH Cluster. After the upgrade is complete, continue to Re-Indexing Solr Collections After Upgrading to CDH 6.