Apache Ambari Major Upgrade
Also available as:
PDF

Back up and Upgrade Ambari Infra and Ambari Log Search

The Ambari Infra Solr instance is used to index data for Atlas, Ranger, and Log Search. The version of Solr used by Ambari Infra in Ambari 2.6 is Solr 5. The version of Solr used by the Ambari Infra in Ambari 2.7 is Solr 7. When moving from Solr 5 to Solr 7 indexed data needs to be backed up from Solr 5, migrated, and restored into Solr 7 as there are on disk format changes, and collection-specific schema changes. The Ambari Infra Solr components must also be upgraded. Fortunately scripts are available to do both, and are explained below.

This process will be broken up into four steps:

Generate Migration Config

The migration utility requires some basic information about your cluster and this step will generate a configuration file that captures that information.

Back up Ambari Infra Solr Data

This process will backup all indexed data either to a node-local disk, shared disk (NFS mount), or HDFS filesystem.

Remove existing collections & Upgrade Binaries

This step will remove the Solr 5 collections, upgrade Ambari Infra to Solr 7, and create the new collections with the upgraded schema required by HDP 3.1 services. This step will also upgrade LogSearch binaries if they are installed.

Migrate & Restore

This step will migrate the backed up data to the new format required by Solr 7 and restore the data into the new collections. This step will be completed after the HDP 3.1 Upgrade has been completed in the Post-upgrade Steps section of the upgrade guide

Generate Migration Config

The utility used in this process is included in the ambari-infra-solr-client package. This package must be upgraded before the utility can be run. To do this:

  1. SSH into a host that has a Infra Solr Instance installed on it. You can locate this host by going to the Ambari Web UI and clicking Hosts. Click on the Filter icon and type Infra Solr Instance: All to find each host that has an Infra Solr Instance installed on it.

  2. Upgrade the ambari-infra-solr-client package.

    yum clean all
    yum upgrade ambari-infra-solr-client -y
  3. You can now proceed to configuring and running the migration tool from the same host.

    Run the following commands as root, or with a user that has sudo access:

    Export the variable that will hold the full path and filename of the configuration file.

    export CONFIG_INI_LOCATION=ambari_solr_migration.ini
  4. Run the migrationConfigGenerator.py script, located in the /usr/lib/ambari-infra-solr-client/ directory, with the following parameters:

    --ini-file $CONFIG_INI_LOCATION

    This is the previously exported environmental variable that holds the path and filename of the configuration file that will be generated.

    --host ambari.hortonworks.local

    This should be the hostname of the Ambari Server.

    --port 8080

    This is the port of the Ambari Server. If the Ambari Server is configured to use HTTPS, please use the HTTPS port and add the -s parameter to configure HTTPS as the communication protocol.

    --cluster cl1

    This is the name of the cluster that is being managed by Ambari. To find the name of your cluster, look in the upper right and corner of the Ambari Web UI, just to the left of the background operations and alerts.

    --username admin

    This is the name of a user that is an “Ambari Admin” .

    --password admin

    This is the password of the aforementioned user.

    --backup-base-path=/my/path

    This is the location where the backed up data will be stored. Data will be backed up to this local directory path on each host that is running an Infra Solr instance in the cluster. So, if you have 3 Infra Solr server instances and you use --backup-base-path=/home/solr/backup, this directory will be created on all 3 hosts and the data for that host will be backed up to this path.

    If you are using a shared file system that is mounted on each Infra Solr instance in the cluster, please use the --shared-drive parameter instead of --backup-base-path. The value of this parameter should be the path to the mounted drive that will be used for the backup. When this option is chosen, a directory will be created in this path for each Ambari Infra Solr instance with the backed up data. For example, if you had an NFS mount /export/solr on each host, you would use --shared-drive=/exports/solr. Only use this option if this path exists and is shared amongst all hosts that are running the Ambari Infra Solr.

    --java-home /usr/jdk64/jdk1.8.0_112

    This should point to a valid Java 1.8 JDK that is available at the same path on each host in the cluster that is running an Ambari Infra Solr instance.

    If the Ranger Audit collection is being stored in HDFS, please add the following parameter, --ranger-hdfs-base-path

    The value of this parameter should be set to the path in HDFS where the Solr collection for the Ranger Audit data has been configured to store its data.

    Example: --ranger-hdfs-base-path=/user/infra-solr

    Example Invocations:

    If using HTTPS for the Ambari Server:

    /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py \
    --ini-file $CONFIG_INI_LOCATION \
    --host c7401.ambari.apache.org \
    --port 8443 -s \
    --cluster cl1 \
    --username admin \
    --password admin \
    --backup-base-path=/home/solr/backup \
    --java-home /usr/jdk64/jdk1.8.0_112 

    If using HTTP for the Ambari Server:

    /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py \
    --ini-file $CONFIG_INI_LOCATION \
    --host c7401.ambari.apache.org \
    --port 8080 \
    --cluster cl1 \
    --username admin \
    --password admin \
    --backup-base-path=/home/solr/backup \
    --java-home /usr/jdk64/jdk1.8.0_112 

    Ensure the script generates cleanly and there are no yellow warning texts visible. If so, review the yellow warnings.

Back up Ambari Infra Solr Data

Once the configuration file has been generated, it’s recommended to review the ini file created by the process. There is a configuration section for each collection that was detected. If, for whatever reason, you do not want to backup a specific collection you can set enabled = false and the collection will not be backed up. Ensure that enabled = true is set for all of the collections you do wish to back up. Only the Atlas, and Ranger collections will be backed up. Log Search will not be backed up.

To execute the backup, run the following command from the same host on which you generated the configuration file:

# /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh \
--ini-file $CONFIG_INI_LOCATION \
--mode backup | tee backup_output.txt 

During this process, the script will generate Ambari tasks that are visible in the Background Operations dialog in the Ambari Server.

Once the process has completed, please retain the output of the script for your records. This output will be helpful when debugging any issues that may occur during the migration process, and the output contains information regarding the number of documents and size of each backed up collection.

Remove Existing Collections & Upgrade Binaries

Once the data base been backed up, the old collections need to be deleted, and the Ambari Infra Solr, and Log Search (if installed) components need to be upgraded. To do all of that, run the following script:

# /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh \
--ini-file $CONFIG_INI_LOCATION \
--mode delete | tee delete_output.txt 

During this process, the script will generate Ambari tasks that are visible in the Background Operations dialog in the Ambari Server.

Once the process has completed, please retain the output of the script for your records. This output will be helpful when debugging any issues that may occur during the migration process.

Next Steps

Migrating Atlas Data