Backup and Upgrade Ambari Infra
The Ambari Infra Solr instance is used to index data for Ranger, and Log Search. The version of Solr used by Ambari Infra in Ambari 2.6 is Solr 5. The version of Solr used by the Ambari Infra in Ambari 2.7 is Solr 7. When moving from Solr 5 to Solr 7 indexed data needs to be backed up from Solr 5, migrated, and restored into Solr 7 as there are on disk format changes, and collection-specific schema changes. The Ambari Infra Solr components must also be upgraded. Fortunately scripts are available to do both, and are explained below.
This process will be broken up into four steps:
- Generate Migration Config
-
The migration utility requires some basic information about your cluster and this step will generate a configuration file that captures that information.
- Back up Ambari Infra Solr Data
-
This process will backup all indexed data either to a node-local disk, shared disk (NFS mount), or HDFS filesystem.
- Remove existing collections & Upgrade Binaries
-
This step will remove the Solr 5 collections, upgrade Ambari Infra to Solr 7, and create the new collections with the upgraded schema required by HDP 3.0 services. This step will also upgrade LogSearch binaries if they are installed.
- Migrate & Restore
-
This step will migrate the backed up data to the new format required by Solr 7 and restore the data into the new collections. This step will be completed after the HDP 3.0 Upgrade has been completed in the Post-upgrade Steps section of the upgrade guide
Generate Migration Config
The utility used in this process is included in the ambari-infra-solr-client package. This package must be upgraded before the utility can be run. To do this:
- SSH into a host that has a Infra Solr Instance installed on it. You can locate this
host by going to the Ambari Web UI and clicking Hosts. Click on the Filter
icon and type
Infra Solr Instance: All
to find each host that has an Infra Solr Instance installed on it. - Upgrade the ambari-infra-solr-client
package.
yum clean all
yum upgrade ambari-infra-solr-client -y
- If you are using a custom username for running Infra Solr, for example a username
that is not ‘infra-solr’ additional scripts need to be downloaded. To do this, again
only if you are using a custom username for Infra Solr, perform the following
steps:
-
wget --no-check-certificate -O /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py https://raw.githubusercontent.com/apache/ambari/trunk/ambari-infra/ambari-infra- solr-client/src/main/python/migrationConfigGenerator.py
-
chmod +x /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py
-
wget --no-check-certificate -O /usr/lib/ambari-infra-solr-client/migrationHelper.py https://raw.githubusercontent.com/apache/ambari/trunk/ambari-infra/ambari-infra- solr-client/src/main/python/migrationHelper.py
-
chmod +x /usr/lib/ambari-infra-solr-client/migrationHelper.py
-
- You can now proceed to configuring and running the migration tool from the same
host.
Run the following commands as root, or with a user that has sudo access:
Export the variable that will hold the full path and filename of the configuration file.
export CONFIG_INI_LOCATION=ambari_solr_migration.ini
Ensure the script generates cleanly and there are no yellow warning texts visible. If so, review the yellow warnings.
- Run the
migrationConfigGenerator.py
script, located in the/usr/lib/ambari-infra-solr-client/
directory, with the following parameters:- --ini-file $CONFIG_INI_LOCATION
-
This is the previously exported environmental variable that holds the path and filename of the configuration file that will be generated.
- --host ambari.hortonworks.local
-
This should be the hostname of the Ambari Server.
- --port 8080
-
This is the port of the Ambari Server. If the Ambari Server is configured to use HTTPS, please use the HTTPS port and add the
-s
parameter to configure HTTPS as the communication protocol. - --cluster cl1
-
This is the name of the cluster that is being managed by Ambari. To find the name of your cluster, look in the upper right and corner of the Ambari Web UI, just to the left of the background operations and alerts.
- --username admin
-
This is the name of a user that is an “Ambari Admin” .
- --password admin
-
This is the password of the aforementioned user.
- --backup-base-path=/my/path
-
This is the location where the backed up data will be stored. Data will be backed up to this local directory path on each host that is running an Infra Solr instance in the cluster. So, if you have 3 Infra Solr server instances and you use --backup-base-path=/home/solr/backup, this directory will be created on all 3 hosts and the data for that host will be backed up to this path.
If you are using a shared file system that is mounted on each Infra Solr instance in the cluster, please use the
--shared-drive
parameter instead of--backup-base-path
. The value of this parameter should be the path to the mounted drive that will be used for the backup. When this option is chosen, a directory will be created in this path for each Ambari Infra Solr instance with the backed up data. For example, if you had an NFS mount /export/solr on each host, you would use--shared-drive=/exports/solr
. Only use this option if this path exists and is shared amongst all hosts that are running the Ambari Infra Solr. - --java-home /usr/jdk64/jdk1.8.0_112
-
This should point to a valid Java 1.8 JDK that is available at the same path on each host in the cluster that is running an Ambari Infra Solr instance.
- If the Ranger Audit collection is being stored in HDFS, please add the following parameter, --ranger-hdfs-base-path
-
The value of this parameter should be set to the path in HDFS where the Solr collection for the Ranger Audit data has been configured to store its data.
Example:
--ranger-hdfs-base-path=/user/infra-solr
Example Invocations:
If using HTTPS for the Ambari Server:
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org --port 8443 -s --cluster cl1 --username admin --password admin --backup-base-path=/my/path --java-home /usr/jdk64/jdk1.8.0_112
If using HTTP for the Ambari Server:
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org --port 8080 --cluster cl1 --username admin --password admin --backup-base-path=/my/path --java-home /usr/jdk64/jdk1.8.0_112
Back up Ambari Infra Solr Data
Once the configuration file has been generated, it’s recommended to review the ini file created by the process. There is a configuration section for each collection that was detected. If, for whatever reason, you do not want to backup a specific collection you can set enabled = false and the collection will not be backed up. Ensure that enabled = true is set for all of the collections you do wish to back up. Only the Atlas, and Ranger collections will be backed up. Log Search will not be backed up.
To execute the backup, run the following command from the same host on which you generated the configuration file:
# /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION
--mode backup | tee backup_output.txt
During this process, the script will generate Ambari tasks that are visible in the Background Operations dialog in the Ambari Server.
Once the process has completed, please retain the output of the script for your records. This output will be helpful when debugging any issues that may occur during the migration process, and the output contains information regarding the number of documents and size of each backed up collection.
Remove Existing Collections & Upgrade Binaries
Once the data base been backed up, the old collections need to be deleted, and the Ambari Infra Solr, and Log Search (if installed) components need to be upgraded. To do all of that, run the following script:
# /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION
--mode delete | tee delete_output.txt
During this process, the script will generate Ambari tasks that are visible in the Background Operations dialog in the Ambari Server.
Once the process has completed, please retain the output of the script for your records. This output will be helpful when debugging any issues that may occur during the migration process.