Upgrading the Cluster’s Underlying OS
With HDP 3.1.4 no longer supporting RHEL/CentOS/OEL 6, SLES 11, and Debian 7 all hosts in the cluster must be on a supported operating system before starting the upgrade from HDP 2.6.x to HDP 3.1.4 For many, this is a process that will take time and orchestration between multiple teams within your organization. We've tried to outline two high-level guidelines for moving from one major operating system version to another:
- In-Place & Restore:
Perform an In-place OS refresh and use Ambari's Restore Host feature
- Move & Decom:
Move Masters and Decom/Recom Workers
Each option has pros and cons and some high-level decision criteria:
In-Place & Restore
This option should be used in medium to large clusters (25 or more nodes), with operational teams that have environment automation experience, and have followed best practices when setting up component High Availability and HDP directory structures (such as ensuring HDP component data and metadata is not stored on the root volume).
This option involves going through each host in the cluster and ensuring important metadata and data is stored on a volume that is not being used by the operating system, and leverages component high availability to maintain maximum cluster availability. When visiting each host, the host is shut down, the operating system volume is refreshed with the new version of the chosen operating system, the host is configured with the same IP address and hostname, all volumes are re-mounted, and the Ambari Agent is installed and configured. Once the host has re-joined the cluster, the Ambari Recover Host functionality is used to re-install, re-configure, and start services. Since the component data and metadata are stored on a volume that is not being used by the operating system, no data is lost and the host performs just like it did before, but with a new operating system version.
Move & Decom
This option should be used in smaller clusters (under 25 nodes), where operational teams may not have access to operating system and configuration management automation tooling or have not yet followed best practices when setting up HDP directory structures (such as ensuring HDP component data and metadata is not stored on the root volume).
This option involves decommissioning worker nodes and replacing them with worker nodes that have the new operating system version on them. For master nodes, the move master operation is used to move all masters off of a host, and on to a new host with the new operating system version on them. Decommissioning worker nodes can take a great deal of time, depending on the density of the nodes, and move master operations require many cluster services to be restarted, so this is a time-consuming process that requires multiple periods downtime, but it does not require any operating system level operations to be performed to accomplish.
Next Steps