In-Place Upgrade HDP 2 to CDP Private Cloud Base
HDP to CDP Upgrade Overview
In-Place Upgrade Overview
CDP Upgrade Readiness
How much time should I plan for to complete my upgrade?
Cluster environment readiness
Disk space and mountpoint considerations
Downloading and Publishing the Package Repository
Downloading and Publishing the Parcel Repository
Hadoop Users (user:group) and Kerberos Principals
Sample data ingestion
Expediting the Hive upgrade
Overview of the expedited Hive upgrade
Handle Missing Table or Partition Locations
Managed Table Location Mapping
Check SERDE Definitions and Availability
Make Tables SparkSQL Compatible
Understanding the Hive upgrade
Modifying the HSMM to prevent migration
Ambari and HDP Upgrade Checklist
Ambari upgrade checklist
Download cluster blueprints without hosts
HDP upgrade checklist
Checklist for large clusters
Kerberos cluster
Before upgrading any cluster
Managing MPacks
Changes to Ambari and HDP services
HDP Core component version changes
Upgrading the cluster's underlying OS
In-Place and Restore
Move and Decommission
Upgrading Ambari
Before you upgrade Ambari
Setting up a local repository
Updating Ambari repo files
Updating HDP repo files
Case study for setting up an HDP-GPL local repository
Setting up local repository with temporary internet access
Case study for setting up local repository
Update version repository base urls
Preparing Ambari Repository Configuration File to use Local Repository
Backup Ambari
Ambari Behavioral changes
Ambari Properties backup
Review Ambari UI and the Quick Links
Upgrade to Ambari 7.1.x.0
Download cluster blueprints
​Mandatory Post-Upgrade Tasks
Upgrading Ambari Metrics System and SmartSense
Upgrading Ambari Metrics
Backup Ambari-Metrics
Upgrading SmartSense
Upgrading HDP to Cloudera Runtime 7.1.x
HDP Prerequisites
Upgrade process
Kerberos cluster
Before upgrading any cluster
Backup HDP Cluster
Backup and Restore Databases
Backup Ranger
Backup Ranger Admin Database
Backup Ranger KMS Database
Backup Atlas
Backup HBase tables
Backup Ambari Infra Solr
Backup Hive
Backup HBase
Backup Kafka
Backup Oozie
Backup Knox
Backup Logsearch
Backup Zeppelin
Backup HDFS
Backup ZooKeeper
Backup Ambari
Backup databases
Before you upgrade
Checkpoint HDFS
Register software repositories
HDP Intermediate bits for 7.1.x.0 Repositories
Software download matrix for HDP 2.6.5 to CDP 7.1.x
AM2CM legacy tools download
Install software on the hosts
Preparing the services for upgrade
Backing up Ambari infra data
Back up and upgrade Ambari infra and Ambari Log Search
Generate migration configuration
Back up Ambari Infra Solr data
Remove Existing Collections and Upgrade Binaries
Upgrading Ambari Infra
Overview of the Migration of the Atlas and Infra Solr Data
Preparing Atlas for upgrade
Place Atlas in migration mode
Ranger Service connection with Oracle database
Ranger admin password
Preparing HBase for upgrade
Remove PREFIX_TREE data block encoding
Validate HFile
Preparing HDP Search for upgrade
Before you begin
Download Solr configuration from HDP Search ZooKeeper
Transition Solr configuration
Cloudera Manager versions 7.1.1 to 7.2.4
Cloudera Manager versions 7.3.1 or higher
Validate the configuration
Test the configuration
Preparing Hive for upgrade
Take a Mandatory Snapshot of Hive Tables
Make Tables SparkSQL Compatible
Download the Pre-Upgrade Tool JAR for Compaction
Get a Kerberos Ticket
Run Compaction on Hive Tables
Save Hive Metastore by Dumping
Capture Information about Multiple HiveServers
Hive Pre-Upgrade Tool Command Help
Preparing the backend HMS database for upgrade
Preparing Kafka for upgrade
Extract Kafka broker ID
Preparing Zeppelin for upgrade
Perform the HDP upgrade
Perform express upgrade
Post-HDP-upgrade tasks
Update Ranger passwords
Atlas Migration and HBase Hook settings
Ambari Metrics and LogSearch
Ambari infra-migrate and restore
Upload HDFS entity information
Custom Spark SQL Warehouse Directory
Hive post-HDP-upgrade tasks
Checking and correcting Hive table locations
Preventing SparkSQL incompatibility
Correct Hive File Locations
Handle Missing Table or Partition Locations
Managed Table Location Mapping
Check SERDE Definitions and Availability
Verify Zeppelin settings in Ambari
Search post-HDP-upgrade tasks
Backup Infra Solr collections
Troubleshooting the HDP upgrade
Hive Metastore corrupt
Missing Hive tables
YARN Registry DNS instance fails to start
Ambari Metrics System (AMS) does not start
Ranger MySQL collation
Rollback Ambari to 2.6.5
Rollback HDP Services
Overview
ZooKeeper
Ambari-Metrics
Ambari Infra Solr
Ranger
Restore Ranger Admin Database
Restore Ranger KMS Database
HDFS
YARN
HBase
Kafka
Atlas
Restore HBase Tables
Restore ATLAS_ENTITY_AUDIT_EVENTS table
Hive
Spark
Oozie
Knox
Zeppelin
Log Search
Transitioning to Cloudera Manager
Pre-transition steps
Databases
Kerberos
Kerberos principal
Atlas migration
HDFS
Preparing HDFS
Backup the non-default Rack Awareness Topology script
Spark
Spark2/Livy
Ranger
Kerberos - Optional task
Solr
Cloudera Manager Installation and Setup
Installing JDBC Driver
Proxy Cloudera Manager through Apache Knox
Transitioning HDP to Cloudera Private Cloud Base
Transitioning HDP 2.6.5 cluster to CDP Private Cloud Base 7.1.x cluster using the AM2CM tool
Post transition steps
Enable Auto Start setting
Kerberos Principal for Cloudera Manager Server
ZooKeeper
Delete ZNODES
Ranger
Ranger KMS
Add Ranger policies for components on the CDP Cluster
Ranger Installation in High Availability with Load Balancer
Create composite keytab for Ranger HA
Set maximum retention days for Ranger audits
HDFS
Ports
TLS/SSL
HDFS HA
Custom Topology
Add Balancer Role to HDFS
Other review configurations for HDFS
Configuring HDFS properties to optimize log collection
Solr
Restore Solr collections on CDP cluster
Kafka
Change Kafka port value
Kafka cluster Kerberos
Unsetting Kafka Protocol version
YARN
Start job history
Yarn Mapreduce framework jars
YARN NodeManager
YARN NodeManager CGroups
Reset ZNode ACLs
Placement rules evaluation engine
Converting old mapping rule format to JSON-based placement rule format
YARN owner permission
YARN mapreduce paramater
Spark
Livy2
Enabling Spark on YARN for Atlas
Enabling SAC manually on Spark
Tez
Hive
Setting up Hive metastore for Atlas
Identifying and fixing invalid Hive schema versions
Fixing statistics
Advanced configuration snippet (Safety valve)
Remove Hive Ranger property
HBase
HBase RegionServer heap size
Hue
Installing Python 3.8
Installing Python 3.8 on CentOS 7 for Hue
Installing Python 3.8 on RHEL 8 for Hue
Installing Python 3.8 on SLES 12 for Hue
Installing Python 3.8 on Ubuntu 18 for Hue
Installing the psycopg2 Python package for PostgreSQL database
Installing MySQL client for MySQL databases
Installing MySQL client for MariaDB databases
Oozie
Validate Database URL
Installing the new Shared Libraries
Update Oozie properties
Access Oozie load balancer URL
Oozie Load Balancer configuration
Atlas advanced configuration snippet (Safety valve)
Migrating Atlas data
Phoenix
Map Phoenix schemas to HBase namespaces
Starting all services
Hive Policy Additions
Knox
Topology migration
Migrate Credential Aliases
Migrate signing key
Configure Apache Knox authentication for AD/LDAP
Client Configurations
Securing ZooKeeper
Zeppelin Shiro configurations
Migrating Spark workloads to CDP
Spark 1.6 to Spark 2.4 Refactoring
Handling prerequisites
Spark 1.6 to Spark 2.4 changes
New Spark entry point SparkSession
Dataframe API registerTempTable deprecated
union replaces unionAll
Empty schema not supported
Referencing a corrupt JSON/CSV record
Dataset and DataFrame API explode deprecated
CSV header and schema match
Table properties support
CREATE OR REPLACE VIEW and ALTER VIEW not supported
Managed table location
Write to Hive bucketed tables
Rounding in arithmetic operations
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Compiling and running a Java-based job
Compiling and running a Scala-based job
Running a Python-based job
Running a job interactively
Post-migration tasks
Spark 2.3 to Spark 2.4 Refactoring
Handling prerequisites
Spark 2.3 to Spark 2.4 changes
Empty schema not supported
CSV header and schema match
Table properties support
Managed table location
Precedence of set operations
HAVING without GROUP BY
CSV bad record handling
Spark 2.4 CSV example
Configuring storage locations
Querying Hive managed tables from Spark
Compiling and running Spark workloads
Post-migration tasks
Apache Hive Expedited Migration Tasks
Preparing tables for migration
Creating a list of tables to migrate
Migrating tables to CDP
Apache Hive Changes in CDP
Hive Configuration Property Changes
Key syntax changes
Handling table reference syntax
Add Backticks to Table References
LOCATION and MANAGEDLOCATION clauses
Key semantic changes and workarounds
Casting timestamps
Casting invalid dates
Changing incompatible column types
Understanding CREATE TABLE behavior
Configuring legacy CREATE TABLE behavior
Disabling Partition Type Checking
Dropping partitions
Handling output of greatest and least functions
Renaming tables
TRUNCATE TABLE on an external table
Hive unsupported interfaces and features
Changes to CDH Hive Tables
Changes to HDP Hive tables
Apache Hive Post-Upgrade Tasks
Customizing critical Hive configurations
Setting Hive Configuration Overrides
Hive Configuration Requirements and Recommendations
Removing the LLAP Queue
Configuring HiveServer for ETL using YARN queues
Removing Hive on Spark Configurations
Configuring authorization to tables
Setting up access control lists
Configure encryption zone security
Configure edge nodes as gateways
Spark integration with Hive
Configure HiveServer HTTP mode
Configuring HMS for high availability
Installing Hive on Tez and adding a HiveServer role
Updating Hive and Impala JDBC/ODBC drivers
Getting the JDBC driver
Getting the ODBC driver
Configuring External Authentication for Cloudera Manager
Additional Services
Installing DAS using Ambari
Check cluster configuration for Hive and Tez
Add the DAS service
DAS post-installation tasks
Additional configuration tasks
Setting up the tmp directory
Configuring DAS for SSL/TLS
Set up trusted CA certificate
Set up self-signed certificates
Configure SSL/TLS in Ambari
Configuring user authentication in Ambari
Configuring user authentication using Knox SSO
Configuring user authentication using Knox proxy
Configuring user authentication using SPNEGO
Enabling logout option for secure clusters
Troubleshooting DAS installation
Problem area: Queries page
Your queries are not appearing on the Queries page
Query column is empty, yet you can see the DAG ID and Application ID
Query column is not empty, but you cannot see the DAG ID and Application ID
You cannot view queries from other users
Problem area: Compose page
You cannot see your databases or the query editor is missing
You cannot view new databases and tables, or cannot see changes to existing databases or tables
Replication failure in the DAS Event Processor
Problem area: Reports page
DAS service installation fails with the "python files missing" message
DAS does not log me out as expected, or I stay logged in longer than the time specified in the Ambari configuration
Getting a 401 - Unauthorized access error message while accessing DAS
Setting up quick links for the DAS UI
Installing DAS using Cloudera Manager
Adding Hue service with Cloudera Manager
Install and configure MySQL database
Add the Hue service using Cloudera Manager
Enable Kerberos for authentication
Integrate Hue with Knox
Grant Ranger permissions to new users or groups
Adding Query Processor service to a cluster
Applications Upgrade
Procedure to Rollback from CDP 7.1.7 SP1 to CDP 7.1.7