What's new in Cloudera Runtime 7.1.9 SP1

Understand the functionalities and improvements to features of components in Cloudera Runtime 7.1.9 SP1.

Upgrade CDP 7.1.9 to CDP 7.1.9 SP1

You can perform an In-place upgrade from CDP 7.1.9 to CDP 7.1.9 SP1. For more information, see CDP to CDP documentation.

Upgrade CDP 7.1.8 latest Cumulative hotfix to CDP 7.1.9 SP1

You can perform an In-place upgrade from CDP 7.1.8 latest Cumulative hotfix to CDP 7.1.9 SP1. For more information, see CDP to CDP documentation.

Upgrade CDP 7.1.7 SP3 to CDP 7.1.9 SP1

You can perform an In-place upgrade from CDP 7.1.7 SP3 to CDP 7.1.9 SP1. For more information, see CDP to CDP documentation.

Upgrade CDH 6.3.x to CDP 7.1.9 SP1

You can perform In-place upgrade from CDH6.3.x to CDP 7.1.9 SP1. For more information, see CDH 6 to CDP documentation.

Upgrade CDH 6.2.x to CDP 7.1.9 SP1

You can perform In-place upgrade from CDH6.2.x to CDP 7.1.9 SP1. For more information, see CDH 6 to CDP documentation.

Upgrade HDP 3 to CDP 7.1.9 SP1

You can perform In-place one-stage upgrade from HDP 3 to CDP 7.1.9 SP1 using CMA 3.3. For more information, see HDP 3 to CDP documentation.

Upgrade HDP 2 to CDP 7.1.9 SP1

You can perform In-place one-stage upgrade from HDP 2 to CDP 7.1.9 SP1 using CMA 3.3. However, you must first upgrade from HDP 2 to CDP 7.1.8 or CDP 7.1.7 SP2 and then upgrade to CDP 7.1.9 SP1. For more information, see HDP 2 to CDP documentation.

Rollback CDP CDP 7.1.9 SP1 to CDP 7.1.9

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to CDP 7.1.9. The rollback restores your CDP cluster to the state it was in before the upgrade, including Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to CDP 7.1.9 documentation.

Rollback CDP 7.1.9 SP1 to CDP 7.1.8 latest Cumulative hotfix

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to CDP 7.1.8 latest Cumulative hotfix. The rollback restores your CDP cluster to the state it was in before the upgrade, including the Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to CDP 7.1.8 latest Cumulative hotfix documentation.

Rollback CDP 7.1.9 SP1 to CDP 7.1.7 SP3

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to CDP 7.1.7 SP3. The rollback restores your CDP cluster to the state it was in before the upgrade, including the Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to CDP 7.1.7 SP3 documentation.

Rollback CDP 7.1.9 SP1 to CDH 6.3.x

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to CDH 6.3.x. The rollback restores your CDH cluster to the state it was in before the upgrade, including the Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to CDH 6.3.x documentation.

Rollback CDP 7.1.9 SP1 to CDH 6.2.x

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to CDH 6.2.x. The rollback restores your CDH cluster to the state it was in before the upgrade, including the Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to CDH 6.2.x documentation.

Rollback CDP 7.1.9 SP1 to HDP 3 or HDP 2

You can downgrade or rollback an upgrade from CDP Private Cloud Base 7.1.9 SP1 to HDP 3 or HDP 2. The rollback restores your CDH cluster to the state it was in before the upgrade, including the Kerberos and TLS/SSL configurations. For more information, see Rollback CDP 7.1.9 SP1 to HDP 3 or HDP 2 documentation.

Atlas

Compression algorithm changed for Atlas HBase tables
The default compaction algorithm was changed from Gzip to Snappy for Atlas HBase tables. This can result in up to 50% lower compaction time. For more information, see the compressionType property in HBase entities created in Atlas
Atlas diagnostic bundle introduced
Atlas can provide you the necessary diagnostic data in a single GZ file inside the diagnostic zip package to help you and the Cloudera Support to troubleshoot problems. For more information on downloading and sending the compressed logs from the Cloudera Manager, see Manually Triggering Collection and Transfer of Diagnostic Data to Cloudera.
Spark connector available
With the new Spark connector you can run Spark job for Atlas even when Kafka brokers are not available. For more information. see Spark connector configuration in Apache Atlas.
Relationship Search is configurable
By default, Relationship Search is disabled by the new property to save time when starting or restarting Atlas. A new property is introduced in Atlas to configure Relationship Search. For more information, see Using Relationship search.
Chinese, Japanese and Korean alphabet supported in searches
Chinese, Japanese and Korean characters are supported in the following search scenarios:
  • Any type based searches including filter operators together with AND, OR filter operators
  • Searches with multiple attribute filters, such as classification and description text
  • Searches involving tags and operators
  • Searches with custom attribute filters
  • Searches including the "*" wildcard character
For more information on searching in Atlas, see Using Free-text Search.
Atlas On-prem to On-prem Replication (Technical Preview)

Replicate governance metadata provided by Cloudera Atlas and data lineage using replication policies in Replication Manager.

This feature is available in this version of CDP but is not ready for production deployment. Cloudera encourages you to explore this technical preview feature in non-production environments and provide feedback on your experiences through the Cloudera Community Forums or through your account managers and field team contacts. For more information regarding limitations and unsupported features, see Atlas replication policies (technical preview).

Custom audit filter updates
The DELETE API call was updated for the single rule deletion. For more information, see Using custom audit filters.
Audit aging enhancements and bug fixes
  • Added note on use of regular expressions for configuring custom audit aging. For more information, see Using custom audit aging.
  • Added note that by default, Audit Aging is disabled through REST API. For more information, see Using audit aging.
  • Changed the default behavior of the atlas.audit.sweep.out.entity.types and atlas.audit.sweep.out.action.types properties. They are no longer applicable by default with sweep out option enabled. For more information, see Using Sweep out configurations.

Hue

Added support for Python 3.10
Hue is now supported with Python 3.10 on Ubuntu 22, SLES 15 SP4, and SLES 15 SP5 operating systems. See Installing Python 3.
Added support for PostgreSQL 16
Hue supports PostgreSQL 16 as its backend database.
Added support for Oracle 21c and 23c
Hue supports Oracle 23c (LTS, Latest) and Oracle 21c (Latest) as its backend database. For the supported cx_Oracle information and installation instructions, see Using Oracle database with Hue.
Security improvements
  • Upgraded the Interactive Python (IPython) command-line shell to version 8.10.0 to address CVE 2023-24816.
  • Upgraded the Python library to version 3.10 to resolve the ReDoS (Regular expression Denial of Service) attacks and address CVE 2022-42969.
  • Upgraded SQLParse to version 0.4.4 to prevent the ReDoS attacks and address CVE 2023-30608.
Added DEBUG-level logging in Hue server logs
The Hue servers logs now contain DEBUG-level information in addition to INFO-level information. Cloudera has fixed multiple logging issues, overall. Only INFO-level logging is enabled by default. However, you can turn on DEBUG-level logging from Cloudera Manager. See Enabling debug logging for Hue server logs.
The requirement of installing psycopg2 package from the source on a FIPS-enabled cluster
If you use PostgreSQL as a backend database for Hue on a FIPS cluster on RHEL 8, you must install a version of the psycopg2 package from the source to be at least 2.9.5 on all Hue hosts because the psycopg2-binary package uses its version of the libssl library file which does not support FIPS. See Installing the psycopg2 Python package for PostgreSQL database on a FIPS cluster (RHEL 8).
Upgrade considerations
If you are using MySQL as a backend database for Hue, then you must perform the upgrade tasks in the following order to avoid potential issues:
  1. Install Python 3.10 on all Hue hosts.
  2. Install the MySQL Client package.
  3. Stop the Hue service.
  4. Upgrade Cloudera Manager to CM 7.11.3 CHF7.
  5. Start Cloudera Manager.
  6. Upgrade CDP Private Cloud Base to 7.1.9 SP1.
  7. Start the Hue service.
For detailed instructions about each step, see Upgrading CDP Private Cloud Base to a higher version.
Zero downtime upgrade (ZDU) considerations
If you are using the ZDU approach to upgrade to 7.1.9 SP1, then you must start Hue only after upgrading Cloudera Manager to 7.11.3 CHF 7, Python to version 3.10, and CDP Private Cloud Base to 7.1.9 SP1. Hue does not start after upgrading only Cloudera Manager and Python but not CDP.

Hive

Added support for Hive DatabaseType in JDBC storage handler
You can now connect to Apache Hive using JdbcStorageHandler to access the hive data source. For more information, see Using JdbcStorageHandler to query external database.

HBase

HBase supports truncating the regions in a table
You can now truncate individual regions of an HBase table using the truncate_region command.
The command syntax is as follows.
truncate_regionREGIONNAME
truncate_regionENCODED_REGIONNAME
For example,
hbase:008:0> list_regions 'employee'
                                                   SERVER_NAME |                                                REGION_NAME |  START_KEY |    END_KEY |  SIZE |   REQ |   LOCALITY |
 ------------------------------------------------------------- | ---------------------------------------------------------- | ---------- | ---------- | ----- | ----- | ---------- |
 ccycloud-4.nightly-7x-by.root.comops.site,22101,1718869191555 |  employee,,1718877308795.66828b0fe6ceda3e28608617eb6f6b3f. |            |          2 |     1 |     2 |        1.0 |
 ccycloud-2.nightly-7x-by.root.comops.site,22101,1718869191308 | employee,2,1718877308795.ff9b19452fecea6353694583e3473b5b. |          2 |            |     1 |     2 |        1.0 |
 2 rows
Took 0.1088 seconds
hbase:014:0> truncate_region 'employee,2,1718877308795.ff9b19452fecea6353694583e3473b5b.'
Took 0.6236 seconds
hbase:010:0> truncate_region 'ff9b19452fecea6353694583e3473b5b'
Took 0.6500 seconds

Kafka

Configurations to customize replication-records-lag metric calculation

Three new properties are introduced that enable you to control how SRM calculates the replication-records-lag metric. This metric provides information regarding the replication lag based on offsets. The metric is available both on the cluster and the topic level. The following new properties are introduced because the calculation of the metric with default configurations might add latency to replications and impact SRM performance. While these properties are configured in Cloudera Manager, they do not have dedicated configuration entries. Instead, you add them to Streams Replication Manager's Replication Configs to configure them.

Property Default Value Description
replication.records.lag.calc.enabled true Controls whether the replication-records-lag metric is calculated. This metric provides information regarding the replication lag based on offsets. The metric is available both on the cluster and the topic level. The calculation of this metric might add latency to replications and impact SRM performance. If you are experiencing performance issues, you can try setting this property to false to disable the calculation of replication-records-lag. Alternatively, you can try fine-tuning how SRM calculates replication-records-lag with the replication.records.lag.calc.period.ms and replication.records.lag.end.offset.timeout.ms properties.
replication.records.lag.calc.period.ms 0 Controls how frequently SRM calculates the replication-records-lag metric. The default value of 0 means that the metric is calculated continuously. Cloudera recommends configuring this property to 15000 ms (15 seconds) or higher if you are experiencing issues related to the calculation of replication-records-lag. A calculation frequency of 15 seconds or more results in the metric being available for consumption without any significant impact on SRM performance.
replication.records.lag.end.offset.timeout.ms 6000 Specifies the Kafka end offset timeout value used for calculating the replication-records-lag metric. Setting this property to a lower value than the default 6000 ms (1 minute) might reduce latency in calculating replication-records-lag, however, replication-records-lag calculation might fail. A value higher than the default can help avoid metric calculation failures, but might increase replication latency and lower SRM performance.
Kafka connect fails to start on FIPS cluster
New properties are introduced for Kafka and Kafka Connect. These properties enable you to fine-tune the retry behavior of the Kafka and Kafka Connect Ranger plugins. Configuring these properties can help you avoid retry and timeout-related communication failures between the Kafka and Range services. The properties introduced are as follows:
  • Ranger Kafka Plugin Policy Rest Client Retry Interval Ms
  • Ranger Kafka Plugin Policy Rest Client Max Retry Attempts
  • Ranger Kafka Connect Plugin Policy Rest Client Retry Interval Ms
  • Ranger Kafka Connect Plugin Policy Rest Client Max Retry Attempts
For more information, see Kafka Properties in Cloudera Runtime 7.1.9.

Kudu

Added support for Python 3.10
Kudu now supports Python 3.10. For more information regarding python setup see Kudu Python client.

Impala

Added support for ZDU
Impala now supports zero-downtime upgrades (ZDU) in High Availability clusters. For details, see Configuring components before starting ZDU.
Added support for Impala High Availability
Impala now supports High Availability (HA) by deploying pairs of StateStore and Catalog instances in primary/standby mode, ensuring continuous operation during failures. For details, see Configuring Impala for High Availability.

Key HSM

CDPD-66512: Enabled FIPS for Luna 7 server
Added support for Key HSM with FIPS enabled Luna Server.

Key Trustee Server

KT-7452: Support added for RHEL 9
Added RHEL 9 support for KTS. You should manually install Python 2 and crontabs package in order to run keytrustee-server on RHEL 9.

Navigator Encrypt

CDPD-61686: Support added for RHEL 9.2
Added RHEL 9.2 support for Navigator Encrypt.
KT-7440: Support added for Ubuntu 22
Added Ubuntu 22 support for NavEncrypt.
KT-7513: Navigator Encrypt generates keytab file when working with KMS
When using KMS as the key manager, Navigator Encrypt generates a keytab file for it to use to authenticate with Ranger KMS.

Ranger

There are no new features for Ranger.

Ranger KMS

CDPD-66512: Enabled FIPS for Luna 7 server
Added support for Ranger KMS and Ranger KMS KTS with FIPS enabled Luna Server.
CDPD-67530: Support added for FIPS on JDK17
Ranger KMS now supports FIPS on JDK17.
CDPD-67536: Support added for Python 3.10
Ranger KMS now supports Python 3.10.
CDPD-68973/CDPD-68284: Support added for Oracle23c database
Ranger KMS supports Oracle23c database.
CDPD-69178: Support for ext libraries in higher versions of java
During Luna client installation, you need to copy Luna API and jar files to the JRE ext library. This ext directory does not exist in Java11 (or Java9+). Cloudera added support for ext libraries in higher versions of java.

Sqoop

Secure options to provide Hive password during a Sqoop import
When importing data into Hive using Sqoop and if LDAP authentication is enabled for Hive, the necessity to set the Hive password parameter directly in the command-line poses a potential vulnerability. Passwords provided in plaintext within command-line interfaces are susceptible to unauthorized access or interception, compromising sensitive credentials and, subsequently, the security of the entire data transfer process.

Learn about the secure options that you can use to provide the Hive password during Sqoop-Hive imports instead of the earlier way of providing the password as plaintext in the command-line interface. For more information, see Secure options to provide Hive password during a Sqoop import.

CDS 3.3 Powered by Apache Spark

Apache Spark 3 integration with Schema Registry
Apache Spark 3 integrated with Schema Registry provides a library to leverage Schema Registry for managing Spark schemas and to serialize and/or de-serialize messages in Spark data sources and sinks. For more information, see Apache Spark 3 integration with Schema Registry.

YARN Queue Manager

Support for YARN Queue Manager data Migration from H2 to PostgreSQL database
You can now migrate YARN Queue Manager from an H2 database to a PostgreSQL database post-installation or upgrade. All your existing data in the H2 database will be transferred to PostgreSQL. The YARN Queue Manager will establish connections using the PostgreSQL database.
For more information, see Migrating from H2 to PostgreSQL database in YARN Queue Manager.
YARN Queue Manager UI behaviour in mixed resource allocation mode
The mixed resource allocation mode in YARN is supported through safety valves. If you open the Queue Manager UI or access the Queue Manager APIs while this mode is active, the Queue Manager blocks access and informs you that mixed resource allocation is active and the Queue Manager is inaccessible until complete compatibility is achieved.
For more information, see YARN Queue Manager UI behavior in mixed resource allocation mode.