Documentation Errata in Cloudera Runtime 7.1.7 SP1

You must be aware of the major enhancements or changes and additions or corrections to the components in Cloudera Runtime 7.1.7 SP1. Learn how the new improvements benefit you.

Hive

HWC Secure access mode: As part of the Cloudera Runtime 7.1.7 SP1 release, Hive Warehouse Connector (HWC) introduces the secure access mode that offers fine-grained access control (FGAC) column masking and row filtering to secure managed (ACID), or even external, Hive table data that you query from Spark. Secure access mode requires you to set up an HDFS staging location to temporarily store Hive files that users need to read from Spark. For details, see Reading data through HWC secure access mode.

Hue

  • You can now create a username for Hue that is 150 characters long. There is no longer a restriction of 30 characters for the Hue username.
  • You may not be able to use the pip command in CDP releases 7.1.7 and above and may see the following error when using pip in a command: “ImportError: cannot import name chardet”. For a workaround, see Unable to use pip command in CDP.

Kudu

A new tool kudu master unsafe_rebuild is added to reconstruct the master catalog from tablet metadata collected from tablet servers. This can be used in emergencies to restore access to tables when all masters are unavailable.

Ranger

Ranger has added support for the GCP Cloud and CipherTrust HSMs and enhanced the encryption algorithms supported for the Luna HSM. For details, see Integrating Components for Encrypting Data at Rest.

Platform Support Enhancements

New DB Versions: Maria DB 10.5 and PostgreSQL 14. For more information, see Cloudera Support Matrix.

Streams Replication Manager

The SRM Driver can now write the origin offset into the record header
SRM now supports a diagnostic feature in which the source offset of the replicated records are written into the headers. The feature can be turned on by setting copy.source.offset.in.header.enabled to true. When enabled, the source offset is written into a header named mm2-source-offset in binary format. The schema of the header payload is available in the connect:mirror-client package, the class name is org.apache.kafka.connect.mirror.SourceOffsets. This feature is only recommended for diagnostic purposes, as the header change increases the size of the replica topic.
SRM now waits for latest offset syncs and does not set the consumer offset into the future
The MirrorCheckpointConnector now checks the latest message in the offset sync topic at startup, and does not emit a checkpoint message until it has read from the beginning all the messages prior and including that last message.

As a part of this improvement, a new configuration property, emit.checkpoints.end.offset.protection is introduced. When this property is enabled, the MirrorCheckpointTask checks the end offset of the replicated topic prior to emitting a checkpoint, and limits the replicated offset to be maximum that value. With this behavior enabled, SRM no longer encounters an issue where in certain situations the replicated offset could be higher than the end offset of the replicated topic, producing a negative lag. The property is enabled by default, but can be configured using the Streams Replication Manager's Replication Configs property.