Behavioral changes in Apache Hive

Learn about the change in certain functionality of Hive that has resulted in a change in behavior from the previously released version to this version of Cloudera Runtime.

Summary:
Change in the way dates are parsed from string by ignoring trailing invalid characters
Previous behavior:
HIVE-20007 introduced changes in the way dates were parsed from strings. SQL functions or date operations involving invalid dates returned "null".
New behavior:
HIVE-27586 extracts and returns a valid date from a string value if there is a valid date prefix in the string. This fix partially restores the behavior changes introduced as part of HIVE-20007 and also makes the current behavior of handling trailing invalid characters more consistent.
The following table illustrates the behavior changes before and after the fix:
String value Behavior (before HIVE-20007) Previous behavior (after HIVE-20007) Current behavior (after HIVE-27586)
2023-08-03_16:02:00 2023-08-03 null 2023-08-03
2023-08-03-16:02:00 2023-08-03 null 2023-08-03
2023-08-0316:02:00 2024-06-11 null 2023-08-03
03-08-2023 0009-02-12 null 0003-08-20
2023-08-03 GARBAGE 2023-08-03 2023-08-03 2023-08-03
2023-08-03TGARBAGE 2023-08-03 2023-08-03 2023-08-03
2023-08-03_GARBAGE 2023-08-03 null 2023-08-03
This change affects various Hive SQL functions and operators that accept dates from string values, such as CAST (V AS DATE), CAST (V AS TIMESTAMP), TO_DATE, DATE_ADD, DATE_DIFF, WEEKOFYEAR, DAYOFWEEK, and TRUNC.
Summary:
Change in the way date and timestamp values are parsed.
Previous behavior:
Some of the Hive date and timestamp functions used the SimpleDateFormat class for formatting and parsing date and timestamp. For more information, refer to the SimpleDateFormat class Javadocs.
New behavior:
The following Hive date and timestamp functions are now enhanced to use the DateTimeFormatter class for printing and parsing date and timestamp objects. For more information, refer to the DateTimeFormatter class Javadocs.
  • unix_timestamp(): This function is enhanced to use the DateTimeFormatter class for String format dates instead of the SimpleDateFormat class. For details, see HIVE-25458.
  • from_unixtime(): This function is now enhanced to consider leap seconds. For details, see HIVE-25403.
  • date_format(): This function previously returned the output in UTC time zone and is now enhanced to display the default user session time zone. For details, see HIVE-25093.
  • cast(): This function is enhanced to display NULL when an incorrect date or timestamp is casted. Prior to this enhancement, when an incorrect date was casted, the function returned a converted value. For example, cast ('2020-20-20' as date) resulted in '2021-08-20' instead of NULL.

    This is because the DateTimeFormatter class that is used to parse string into date or timestamp was set to ResolverStyle.LENIENT. This is now updated to use ResolverStyle.STRICT and returns NULL when an invalid date or timestamp is casted. For details, see HIVE-25306.

For more details about the syntax and behavior of these functions, see Hive LanguageManual UDF.

Summary:
Change in default value of the hive.server2.tez.initialize.default.sessions property
Previous behavior:
The default value for the hive.server2.tez.initialize.default.sessions property is set to "true"
New behavior:
The default value for the hive.server2.tez.initialize.default.sessions property is changed to "false" to prevent queries from waiting on the same Tez AM pool and thereby improving query performance.

If there are multiple queries running, you might notice that the queries are taking longer to complete because the default value for hive.server2.tez.sessions.per.default.queue is 1, which means only one query can run at a time. Therefore, depending on your resource availability and query concurrency/load on the server, you can set hive.server2.tez.initialize.default.sessions to "true" and increase the value of hive.server2.tez.sessions.per.default.queue.

Summary:
Change in default value of the hive.optimize.sort.dynamic.partition.threshold property
Previous behavior:
The default value for the hive.optimize.sort.dynamic.partition.threshold property is set to "0"
New behavior:
The default value for the hive.optimize.sort.dynamic.partition.threshold property is changed to "-1" to disable cost-based optimization entirely. This helps in addressing performance issues while loading data to Hive tables.