Apache Hive Incompatible Changes and Limitations

Metastore schema upgrade: CDH 5.2.0 includes Hive version 0.13.1. Upgrading from an earlier Hive version to Hive 0.13.1 or later requires a metastore schema upgrade.

CDH 5 includes a new offline tool called schematool; Cloudera recommends you use this tool to upgrade your metastore schema. See Using the Hive Schema Tool in CDH for more information.

Hive upgrade: Upgrading Hive from an earlier CDH 5.x release to CDH 5.2 or later, requires several manual steps. Follow the upgrade guide closely. See Upgrading Hive.

Incompatible changes between any earlier CDH version and CDH 5.4.x:

  • CDH 5.2.0 and later clients cannot communicate with CDH 5.1.x and earlier servers. This means that you must upgrade the server before the clients.
  • As of CDH 5.2.0, DESCRIBE DATABASE returns additional fields: owner_name and owner_type. The command will continue to behave as expected if you identify the field you're interested in by its (string) name, but could produce unexpected results if you use a numeric index to identify the field(s).
  • CDH 5.2.0 implements HIVE-6248 , which includes some backward-incompatible changes to the HCatalog API.
  • The CDH 5.2 Hive JDBC driver is not wire-compatible with the CDH 5.1 version of HiveServer2. Make sure you upgrade Hive clients and all other Hive hosts in tandem: the server first, and then the clients.
  • HiveServer 1 is deprecated as of CDH 5.3, and will be removed in a future release of CDH. Users of HiveServer 1 should upgrade to HiveServer 2 as soon as possible. For more information, see HiveServer 2.
  • org.apache.hcatalog is deprecated as of CDH 5.3. All client-facing classes were moved from org.apache.hcatalog to org.apache.hive.hcatalog as of CDH 5.0 and the deprecated classes in org.apache.hcatalog will be removed altogether in a future release. If you are still using org.apache.hcatalog, you should move to org.apache.hive.hcatalog immediately.
  • Date partition columns: as of Hive version 13, implemented in CDH 5.2, Hive validates the format of dates in partition columns, if they are stored as dates. A partition column with a date in invalid form can neither be used nor dropped once you upgrade to CDH 5.2 or higher. To avoid this problem, do one of the following:
    • Fix any invalid dates before you upgrade. Hive expects dates in partition columns to be in the form YYYY-MM-DD.
    • Store dates in partition columns as strings or integers.
    You can use the following SQL query to find any partition-column values stored as dates:
    SELECT "DBS"."NAME", "TBLS"."TBL_NAME", "PARTITION_KEY_VALS"."PART_KEY_VAL"
    FROM "PARTITION_KEY_VALS"
      INNER JOIN "PARTITIONS" ON "PARTITION_KEY_VALS"."PART_ID" = "PARTITIONS"."PART_ID"
      INNER JOIN "PARTITION_KEYS" ON "PARTITION_KEYS"."TBL_ID" = "PARTITIONS"."TBL_ID"
      INNER JOIN "TBLS" ON "TBLS"."TBL_ID" = "PARTITIONS"."TBL_ID"
      INNER JOIN "DBS" ON "DBS"."DB_ID" = "TBLS"."DB_ID"
        AND "PARTITION_KEYS"."INTEGER_IDX" ="PARTITION_KEY_VALS"."INTEGER_IDX"
        AND "PARTITION_KEYS"."PKEY_TYPE" = 'date';
  • Decimal precision and scale: As of CDH 5.4, Hive support for decimal precision and scale changes as follows:
    1. When decimal is used as a type, it means decimal(10, 0) rather than a precision of 38 with a variable scale.
    2. When Hive is unable to determine the precision and scale of a decimal type (for example in the case of non-generic User-Defined Function (UDF) that has an evaluate() method that returns decimal), a precision and scale of (38, 18) is assumed. In previous versions, a precision of 38 and a variable scale were assumed. Cloudera recommends you develop generic UDFs instead, and specify exact precision and scale.
    3. When a decimal value is assigned or cast to a different decimal type, rounding is used to handle cases in which the precision of the value is greater than that of the target decimal type, as long as the integer portion of the value can be preserved. In previous versions, if the value's precision was greater than 38 (the only allowed precision for the decimal type), the value was set to null, regardless of whether the integer portion could be preserved.
  • Deprecation of HivePassThrough serde formats: As of CDH 5.4, HIVE-8910 changes how the storage handler uses the HivePassThroughOutputFormat class. It removes the empty default constructor, which breaks org.apache.hadoop.util.ReflectionUtils.newInstance and throws a NoSuchMethodException. The workaround is to re-create the Hive tables without HivePassThrough serde formats.