Hive Configuration Requirements and Recommendations

You need to set certain Hive and HiveServer (HS2) configuration properties after upgrading. You review recommendations for setting up CDP Private Cloud Base for your needs, and understand which configurations remain unchanged after upgrading, which impact performance, and default values.

Requirements and Recommendations

The following table includes the Hive service and HiveServer properties that the upgrade process changes. Other property values (not shown) are carried over unchanged from CDH or HDP to CDP
  • Set After Upgrade column: properties you need to manually configure after the upgrade to CDP. Pre-existing customized values are not preserved after the upgrade.
  • Default Recommended column: properties that the upgrade process changes to a new value that you are strongly advised to use.
  • Impacts Performance column: properties changed by the upgrade process that you set to tune performance.
  • Safety Value Overrides column: How the upgrade process handles Safety Valve overrides.
    • Disregards: the upgrade process removes any old CDH Safety Valve configuration snippets from the new CDP configuration.
    • Preserves means the upgrade process carries over any old CDH snippets to the new CDP configuration.
    • Not applicable means the value of the old parameter is preserved.
  • Visible in CM column: property is visible in Cloudera Manager after upgrading.

    If a property is not visible, and you want to configure it, use the Cloudera Manager Safety Valve (see link below) to safely add the parameter to the correct file, for example to a cluster-wide, hive-site.xml file.

Table 1.
Property Set After Upgrade Default Recommended Impacts Performance New Feature Safety Valve Overrides Visible in CM
datanucleus.connectionPool.maxPoolSize Preserve
datanucleus.connectionPoolingType Disregard
hive.async.log.enabled Disregard Not applicable Preserve Preserve
hive.cbo.enable Disregard Disregard
hive.compactor.worker.threads Disregard
hive.compute.query.using.stats Disregard
hive.conf.hidden.list Disregard
hive.conf.restricted.list Disregard
hive.default.fileformat.managed Disregard
hive.default.rcfile.serde Preserve
hive.driver.parallel.compilation Disregard
hive.exec.dynamic.partition.mode Disregard
hive.exec.max.dynamic.partitions Preserve
hive.exec.max.dynamic.partitions.pernode Preserve Disregard
hive.exec.reducers.max ✓ or other prime number Not applicable
hive.execution.engine Disregard
hive.fetch.task.conversion Not applicable
hive.fetch.task.conversion.threshold Not appliable
hive.hashtable.key.count.adjustment Preserve
hive.limit.optimize.enable Disregard
hive.limit.pushdown.memory.usage Not Applicable
hive.mapjoin.hybridgrace.hashtable Disregard
hive.mapred.reduce.tasks.speculative.execution Disregard
hive.metastore.aggregate.stats.cache.enabled Disregard
hive.metastore.disallow.incompatible.col.type.changes Disregard Disregard
hive.metastore.event.message.factory Disregard
hive.metastore.uri.selection Disregard
hive.metastore.warehouse.dir Preserve
hive.optimize.metadataonly Disregard
hive.optimize.point.lookup.min Disregard
hive.prewarm.numcontainers Disregard
hive.script.operator.env.blacklist Disregard Disregard Disregard
hive.server2.enable.doAs Disregard
hive.server2.idle.session.timeout Not applicable
hive.server2.max.start.attempts Preserve Preserve Disregard
hive.server2.tez.initialize.default.sessions Disregard
hive.server2.thrift.max.worker.threads Not Applicable
hive.server2.thrift.resultset.max.fetch.size Preserve
hive.service.metrics.file.location Disregard
hive.stats.column.autogather Disregard
hive.stats.deserialization.factor Disregard Disregard Disregard
hive.tez.bucket.pruning Disregard
hive.tez.container.size Disregard
hive.tez.exec.print.summary Disregard
hive.txn.manager Disregard
hive.vectorized.execution.mapjoin.minmax.enabled Disregard Disregard
hive.vectorized.use.row.serde.deserialize Disregard