List of configurations copied from the base cluster to CDW on Private Cloud

The Cloudera Data Warehouse (CDW) data service on Private Cloud has different configurations than the base cluster. When you activate an environment in CDW, configurations such as default file format, compression type, and transactional type are copied from the base cluster to CDW by default. This enables workload migration from base clusters to CDW data service.

Understanding the scenarios in which the configurations are copied from base to CDW

If you upgrade the platform from 1.5.0 to 1.5.1, for example, then the configuration of an existing environments stays the same as before. The configurations are not copied from the base cluster. To copy configurations from the base cluster, you must reactivate the environment.

On CDW environments that have received the base cluster configurations: If you change the configurations on the base cluster, refresh the Virtual Warehouse to obtain the updates base-cluster configurations by clicking > Refresh on the Virtual Warehouse tile.

The CDW web interface displays all the current configurations. If the Impala or the Hive on Tez service does not exist on the base cluster or, if the specific configuration is empty on the base cluster, then the default values from the Virtual Warehouse are used.

If you do not want to use the base cluster configuration, then you can disable the Copy configurations from base cluster to CDW option from the Advanced Configurations > Advanced Settings page before activating the environment.

The following table provides the list of base cluster Impala configurations that are be copied to CDW upon activating the environment:
Table 1. Base cluster configuration for Impala
Configuration category Base cluster configuration Description
Default query option (default_query_options) default_file_format The default file format for the CREATE TABLE statement, for example Parquet. The default value is Parquet.
default_transactional_type The default transactional type, for example insert_only or none. Creates insert-only ACID tables by default. Does not apply to external tables. Default value is insert_only.
timezone Defines the timezone used for conversions between UTC and the local time. If not set, Impala uses the system time zone where the coordinator Impalad runs. As query options are not sent to the Coordinator immediately, the timezones are validated only when the query runs.
parquet_array_resolution Controls the behavior of the indexed-based resolution for nested arrays in Parquet.
parquet_fallback_schema_resolution Allows Impala to look up columns within Parquet files by column name, rather than column order, when necessary. The allowed values are: POSITION (0) and NAME (1).
allow_erasure_coded_files Enables or disables the support for erasure coded files in Impala. The default value is false. When set to false, Impala returns an error when a query requires scanning an erasure coded file.
max_row_size Ensures that Impala can process rows of at least the specified size. Applies when constructing intermediate or final rows in the result set. Used to prevent out-of-control memory use when accessing columns containing huge strings.
compression_codec The underlying compression for Parquet data files when Impala writes them using the INSERT statement.
Startup option for Impala daemon (impalad) fe_service_threads Specifies the maximum number of concurrent client connections or threads allowed to serve client requests. If this option is not set on the base cluster, then the default value used is 96.

Ensure that the value of this property is at least 96. A lower value can degrade performance.

Timeout options idle_query_timeout Sets the idle query timeout value for the session, in seconds. It is copied from the base cluster if it is greater than 0. If this option is not set on the base cluster, then the default value is 600.
idle_session_timeout The time in seconds after which an idle session is cancelled. It is copied from the base cluster if it is greater than 0. If this option is not set on the base cluster, then the default value is 1200.
TLS/SSL version and ciphers ssl_minimum_version Controls the allowed versions of TLS/SSL used by Impala. Starting with Impala 4.0, the default value is tlsv1.2.
ssl_cipher_list Used to specify the allowed set of TLS ciphers that are used by Impala.
The following table provides the list of base cluster Hive on Tez configurations that are copied to CDW upon activating the environment:
Table 2. Base cluster configuration for Hive on Tez
Base cluster configuration Description
hive.create.as.insert.only Used to specify whether the eligible tables should be created as ACID insert-only tables by default. Does not apply to external tables that use storage handlers. If this property is not set on the base cluster, then the default value is true.
hive.create.as.acid Used to specify whether the eligible tables should be created as full ACID tables by default. Does not apply to external tables that use storage handlers. If this property is not set on the base cluster, then the default value is true.
hive.default.fileformat The default file format for the CREATE TABLE statement. The default value is TextFile.
hive.default.fileformat.managed The default file format for the CREATE TABLE statement applied to the managed tables only. External tables are created with default file format. The default value is ORC.
hive.local.time.zone Sets the timezone for displaying and interpreting time stamps. If the value of this property is either set to LOCAL, is not specified, or is an incorrect timezone, then the system default timezone is used.
hive.external.table.purge.default If set to true, it sets external.table.purge=true on the newly created external tables, which indicates that the table data should be deleted when the table is dropped. If set to false, it maintains the existing behavior in which the external tables do not delete data when the table is dropped.