List of configurations copied from the base cluster to CDW on Private Cloud
The Cloudera Data Warehouse (CDW) data service on Private Cloud has different configurations than the base cluster. When you activate an environment in CDW, configurations such as default file format, compression type, and transactional type are copied from the base cluster to CDW by default. This enables workload migration from base clusters to CDW data service.
Understanding the scenarios in which the configurations are copied from base to CDW
If you upgrade the platform from 1.5.0 to 1.5.1, for example, then the configuration of an existing environments stays the same as before. The configurations are not copied from the base cluster. To copy configurations from the base cluster, you must reactivate the environment.
On CDW environments that have received the base cluster configurations: If you change the configurations on the base cluster, refresh the Virtual Warehouse to obtain the updates base-cluster configurations by clicking
on the Virtual Warehouse tile.The CDW web interface displays all the current configurations. If the Impala or the Hive on Tez service does not exist on the base cluster or, if the specific configuration is empty on the base cluster, then the default values from the Virtual Warehouse are used.
If you do not want to use the base cluster configuration, then you can disable the Copy configurations from base cluster to CDW option from the page before activating the environment.
Configuration category | Base cluster configuration | Description |
---|---|---|
Default query option
(default_query_options ) |
default_file_format |
The default file format for the CREATE TABLE statement, for example Parquet. The default value is Parquet. |
default_transactional_type |
The default transactional type, for example insert_only or
none . Creates insert-only ACID tables by default. Does not
apply to external tables. Default value is insert_only . |
|
timezone |
Defines the timezone used for conversions between UTC and the local time. If not set, Impala uses the system time zone where the coordinator Impalad runs. As query options are not sent to the Coordinator immediately, the timezones are validated only when the query runs. | |
parquet_array_resolution |
Controls the behavior of the indexed-based resolution for nested arrays in Parquet. | |
parquet_fallback_schema_resolution |
Allows Impala to look up columns within Parquet files by column name, rather
than column order, when necessary. The allowed values are: POSITION
(0) and NAME (1) . |
|
allow_erasure_coded_files |
Enables or disables the support for erasure coded files in Impala. The
default value is false . When set to false ,
Impala returns an error when a query requires scanning an erasure coded
file. |
|
max_row_size |
Ensures that Impala can process rows of at least the specified size. Applies when constructing intermediate or final rows in the result set. Used to prevent out-of-control memory use when accessing columns containing huge strings. | |
compression_codec |
The underlying compression for Parquet data files when Impala writes them using the INSERT statement. | |
Startup option for Impala daemon (impalad) | fe_service_threads |
Specifies the maximum number of concurrent client connections or threads
allowed to serve client requests. If this option is not set on the base cluster,
then the default value used is 96. Ensure that the value of this property is at least 96. A lower value can degrade performance. |
Timeout options | idle_query_timeout |
Sets the idle query timeout value for the session, in seconds. It is copied from the base cluster if it is greater than 0. If this option is not set on the base cluster, then the default value is 600. |
idle_session_timeout |
The time in seconds after which an idle session is cancelled. It is copied from the base cluster if it is greater than 0. If this option is not set on the base cluster, then the default value is 1200. | |
TLS/SSL version and ciphers | ssl_minimum_version |
Controls the allowed versions of TLS/SSL used by Impala. Starting with Impala
4.0, the default value is tlsv1.2 . |
ssl_cipher_list |
Used to specify the allowed set of TLS ciphers that are used by Impala. |
Base cluster configuration | Description |
---|---|
hive.create.as.insert.only |
Used to specify whether the eligible tables should be created as ACID
insert-only tables by default. Does not apply to external tables that use storage
handlers. If this property is not set on the base cluster, then the default value
is true . |
hive.create.as.acid |
Used to specify whether the eligible tables should be created as full ACID
tables by default. Does not apply to external tables that use storage handlers. If
this property is not set on the base cluster, then the default value is
true . |
hive.default.fileformat |
The default file format for the CREATE TABLE statement. The default value is
TextFile . |
hive.default.fileformat.managed |
The default file format for the CREATE TABLE statement applied to the managed
tables only. External tables are created with default file format. The default
value is ORC . |
hive.local.time.zone |
Sets the timezone for displaying and interpreting time stamps. If the value
of this property is either set to LOCAL , is not specified, or is
an incorrect timezone, then the system default timezone is used. |
hive.external.table.purge.default |
If set to true , it sets
external.table.purge=true on the newly created external tables,
which indicates that the table data should be deleted when the table is dropped.
If set to false , it maintains the existing behavior in which the
external tables do not delete data when the table is dropped. |