Apache Impala Incompatible Changes and Limitations

The Impala version covered by this documentation library contains the following incompatible changes. These are things such as file format changes, removed features, or changes to implementation, default configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.

Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns whose names conflict with the new keywords. See Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Continue reading:

Incompatible Changes Introduced in CDH 5.14.x / Impala 2.11.x

The following new or changed behaviors could cause issues during upgrades or cause changed behavior from prior releases, possibly requiring changes to source code:

  • In the [impala] section of the .impalarc file, you can specify 0 or 1 wherever a Boolean value is required. Under the [impala.query_options] section, values that are expected to be Boolean must be true, false, 0, or 1, otherwise Impala reports an error. Formerly, unrecognized Boolean values would only cause a warning.

  • Arithmetic expressions involving both DECIMAL and FLOAT or DOUBLE arguments now produce results of DECIMAL type instead of DOUBLE.

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for CDH 5.14.

Incompatible Changes Introduced in CDH 5.13.x / Impala 2.10.x

The following new or changed behaviors could cause issues during upgrades or cause changed behavior from prior releases, possibly requiring changes to source code:

  • The buffer pool feature changes the way memory is allocated during a query, and the thresholds at which operators within a query might spill to disk. Although in general the result is less memory usage and less spilling, it is possible that some queries that worked before (even if those queries were slow and inefficient) might now fail, requiring you to adjust query options related to the buffer pool. In particular, if a table contains rows whose total data size is greater than 512 KB, you must increase the setting for the MAX_ROW_SIZE query option.

    After upgrading to CDH 5.13 / Impala 2.10, follow the instructions in CDH 5.13 / Impala 2.10 to check if your queries are affected by these changes and to modify your configuration settings if so. This advice is especially important for any users who have increased the --read_size configuration setting from its default value of 8 MB.

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for CDH 5.13.

Incompatible Changes Introduced in CDH 5.12.x / Impala 2.9.x

For the full list of issues closed in this release, including any that introduce behavior changes or incompatibilities, see the changelog for CDH 5.12.

Impala Incompatible Changes Introduced in CDH 5.11.x / Impala 2.9.x

Because the core Impala functionality in CDH 5.11.x is very close to that of CDH 5.10.x, there are no new incompatible changes for this release.

Impala Incompatible Changes Introduced in CDH 5.10.x / Impala 2.8.x

  • Llama support is removed completely from Impala. Related flags (--enable_rm) and query options (such as V_CPU_CORES) remain but do not have any effect.

    If --enable_rm is passed to Impala, a warning is printed to the log on startup.

  • The syntax related to Kudu tables includes a number of new reserved words, such as COMPRESSION, DEFAULT, and ENCODING, that might conflict with names of existing tables, columns, or other identifiers from older Impala versions. See Impala Reserved Words for the full list of reserved words.

  • The DDL syntax for Kudu tables, particularly in the CREATE TABLE statement, is different from the special impala_next fork that was previously used for accessing Kudu tables from Impala:

    • The DISTRIBUTE BY clause is now PARTITIONED BY.

    • The INTO N BUCKETS clause is now PARTITIONS N.

    • The SPLIT ROWS clause is replaced by different syntax for specifying the ranges covered by each partition.

  • The DESCRIBE output for Kudu tables includes several extra columns.

  • Non-primary-key columns can contain NULL values by default. The SHOW CREATE TABLE output for these columns displays the NULL attribute. There was a period during early experimental versions of Impala + Kudu where non-primary-key columns had the NOT NULL attribute by default.

  • The IGNORE keyword that was present in early experimental versions of Impala + Kudu is no longer present. The behavior of the IGNORE keyword is now the default: DML statements continue with warnings, instead of failing with errors, if they encounter conditions such as "primary key already exists" for an INSERT statement or "primary key already deleted" for a DELETE statement.

  • The replication factor for Kudu tables must be an odd number.

  • A UDF compiled into an LLVM IR bitcode module (.bc) might encounter a runtime error when native code generation is turned off by setting the query option DISABLE_CODEGEN=1. This issue also applies when running a built-in or native UDF with more than 20 arguments. See IMPALA-4432 for details. As a workaround, either turn native code generation back on with the query option DISABLE_CODEGEN=0, or use the regular UDF compilation path that does not produce an IR module.

Impala Incompatible Changes Introduced in CDH 5.9.x / Impala 2.7.x

  • Bug fixes related to parsing of floating-point values (IMPALA-1731 and IMPALA-3868) can change the results of casting strings that represent invalid floating-point values. For example, formerly a string value beginning or ending with inf, such as 1.23inf or infinite, now are converted to NULL when interpreted as a floating-point value. Formerly, they were interpreted as the special "infinity" value when converting from string to floating-point. Similarly, now only the string NaN (case-sensitive) is interpreted as the special "not a number" value. String values containing multiple dots, such as 3..141 or 3.1.4.1, are now interpreted as NULL rather than being converted to valid floating-point values.

Impala Incompatible Changes Introduced in CDH 5.8.x / Impala 2.6.x

  • The default for the RUNTIME_FILTER_MODE query option is changed to GLOBAL (the highest setting).

  • The RUNTIME_BLOOM_FILTER_SIZE setting is now only used as a fallback if statistics are not available; otherwise, Impala uses the statistics to estimate the appropriate size to use for each filter.

  • Admission control and dynamic resource pools are enabled by default. When upgrading from an earlier release, you must turn on these settings yourself if they are not already enabled. See Admission Control and Query Queuing for details about admission control.

  • Impala reserves some new keywords, in preparation for support for Kudu syntax: buckets, delete, distribute, hash, ignore, split, and update.

  • For Kerberized clusters, the Catalog service now uses the Kerberos principal instead of the operating sytem user that runs the catalogd daemon. This eliminates the requirement to configure a hadoop.user.group.static.mapping.overrides setting to put the OS user into the Sentry administrative group, on clusters where the principal and the OS user name for this user are different.

  • The mechanism for interpreting DECIMAL literals is improved, no longer going through an intermediate conversion step to DOUBLE:

    • Casting a DECIMAL value to TIMESTAMP DOUBLE produces a more precise value for the TIMESTAMP than formerly.

    • Certain function calls involving DECIMAL literals now succeed, when formerly they failed due to lack of a function signature with a DOUBLE argument.

  • Improved type accuracy for CASE return values. If all WHEN clauses of the CASE expression are of CHAR type, the final result is also CHAR instead of being converted to STRING.

  • The initial release of CDH 5.7 / Impala 2.5 sometimes has a higher peak memory usage than in previous releases while reading Parquet files. The following query options might help to reduce memory consumption in the Parquet scanner:
    • Reduce the number of scanner threads, for example: set num_scanner_threads=30
    • Reduce the batch size, for example: set batch_size=512
    • Increase the memory limit, for example: set mem_limit=64g
    You can track the status of the fix for this issue at IMPALA-3662.
  • The S3_SKIP_INSERT_STAGING query option, which is enabled by default, increases the speed of INSERT operations for S3 tables. The speedup applies to regular INSERT, but not INSERT OVERWRITE. The tradeoff is the possibility of inconsistent output files left behind if a node fails during INSERT execution. See S3_SKIP_INSERT_STAGING Query Option (CDH 5.8 or higher only) for details.

Certain features are turned off by default, to avoid regressions or unexpected behavior following an upgrade. Consider turning on these features after suitable testing:

  • Impala now recognizes the auth_to_local setting, specified through the HDFS configuration setting hadoop.security.auth_to_local. This feature is disabled by default; to enable it, specify --load_auth_to_local_rules=true in the impalad configuration settings.

  • A new query option, PARQUET_ANNOTATE_STRINGS_UTF8, makes Impala include the UTF-8 annotation metadata for STRING, CHAR, and VARCHAR columns in Parquet files created by INSERT or CREATE TABLE AS SELECT statements.

  • A new query option, PARQUET_FALLBACK_SCHEMA_RESOLUTION, lets Impala locate columns within Parquet files based on column name rather than ordinal position. This enhancement improves interoperability with applications that write Parquet files with a different order or subset of columns than are used in the Impala table.

Impala Incompatible Changes Introduced in CDH 5.7.x / Impala 2.5.x

  • The admission control default limit for concurrent queries (the max requests setting) is now unlimited instead of 200.

  • Multiplying a mixture of DECIMAL and FLOAT or DOUBLE values now returns DOUBLE rather than DECIMAL. This change avoids some cases where an intermediate value would underflow or overflow and become NULL unexpectedly. The results of multiplying DECIMAL and FLOAT or DOUBLE might now be slightly less precise than before. Previously, the intermediate types and thus the final result depended on the exact order of the values of different types being multiplied, which made the final result values difficult to reason about.

  • Previously, the _ and % wildcard characters for the LIKE operator would not match characters on the second or subsequent lines of multi-line string values. The fix for issue IMPALA-2204 causes the wildcard matching to apply to the entire string for values containing embedded \n characters. This could cause different results than in previous Impala releases for identical queries on identical data.

  • Formerly, all Impala UDFs and UDAs required running the CREATE FUNCTION statements to re-create them after each catalogd restart. In CDH 5.7 / Impala 2.5 and higher, functions written in C++ are persisted across restarts, and the requirement to re-create functions only applies to functions written in Java. Adapt any function-reloading logic that you have added to your Impala environment.

  • CREATE TABLE LIKE no longer inherits HDFS caching settings from the source table.

  • The SHOW DATABASES statement now returns two columns rather than one. The second column includes the associated comment string, if any, for each database. Adjust any application code that examines the list of databases and assumes the result set contains only a single column.

  • The output of the SHOW FUNCTIONS statement includes two new columns, showing the kind of the function (for example, BUILTIN) and whether or not the function persists across catalog server restarts. For example, the SHOW FUNCTIONS output for the _impala_builtins database starts with:

    +--------------+-------------------------------------------------+-------------+---------------+
    | return type  | signature                                       | binary type | is persistent |
    +--------------+-------------------------------------------------+-------------+---------------+
    | BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
    | DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
    | DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
    ...
    

Impala Incompatible Changes Introduced in CDH 5.6.x / Impala 2.4.x

The Impala feature set for CDH 5.6 is the same as for CDH 5.5. Therefore, there are no incompatible changes for Impala introduced in CDH 5.6.

Impala Incompatible Changes Introduced in CDH 5.5.x / Impala 2.3.x

  • If Impala encounters a Parquet file that is invalid because of an incorrect magic number, the query skips the file. This change is caused by the fix for issue IMPALA-2130. Previously, Impala would attempt to read the file despite the possibility that the file was corrupted.

  • Previously, calls to overloaded built-in functions could treat parameters as DOUBLE or FLOAT when no overload had a signature that matched the exact argument types. Now Impala prefers the function signature with DECIMAL parameters in this case. This change avoids a possible loss of precision in function calls such as greatest(0, 99999.8888); now both parameters are treated as DECIMAL rather than DOUBLE, avoiding any loss of precision in the fractional value. This could cause slightly different results than in previous Impala releases for certain function calls.

  • Formerly, adding or subtracting a large interval value to a TIMESTAMP could produce a nonsensical result. Now when the result goes outside the range of TIMESTAMP values, Impala returns NULL.

  • Formerly, it was possible to accidentally create a table with identical row and column delimiters. This could happen unintentionally, when specifying one of the delimiters and using the default value for the other. Now an attempt to use identical delimiters still succeeds, but displays a warning message.

  • Formerly, Impala could include snippets of table data in log files by default, for example when reporting conversion errors for data values. Now any such log messages are only produced at higher logging levels that you would enable only during debugging.

Impala Incompatible Changes Introduced in CDH 5.4.x

Changes to File Handling

Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any files with extensions .tmp or .copying are not considered part of the Impala table. The suffix matching is case-insensitive, so for example Impala ignores both .copying and .COPYING suffixes.

The log rotation feature in Impala 2.2.0 and higher means that older log files are now removed by default. The default is to preserve the latest 10 log files for each severity level, for each Impala-related daemon. If you have set up your own log rotation processes that expect older files to be present, either adjust your procedures or change the Impala -max_log_files setting. See Rotating Impala Logs for details.

Changes to Prerequisites

The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also worked on SSSE3-enabled processors.

Incompatible Changes Introduced in CDH 5.3.x

Changes to Prerequisites

Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to CDH 5.3 / Impala 2.1.

Changes to Output Format

The "small query" optimization feature introduces some new information in the EXPLAIN plan, which you might need to account for if you parse the text of the plan output.

New Reserved Words

New SQL syntax introduces additional reserved words: FOR, GRANT, REVOKE, ROLE, ROLES, INCREMENTAL. As always, see Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Incompatible Changes Introduced in Impala 2.0.5 / CDH 5.2.6

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.4 / CDH 5.2.5

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.3 / CDH 5.2.4

Incompatible Changes Introduced in Impala 2.0.2 / CDH 5.2.3

No incompatible changes.

Incompatible Changes Introduced in Impala 2.0.1 / CDH 5.2.1

  • The INSERT statement has always left behind a hidden work directory inside the data directory of the table. Formerly, this hidden work directory was named .impala_insert_staging . In Impala 2.0.1 and later, this directory name is changed to _impala_insert_staging . (While HDFS tools are expected to treat names beginning either with underscore and dot as hidden, in practice names beginning with an underscore are more widely supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory, adjust them to use the new name.

  • The abs() function now takes a broader range of numeric types as arguments, and the return type is the same as the argument type.

  • Shorthand notation for character classes in regular expressions, such as \d for digit, are now available again in regular expression operators and functions such as regexp_extract() and regexp_replace(). Some other differences in regular expression behavior remain between Impala 1.x and Impala 2.x releases. See Incompatible Changes Introduced in Impala 2.0.0 / CDH 5.2.0 for details.

Incompatible Changes Introduced in Impala 2.0.0 / CDH 5.2.0

Changes to Prerequisites

Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check the CPU level of the hosts in your cluster before upgrading to CDH 5.2 / Impala 2.0.

Changes to Query Syntax

The new syntax where query hints are allowed in comments causes some changes in the way comments are parsed in the impala-shell interpreter. Previously, you could end a -- comment line with a semicolon and impala-shell would treat that as a no-op statement. Now, a comment line ending with a semicolon is passed as an empty statement to the Impala daemon, where it is flagged as an error.

Impala 2.0 and later uses a different support library for regular expression parsing than in earlier Impala versions. Now, Impala uses the Google RE2 library rather than Boost for evaluating regular expressions. This implementation change causes some differences in the allowed regular expression syntax, and in the way certain regex operators are interpreted. The following are some of the major differences (not necessarily a complete list):

  • .*? notation for non-greedy matches is now supported, where it was not in earlier Impala releases.

  • By default, ^ and $ now match only begin/end of buffer, not begin/end of each line. This behavior can be overridden in the regex itself using the m flag.

  • By default, . does not match newline. This behavior can be overridden in the regex itself using the s flag.

  • \Z is not supported.

  • < and > for start of word and end of word are not supported.

  • Lookahead and lookbehind are not supported.

  • Shorthand notation for character classes, such as \d for digit, is not recognized. (This restriction is lifted in Impala 2.0.1, which restores the shorthand notation.)

Changes to Output Format

In Impala 2.0 and later, user() returns the full Kerberos principal string, such as user@example.com, in a Kerberized environment.

The changed format for the user name in secure environments is also reflected where the user name is displayed in the output of the PROFILE command.

In the output from SHOW FUNCTIONS, SHOW AGGREGATE FUNCTIONS, and SHOW ANALYTIC FUNCTIONS, arguments and return types of arbitrary DECIMAL scale and precision are represented as DECIMAL(*,*). Formerly, these items were displayed as DECIMAL(-1,-1).

Changes to Query Options

The PARQUET_COMPRESSION_CODEC query option has been replaced by the COMPRESSION_CODEC query option. See COMPRESSION_CODEC Query Option (CDH 5.2 or higher only) for details.

Changes to Configuration Options

The meaning of the --idle_query_timeout configuration option is changed, to accommodate the new QUERY_TIMEOUT_S query option. Rather than setting an absolute timeout period that applies to all queries, it now sets a maximum timeout period, which can be adjusted downward for individual queries by specifying a value for the QUERY_TIMEOUT_S query option. In sessions where no QUERY_TIMEOUT_S query option is specified, the --idle_query_timeout timeout period applies the same as in earlier versions.

The --strict_unicode option of impala-shell was removed. To avoid problems with Unicode values in impala-shell, define the following locale setting before running impala-shell:

export LC_CTYPE=en_US.UTF-8

New Reserved Words

Some new SQL syntax requires the addition of new reserved words: ANTI, ANALYTIC, OVER, PRECEDING, UNBOUNDED, FOLLOWING, CURRENT, ROWS, RANGE, CHAR, VARCHAR. As always, see Impala Reserved Words for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.

Changes to Data Files

The default Parquet block size for Impala is changed from 1 GB to 256 MB. This change could have implications for the sizes of Parquet files produced by INSERT and CREATE TABLE AS SELECT statements.

Although older Impala releases typically produced files that were smaller than the old default size of 1 GB, now the file size matches more closely whatever value is specified for the PARQUET_FILE_SIZE query option. Thus, if you use a non-default value for this setting, the output files could be larger than before. They still might be somewhat smaller than the specified value, because Impala makes conservative estimates about the space needed to represent each column as it encodes the data.

When you do not specify an explicit value for the PARQUET_FILE_SIZE query option, Impala tries to keep the file size within the 256 MB default size, but Impala might adjust the file size to be somewhat larger if needed to accommodate the layout for wide tables, that is, tables with hundreds or thousands of columns.

This change is unlikely to affect memory usage while writing Parquet files, because Impala does not pre-allocate the memory needed to hold the entire Parquet block.

Incompatible Changes Introduced in Impala 1.4.4 / CDH 5.1.5

No incompatible changes.

Incompatible Changes Introduced in Impala 1.4.3 / CDH 5.1.4

No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.

Incompatible Changes Introduced in Impala 1.4.2 / CDH 5.1.3

None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.

Incompatible Changes Introduced in Impala 1.4.1 / CDH 5.1.2

None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.

Incompatible Changes Introduced in Impala 1.4.0 / CDH 5.1.0

  • There is a slight change to required security privileges in the Sentry framework. To create a new object, now you need the ALL privilege on the parent object. For example, to create a new table, view, or function requires having the ALL privilege on the database containing the new object. See Privilege Model and Object Hierarchy for a full list of operations and associated privileges.

  • With the ability of ORDER BY queries to process unlimited amounts of data with no LIMIT clause, the query options DEFAULT_ORDER_BY_LIMIT and ABORT_ON_DEFAULT_LIMIT_EXCEEDED are now deprecated and have no effect. See ORDER BY Clause for details about improvements to the ORDER BY clause.

  • There are some changes to the list of reserved words. See Impala Reserved Words for the most current list. The following keywords are new:

    • API_VERSION
    • BINARY
    • CACHED
    • CLASS
    • PARTITIONS
    • PRODUCED
    • UNCACHED

    The following were formerly reserved keywords, but are no longer reserved:

    • COUNT
    • GROUP_CONCAT
    • NDV
    • SUM
  • The fix for issue IMPALA-973 changes the behavior of the INVALIDATE METADATA statement regarding nonexistent tables. In Impala 1.4.0 and higher, the statement returns an error if the specified table is not in the metastore database at all. It completes successfully if the specified table is in the metastore database but not yet recognized by Impala, for example if the table was created through Hive. Formerly, you could issue this statement for a completely nonexistent table, with no error.

Incompatible Changes Introduced in Impala 1.3.3 / CDH 5.0.5

No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with Impala.

Incompatible Changes Introduced in Impala 1.3.2 / CDH 5.0.4

With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.

Incompatible Changes Introduced in Impala 1.3.1 / CDH 5.0.3

  • In Impala 1.3.1 and higher, the REGEXP and RLIKE operators now match a regular expression string that occurs anywhere inside the target string, the same as if the regular expression was enclosed on each side by .*. See REGEXP Operator for examples. Previously, these operators only succeeded when the regular expression matched the entire target string. This change improves compatibility with the regular expression support for popular database systems. There is no change to the behavior of the regexp_extract() and regexp_replace() built-in functions.

  • The result set for the SHOW FUNCTIONS statement includes a new first column, with the data type of the return value. See SHOW Statement for examples.

Incompatible Changes Introduced in Impala 1.3.0 / CDH 5.0.0

  • The EXPLAIN_LEVEL query option now accepts numeric options from 0 (most concise) to 3 (most verbose), rather than only 0 or 1. If you formerly used SET EXPLAIN_LEVEL=1 to get detailed explain plans, switch to SET EXPLAIN_LEVEL=3. If you used the mnemonic keyword (SET EXPLAIN_LEVEL=verbose), you do not need to change your code because now level 3 corresponds to verbose. See EXPLAIN_LEVEL Query Option for details about the allowed explain levels, and Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles for usage information.

  • The keyword DECIMAL is now a reserved word. If you have any databases, tables, columns, or other objects already named DECIMAL, quote any references to them using backticks (``) to avoid name conflicts with the keyword.
  • The query option formerly named YARN_POOL is now named REQUEST_POOL to reflect its broader use with the Impala admission control feature. See REQUEST_POOL Query Option for information about the option, and Admission Control and Query Queuing for details about its use with the admission control feature.

  • There are some changes to the list of reserved words. See Impala Reserved Words for the most current list.

    • The names of aggregate functions are no longer reserved words, so you can have databases, tables, columns, or other objects named AVG, MIN, and so on without any name conflicts.

    • The internal function names DISTINCTPC and DISTINCTPCSA are no longer reserved words, although DISTINCT is still a reserved word.

    • The keywords CLOSE_FN and PREPARE_FN are now reserved words. See CREATE FUNCTION Statement for their role in the CREATE FUNCTION statement, and Thread-Safe Work Area for UDFs for usage information.

  • The HDFS property dfs.client.file-block-storage-locations.timeout was renamed to dfs.client.file-block-storage-locations.timeout.millis, to emphasize that the unit of measure is milliseconds, not seconds. Impala requires a timeout of at least 10 seconds, making the minimum value for this setting 10000. On systems not managed by Cloudera Manager, you might need to edit the hdfs-site.xml file in the Impala configuration directory for the new name and minimum value.

Incompatible Changes Introduced in Impala 1.2.4

There are no incompatible changes introduced in Impala 1.2.4.

Previously, after creating a table in Hive, you had to issue the INVALIDATE METADATA statement with no table name, a potentially expensive operation on clusters with many databases, tables, and partitions. Starting in Impala 1.2.4, you can issue the statement INVALIDATE METADATA table_name for a table newly created through Hive. Loading the metadata for only this one table is faster and involves less network overhead. Therefore, you might revisit your setup DDL scripts to add the table name to INVALIDATE METADATA statements, in cases where you create and populate the tables through Hive before querying them through Impala.

Incompatible Changes Introduced in Impala 1.2.3

Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible changes. See Incompatible Changes Introduced in Impala 1.2.2 if you are upgrading from Impala 1.2.1 or 1.1.x.

Incompatible Changes Introduced in Impala 1.2.2

The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code, or schema objects such as tables or views:

  • With the addition of the CROSS JOIN keyword, you might need to rewrite any queries that refer to a table named CROSS or use the name CROSS as a table alias:

    -- Formerly, 'cross' in this query was an alias for t1
    -- and it was a normal join query.
    -- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
    -- is not interpreted as a table alias, and the query
    -- uses the special CROSS JOIN processing rather than a
    -- regular join.
    select * from t1 cross join t2...
    
    -- Now if CROSS is used in other context such as a table or column name,
    -- use backticks to escape it.
    create table `cross` (x int);
    select * from `cross`;
  • Formerly, a DROP DATABASE statement in Impala would not remove the top-level HDFS directory for that database. The DROP DATABASE has been enhanced to remove that directory. (You still need to drop all the tables inside the database first; this change only applies to the top-level directory for the entire database.)

  • The keyword PARQUET is introduced as a synonym for PARQUETFILE in the CREATE TABLE and ALTER TABLE statements, because that is the common name for the file format. (As opposed to SequenceFile and RCFile where the "File" suffix is part of the name.) Documentation examples have been changed to prefer the new shorter keyword. The PARQUETFILE keyword is still available for backward compatibility with older Impala versions.
  • New overloads are available for several operators and built-in functions, allowing you to insert their result values into smaller numeric columns such as INT, SMALLINT, TINYINT, and FLOAT without using a CAST() call. If you remove the CAST() calls from INSERT statements, those statements might not work with earlier versions of Impala.

Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read Incompatible Changes Introduced in Impala 1.2.1 for things to note about upgrading to Impala 1.2.x in general.

Incompatible Changes Introduced in Impala 1.2.1

The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code, or schema objects such as tables or views:

  • In Impala 1.2.1 and higher, all NULL values come at the end of the result set for ORDER BY ... ASC queries, and at the beginning of the result set for ORDER BY ... DESC queries. In effect, NULL is considered greater than all other values for sorting purposes. The original Impala behavior always put NULL values at the end, even for ORDER BY ... DESC queries. The new behavior in Impala 1.2.1 makes Impala more compatible with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting behavior for NULL by adding the clause NULLS FIRST or NULLS LAST at the end of the ORDER BY clause.

    See NULL for more information.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

Incompatible Changes Introduced in Impala 1.2.0 (Beta)

There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).

Because Impala 1.2.0 is bundled with the CDH 5 beta download and depends on specific levels of Apache Hadoop components supplied with CDH 5, you can only install it in combination with the CDH 5 beta.

The new catalogd service might require changes to any user-written scripts that stop, start, or restart Impala services, install or upgrade Impala packages, or issue REFRESH or INVALIDATE METADATA statements:

The new resource management feature interacts with both YARN and Llama services, which are available in CDH 5. These services are set up for you automatically in a Cloudera Manager (CM) environment. For information about setting up the YARN and Llama services, see the instructions for YARN and Llama in the CDH 5 Documentation. See Resource Management for Impala for usage information for Impala resource management.

Incompatible Changes Introduced in Impala 1.1.1

There are no incompatible changes in Impala 1.1.1.

Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires updating the table metadata. Use the following command if you are already running Impala 1.1.1:

ALTER TABLE table_name SET FILEFORMAT PARQUETFILE;

If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:

ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
ALTER TABLE table_name SET FILEFORMAT
  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";

Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.

As usual, make sure to upgrade the Impala LZO support package to the latest level at the same time as you upgrade the Impala server.

Incompatible Change Introduced in Impala 1.1

  • The REFRESH statement now requires a table name; in Impala 1.0, the table name was optional. This syntax change is part of the internal rework to make REFRESH a true Impala SQL statement so that it can be called through the JDBC and ODBC APIs. REFRESH now reloads the metadata immediately, rather than marking it for update the next time any affected table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire Impala metadata catalog, is available through the new INVALIDATE METADATA statement. INVALIDATE METADATA can be specified with a table name to affect a single table, or without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next time it is requested during the processing for a SQL statement. See REFRESH Statement and INVALIDATE METADATA Statement for the latest details about these statements.

Incompatible Changes Introduced in Impala 1.0

  • If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the impala-lzo-cdh4 to the latest level. See Using LZO-Compressed Text Files for details.
  • Cloudera Manager 4.5.2 and higher only supports Impala 1.0 and higher, and vice versa. If you upgrade to Impala 1.0 or higher managed by Cloudera Manager, you must also upgrade Cloudera Manager to version 4.5.2 or higher. If you upgrade from an earlier version of Cloudera Manager, and were using Impala, you must also upgrade Impala to version 1.0 or higher. The beta versions of Impala are no longer supported as of the release of Impala 1.0.