- IMPALA-532: Impala should tolerate bad locale settings
- 7.3.1 and its higher versions
- If the
LC_* environment variables specify an
unsupported locale, Impala does not start.
- Add
LC_ALL="C" to the environment
settings for both the Impala daemon and the Statestore daemon.
- IMPALA-691: Process mem limit does not account for the JVM's
memory usage
- 7.3.1 and its higher versions
- Some memory allocated by the JVM used internally by Impala is
not counted against the memory limit for the impalad daemon.
- To monitor overall memory usage, use the top command, or
add the memory figures in the Impala web UI /memz tab to JVM memory usage shown
on the /metrics tab.
- IMPALA-635: Avro Scanner fails to parse some schemas
- 7.3.1 and its higher versions
- The default value in Avro schema must match type of first union
type, e.g. if the default value is
null, then the first type in the
UNION must be "null".
- Swap the order of the fields in the schema specification.
For example, use
["null", "string"] instead of ["string",
"null"]. Note that the files written with the problematic schema must be
rewritten with the new schema because Avro files have embedded schemas.
- IMPALA-1024: Impala BE cannot parse Avro schema that contains a
trailing semi-colon
- 7.3.1 and its higher versions
- If an Avro table has a schema definition with a trailing
semicolon, Impala encounters an error when the table is queried.
- Remove trailing semicolon from the Avro schema.
- IMPALA-1652: Incorrect results with basic predicate on CHAR
typed column
- 7.3.1 and its higher versions
- When comparing a
CHAR column value to a string
literal, the literal value is not blank-padded and so the comparison might fail when it
should match.
- Use the
RPAD() function to blank-pad
literals compared with CHAR columns to the expected length.
- IMPALA-1821: Casting scenarios with invalid/inconsistent
results
- 7.3.1 and its higher versions
- Using a
CAST() function to convert large
literal values to smaller types, or to convert special values such as
NaN or Inf, produces values not consistent with
other database systems. This could lead to unexpected results from queries.
- None
- IMPALA-2005: A failed CTAS does not drop the table if the insert
fails
- 7.3.1 and its higher versions
- If a
CREATE TABLE AS SELECT operation
successfully creates the target table but an error occurs while querying the source
table or copying the data, the new table is left behind rather than being dropped.
- Drop the new table manually after a failed
CREATE
TABLE AS SELECT
- IMPALA-3509: Breakpad minidumps can be very large when the
thread count is high
- 7.3.1 and its higher versions
- The size of the breakpad minidump files grows linearly with the
number of threads. By default, each thread adds 8 KB to the minidump size. Minidump
files could consume significant disk space when the daemons have a high number of
threads.
- Add
-\-minidump_size_limit_hint_kb=size to set a soft upper
limit on the size of each minidump file. If the minidump file would exceed that limit,
Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread
information is captured for the first 20 threads, then 2 KB per thread after that.) The
minidump file can still grow larger than the "hinted" size. For example, if you have
10,000 threads, the minidump file can be more than 20 MB.
- IMPALA-4978: Impala requires FQDN from hostname command on
Kerberized clusters
- 7.3.1 and its higher versions
- The method Impala uses to retrieve the host name while
constructing the Kerberos principal is the
gethostname() system call.
This function might not always return the fully qualified domain name, depending on the
network configuration. If the daemons cannot determine the FQDN, Impala does not start
on a Kerberized cluster.
- Test if a host is affected by checking whether the output
of the hostname command includes the FQDN. On hosts where
hostname, only returns the short name, pass the command-line flag
‑‑hostname=fully_qualified_domain_name in the
startup options of all Impala-related daemons.
- IMPALA-6671: Metadata operations block read-only operations on
unrelated tables
- 7.3.1 and its higher versions
- Metadata operations that change the state of a table, like
COMPUTE STATS or ALTER RECOVER PARTITIONS, may delay
metadata propagation of unrelated unloaded tables triggered by statements like
DESCRIBE or SELECT queries.
- None
- IMPALA-7072: Impala does not support Heimdal Kerberos
- None
- CDPD-28139: Set spark.hadoop.hive.stats.autogather to false by
default
- As an Impala user, if you submit a query against a table containing data ingested
using Spark and you are concerned about the quality of the query plan, you must run
COMPUTE STATS against such a table in any case after an ETL operation because numRows
created by Spark could be incorrect. Also, use other stats computed by COMPUTE STATS,
e.g., Number of Distinct Values (NDV) and NULL count for good selectivity estimates.
- For example, when a user ingests data from a file into a partition of an existing
table using Spark, if spark.hadoop.hive.stats.autogather is not set to false explicitly,
numRows associated with this partition would be 0 even though there is at least one row
in the file. To avoid this, the workaround is to set
"spark.hadoop.hive.stats.autogather=false" in the "Spark Client Advanced Configuration
Snippet (Safety Valve) for spark-conf/spark-defaults.conf" in Spark's CM Configuration
section.
- IMPALA-2422: % escaping does not work correctly when occurs at
the end in a LIKE clause
- 7.3.1 and its higher versions
- If the final character in the RHS argument of a
LIKE operator is an escaped \% character, it does
not match a % final character of the LHS argument.
- None
- IMPALA-2603: Crash:
impala::Coordinator::ValidateCollectionSlots
- A query could encounter a serious error if includes multiple
nested levels of
INNER JOIN clauses involving subqueries.
- None
- IMPALA-3094: Incorrect result due to constant evaluation in
query with outer join
- 7.3.1 and its higher versions
- An
OUTER JOIN query could omit some
expected result rows due to a constant such as FALSE in another join
clause. For example:
explain SELECT 1 FROM alltypestiny a1
INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
| Explain String |
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
| |
| 00:EMPTYSET |
+-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-+
- CDPD-41138: Reading through
https://github.com/hunterhacker/jdom/issues/189, the fix for CVE-2021-33813 is
specifically that if you were relying on
setFeature("http://xml.org/sax/features/external-general-entities", false), it was not
applied correctly and you were still vulnerable. However if you used
setExpandEntities(false) then you're not vulnerable to CVE-2021-33813.
- 7.2.16.300, 7.2.17.100, 7.2.18
- 7.2.16.0
- I found sources for rome 0.9 at
http://www.java2s.com/Code/Jar/r/Downloadrome09sourcesjar.htm (it's no longer available
at https://java.net/) and verified it uses both setFeature and setExpandEntities to
prevent XXE attacks. So I don't believe rome in particular is vulnerable to this issue,
and jdom 1.0 is only included for rome 0.9.
- None
- Impala known limitation when querying compacted tables
- 7.3.1 and its higher versions
- When the compaction process deletes the files for a table from the underlying HDFS
location, the Impala service does not detect the changes as the compactions does not
allocate new write ids. When the same table is queried from Impala it throws a 'File
does not exist' exception that looks something like
this:
Query Status: Disk I/O error on <node>:22000: Failed to open HDFS file hdfs://nameservice1/warehouse/tablespace/managed/hive/<database>/<table>/xxxxx
Error(2): No such file or directory Root cause: RemoteException: File does not exist: /warehouse/tablespace/managed/hive/<database>/<table>/xxxx
- Use the REFRESH/INVALIDATE statements on the affected table to overcome the 'File does
not exist' exception.
- Impala Virtual Warehouses might produce an error when querying
transactional (ACID) tables
- Problem: If you are querying transactional (ACID) tables with an
Impala Virtual Warehouse and compaction is run on the compacting Hive Virtual Warehouse,
the query might fail. The compacting process deletes files and the Impala Virtual
Warehouse might not be aware of the deletion. Then when the Impala Virtual Warehouse
attempts to read the deleted file, an error can occur. This situation occurs
randomly.
- Run the
INVALIDATE METADATA statement on
the transactional (ACID) table to refresh the metadata. This fixes the problem until the
next compaction occurs.
- IMPALA-5605: Configuration to prevent crashes caused by thread
resource limits
- Impala could encounter a serious error due to resource usage
under very high concurrency. The error message is similar to:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
- To prevent such errors, configure each host running an
impalad daemon with the following settings:
echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
Add the following lines in
/etc/security/limits.conf:
impala soft nproc 262144
impala hard nproc 262144
- IMPALA-9350: Ranger audit logs for applying column masking
policies missing
- Impala is not producing these logs.
- None
- IMPALA-1792: ImpalaODBC: Can not get the value in the
SQLGetData(m-x th column) after the SQLBindCol(m th column)
- If the ODBC
SQLGetData is called on a series of
columns, the function calls must follow the same order as the columns. For example, if
data is fetched from column 2 then column 1, the SQLGetData call for
column 1 returns NULL.
- Fetch columns in the same order they are defined in the
table.