Potential SQL Issues
Lists the most common mistakes made during SQL statement creation that are identified by Workload XM as potential issues. You can find the potential issues associated with a query on the Data Warehouse Queries page in the Performance Issues region of the query details page.
Potential SQL Issue | Impact and Recommendation |
---|---|
>5 table joins or > 10 join conditions found. | Possible performance impact depending on table sizes, partitioning keys, filter and join conditions that are specified in the query. To address this issue, denormalize tables to eliminate the need for joins. |
>10 columns present in GROUP BY list. | Possible performance impact, depending on the number of distinct groups and the memory configuration. This issue is not raised if the source platform is Impala. To address this issue, evaluate the memory requirements for the query. |
>10 Inline Views present in query. | Possible performance impact, depending on the memory configuration, especially if complex expressions are present in inline views on Impala. To address this issue, evaluate the memory requirements and materialize inline views. |
>50 query blocks present in large query. | Possible performance impact, depending on the memory configuration. To address this issue, evaluate the query memory requirements, split the query into smaller queries, and materialize duplicate blocks. |
>2000 expressions found in WHERE clause of a single query. | This is a hard limit enforced by Impala. The query fails if
it contains >2000 expressions. To address this issue,
consolidate expressions by replacing repetitive sequences with
single operators like IN or
BETWEEN . |
Cartesian or CROSS join found. | Performance impact if tables are large. To address this issue, rewrite the query to add join conditions and eliminate Cartesian joins. |
High cardinality GROUP BY column found. | Possible performance impact, depending on the number of distinct groups and the memory configuration. To address this issue, evaluate the memory requirements for the query. |
Joins across large tables found. | Possible performance impact, depending on the partitioning
keys, filter and join conditions that are specified in the query.
To determine the cause, evaluate the EXPLAIN
output on Impala. To address this issue, evaluate the filter
conditions, join conditions, and the memory requirements for the
query and for possible table partitioning strategies. |
Join on a large table found. | Possible performance impact, depending on the partitioning
keys, filter and join conditions that are specified in the query.
To determine the cause, evaluate the EXPLAIN
output on Hive or Impala. To address this issue, evaluate the
filter conditions, the join conditions, the memory
requirements for the query, and the possible table partitioning
strategies. |
Many single-row inserts found. | Possible performance impact because singleton inserts create too many small files instead of creating fewer large files. To address this issue, batch inserts together to prevent creating too many small data files. |
Popular CASE expression across queries found. | Possible performance improvement. Consider materializing the
CASE expression. |
Popular filter conditions found. | Possible performance impact if the tables are large and are not partitioned. To address this issue, evaluate the possible table partitioning strategies on these filter conditions. |
Popular inline views across queries found. | Possible performance impact, depending on the memory configuration, especially if complex expressions are used in inline views on Impala. To address this issue, consider materializing the inline view. |
Popular subqueries across queries found. | Possible performance improvement. Consider materializing the subqueries. |
Query has no filters. | Possible performance impact if the result set that is returned is very large. To address this issue, rewrite the query to add filtering conditions and to reduce the size of the result set that is returned. |
Query on partitioned table is missing filters on partitioning columns. | Possible performance impact if the tables are large. To address this issue, rewrite the query to add filtering conditions. |
Query with filter conditions on a large table found. | Possible performance impact if the tables are large and are not partitioned. To address this issue, evaluate the possible table partitioning strategies on these filter conditions. |
Query with inline views found. | Possible performance impact, depending on the memory configuration, especially if complex expressions are used in inline views on Impala. To address this issue, if the inline view is duplicated, evaluate whether materializing the inline view is advantageous. |
Table might contain too many partitions (>30K). | Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete. |
Table might contain too many partitions (>50K). | Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete. |
Table might contain too many partitions (>100K). | Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete. |
Unsupported commands: UPDATE or DELETE. | UPDATE and DELETE are not
supported on CDH. These queries fail. To address this issue,
rewrite the query to use views or partitioning strategies that
mimic UPDATE and DELETE . Use of
partition-based INSERT with
OVERWRITE or using SELECT with
views over main and delta tables are common
workaround strategies. |