ReferencePDF version

Potential SQL Issues

Lists the most common mistakes made during SQL statement creation that are identified by Workload XM as potential issues. You can find the potential issues associated with a query on the Data Warehouse Queries page in the Performance Issues region of the query details page.

Table 1. Common SQL Issues
Potential SQL Issue Impact and Recommendation
>5 table joins or > 10 join conditions found. Possible performance impact depending on table sizes, partitioning keys, filter and join conditions that are specified in the query. To address this issue, denormalize tables to eliminate the need for joins.
>10 columns present in GROUP BY list. Possible performance impact, depending on the number of distinct groups and the memory configuration. This issue is not raised if the source platform is Impala. To address this issue, evaluate the memory requirements for the query.
>10 Inline Views present in query. Possible performance impact, depending on the memory configuration, especially if complex expressions are present in inline views on Impala. To address this issue, evaluate the memory requirements and materialize inline views.
>50 query blocks present in large query. Possible performance impact, depending on the memory configuration. To address this issue, evaluate the query memory requirements, split the query into smaller queries, and materialize duplicate blocks.
>2000 expressions found in WHERE clause of a single query. This is a hard limit enforced by Impala. The query fails if it contains >2000 expressions. To address this issue, consolidate expressions by replacing repetitive sequences with single operators like IN or BETWEEN.
Cartesian or CROSS join found. Performance impact if tables are large. To address this issue, rewrite the query to add join conditions and eliminate Cartesian joins.
High cardinality GROUP BY column found. Possible performance impact, depending on the number of distinct groups and the memory configuration. To address this issue, evaluate the memory requirements for the query.
Joins across large tables found. Possible performance impact, depending on the partitioning keys, filter and join conditions that are specified in the query. To determine the cause, evaluate the EXPLAIN output on Impala. To address this issue, evaluate the filter conditions, join conditions, and the memory requirements for the query and for possible table partitioning strategies.
Join on a large table found. Possible performance impact, depending on the partitioning keys, filter and join conditions that are specified in the query. To determine the cause, evaluate the EXPLAIN output on Hive or Impala. To address this issue, evaluate the filter conditions, the join conditions, the memory requirements for the query, and the possible table partitioning strategies.
Many single-row inserts found. Possible performance impact because singleton inserts create too many small files instead of creating fewer large files. To address this issue, batch inserts together to prevent creating too many small data files.
Popular CASE expression across queries found. Possible performance improvement. Consider materializing the CASE expression.
Popular filter conditions found. Possible performance impact if the tables are large and are not partitioned. To address this issue, evaluate the possible table partitioning strategies on these filter conditions.
Popular inline views across queries found. Possible performance impact, depending on the memory configuration, especially if complex expressions are used in inline views on Impala. To address this issue, consider materializing the inline view.
Popular subqueries across queries found. Possible performance improvement. Consider materializing the subqueries.
Query has no filters. Possible performance impact if the result set that is returned is very large. To address this issue, rewrite the query to add filtering conditions and to reduce the size of the result set that is returned.
Query on partitioned table is missing filters on partitioning columns. Possible performance impact if the tables are large. To address this issue, rewrite the query to add filtering conditions.
Query with filter conditions on a large table found. Possible performance impact if the tables are large and are not partitioned. To address this issue, evaluate the possible table partitioning strategies on these filter conditions.
Query with inline views found. Possible performance impact, depending on the memory configuration, especially if complex expressions are used in inline views on Impala. To address this issue, if the inline view is duplicated, evaluate whether materializing the inline view is advantageous.
Table might contain too many partitions (>30K). Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete.
Table might contain too many partitions (>50K). Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete.
Table might contain too many partitions (>100K). Can crash the Hive metastore. To address this issue, re-evaluate the partitioning key strategy. Queries that access many partitions might not complete.
Unsupported commands: UPDATE or DELETE. UPDATE and DELETE are not supported on CDH. These queries fail. To address this issue, rewrite the query to use views or partitioning strategies that mimic UPDATE and DELETE. Use of partition-based INSERT with OVERWRITE or using SELECT with views over main and delta tables are common workaround strategies.

We want your opinion

How can we improve this page?

What kind of feedback do you have?