Property changes affecting ordered or sorted subqueries and views
Order by/sort by without limit in subqueries is not supported in Hive 3.
Hive 1 and 2
You can use order by/sort by functionality in subqueries.
For example, the optimizer orders a subquery of the following table:
+--------------------+---------------------+
| test_groupby.col1 | test_groupby.col2 |
+--------------------+---------------------+
| test | 2 |
| test | 3 |
| test1 | 5 |
+--------------------+---------------------+
select * from (select * from test_groupby order by col2 desc) t2;
+----------+-----------+
| t2.col1 | t2.col1 |
+----------+-----------+
| test1 | 5 |
| test | 3 |
| test | 2 |
+----------+-----------+
Hive 3
The optimizer removes order by/sort by without limit in subqueries and views. For example:
select * from (select * from test_groupby order by col2 desc) t2;
+----------+-----------+
| t2.col1 | t2.col1 |
+----------+-----------+
| test | 2 |
| test | 3 |
| test1 | 5 |
+----------+-----------+
Action Required
If the outer query has to perform some functional logic based on order of a subquery,
the following query in Hive 2 returns a different result from Hive 3:
select col1 from (select * from test_groupby order by col2t desc ) t2 limit 1;
You must rewrite the query or set the following property to restore previous behavior:
set hive.remove.orderby.in.subquery=False
Distribution Affected
CDH5, CDH6