Skip to content

Conversation

@silundong
Copy link
Contributor

This PR makes two changes:

  1. During decorrelation, Sort will always be rewritten as [Filter-]Window, not only when both ORDER BY and LIMIT/OFFSET are present, because ORDER BY, LIMIT and OFFSET have to be enforced per value of the outer bindings instead of globally.
  2. For the JIRA case, the bug was that when rewriting Sort into Filter-Window, the decorrelator forgot to set the frame clause for the window function, causing ROW_NUMBER() to be computed incorrectly. This has now been fixed.


!ok

select dname, (select empno from emp where dept.deptno = emp.deptno limit 1) from dept where deptno = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have an ORDER BY in the sub-query (before the LIMIT 1) to make sure that the query result is deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intended to demonstrate that Sort needs to be rewritten as a [Filter-]Window regardless of whether it has an ORDER BY or LIMIT/OFFSET. As I updated in the comment

because now the order/limit/offset has to be enforced per value of the outer bindings instead of globally.

This is exactly what @asolimando was concerned about in his comment below. Perhaps in practical applications there's almost no need to write it this way; can I call this a edge case?


# [CALCITE-7382] The TopDownGeneralDecorrelator returns an error result when a subquery contains a LIMIT 1
!use scott
SELECT dname, (SELECT emp.comm FROM "scott".emp where dept.deptno = emp.deptno ORDER BY emp.comm limit 1) FROM "scott".dept;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT dname, (SELECT emp.comm FROM "scott".emp where dept.deptno = emp.deptno ORDER BY emp.comm limit 1) FROM "scott".dept;
SELECT dname, (SELECT emp.comm FROM "scott".emp where dept.deptno = emp.deptno ORDER BY emp.comm LIMIT 1) FROM "scott".dept;

builder.filter(conditions);
} else {
builder.sortLimit(sort.offset, sort.fetch, builder.fields(shiftCollation));
// the Sort have to be changed during rewriting because now the order/limit/offset has to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: was this a shortcoming of the original paper or an implementation issue?

Copy link
Contributor Author

@silundong silundong Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper's original wording is:

Subqueries with ORDER BY and LIMIT or OFFSET have to be changed....

It doesn't say what to do when only LIMIT or OFFSET is present. However, I believe that even with only LIMIT or OFFSET, it should be rewritten as a window function without ORDER clause.
This matches the test case that @rubenada mentioned in his comment. I performed the same test in the umbra-db interface, and it indeed rewrite the LIMIT as a window function. It seems our approach aligns with the paper's intent.
image

@xiedeyantu
Copy link
Member

After using the changes made in this PR, this case no longer works properly.

# [CALCITE-709] Errors with LIMIT inside scalar sub-query
!use scott
select deptno, (select sum(empno) from "scott".emp where deptno = dept.deptno limit 0) as x from "scott".dept;
+--------+---+
| DEPTNO | X |
+--------+---+
|     10 |   |
|     20 |   |
|     30 |   |
|     40 |   |
+--------+---+
(4 rows)

!ok

Caused by: java.lang.ArithmeticException: Value 38216 out of range
at org.apache.calcite.linq4j.tree.Primitive.checkRoundedRange(Primitive.java:387)
at org.apache.calcite.linq4j.tree.Primitive.numberValue(Primitive.java:564)
at org.apache.calcite.linq4j.tree.Primitive.numberValueRoundDown(Primitive.java:539)
at Baz$3.apply(Unknown Source)
at Baz$3.apply(Unknown Source)
at Baz$3.apply(Unknown Source)
at org.apache.calcite.adapter.enumerable.BasicAggregateLambdaFactory$AccumulatorAdderSeq.apply(BasicAggregateLambdaFactory.java:81)
at org.apache.calcite.linq4j.EnumerableDefaults.groupBy_(EnumerableDefaults.java:1177)
at org.apache.calcite.linq4j.EnumerableDefaults.groupBy(EnumerableDefaults.java:782)
at org.apache.calcite.linq4j.DefaultEnumerable.groupBy(DefaultEnumerable.java:312)
at Baz.bind(Unknown Source)
at org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:367)
at org.apache.calcite.jdbc.CalciteConnectionImpl.enumerable(CalciteConnectionImpl.java:335)
at org.apache.calcite.jdbc.CalciteMetaImpl._createIterable(CalciteMetaImpl.java:609)
at org.apache.calcite.jdbc.CalciteMetaImpl.createIterable(CalciteMetaImpl.java:600)
at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:184)
at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:64)
at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:43)

@silundong
Copy link
Contributor Author

@xiedeyantu Hi, I don't think this is caused by the decorrelation algorithm. It looks like a loss of significant digits during type casting. The failure seems to come from the Aggregate when computing SUM. I tested:

SELECT SUM(empno) FROM emp GROUP BY deptno;

and it fails the same way. Perhaps a new ticket needs to be created to record it.

@xiedeyantu
Copy link
Member

@xiedeyantu Hi, I don't think this is caused by the decorrelation algorithm. It looks like a loss of significant digits during type casting. The failure seems to come from the Aggregate when computing SUM. I tested:

SELECT SUM(empno) FROM emp GROUP BY deptno;

and it fails the same way. Perhaps a new ticket needs to be created to record it.

Thanks! I have file a new jira CALCITE-7384 to log this issue.

@xiedeyantu
Copy link
Member

CALCITE-7384 This is an old problem that can be solved using cast(col as bigint), and it has been discussed many times. We can ignore this problem.

!ok

# [CALCITE-7382] The TopDownGeneralDecorrelator returns an error result when a subquery contains a LIMIT 1
!use scott
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a note to record where the currently processed case originated.

# This case comes from sub-query.iq [CALCITE-6652]

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants