Skip to content

Modification to allow Linq subqueries to have paging (OrderBy/Take) #2480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 2, 2020

Conversation

ngbrown
Copy link
Contributor

@ngbrown ngbrown commented Aug 13, 2020

Fixes #2479

Adds a test with the general pattern I was trying to accomplish.

The fix involves removing the copying of OrderBy in PagingRewriter. All existing tests still passed.

@bahusoid
Copy link
Member

bahusoid commented Aug 13, 2020

To me simply removing if block looks as very suspicious fix. It's true it's not really required inside your exists subquery.

But I'm pretty sure if subquery returns some results - order by is required to be added. I meant something like this:

var query = db.Orders
	.Where(x => x.Employee.EmployeeId > 5)
	.OrderBy(x => x.OrderId)
	.Take(2);

var listFuture = db.OrderLines
	.Where(
		x =>
			//Check only first order with not empty shipper from subquery
			query.Where(x2 => x2.Shipper != null).First() == x.Order)
	.ToFuture();

@bahusoid
Copy link
Member

Maybe fix should skip order by processing only if queryModel.ResultOperators consists of only bool operators (like Any/All/Contains...)

@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 13, 2020

@bahusoid So based on the test results, that deleted portion of the code was needed for SQL engines other than MS SQL Server.

I haven't ever work on the LINQ or AST areas of the code before, so I'm pretty out of my depth. I did lots of watching with the debugger stepping through, and saw that this place was what was duplicating the OrderBy but I don't really understand why the PagingRewriter` visitor is needed.

I'm trying to avoid multiple round trips with the database with the nested query. Replicating the conditions multiple times isn't straightforward as the inner query has lots of options and filters that get applied leading up to being re-used across multiple ToFuture() fetches. Since FetchMany doesn't work with paging, I fetch all the associated lists and items through separate queries and then assemble the results in the view model. This example is the minimum to trigger the exception in the unit tests.

@bahusoid

This comment has been minimized.

@bahusoid
Copy link
Member

This includes fixing some tests in CriteriaAssertFixture so they wouldn't report a failure when really passing in ReSharper.

Can you move this change to separate PR?

@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 15, 2020

I've moved the CriteriaAssertFixture fixture change into #2489.

@bahusoid, your example test, PagedProductsWithOuterWhereClause2 didn't fail as expected without the order by cloning on SQL Server.

@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 15, 2020

Maybe fix should skip order by processing only if queryModel.ResultOperators consists of only bool operators (like Any/All/Contains...)

Are you thinking something like this?

if (!queryModel.BodyClauses.OfType<OrderByClause>().Any() &&
    queryModel.BodyClauses.Any(x => (x as WhereClause)?.Predicate.Type != typeof(bool)))

It seems to work, but not sure if it's too broad. It's still not right in non-sql server engines.

@bahusoid
Copy link
Member

didn't fail as expected

Ok.. Sorry I didn't test it. And I just realized it would still be a query with IN on identifier column - so changing OrderBy doesn't make any difference for SQL Server. But the problem is the following. Now generated query looks like:

    select
   *
    from
        Products product0_ 
    where
        product0_.UnitsInStock>@p0 
        and (
            product0_.ProductId in (
                SELECT
                    TOP (@p1) ProductId 
                FROM
                    (select
                        product1_.ProductId,
                        ROW_NUMBER() OVER(
                    ORDER BY
                        product1_.UnitPrice,
                        product1_.ProductId) as __hibernate_sort_row 
                    from
                        Products product1_) as query 
                WHERE
                    query.__hibernate_sort_row > @p2 
                ORDER BY
                    query.__hibernate_sort_row)
            );

With your fix as you can see ORDER BY is only placed inside IN subquery. There is no ORDER BY for main query. And most DBs apply default sorting rules for results - so test is failing. And the fact that SQL Server returns those records in order supplied by IN subquery is not documented behavior.

@bahusoid
Copy link
Member

Are you thinking something like this?

No I meant something like this:

if (!queryModel.BodyClauses.OfType<OrderByClause>().Any()
	//There is no need to apply 'order by` if subquery is wrapped in some bool expression (like exists)
	&& !(queryModel.ResultOperators.Count == 1
		&& (queryModel.ResultOperators[0] is AnyResultOperator || queryModel.ResultOperators[0] is AllResultOperator)))
{
	var orderByClauses = subQueryModel.BodyClauses.OfType<OrderByClause>();
	foreach (var orderByClause in orderByClauses)
		queryModel.BodyClauses.Add(orderByClause);
}

Though I also don't know this parts of code. So not sure if it's a good fix.

@ngbrown ngbrown force-pushed the bug/2479 branch 3 times, most recently from 7b95d7c to bd753c2 Compare August 15, 2020 22:54
@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 16, 2020

I've added additional tests for Contains and All. Each test was red without the fix and are green with it.

I tested locally with both MSSQL and SQLite. That seems to cover the differences between engines.

@ngbrown ngbrown marked this pull request as ready for review August 16, 2020 05:13
@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 17, 2020

I've made all the suggested changes and this is ready for further and wider review.

@hazzik
Copy link
Member

hazzik commented Aug 28, 2020

Ideally we should get this query:

select
    orderline0_.OrderLineId as orderlineid1_4_,
    orderline0_.OrderId as orderid2_4_,
    orderline0_.ProductId as productid3_4_,
    orderline0_.UnitPrice as unitprice4_4_,
    orderline0_.Quantity as quantity5_4_,
    orderline0_.Discount as discount6_4_ 
from
    OrderLines orderline0_ 
where
	orderline0_.OrderId in (
        select
            TOP (2) order2_.OrderId 
        from
            Orders order2_ 
        where
            order2_.EmployeeId>5
        order by
            order2_.OrderId asc
	)
order by
    orderline0_.OrderLineId asc;

@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 28, 2020

@hazzik I assume you are referring to OrdersWithSubquery9(). With this proposed pull request, it now generates the following:

DECLARE @p0 AS INT = 5;
DECLARE @p1 AS INT = 2;

select
    orderline0_.OrderLineId as orderlineid1_4_,
    orderline0_.OrderId as orderid2_4_,
    orderline0_.ProductId as productid3_4_,
    orderline0_.UnitPrice as unitprice4_4_,
    orderline0_.Quantity as quantity5_4_,
    orderline0_.Discount as discount6_4_ 
from
    OrderLines orderline0_ 
where
    exists (
        select
            order1_.OrderId 
        from
            Orders order1_ 
        where
            order1_.OrderId=orderline0_.OrderId 
            and (
                order1_.OrderId in (
                    select
                        TOP (@p0) order2_.OrderId 
                    from
                        Orders order2_ 
                    where
                        order2_.EmployeeId>@p1 
                    order by
                        order2_.OrderId asc
                )
            )
        ) 
    order by
        orderline0_.OrderLineId asc;

Your query is slightly more efficient in the query planner.

Screenshot - 2020-08-28_15-34-34

I assumed the slightly less efficient output is a side effect of the translation from Linq to SQL.


I agree that the SQL could be more efficient, but I think that could be another effort and another pull request. This pull request is just to get the query working in SQL Server with the existing SQL generation. It was already working in other engines that were ignoring the extra order by.

@ngbrown

This comment has been minimized.

@ngbrown
Copy link
Contributor Author

ngbrown commented Aug 30, 2020

In a very narrow case, I got something that can generate the more efficient SQL (ngbrown-forks@1386cf4):

else if (queryModel.ResultOperators.Count == 1 && queryModel.ResultOperators[0] is AnyResultOperator &&
         queryModel.BodyClauses.Count == 1 && queryModel.BodyClauses[0] is WhereClause whereClause &&
         whereClause.Predicate is BinaryExpression whereClausePredicate &&
         new[] {whereClausePredicate.Left, whereClausePredicate.Right}.Count(x => x.NodeType == ExpressionType.MemberAccess) == 1)
{
	var joinOnBodyClause = whereClausePredicate.Right.NodeType == ExpressionType.MemberAccess
		? whereClausePredicate.Right
		: whereClausePredicate.Left;
	var cro = new ContainsResultOperator(joinOnBodyClause);

	queryModel.BodyClauses.Clear();
	foreach (var orderByClause in subQueryModel.BodyClauses)
	{
		queryModel.BodyClauses.Add(orderByClause);
	}

	queryModel.ResultOperators.Clear();
	foreach (var resultOperator in subQueryModel.ResultOperators)
	{
		queryModel.ResultOperators.Add(resultOperator);
	}

	queryModel.ResultOperators.Add(cro);
	queryModel.ResultTypeOverride = typeof (bool);
}
PagingRewriter output:
from Order x in value(NHibernate.Linq.NhQueryable`1[NHibernate.DomainModel.Northwind.Entities.Order]) where ([x].Employee.EmployeeId > 5) orderby [x].OrderId asc select [x] => Take(2) => Contains([x].Order)

Maybe there's a more idomatic way to do this?

In general, I still don't think this line of optimization is really worth it.

@hazzik
Copy link
Member

hazzik commented Sep 1, 2020

Ok, it is likely that we're trying to fix query in a wrong place. It should be done somewhere else. If you remove .Take(...) from OrdersWithSubquery9 test then it will fail with the same error.

@ngbrown
Copy link
Contributor Author

ngbrown commented Sep 1, 2020

@hazzik If you manually modify the ideal query text to remove the TOP of an IN sub-query you will get the same error. It's expected for MS SQL Server to fail queries that have an ordered SQL sub-query without a TOP.

Can you help me with the correctness of the optimization code if that's what's needed? Or if the double sub-query, with the slight inefficiency, is fine, are there any other things to check for before merging?

@hazzik
Copy link
Member

hazzik commented Sep 1, 2020

@hazzik If you manually modify the ideal query text to remove the TOP of an IN sub-query you will get the same error.

Yes, I know.

It's expected for MS SQL Server to fail queries that have an ordered SQL sub-query without a TOP.

There is already some code which removes order-by for the count(*) result operators. I think it should do the same for other.

@hazzik
Copy link
Member

hazzik commented Sep 1, 2020

Can you help me with the correctness of the optimization code if that's what's needed?

It's incorrect. What required (and I think it is outside of scope of this PR) is to rewrite something.Any(y => y == other) to something.Contains(other).

The correct way is to detect if one side of BinaryExpression is QuerySourceReferenceExpression and it is referencing the QueryModel.MainFromClause then other side is an argument to ContainsResultOperator.

@bahusoid
Copy link
Member

bahusoid commented Sep 2, 2020

The correct way is to detect if one side of BinaryExpression is QuerySourceReferenceExpression and it is referencing the QueryModel.MainFromClause then other side is an argument to ContainsResultOperator.

Code should also check that's it's simple single ==/!= call and not something like Any(o => o.SomeValue > x.SomeValue). I'm not sure it's worth the trouble considering that user can use Contains himself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When using a paged sub-query in Linq, generates incorrect SQL
3 participants