Enabling sub-query filtering and sorting

I believe sub-query filtering and sorting is an incredibly useful feature, and I have been able to figure out a partial solution. Posting here so hopefully people can build off of it. **NOTE**: This solution takes advantage of the `RelationshipLoader` class for resolving sub-queries, so it will only work if the `batching` option on the Node's `Meta` class is set to `True`. The sample code below builds off of the changes discussed  in #419, but should be easily adaptable if people haven't implemented those changes.

# Issue
Our UI follows a backend for frontend pattern, and on one page we want to display information about an entity, along with recent events related to that entity. Ideally, we could use a query akin to this:
```graphql
query EntityWithEvents($entity_name: String!) {
  entity(filter: {name: {eq: $entity_name}}) {
    edges {
      node {
        value1
        value2
        events(filter: {timestamp: {gt: "<past day's events>"}}, sort: TIMESTAMP_DESC) {
          edges {
            node {
              eventValue1
              eventValue2
            }
          }
        }
      }
    }
  }
}
```

The problem is that with the current code, the `filter` and `sort` arguments to `events` are essentially ignored. As a result, we have no choice but to query *all* of events, and filter and sort them manually. This results in extraneous frontend logic, and massively increases load times and response sizes for the page, as we are loading the entire event history for the entity. We can see the following (summarized) statements emitted by sqlalchemy when we run this query:
```logs
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.id ASC;
```

## Potential Workaround
One possible workaround is to set an `order_by` on the [relationship definition](https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship.params.order_by) itself, and then pass the `first` argument to `events` to limit the response to only the first N results. The statements emitted in this case are:
```logs
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.timestamp DESC;
```
However, this approach has the following problems:
- The `first` argument is not a direct analogue of SQL's `OFFSET` - it will still load the full event history into memory in order to generate a cursor, and then truncate the results (NOTE: depending on the database engine, this approach *may* use a database cursor rather than sending all data to the graphql server. I'm not familiar enough with cursors to verify this). So this is still a fundamentally slow query.
- Returning first N results is semantically different from returning all results since a given time. Some days might have 0 events, while others will have 10+. If our goal is to show *all* events since the day before, then this approach is untenable.
- Having to set the sort as part of the relationship definition rather than specifying as part of the query itself is inflexible and relies on developer awareness, rather than explicit ordering.

# Solution
 To fix the issue, we will implement our own `RelationshipLoader` class, modifying it from the original to accept sort/filter args during construction, then patch in a modified `get_batch_resolver` definition that uses our new implementation. 

Usually, the `_load_for_path` internal function called by `batch_load_fn` will ignore any filtering or sorting set on the `query_context` arg. It does, however, respect filtering requested via the `loadopt` parameter, in order to implement `with_loader_criteria()` semantics described [here](https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#loader-option-criteria). The easiest way to hijack this flow is to construct our own loader (a `defaultload`) and steal it's loader options (stored in `context[0]`). `_load_for_path` *does* respect the `order_by` prop of the relationship that was passed in to initially (and will only resolve that reference at call-time), so we can simply update `relationship_prop.order_by` with the requested order right before calling. Note that, while the `RelationshipLoader` is constructed on a per-arg basis, the `relationship_prop` is still shared by all instances, so we have to set this value *right before* calling in order to avoid conflicts. In the case where no `sort` arg is passed, we store a cache of default sort orders the first time each `relationship_prop` is passed in, which is set prior to calling.

**NOTE**: the `patch_relationship_loader()` function *must* be called before constructing your `graphene.Schema` object. 

Specific changes are noted in the comments.
```python
# patch_batching.py
from asyncio import get_event_loop
from typing import Any, Callable, Coroutine

import sqlalchemy
from graphene_sqlalchemy.batching import DataLoader
from sqlalchemy.orm import Session, defaultload, strategies
from sqlalchemy.util import immutabledict
from sqlalchemy.util._concurrency_py3k import greenlet_spawn


def patch_relationship_loader() -> None:
    """
    Call this function once prior to any other setup code
    (specifically, prior to creating your `graphene.Schema` object)
    """
    import graphene_sqlalchemy.fields, graphene_sqlalchemy.converter
    graphene_sqlalchemy.fields.get_batch_resolver = get_batch_resolver
    graphene_sqlalchemy.converter.get_batch_resolver = get_batch_resolver


class RelationshipLoader(DataLoader):
    cache = False
    DEFAULT_PROP_ORDER_CACHE: dict[Any, Any] = {}
    "Cache to ensure that default sort order is used when no `sort` arg is passed"

    def __init__(self, relationship_prop: Any, selectin_loader: Any,
                 args: dict[Any, Any]) -> None:
        super().__init__()
        if relationship_prop not in RelationshipLoader.DEFAULT_PROP_ORDER_CACHE:
            RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
                relationship_prop] = relationship_prop.order_by

        self.relationship_prop = relationship_prop
        self.selectin_loader = selectin_loader
        self.args = args

        sort = args.get('sort')
        if sort is None:
            # Check cache if no sort if specified
            self.order_by = RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
                relationship_prop]
        else:
            # Sort handling copied from `SQLAlchemyConnectionField.get_query`
            if not isinstance(sort, list):
                sort = [sort]
            sort_args = []
            for item in sort:
                import enum
                from graphene_sqlalchemy.utils import EnumValue
                if isinstance(item, enum.Enum):
                    sort_args.append(item.value.value)
                elif isinstance(item, EnumValue):
                    sort_args.append(item.value)
                else:
                    sort_args.append(item)
            self.order_by = sort_args

        filter = args.get('filter')
        if filter is None:
            self.loadopt = None
        else:
            filters = type(filter).execute_filters(None, filter)[1]
            # Need `InstrumentedAttribute` (type of `relationship_prop`) to utilize `and_` filtering. Better way to access this?
            rel_attr = getattr(self.relationship_prop.parent.entity,
                               self.relationship_prop.key)
            # Use SA's public relationship-loading-strategy functions to generate a `_AttributeStrategyLoad` object for us
            self.loadopt = defaultload(rel_attr.and_(
                *filters)).context[0]

    async def batch_load_fn(self, parents: Any) -> list[Any]:
        """
        This function is the corrected version of the batch load function
        to use for relationships. The only differences are that,
        rather than calling `selectin_loader._load_for_path` directly, it
        calls it asynchronously using `greenlet_spawn`; it also sets
        `self.relationship_prop.order_by` to enable sorting (referenced via
        `self.selectin_loader`) and passes `self.loadopt` to enable filtering.

        Other extraneous logic has been removed for clarity. See the original
        code at
            https://github.com/graphql-python/graphene-sqlalchemy/blob/eb9c663cc0e314987397626573e3d2f940bea138/graphene_sqlalchemy/batching.py#L39
        """
        child_mapper = self.relationship_prop.mapper
        parent_mapper = self.relationship_prop.parent
        session = Session.object_session(parents[0])

        for parent in parents:
            assert session is Session.object_session(parent)
            assert session and parent not in session.dirty

        states = [(sqlalchemy.inspect(parent), True) for parent in parents]

        query_context = None
        if session:
            self.relationship_prop.order_by = self.order_by
            parent_mapper_query = session.query(parent_mapper.entity)
            query_context = parent_mapper_query._compile_context()
            await greenlet_spawn(
                self.selectin_loader._load_for_path,
                query_context,
                parent_mapper._path_registry,
                states,
                None,
                child_mapper,
                self.loadopt,
                None,  # recursion depth can be none
                immutabledict(),  # default value for selectinload->lazyload
            )
        result = [
            getattr(parent, self.relationship_prop.key) for parent in parents
        ]
        return result


# Cache this across `batch_load_fn` calls
# This is so SQL string generation is cached under-the-hood via `bakery`
# Caching the relationship loader for each relationship prop and args combo (can't use dict because `args` is not hashable)
RELATIONSHIP_LOADERS_CACHE: list[RelationshipLoader] = []


def get_batch_resolver(
        relationship_prop: Any
) -> Callable[[Any, Any], Coroutine[Any, Any, None]]:
    """Get the resolve function for the given relationship.
       Changed from default implementation to pass `args` in and consider `args` while caching."""

    def _get_loader(relationship_prop: Any,
                    args: dict[Any, Any]) -> RelationshipLoader:
        """Retrieve the cached loader of the given relationship."""
        loop = get_event_loop()
        RELATIONSHIP_LOADERS_CACHE[:] = [
            l for l in RELATIONSHIP_LOADERS_CACHE if l.loop == loop
        ]
        loader = next(
            (l for l in RELATIONSHIP_LOADERS_CACHE
             if l.relationship_prop == relationship_prop and l.args == args),
            None)
        if loader is None:
            selectin_loader = strategies.SelectInLoader(
                relationship_prop, (("lazy", "selectin"), ))
            loader = RelationshipLoader(
                relationship_prop=relationship_prop,
                selectin_loader=selectin_loader,
                args=args,
            )
            RELATIONSHIP_LOADERS_CACHE.append(loader)
        return loader

    async def resolve(root: Any, info: Any, **args: Any) -> None:
        return await _get_loader(relationship_prop, args).load(root)

    return resolve
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling sub-query filtering and sorting #422

Issue

Potential Workaround

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling sub-query filtering and sorting #422

Description

Issue

Potential Workaround

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions