Description
I believe sub-query filtering and sorting is an incredibly useful feature, and I have been able to figure out a partial solution. Posting here so hopefully people can build off of it. NOTE: This solution takes advantage of the RelationshipLoader
class for resolving sub-queries, so it will only work if the batching
option on the Node's Meta
class is set to True
. The sample code below builds off of the changes discussed in #419, but should be easily adaptable if people haven't implemented those changes.
Issue
Our UI follows a backend for frontend pattern, and on one page we want to display information about an entity, along with recent events related to that entity. Ideally, we could use a query akin to this:
query EntityWithEvents($entity_name: String!) {
entity(filter: {name: {eq: $entity_name}}) {
edges {
node {
value1
value2
events(filter: {timestamp: {gt: "<past day's events>"}}, sort: TIMESTAMP_DESC) {
edges {
node {
eventValue1
eventValue2
}
}
}
}
}
}
}
The problem is that with the current code, the filter
and sort
arguments to events
are essentially ignored. As a result, we have no choice but to query all of events, and filter and sort them manually. This results in extraneous frontend logic, and massively increases load times and response sizes for the page, as we are loading the entire event history for the entity. We can see the following (summarized) statements emitted by sqlalchemy when we run this query:
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.id ASC;
Potential Workaround
One possible workaround is to set an order_by
on the relationship definition itself, and then pass the first
argument to events
to limit the response to only the first N results. The statements emitted in this case are:
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.timestamp DESC;
However, this approach has the following problems:
- The
first
argument is not a direct analogue of SQL'sOFFSET
- it will still load the full event history into memory in order to generate a cursor, and then truncate the results (NOTE: depending on the database engine, this approach may use a database cursor rather than sending all data to the graphql server. I'm not familiar enough with cursors to verify this). So this is still a fundamentally slow query. - Returning first N results is semantically different from returning all results since a given time. Some days might have 0 events, while others will have 10+. If our goal is to show all events since the day before, then this approach is untenable.
- Having to set the sort as part of the relationship definition rather than specifying as part of the query itself is inflexible and relies on developer awareness, rather than explicit ordering.
Solution
To fix the issue, we will implement our own RelationshipLoader
class, modifying it from the original to accept sort/filter args during construction, then patch in a modified get_batch_resolver
definition that uses our new implementation.
Usually, the _load_for_path
internal function called by batch_load_fn
will ignore any filtering or sorting set on the query_context
arg. It does, however, respect filtering requested via the loadopt
parameter, in order to implement with_loader_criteria()
semantics described here. The easiest way to hijack this flow is to construct our own loader (a defaultload
) and steal it's loader options (stored in context[0]
). _load_for_path
does respect the order_by
prop of the relationship that was passed in to initially (and will only resolve that reference at call-time), so we can simply update relationship_prop.order_by
with the requested order right before calling. Note that, while the RelationshipLoader
is constructed on a per-arg basis, the relationship_prop
is still shared by all instances, so we have to set this value right before calling in order to avoid conflicts. In the case where no sort
arg is passed, we store a cache of default sort orders the first time each relationship_prop
is passed in, which is set prior to calling.
NOTE: the patch_relationship_loader()
function must be called before constructing your graphene.Schema
object.
Specific changes are noted in the comments.
# patch_batching.py
from asyncio import get_event_loop
from typing import Any, Callable, Coroutine
import sqlalchemy
from graphene_sqlalchemy.batching import DataLoader
from sqlalchemy.orm import Session, defaultload, strategies
from sqlalchemy.util import immutabledict
from sqlalchemy.util._concurrency_py3k import greenlet_spawn
def patch_relationship_loader() -> None:
"""
Call this function once prior to any other setup code
(specifically, prior to creating your `graphene.Schema` object)
"""
import graphene_sqlalchemy.fields, graphene_sqlalchemy.converter
graphene_sqlalchemy.fields.get_batch_resolver = get_batch_resolver
graphene_sqlalchemy.converter.get_batch_resolver = get_batch_resolver
class RelationshipLoader(DataLoader):
cache = False
DEFAULT_PROP_ORDER_CACHE: dict[Any, Any] = {}
"Cache to ensure that default sort order is used when no `sort` arg is passed"
def __init__(self, relationship_prop: Any, selectin_loader: Any,
args: dict[Any, Any]) -> None:
super().__init__()
if relationship_prop not in RelationshipLoader.DEFAULT_PROP_ORDER_CACHE:
RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
relationship_prop] = relationship_prop.order_by
self.relationship_prop = relationship_prop
self.selectin_loader = selectin_loader
self.args = args
sort = args.get('sort')
if sort is None:
# Check cache if no sort if specified
self.order_by = RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
relationship_prop]
else:
# Sort handling copied from `SQLAlchemyConnectionField.get_query`
if not isinstance(sort, list):
sort = [sort]
sort_args = []
for item in sort:
import enum
from graphene_sqlalchemy.utils import EnumValue
if isinstance(item, enum.Enum):
sort_args.append(item.value.value)
elif isinstance(item, EnumValue):
sort_args.append(item.value)
else:
sort_args.append(item)
self.order_by = sort_args
filter = args.get('filter')
if filter is None:
self.loadopt = None
else:
filters = type(filter).execute_filters(None, filter)[1]
# Need `InstrumentedAttribute` (type of `relationship_prop`) to utilize `and_` filtering. Better way to access this?
rel_attr = getattr(self.relationship_prop.parent.entity,
self.relationship_prop.key)
# Use SA's public relationship-loading-strategy functions to generate a `_AttributeStrategyLoad` object for us
self.loadopt = defaultload(rel_attr.and_(
*filters)).context[0]
async def batch_load_fn(self, parents: Any) -> list[Any]:
"""
This function is the corrected version of the batch load function
to use for relationships. The only differences are that,
rather than calling `selectin_loader._load_for_path` directly, it
calls it asynchronously using `greenlet_spawn`; it also sets
`self.relationship_prop.order_by` to enable sorting (referenced via
`self.selectin_loader`) and passes `self.loadopt` to enable filtering.
Other extraneous logic has been removed for clarity. See the original
code at
https://github.com/graphql-python/graphene-sqlalchemy/blob/eb9c663cc0e314987397626573e3d2f940bea138/graphene_sqlalchemy/batching.py#L39
"""
child_mapper = self.relationship_prop.mapper
parent_mapper = self.relationship_prop.parent
session = Session.object_session(parents[0])
for parent in parents:
assert session is Session.object_session(parent)
assert session and parent not in session.dirty
states = [(sqlalchemy.inspect(parent), True) for parent in parents]
query_context = None
if session:
self.relationship_prop.order_by = self.order_by
parent_mapper_query = session.query(parent_mapper.entity)
query_context = parent_mapper_query._compile_context()
await greenlet_spawn(
self.selectin_loader._load_for_path,
query_context,
parent_mapper._path_registry,
states,
None,
child_mapper,
self.loadopt,
None, # recursion depth can be none
immutabledict(), # default value for selectinload->lazyload
)
result = [
getattr(parent, self.relationship_prop.key) for parent in parents
]
return result
# Cache this across `batch_load_fn` calls
# This is so SQL string generation is cached under-the-hood via `bakery`
# Caching the relationship loader for each relationship prop and args combo (can't use dict because `args` is not hashable)
RELATIONSHIP_LOADERS_CACHE: list[RelationshipLoader] = []
def get_batch_resolver(
relationship_prop: Any
) -> Callable[[Any, Any], Coroutine[Any, Any, None]]:
"""Get the resolve function for the given relationship.
Changed from default implementation to pass `args` in and consider `args` while caching."""
def _get_loader(relationship_prop: Any,
args: dict[Any, Any]) -> RelationshipLoader:
"""Retrieve the cached loader of the given relationship."""
loop = get_event_loop()
RELATIONSHIP_LOADERS_CACHE[:] = [
l for l in RELATIONSHIP_LOADERS_CACHE if l.loop == loop
]
loader = next(
(l for l in RELATIONSHIP_LOADERS_CACHE
if l.relationship_prop == relationship_prop and l.args == args),
None)
if loader is None:
selectin_loader = strategies.SelectInLoader(
relationship_prop, (("lazy", "selectin"), ))
loader = RelationshipLoader(
relationship_prop=relationship_prop,
selectin_loader=selectin_loader,
args=args,
)
RELATIONSHIP_LOADERS_CACHE.append(loader)
return loader
async def resolve(root: Any, info: Any, **args: Any) -> None:
return await _get_loader(relationship_prop, args).load(root)
return resolve