Description
Admittedly, this isn't so much an issue as it is a request for guidance. It might turn into a feature request.
I'm working on a graphql schema that will source information from a Mongo database and a neo4j database. The documents I'm storing in Mongo are rather large (up to 2MB in some cases), and those documents are highly related. It was easiest to provide traversal between objects via a graph database, and we're not terribly concerned about most of the schema of the documents in Mongo, so the tech makes sense for our use case.
The dataloader library is a big help in optimizing our db queries to Mongo to load documents, but I'm currently at a loss as to how to use neo4j to query for which documents I need to load. The general order of operation looks something like this, for our data:
- Load first-level documents from Mongo.
- For a second-level portion of a query, collect the entire set of IDs from the results of the first level query, make a single query to neo4j for a list of all related IDs for the relationship type, and map those results to the source IDs (resulting roughly in a mapping of source ID to a list of related IDs).
- Load second-level documents from Mongo, with the IDs retrieved from the previous step.
The dataloader library makes the process easy if you can derive the IDs of related documents from a source object, but in my case I'm not able to determine that without a neo4j query. The best I know how to do this at the moment is run a neo4j query for an individual source object in a datafetcher for each source object, meaning I still end up roughly with an n+1 problem.
Any guidance on how to optimize for bulk loading in this scenario?