Description
I have written a Neo4j client library that uses this driver under the hood. (https://drivine.org/).
One of the use-cases is streaming without back-pressure. Back-pressure is where the source produces data faster than the sync can process, for example writing too quickly into a file-stream, say if we were producing a report that presents COVID-19 cases for a region.
The library includes an API as follows:
openCursor<T>(spec: QuerySpecification<T>): Promise<Cursor<T>>;
. . whereby you can open a Cursor<T>
representing a set of results. This cursor is AsyncIterable
(supports for await (const item of cursor)
and more importantly can turn itself into a Node.JS Readable
stream.
When these streams are piped together, it is the sink stream that will pull through from the source, at the rate the sink can handle. So no back-pressure problems, excessive memory usage or going over a high-water mark. Good for many use-case. We can easily create an Observable from a stream too.
How it is currently Implemented:
The way this is implemented is to fetch data from Neo4j in batches using SKIP and LIMIT.
I wanted to see if I can replace this using the driver's scheming capabilities, however from what I understand RxJS has no capability to handling back-pressure. It is a push-through library. Right?
For pull-through we should use the companion IxJS (https://github.com/ReactiveX/IxJS) instead. Pulling through will push the onus of handling back-pressure onto the source, which would need to hold the entire result set and emit items when requested. This should not be a problem, as it needs to hold the results in memory for a period of time in any case.
So how about supporting pull-through streaming at the driver level? Either as Readable
streams or with IxJS
. (Or apparently there is a way to do it in RxJS: https://itnext.io/lossless-backpressure-in-rxjs-b6de30a1b6d4 <--- I'm still digesting this article).