-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[HttpFoundation] Add StreamedJsonResponse
for efficient JSON streaming
#47709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HttpFoundation] Add StreamedJsonResponse
for efficient JSON streaming
#47709
Conversation
The error in the tests of |
9a34c82
to
bdd5bab
Compare
80f75ec
to
aba67fa
Compare
StreamedJsonResponse
for efficient JSON streaming
src/Symfony/Component/HttpFoundation/Tests/StreamedJsonResponseTest.php
Outdated
Show resolved
Hide resolved
would it be reasonable to consider a "compute json inline" approach, rather than end-users taking care of unique identifiers $lazyJson = ['key' => fn() => yield from $heavy]; |
@ro0NL this would force to re-implement the whole json encoding in userland |
we could array walk the structure first, thus keeping the unique placeholders an implementation detail. |
@ro0NL if you do that, you are not streaming json anymore, defeating the whole purpose of this PR. |
the idea is to split the generators from the structure, preserving remaining logic. But this is an extra step yes, thus less ideal perhaps. |
@ro0NL interesting input. As I think the structure array is mostly small it could be possible. But we would need to have a look at what difference this would be in the performance. I hacked something together using |
@alexander-schranz be careful when implementing this. |
@stof great hint think |
now that we have first class callables, I would say yes. You can convert any callable to a closure using this feature. |
Okay I don't need to check for return new StreamedJsonResponse(
[
'_embedded' => [
'articles' => $this->findArticles('Article'), // returns a \Generator which will generate a list of data
],
],
); The diff between old and new implementation is not big it just takes about I also update the example repository using the new class under |
2480746
to
7d8700f
Compare
3453946
to
3de6fc7
Compare
I propose to add the content from the README of your prototype application to the PR header 👍🏻 |
@OskarStark added. Think PR is blocked until 6.3 branch is created? |
thanks
Yes |
3de6fc7
to
626eafe
Compare
@dunglas that sounds very interesting. I think currently I would stay with the implementation how it is for now, this gives a very low resource solution without the need that the http foundation package has additional requirements to any kind of serializer and so on. Still a serializer/normalizer is possible be used inside the Generator already, which will be they current implementation of this class also be very low on resources usage as it don't try to serialize all objects at once just one after the other and so don't need to keep more then one object in the memory aslong as the ORM loading allows that. |
Shall we move forward on this one? |
626eafe
to
a3ee766
Compare
@chalasr rebased. Not sure what is open or required to get this merged :) |
a3ee766
to
ecc5355
Compare
Let's iterate, thanks @alexander-schranz! |
🎉 Thx you all for the great feedback and ideas. Think we got a great solution out of it with a better DX as I could think of when created the Pull request. @chalasr that sounds great :) |
…medJsonResponse (alexander-schranz) This PR was merged into the 6.3 branch. Discussion ---------- [HttpFoundation] Fix problem with empty generator in StreamedJsonResponse | Q | A | ------------- | --- | Branch? | 6.3 (Feature `StreamedJsonResponse`: #47709) | Bug fix? | yes | New feature? | no <!-- please update src/**/CHANGELOG.md files --> | Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files --> | Tickets | Fix - was reported to me on Slack by `@norkunas` | License | MIT | Doc PR | symfony/symfony-docs#... <!-- required for new features --> Currently when the Generator is empty the return is invalid JSON which should not happen. So adding a testcase and a fix to the problem with the empty generator. Commits ------- 39bb6b6 Fix problem with empty generator in StreamedJsonResponse
…onse` (alexander-schranz) This PR was squashed before being merged into the 6.3 branch. Discussion ---------- [HttpFoundation] Add documentation for `StreamedJsonResponse` Docs for: symfony/symfony#47709 # TODO - [x] Example of Flush Handling Commits ------- 8a285e3 [HttpFoundation] Add documentation for `StreamedJsonResponse`
…medJsonResponse (Jeroeny) This PR was merged into the 6.4 branch. Discussion ---------- [HttpFoundation] Support root-level Generator in StreamedJsonResponse | Q | A | ------------- | --- | Branch? | 6.4 | Bug fix? | no | New feature? | yes | Deprecations? | no | License | MIT Currently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure. However if a response is a list of items (for example database entities) on the root level, this isn't usable. I think both usecases can be supported with the change in this PR. The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic. Link to first PR that introduced StreamedJsonResponse: #47709 ~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations. Commits ------- 05e582f support root-level Generator in StreamedJsonResponse
…medJsonResponse (Jeroeny) This PR was merged into the 6.4 branch. Discussion ---------- [HttpFoundation] Support root-level Generator in StreamedJsonResponse | Q | A | ------------- | --- | Branch? | 6.4 | Bug fix? | no | New feature? | yes | Deprecations? | no | License | MIT Currently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure. However if a response is a list of items (for example database entities) on the root level, this isn't usable. I think both usecases can be supported with the change in this PR. The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic. Link to first PR that introduced StreamedJsonResponse: symfony/symfony#47709 ~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations. Commits ------- 05e582f1a3 support root-level Generator in StreamedJsonResponse
When big data are streamed via JSON API it can sometimes be difficult to keep the resources usages low. For this I experimented with a different way of streaming data for JSON responses. It uses combination of
structured array
andgenerics
which did result in a lot better result.More can be read about here: https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine.
I thought it maybe can be a great addition to Symfony itself to make this kind of responses easier and that APIs can be made more performant.
Usage
First Version (replaced)
Update Version (thx to @ro0NL for the idea):
As proposed by @OskarStark the Full Content of Blog about "Efficient JSON Streaming with Symfony and Doctrine":
Efficient JSON Streaming with Symfony and Doctrine
After reading a tweet about we provide only a few items (max. 100) over our
JSON APIs but providing 4k images for our websites. I did think about why is
this the case.
The main difference first we need to know about how images are streamed.
On webservers today is mostly the sendfile feature used. Which is very
efficient as it can stream a file chunk by chunk and don't need to load
the whole data.
So I'm asking myself how we can achieve the same mechanisms for our
JSON APIs, with a little experiment.
As an example we will have a look at a basic entity which has the
following fields defined:
The response of our API should look like the following:
Normally to provide this API we would do something like this:
In most cases we will add some pagination to the endpoint so our response are not too big.
Making the api more efficient
But there is also a way how we can stream this response in an efficient way.
First of all we need to adjust how we load the articles. This can be done by replace
the
getResult
with the more efficienttoIterable
:Still the whole JSON need to be in the memory to send it. So we need also refactoring
how we are creating our response. We will replace our
JsonResponse
with theStreamedResponse
object.But the
json
format is not the best format for streaming, so we need to add some hacksso we can make it streamable.
First we will create will define the basic structure of our JSON this way:
Instead of the
$articles
we are using a placeholder which we use to split the string intoa
$before
and$after
variable:Now we are first sending the
$before
:Then we stream the articles one by one to it here we need to keep the comma in mind which
we need to add after every article but not the last one:
Also we will add an additional
flush
after every 500 elements:After that we will also send the
$after
part:The result
So at the end the whole action looks like the following:
The metrics for 100000 Articles (nginx + php-fpm 7.4 - Macbook Pro 2013):
This way we did not only reduce the memory usage on our server
also we did make the response faster. The memory usage was
measured here with
memory_get_usage
andmemory_get_peak_usage
.The "Time to first Byte" by the browser value and response times
over curl.
Updated 2022-10-02 - (symfony serve + php-fpm 8.1 - Macbook Pro 2021)
While there is not much different for a single response in the time,
the real performance is the lower memory usage. Which will kick in when
you have a lot of simultaneously requests. On my machine >150 simultaneously
requests - which is a high value but will on a normal server be a lot lower.
While 150 simultaneously requests crashes in the old implementation
the new implementation still works with 220 simultaneously requests. Which
means we got about ~46% more requests possible.
Reading Data in javascript
As we stream the data we should also make our JavaScript on the other
end the same way - so data need to read in streamed way.
Here I'm just following the example from the Fetch API Processing a text file line by line
So if we look at our
script.js
we split the objectline by line and append it to our table. This method is definitely not the
way how JSON should be read and parsed. It should only be shown as example
how the response could be read from a stream.
Conclusion
The implementation looks a little hacky for maintainability it could
be moved into its own Factory which creates this kind of response.
Example:
The JavaScript part something is definitely not ready for production
and if used you should probably creating your own content-type e.g.:
application/json+stream
. So you are parsing the json this wayonly when you know it is really in this line by line format.
There maybe better libraries like
JSONStream
to read data but at current state did test them out. Let me know
if somebody has experience with that and has solutions for it.
Atleast what I think everybody should use for providing lists
is to use
toIterable
when possible for your lists when loadingyour data via Doctrine and and select specific fields instead
of using the
ORM
to avoid hydration process to object.Let me know what you think about this experiment and how you currently are
providing your JSON data.
The whole experiment here can be checked out and test yourself via this repository.
Attend the discussion about this on Twitter.
Update 2022-09-27
Added a StreamedJsonRepsonse class and
try to contribute this implementation to the Symfony core.
#47709
Update 2022-10-02
Updated some statistics with new machine and apache benchmark tests for concurrency requests.