Skip to content

[HttpFoundation] Add StreamedJsonResponse for efficient JSON streaming #47709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alexander-schranz
Copy link
Contributor

@alexander-schranz alexander-schranz commented Sep 27, 2022

Q A
Branch? 6.2
Bug fix? no
New feature? yes
Deprecations? no
Tickets Fix #...
License MIT
Doc PR symfony/symfony-docs#17301

When big data are streamed via JSON API it can sometimes be difficult to keep the resources usages low. For this I experimented with a different way of streaming data for JSON responses. It uses combination of structured array and generics which did result in a lot better result.

More can be read about here: https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine.

I thought it maybe can be a great addition to Symfony itself to make this kind of responses easier and that APIs can be made more performant.

Usage

First Version (replaced)
class ArticleListAction {
    public function __invoke(EntityManagerInterface  $entityManager): Response
    {
        $articles = $this->findArticles($entityManager);

        return new StreamedJsonResponse(
            // json structure with replacers identifiers
            [
                '_embedded' => [
                    'articles' => '__articles__',
                ],
            ],
            // array of generator replacer identifier used as key
            [
                '__articles__' => $this->findArticles('Article'),
            ]
        );
    }

    private function findArticles(EntityManagerInterface  $entityManager): \Generator
    {
        $queryBuilder = $entityManager->createQueryBuilder();
        $queryBuilder->from(Article::class, 'article');
        $queryBuilder->select('article.id')
            ->addSelect('article.title')
            ->addSelect('article.description');

        return $queryBuilder->getQuery()->toIterable();
    }
}

Update Version (thx to @ro0NL for the idea):

class ArticleListAction {
    public function __invoke(EntityManagerInterface  $entityManager): Response
    {
        $articles = $this->findArticles($entityManager);

        return new StreamedJsonResponse(
            // json structure with generators in it which are streamed
            [
                '_embedded' => [
                    'articles' => $this->findArticles('Article'), // returns a generator which is streamed
                ],
            ],
        );
    }

    private function findArticles(EntityManagerInterface  $entityManager): \Generator
    {
        $queryBuilder = $entityManager->createQueryBuilder();
        $queryBuilder->from(Article::class, 'article');
        $queryBuilder->select('article.id')
            ->addSelect('article.title')
            ->addSelect('article.description');

        return $queryBuilder->getQuery()->toIterable();
    }
}

As proposed by @OskarStark the Full Content of Blog about "Efficient JSON Streaming with Symfony and Doctrine":

Efficient JSON Streaming with Symfony and Doctrine

After reading a tweet about we provide only a few items (max. 100) over our
JSON APIs but providing 4k images for our websites. I did think about why is
this the case.

The main difference first we need to know about how images are streamed.
On webservers today is mostly the sendfile feature used. Which is very
efficient as it can stream a file chunk by chunk and don't need to load
the whole data.

So I'm asking myself how we can achieve the same mechanisms for our
JSON APIs, with a little experiment.

As an example we will have a look at a basic entity which has the
following fields defined:

  • id: int
  • title: string
  • description: text

The response of our API should look like the following:

{
  "_embedded": {
    "articles": [
      {
        "id": 1,
        "title": "Article 1",
        "description": "Description 1\nMore description text ...",
      },
      ...
    ]
  } 
}

Normally to provide this API we would do something like this:

<?php

namespace App\Controller;

use App\Entity\Article;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\HttpFoundation\JsonResponse;
use Symfony\Component\HttpFoundation\Response;

class ArticleListAction
{
    public function __invoke(EntityManagerInterface $entityManager): Response
    {
        $articles = $this->findArticles($entityManager);

        return JsonResponse::fromJsonString(json_encode([
            'embedded' => [
                'articles' => $articles,
            ],
            'total' => 100_000,
        ], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE));
    }

    // normally this method would live in a repository
    private function findArticles(EntityManagerInterface  $entityManager): iterable
    {
        $queryBuilder = $entityManager->createQueryBuilder();
        $queryBuilder->from(Article::class, 'article');
        $queryBuilder->select('article.id')
            ->addSelect('article.title')
            ->addSelect('article.description');

        return $queryBuilder->getQuery()->getResult();
    }
}

In most cases we will add some pagination to the endpoint so our response are not too big.

Making the api more efficient

But there is also a way how we can stream this response in an efficient way.

First of all we need to adjust how we load the articles. This can be done by replace
the getResult with the more efficient toIterable:

-        return $queryBuilder->getQuery()->getResult();
+        return $queryBuilder->getQuery()->toIterable();

Still the whole JSON need to be in the memory to send it. So we need also refactoring
how we are creating our response. We will replace our JsonResponse with the
StreamedResponse object.

return new StreamedResponse(function() use ($articles) {
    // stream json
}, 200, ['Content-Type' => 'application/json']);

But the json format is not the best format for streaming, so we need to add some hacks
so we can make it streamable.

First we will create will define the basic structure of our JSON this way:

$jsonStructure = json_encode([
    'embedded' => [
        'articles' => ['__REPLACES_ARTICLES__'],
    ],
    'total' => 100_000,
], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);

Instead of the $articles we are using a placeholder which we use to split the string into
a $before and $after variable:

[$before, $after] = explode('"__REPLACES_ARTICLES__"', $jsonStructure, 2);

Now we are first sending the $before:

echo $before . PHP_EOL;

Then we stream the articles one by one to it here we need to keep the comma in mind which
we need to add after every article but not the last one:

foreach ($articles as $count => $article) {
    if ($count !== 0) {
        echo ',' . PHP_EOL; // if not first element we need a separator
    }

    echo json_encode($article, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);
}

Also we will add an additional flush after every 500 elements:

if ($count % 500 === 0 && $count !== 100_000) { // flush response after every 500
    flush();
}

After that we will also send the $after part:

echo PHP_EOL;
echo $after;

The result

So at the end the whole action looks like the following:

<?php

namespace App\Controller;

use App\Entity\Article;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\HttpFoundation\StreamedResponse;

class ArticleListAction
{
    public function __invoke(EntityManagerInterface  $entityManager): Response
    {
        $articles = $this->findArticles($entityManager);

        return new StreamedResponse(function() use ($articles) {
            // defining our json structure but replaces the articles with a placeholder
            $jsonStructure = json_encode([
                'embedded' => [
                    'articles' => ['__REPLACES_ARTICLES__'],
                ],
                'total' => 100_000,
            ], JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);

            // split by placeholder
            [$before, $after] = explode('"__REPLACES_ARTICLES__"', $jsonStructure, 2);

            // send first before part of the json
            echo $before . PHP_EOL;

            // stream article one by one as own json
            foreach ($articles as $count => $article) {
                if ($count !== 0) {
                    echo ',' . PHP_EOL; // if not first element we need a separator
                }

                if ($count % 500 === 0 && $count !== 100_000) { // flush response after every 500
                    flush();
                }

                echo json_encode($article, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);
            }

            // send the after part of the json as last
            echo PHP_EOL;
            echo $after;
        }, 200, ['Content-Type' => 'application/json']);
    }

    private function findArticles(EntityManagerInterface  $entityManager): iterable
    {
        $queryBuilder = $entityManager->createQueryBuilder();
        $queryBuilder->from(Article::class, 'article');
        $queryBuilder->select('article.id')
            ->addSelect('article.title')
            ->addSelect('article.description');

        return $queryBuilder->getQuery()->toIterable();
    }
}

The metrics for 100000 Articles (nginx + php-fpm 7.4 - Macbook Pro 2013):

Old Implementation New Implementation
Memory Usage 49.53 MB 2.10 MB
Memory Usage Peak 59.21 MB 2.10 MB
Time to first Byte 478ms 28ms
Time 2.335 s 0.584 s

This way we did not only reduce the memory usage on our server
also we did make the response faster. The memory usage was
measured here with memory_get_usage and memory_get_peak_usage.
The "Time to first Byte" by the browser value and response times
over curl.

Updated 2022-10-02 - (symfony serve + php-fpm 8.1 - Macbook Pro 2021)

Old Implementation New Implementation
Memory Usage 64.21 MB 2.10 MB
Memory Usage Peak 73.89 MB 2.10 MB
Time to first Byte 0.203 s 0.049 s
Updated Time (2022-10-02) 0.233 s 0.232 s

While there is not much different for a single response in the time,
the real performance is the lower memory usage. Which will kick in when
you have a lot of simultaneously requests. On my machine >150 simultaneously
requests - which is a high value but will on a normal server be a lot lower.

While 150 simultaneously requests crashes in the old implementation
the new implementation still works with 220 simultaneously requests. Which
means we got about ~46% more requests possible.

Reading Data in javascript

As we stream the data we should also make our JavaScript on the other
end the same way - so data need to read in streamed way.

Here I'm just following the example from the Fetch API Processing a text file line by line

So if we look at our script.js we split the object
line by line and append it to our table. This method is definitely not the
way how JSON should be read and parsed. It should only be shown as example
how the response could be read from a stream.

Conclusion

The implementation looks a little hacky for maintainability it could
be moved into its own Factory which creates this kind of response.

Example:

return StreamedResponseFactory::create(
    [
        'embedded' => [
            'articles' => ['__REPLACES_ARTICLES__'],
        ],
        'total' => 100_000,
    ],
    ['____REPLACES_ARTICLES__' => $articles]
);

The JavaScript part something is definitely not ready for production
and if used you should probably creating your own content-type e.g.:
application/json+stream. So you are parsing the json this way
only when you know it is really in this line by line format.
There maybe better libraries like JSONStream
to read data but at current state did test them out. Let me know
if somebody has experience with that and has solutions for it.

Atleast what I think everybody should use for providing lists
is to use toIterable when possible for your lists when loading
your data via Doctrine and and select specific fields instead
of using the ORM to avoid hydration process to object.

Let me know what you think about this experiment and how you currently are
providing your JSON data.

The whole experiment here can be checked out and test yourself via this repository.

Attend the discussion about this on Twitter.

Update 2022-09-27

Added a StreamedJsonRepsonse class and
try to contribute this implementation to the Symfony core.

#47709

Update 2022-10-02

Updated some statistics with new machine and apache benchmark tests for concurrency requests.

@carsonbot carsonbot added this to the 6.2 milestone Sep 27, 2022
@carsonbot carsonbot changed the title Add StreamedJsonResponse for efficient JSON streaming [HttpFoundation] Add StreamedJsonResponse for efficient JSON streaming Sep 27, 2022
@alexander-schranz
Copy link
Contributor Author

The error in the tests of Stopwatch is unrelated to the pull request.

@alexander-schranz alexander-schranz force-pushed the feature/streamed-json-response branch from 80f75ec to aba67fa Compare September 28, 2022 17:13
@OskarStark OskarStark changed the title [HttpFoundation] Add StreamedJsonResponse for efficient JSON streaming [HttpFoundation] Add StreamedJsonResponse for efficient JSON streaming Sep 29, 2022
@OskarStark OskarStark requested a review from dunglas September 29, 2022 08:36
@ro0NL
Copy link
Contributor

ro0NL commented Sep 29, 2022

would it be reasonable to consider a "compute json inline" approach, rather than end-users taking care of unique identifiers

$lazyJson = ['key' => fn() => yield from $heavy];

@stof
Copy link
Member

stof commented Sep 29, 2022

@ro0NL this would force to re-implement the whole json encoding in userland

@ro0NL
Copy link
Contributor

ro0NL commented Sep 29, 2022

we could array walk the structure first, thus keeping the unique placeholders an implementation detail.

@stof
Copy link
Member

stof commented Sep 29, 2022

@ro0NL if you do that, you are not streaming json anymore, defeating the whole purpose of this PR.

@ro0NL
Copy link
Contributor

ro0NL commented Sep 29, 2022

the idea is to split the generators from the structure, preserving remaining logic. But this is an extra step yes, thus less ideal perhaps.

@alexander-schranz
Copy link
Contributor Author

@ro0NL interesting input. As I think the structure array is mostly small it could be possible. But we would need to have a look at what difference this would be in the performance.

I hacked something together using array_walk_recursive: https://3v4l.org/tndhO. Will have a deeper look at it at the evening or next days.

@stof
Copy link
Member

stof commented Sep 29, 2022

@alexander-schranz be careful when implementing this. is_callable would turn some strings into placeholders instead of outputting them.

@alexander-schranz
Copy link
Contributor Author

@stof great hint think $item instanceof Closure should then do the job?

@stof
Copy link
Member

stof commented Sep 29, 2022

now that we have first class callables, I would say yes. You can convert any callable to a closure using this feature.

@alexander-schranz
Copy link
Contributor Author

Okay I don't need to check for closures or callables. I just need to check on \Generators because the Closures are already called. Which is very important, as example when Database connection is not available the exception need to be thrown in the Controller and should not be thrown when Status Code 200 is already returned:

return new StreamedJsonResponse(
    [
        '_embedded' => [
            'articles' => $this->findArticles('Article'), // returns a \Generator which will generate a list of data
        ],
    ],
);

The diff between old and new implementation is not big it just takes about 0.0000128s todo the array_walk_recursive and replace it. It also did not have any visible changes on the memory usage. The tested arrays are really small but that will mostly be the case I think in this kind of responses.

I also update the example repository using the new class under /symfony-articles.json: https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine if somebody want to experiment with it.

@alexander-schranz alexander-schranz force-pushed the feature/streamed-json-response branch 2 times, most recently from 2480746 to 7d8700f Compare October 24, 2022 19:08
@alexander-schranz alexander-schranz force-pushed the feature/streamed-json-response branch from 3453946 to 3de6fc7 Compare October 24, 2022 20:50
@OskarStark
Copy link
Contributor

I propose to add the content from the README of your prototype application to the PR header 👍🏻

@alexander-schranz
Copy link
Contributor Author

@OskarStark added.

Think PR is blocked until 6.3 branch is created?

@OskarStark
Copy link
Contributor

@OskarStark added.

thanks

Think PR is blocked until 6.3 branch is created?

Yes

@dunglas
Copy link
Member

dunglas commented Nov 25, 2022

For the record, @mtarld @soyuka and I are working on a new component that will be an alternative to json_encode/json_decode and to the Symfony Serializer that will natively support JSON streaming (for encoding and decoding). Maybe will it be possible to use this component in this PR.

@alexander-schranz
Copy link
Contributor Author

@dunglas that sounds very interesting. I think currently I would stay with the implementation how it is for now, this gives a very low resource solution without the need that the http foundation package has additional requirements to any kind of serializer and so on. Still a serializer/normalizer is possible be used inside the Generator already, which will be they current implementation of this class also be very low on resources usage as it don't try to serialize all objects at once just one after the other and so don't need to keep more then one object in the memory aslong as the ORM loading allows that.

@chalasr
Copy link
Member

chalasr commented Dec 29, 2022

Shall we move forward on this one?

@alexander-schranz alexander-schranz force-pushed the feature/streamed-json-response branch from 626eafe to a3ee766 Compare December 29, 2022 13:35
@alexander-schranz
Copy link
Contributor Author

@chalasr rebased. Not sure what is open or required to get this merged :)

@chalasr chalasr force-pushed the feature/streamed-json-response branch from a3ee766 to ecc5355 Compare December 29, 2022 13:44
@chalasr
Copy link
Member

chalasr commented Dec 29, 2022

Let's iterate, thanks @alexander-schranz!

@chalasr chalasr merged commit f43cd26 into symfony:6.3 Dec 29, 2022
@alexander-schranz alexander-schranz deleted the feature/streamed-json-response branch December 29, 2022 13:48
@alexander-schranz
Copy link
Contributor Author

🎉 Thx you all for the great feedback and ideas. Think we got a great solution out of it with a better DX as I could think of when created the Pull request.

@chalasr that sounds great :)

@fabpot fabpot mentioned this pull request May 1, 2023
nicolas-grekas added a commit that referenced this pull request May 16, 2023
…medJsonResponse (alexander-schranz)

This PR was merged into the 6.3 branch.

Discussion
----------

[HttpFoundation] Fix problem with empty generator in StreamedJsonResponse

| Q             | A
| ------------- | ---
| Branch?       | 6.3 (Feature `StreamedJsonResponse`: #47709)
| Bug fix?      | yes
| New feature?  | no <!-- please update src/**/CHANGELOG.md files -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tickets       | Fix - was reported to me on Slack by `@norkunas`
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

Currently when the Generator is empty the return is invalid JSON which should not happen. So adding a testcase and a fix to the problem with the empty generator.

Commits
-------

39bb6b6 Fix problem with empty generator in StreamedJsonResponse
javiereguiluz added a commit to symfony/symfony-docs that referenced this pull request Jun 6, 2023
…onse` (alexander-schranz)

This PR was squashed before being merged into the 6.3 branch.

Discussion
----------

[HttpFoundation] Add documentation for `StreamedJsonResponse`

Docs for: symfony/symfony#47709

# TODO

- [x] Example of Flush Handling

Commits
-------

8a285e3 [HttpFoundation] Add documentation for `StreamedJsonResponse`
fabpot added a commit that referenced this pull request Oct 1, 2023
…medJsonResponse (Jeroeny)

This PR was merged into the 6.4 branch.

Discussion
----------

[HttpFoundation] Support root-level Generator in StreamedJsonResponse

| Q             | A
| ------------- | ---
| Branch?       | 6.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| License       | MIT

Currently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure.
However if a response is a list of items (for example database entities) on the root level, this isn't usable.
I think both usecases can be supported with the change in this PR.

The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic.

Link to first PR that introduced StreamedJsonResponse: #47709

~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations.

Commits
-------

05e582f support root-level Generator in StreamedJsonResponse
symfony-splitter pushed a commit to symfony/http-foundation that referenced this pull request Oct 1, 2023
…medJsonResponse (Jeroeny)

This PR was merged into the 6.4 branch.

Discussion
----------

[HttpFoundation] Support root-level Generator in StreamedJsonResponse

| Q             | A
| ------------- | ---
| Branch?       | 6.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| License       | MIT

Currently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure.
However if a response is a list of items (for example database entities) on the root level, this isn't usable.
I think both usecases can be supported with the change in this PR.

The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic.

Link to first PR that introduced StreamedJsonResponse: symfony/symfony#47709

~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations.

Commits
-------

05e582f1a3 support root-level Generator in StreamedJsonResponse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.