From 09fd70c00738bb65dc32b5e11c7774ff049e8b74 Mon Sep 17 00:00:00 2001 From: Tom Van Looy Date: Mon, 27 Apr 2020 17:26:57 +0200 Subject: [PATCH 1/2] Messenger process managers Add information about mysql timeouts, supervisor FATAL, systemd user services. Add docs for #13617 --- messenger.rst | 145 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 139 insertions(+), 6 deletions(-) diff --git a/messenger.rst b/messenger.rst index 0194b11fd11..f1b4b256aad 100644 --- a/messenger.rst +++ b/messenger.rst @@ -486,23 +486,30 @@ Deploying to Production On production, there are a few important things to think about: -**Use Supervisor to keep your worker(s) running** +**Use a Process Manager like Supervisor or systemd to keep your worker(s) running** You'll want one or more "workers" running at all times. To do that, use a - process control system like :ref:`Supervisor `. + process control system like :ref:`Supervisor ` + or :ref:`systemd `. **Don't Let Workers Run Forever** Some services (like Doctrine's ``EntityManager``) will consume more memory over time. So, instead of allowing your worker to run forever, use a flag like ``messenger:consume --limit=10`` to tell your worker to only handle 10 - messages before exiting (then Supervisor will create a new process). There + messages before exiting (then the process manager will create a new process). There are also other options like ``--memory-limit=128M`` and ``--time-limit=3600``. +**Stopping Workers That Encounter Errors** + If a worker dependency like your database server is down, or timeout is reached, + you can try to add :ref:`reconnect logic `, or just quit + the worker if it receives too many errors with the ``--failure-limit`` option of + the ``messenger:consume`` command. + **Restart Workers on Deploy** Each time you deploy, you'll need to restart all your worker processes so that they see the newly deployed code. To do this, run ``messenger:stop-workers`` on deployment. This will signal to each worker that it should finish the message - it's currently handling and should shut down gracefully. Then, Supervisor will create - new worker processes. The command uses the :ref:`app ` + it's currently handling and should shut down gracefully. Then, the process manager + will create new worker processes. The command uses the :ref:`app ` cache internally - so make sure this is configured to use an adapter you like. **Use the Same Cache Between Deploys** @@ -695,8 +702,49 @@ Next, tell Supervisor to read your config and start your workers: See the `Supervisor docs`_ for more details. +It is possible to end up in a situation where the supervisor job gets into a +FATAL (too many start retries) state when trying to restart when something is +not yet available. You can prevent this by wrapping the Symfony script with a +shell script and sleep when the script fails: + +.. code-block:: bash + + #!/usr/bin/env bash + + # Supervisor sends TERM to services when stopped. + # This wrapper has to pass the signal to it's child. + # Note that we send TERM (graceful) instead of KILL (immediate). + _term() { + echo "[GOT TERM, SIGNALING CHILD]" + kill -TERM "$child" 2>/dev/null + exit 1 + } + + trap _term SIGTERM + + # Execute console.php with whatever arguments were specified to this script + "$@" & + child=$! + wait "$child" + rc=$? + + # Delay to prevent supervisor from restarting too fast on failure + sleep 30 + + # Return with the exit code of the wrapped process + exit $rc + +The supervisor job would then look like this: + +.. code-block:: ini + + ;/etc/supervisor/conf.d/messenger-worker.conf + [program:messenger-consume] + command=/path/to/your/app/bin/console_wrapper php /path/to/your/app/bin/console messenger:consume async --time-limit=3600" + ... + Graceful Shutdown -~~~~~~~~~~~~~~~~~ +................. If you install the `PCNTL`_ PHP extension in your project, workers will handle the ``SIGTERM`` POSIX signal to finish processing their current message before @@ -712,6 +760,88 @@ of the desired grace period in seconds) in order to perform a graceful shutdown: [program:x] stopwaitsecs=20 +.. _messenger-systemd: + +Systemd Configuration +~~~~~~~~~~~~~~~~~~~~~ + +While Supervisor is a great tool, it has the disadvantage that you need system +access to run it. Systemd has become the standard on most Linux distributions, +and has a good alternative called *user services*. + +Systemd user service configuration files typically live in a ``~/.config/systemd/user`` +directory. For example, you can create a new ``messenger-worker.service`` file. Or a +``messenger-worker@.service`` file if you want more instances running at the same time: + +.. code-block:: ini + + [Unit] + Description=Symfony messenger-consume %i + + [Service] + ExecStart=php /path/to/your/app/bin/console messenger:consume async --time-limit=3600 + Restart=always + RestartSec=30 + + [Install] + WantedBy=default.target + +Now, tell systemd to enable and start one worker: + +.. code-block:: terminal + + $ systemctl --user enable messenger-worker@1.service + $ systemctl --user start messenger-worker@1.service + + # to enable and start 20 workers + $ systemctl --user enable messenger-worker@{1..20}.service + $ systemctl --user start messenger-worker@{1..20}.service + +If you change your service config file, you need to reload the daemon: + +.. code-block:: terminal + + $ systemctl --user daemon-reload + +To restart all your consumers: + +.. code-block:: terminal + + $ systemctl --user restart messenger-consume@*.service + +The systemd user instance is only started after the first login of the +particular user. Consumer often need to start on system boot instead. +Enable lingering on the user to activate that behavior: + +.. code-block:: terminal + + $ loginctl enable-linger + +Logs are managed by journald and can be worked with using the journalctl +command: + +.. code-block:: terminal + + # follow logs of consumer nr 11 + $ journalctl -f --user-unit messenger-consume@11.service + + # follow logs of all consumers + $ journalctl -f --user-unit messenger-consume@* + + # follow all logs from your user services + $ journalctl -f _UID=$UID + +See the `systemd docs`_ for more details. + +.. note:: + + You either need elevated privileges for the ``journalctl`` command, or add + your user to the systemd-journal group: + + .. code-block:: terminal + + $ sudo usermod -a -G systemd-journal + Stateless Worker ~~~~~~~~~~~~~~~~ @@ -2190,6 +2320,8 @@ middleware and *only* include your own: If a middleware service is abstract, a different instance of the service will be created per bus. +.. _middleware-doctrine: + Middleware for Doctrine ~~~~~~~~~~~~~~~~~~~~~~~ @@ -2377,6 +2509,7 @@ Learn more .. _`streams`: https://redis.io/topics/streams-intro .. _`Supervisor docs`: http://supervisord.org/ .. _`PCNTL`: https://www.php.net/manual/book.pcntl.php +.. _`systemd docs`: https://www.freedesktop.org/wiki/Software/systemd/ .. _`SymfonyCasts' message serializer tutorial`: https://symfonycasts.com/screencast/messenger/transport-serializer .. _`Long polling`: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-short-and-long-polling.html .. _`Visibility Timeout`: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html From 156058489891b657339b34310204f34bb0e17619 Mon Sep 17 00:00:00 2001 From: Wouter de Jong Date: Tue, 6 Sep 2022 17:28:42 +0200 Subject: [PATCH 2/2] Add documentation of Supervisor BACKOFF strategy This feature seems to be the official way to handle delays in restart. --- messenger.rst | 55 +++++++++++++-------------------------------------- 1 file changed, 14 insertions(+), 41 deletions(-) diff --git a/messenger.rst b/messenger.rst index f1b4b256aad..20f1136de94 100644 --- a/messenger.rst +++ b/messenger.rst @@ -676,11 +676,25 @@ times: startsecs=0 autostart=true autorestart=true + startretries=10 process_name=%(program_name)s_%(process_num)02d Change the ``async`` argument to use the name of your transport (or transports) and ``user`` to the Unix user on your server. +.. caution:: + + During a deployment, something might be unavailable (e.g. the + database) causing the consumer to fail to start. In this situation, + Supervisor will try ``startretries`` number of times to restart the + command. Make sure to change this setting to avoid getting the command + in a FATAL state, which will never restart again. + + Each restart, Supervisor increases the delay by 1 second. For instance, if + the value is ``10``, it will wait 1 sec, 2 sec, 3 sec, etc. This gives the + service a total of 55 seconds to become available again. Increase the + ``startretries`` setting to cover the maximum expected downtime. + If you use the Redis Transport, note that each worker needs a unique consumer name to avoid the same message being handled by multiple workers. One way to achieve this is to set an environment variable in the Supervisor configuration @@ -702,47 +716,6 @@ Next, tell Supervisor to read your config and start your workers: See the `Supervisor docs`_ for more details. -It is possible to end up in a situation where the supervisor job gets into a -FATAL (too many start retries) state when trying to restart when something is -not yet available. You can prevent this by wrapping the Symfony script with a -shell script and sleep when the script fails: - -.. code-block:: bash - - #!/usr/bin/env bash - - # Supervisor sends TERM to services when stopped. - # This wrapper has to pass the signal to it's child. - # Note that we send TERM (graceful) instead of KILL (immediate). - _term() { - echo "[GOT TERM, SIGNALING CHILD]" - kill -TERM "$child" 2>/dev/null - exit 1 - } - - trap _term SIGTERM - - # Execute console.php with whatever arguments were specified to this script - "$@" & - child=$! - wait "$child" - rc=$? - - # Delay to prevent supervisor from restarting too fast on failure - sleep 30 - - # Return with the exit code of the wrapped process - exit $rc - -The supervisor job would then look like this: - -.. code-block:: ini - - ;/etc/supervisor/conf.d/messenger-worker.conf - [program:messenger-consume] - command=/path/to/your/app/bin/console_wrapper php /path/to/your/app/bin/console messenger:consume async --time-limit=3600" - ... - Graceful Shutdown .................