Skip to content

bpo-39104: Fix hanging ProcessPoolExecutor on shutdown nowait with pickling failure #17670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Dec 20, 2019

As reported initially by @rad-pat in #6084, the following script causes a deadlock.

from concurrent.futures import ProcessPoolExecutor


class ObjectWithPickleError():
    """Triggers a RuntimeError when sending job to the workers"""

    def __reduce__(self):
        raise RuntimeError()


if __name__ == "__main__":
    e = ProcessPoolExecutor()
    f = e.submit(id, ObjectWithPickleError())
    e.shutdown(wait=False)
    f.result()  # Deadlock on get

This is caused by the fact that the main process is closing communication channels that might be necessary to the queue_management_thread later. To avoid this, this PR let the queue_management_thread manage all the closing.

https://bugs.python.org/issue39104

Automerge-Triggered-By: @pitrou

Copy link

@rad-pat rad-pat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment. I had also started trying to make a solution, but perhaps I didn't grasp the entire problem. Please see rad-pat@188c222, including a more basic test without the pickle error.

@@ -87,7 +87,12 @@ def close(self):
self._reader.close()

def wakeup(self):
self._writer.send_bytes(b"")
try:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a _closed flag might be better than catching the exception? You could also then check the flag in the clear method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, this looks better.

…tdown(wait=False)

When a ProcessPoolExecutor was created, a job added and then shutdown with
wait=False, an OSError: handle is closed error was raised by the ThreadWakeup
class. The threadwakeup should not be closed on shutdown if wait is False,
but in the shutdown_worker method.
@tomMoral
Copy link
Contributor Author

Sorry for the delay @rad-pat !

I cherry picked your solution to add more tests and add the closed flag.

@rad-pat
Copy link

rad-pat commented Jan 17, 2020

Thanks, it looks good to me @tomMoral, just need some core dev interest now and hopefully we can get this resolved

@rad-pat
Copy link

rad-pat commented Feb 11, 2020

@pitrou I believe you have an interest in reviewing PRs for multiprocessing, is there any chance of a review of this one? Thanks.

@csabella csabella requested a review from pitrou February 11, 2020 11:54
@pitrou pitrou changed the title bpo-39104 Fix hanging ProcessPoolExcutor on shutdown nowait with pickling failure bpo-39104: Fix hanging ProcessPoolExcutor on shutdown nowait with pickling failure Feb 13, 2020
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR. Some comments below.

@@ -339,6 +346,8 @@ def shutdown_worker():

# Release the queue's resources as soon as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a requirement for this PR, but it seems to me that the _queue_management_worker function has grown quite long and complicated. Do you think it could be turned into an object with a bunch of short and readable helper methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this would be a very much easier to read indeed. I will try to make a new PR with this change when I get a chance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to your PR then @tomMoral :-)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@rad-pat
Copy link

rad-pat commented Feb 14, 2020

@tomMoral , I've hopefully made all the requested changes here https://github.com/rad-pat/cpython/tree/FIX_hanging_interpreter_executor_shutdown_nowait, but I can't make a PR to your repo. Would you please be able to pull in the changes?

@tomMoral
Copy link
Contributor Author

@tomMoral , I've hopefully made all the requested changes here rad-pat/cpython@FIX_hanging_interpreter_executor_shutdown_nowait , but I can't make a PR to your repo. Would you please be able to pull in the changes?

Thanks @rad-pat , I updated the PR with your change and made a couple of adjustments!

@rad-pat
Copy link

rad-pat commented Feb 15, 2020

Great stuff, thanks @tomMoral , I'm hoping/pushing for this fix because it's giving me a production issue right now. So, comment per bot's request ... I have made the requested changes; please review again

@pitrou pitrou changed the title bpo-39104: Fix hanging ProcessPoolExcutor on shutdown nowait with pickling failure bpo-39104: Fix hanging ProcessPoolExecutor on shutdown nowait with pickling failure Feb 16, 2020
@miss-islington
Copy link
Contributor

@tomMoral: Status check is done, and it's a failure ❌ .

@miss-islington
Copy link
Contributor

@tomMoral: Status check is done, and it's a success ✅ .

@miss-islington miss-islington merged commit a5cbab5 into python:master Feb 16, 2020
colesbury referenced this pull request in colesbury/nogil Oct 6, 2021
…ckling failure (GH-17670)

As reported initially by @rad-pat in #6084, the following script causes a deadlock.

```
from concurrent.futures import ProcessPoolExecutor


class ObjectWithPickleError():
    """Triggers a RuntimeError when sending job to the workers"""

    def __reduce__(self):
        raise RuntimeError()


if __name__ == "__main__":
    e = ProcessPoolExecutor()
    f = e.submit(id, ObjectWithPickleError())
    e.shutdown(wait=False)
    f.result()  # Deadlock on get
```

This is caused by the fact that the main process is closing communication channels that might be necessary to the `queue_management_thread` later. To avoid this, this PR let the `queue_management_thread` manage all the closing.



https://bugs.python.org/issue39104



Automerge-Triggered-By: @pitrou
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants