-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-39104: Fix hanging ProcessPoolExecutor on shutdown nowait with pickling failure #17670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-39104: Fix hanging ProcessPoolExecutor on shutdown nowait with pickling failure #17670
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment. I had also started trying to make a solution, but perhaps I didn't grasp the entire problem. Please see rad-pat@188c222, including a more basic test without the pickle error.
Lib/concurrent/futures/process.py
Outdated
@@ -87,7 +87,12 @@ def close(self): | |||
self._reader.close() | |||
|
|||
def wakeup(self): | |||
self._writer.send_bytes(b"") | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a _closed flag might be better than catching the exception? You could also then check the flag in the clear method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, this looks better.
…tdown(wait=False) When a ProcessPoolExecutor was created, a job added and then shutdown with wait=False, an OSError: handle is closed error was raised by the ThreadWakeup class. The threadwakeup should not be closed on shutdown if wait is False, but in the shutdown_worker method.
Sorry for the delay @rad-pat ! I cherry picked your solution to add more tests and add the closed flag. |
Thanks, it looks good to me @tomMoral, just need some core dev interest now and hopefully we can get this resolved |
@pitrou I believe you have an interest in reviewing PRs for multiprocessing, is there any chance of a review of this one? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the PR. Some comments below.
@@ -339,6 +346,8 @@ def shutdown_worker(): | |||
|
|||
# Release the queue's resources as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a requirement for this PR, but it seems to me that the _queue_management_worker
function has grown quite long and complicated. Do you think it could be turned into an object with a bunch of short and readable helper methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be a very much easier to read indeed. I will try to make a new PR with this change when I get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to your PR then @tomMoral :-)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
@tomMoral , I've hopefully made all the requested changes here https://github.com/rad-pat/cpython/tree/FIX_hanging_interpreter_executor_shutdown_nowait, but I can't make a PR to your repo. Would you please be able to pull in the changes? |
Thanks @rad-pat , I updated the PR with your change and made a couple of adjustments! |
Great stuff, thanks @tomMoral , I'm hoping/pushing for this fix because it's giving me a production issue right now. So, comment per bot's request ... I have made the requested changes; please review again |
Fix typos
@tomMoral: Status check is done, and it's a failure ❌ . |
@tomMoral: Status check is done, and it's a success ✅ . |
…ckling failure (GH-17670) As reported initially by @rad-pat in #6084, the following script causes a deadlock. ``` from concurrent.futures import ProcessPoolExecutor class ObjectWithPickleError(): """Triggers a RuntimeError when sending job to the workers""" def __reduce__(self): raise RuntimeError() if __name__ == "__main__": e = ProcessPoolExecutor() f = e.submit(id, ObjectWithPickleError()) e.shutdown(wait=False) f.result() # Deadlock on get ``` This is caused by the fact that the main process is closing communication channels that might be necessary to the `queue_management_thread` later. To avoid this, this PR let the `queue_management_thread` manage all the closing. https://bugs.python.org/issue39104 Automerge-Triggered-By: @pitrou
As reported initially by @rad-pat in #6084, the following script causes a deadlock.
This is caused by the fact that the main process is closing communication channels that might be necessary to the
queue_management_thread
later. To avoid this, this PR let thequeue_management_thread
manage all the closing.https://bugs.python.org/issue39104
Automerge-Triggered-By: @pitrou