Skip to content

[Windows] Prevent socket callback from falsely detecting a connection close #578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lxbndr
Copy link
Contributor

@lxbndr lxbndr commented Oct 1, 2021

This fixes a hard-to-reproduce issue we did observe in our live product (powered by Swift on Windows). The application communicates with the backend a lot, and some network requests were randomly timing out. All attempts to reproduce that behavior synthetically failed, but in the end we got some low but stable repro rate and had a chance to debug this.

It appeared that sometimes Dispatch catches FD_READ event, but ioctlsocket says that we have nothing to read at the moment of event processing. That is perfectly ok, non-zero result is not guaranteed there, as the reading could occur asynchronously.

But 0 is also used as a special value in _dispatch_event_merge as a marker of a closed socket. Such sockets are exclude from any further handling, making the connection hang.

The suggested solution avoids read/write processing with 0 available bytes, except for FD_CLOSE event, when we really need to poke all handlers to actually shut down the socket. The FD_READ event with 0 bytes becomes a no-op and doesn't do anything.

We applied this fix locally more than a year ago and since then the connection behavior seems to be stable.

@lxbndr
Copy link
Contributor Author

lxbndr commented Oct 1, 2021

CC: @compnerd

Copy link
Member

@compnerd compnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that #581 might be the change that this is intending to make - which seems like it should be correct. A racy read is possible between WSAEnumNetworkEvents and ioctlsocket. I'm not sure how to create a decent test case for that though.

_dispatch_muxnote_retain(dmn);
if (!PostQueuedCompletionStatus(hPort, dwBytesAvailable,
(ULONG_PTR)DISPATCH_PORT_SOCKET_READ, (LPOVERLAPPED)dmn)) {
DISPATCH_INTERNAL_CRASH(GetLastError(),
"PostQueuedCompletionStatus");
}
}
if (lNetworkEvents & FD_WRITE) {
if (((lNetworkEvents & FD_WRITE) && dwBytesAvailable > 0) || (lNetworkEvents & FD_CLOSE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense - dwBytesAavailable is guaranteed 1 if only a write event occurred. This condition doesn't seem correct. It should be either a write or a close event occurred I believe.

// Here and below we have to double-check dwBytesAvailable and make sure we don't
// wake up read/write handlers with 0 value for live (not closed) socket, because
// that would force handler to shut down.
if (((lNetworkEvents & FD_READ) && dwBytesAvailable > 0) || (lNetworkEvents & FD_CLOSE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems plausible to occur. But FD_READ is already set when FD_CLOSE is set, so this is rather confusing.

@lxbndr
Copy link
Contributor Author

lxbndr commented Oct 11, 2021

Closing in favor of #581

@lxbndr lxbndr closed this Oct 11, 2021
@lxbndr lxbndr deleted the not-a-close branch October 11, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants