Skip to content

Fix the read timeout implementation in NettyStream #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 20, 2021

Conversation

stIncMale
Copy link
Member

Both the new approach and the original one achieve the guarantee that
there are no concurrent read timeouts scheduled for a channel.
This is an essential property needed for a timeout implementation,
let us call it "at-most-one".

The original approach of achieving the at-most-one property:

  • Schedule timeouts only by an event loop thread.
  • If another thread needs to schedule a timeout, it submits a new scheduleTimeout task
    to the channel's event loop (asynchronous timeout scheduling).
  • The scheduleTimeout task schedules a new timeout if none is scheduled.
    The original approach allowed executions in which a scheduleTimeout task runs after
    completion of the read operation that submitted the task,
    which resulted in unexpected timeout exceptions.

The new approach achieves the at-most-one property by using a lock.
As a result, timeouts can be scheduled by any thread and there is no
asynchronous timeout scheduling. This means we cannot miss removing a timeout
because it was submitted for scheduling, but has not been scheduled yet.

Other notable changes:

  • [related bug fix] NettyStream now must always respect the requested additional timeout.
    The original implementation had a chance to ignore the requested additional timeout
    and schedule a timeout with the default delay. This is again due to the asynchronous
    timeout scheduling in the original approach.
  • [performance optimization] Public read methods do not schedule timeouts anymore
    if the requested number of bytes is available right away.
  • [performance optimization] Netty channel handlers do not try to schedule timeouts anymore,
    timeouts may be scheduled only by the public read methods.
    Trying to schedule timeouts from the method handleReadResponse was unnecessary
    even in the original approach.
  • [performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader
    objects each time a new piece of data arrives for a pending reader.
  • [code improvement] The fields NettyStream.pendingReader, pendingException are always written/read
    inside synchronized blocks that use the same NettyStream object, so they can be plain.
    Marking them volatile is unnecessary and potentially misleading.
  • [code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed.

JAVA-3920

@stIncMale stIncMale requested review from rozza and jyemin January 16, 2021 20:13
@stIncMale
Copy link
Member Author

Almost all improvements described in the "Other notable changes" are done because of the suggestions expressed by @konrad-kaminski either verbally or via the proposed code. Thank you, Konrad!

Both the new approach and the original one achieve the guarantee that
there are no concurrent read timeouts scheduled for a channel.
This is an essential property needed for a timeout implementation,
let us call it "at-most-one".

The original approach of achieving the at-most-one property:
- Schedule timeouts only by an event loop thread.
- If another thread needs to schedule a timeout, it submits a new scheduleTimeout task
to the channel's event loop (asynchronous timeout scheduling).
- The scheduleTimeout task schedules a new timeout if none is scheduled.
The original approach allowed executions in which a scheduleTimeout task runs after
completion of the read operation that submitted the task,
which resulted in unexpected timeout exceptions.

The new approach achieves the at-most-one property by using a lock.
As a result, timeouts can be scheduled by any thread and there is no
asynchronous timeout scheduling. This means we cannot miss removing a timeout
because it was submitted for scheduling, but has not been scheduled yet.

Other notable changes:
- [related  bug fix] NettyStream now must always respect the requested additional timeout.
The original implementation had a chance to ignore the requested additional timeout
and schedule a timeout with the default delay. This is again due to the asynchronous
timeout scheduling in the original approach.
- [performance optimization] Public read methods do not schedule timeouts anymore
if the requested number of bytes is available right away.
- [performance optimization] Netty channel handlers do not try to schedule timeouts anymore,
timeouts may be scheduled only by the public read methods.
Trying to schedule timeouts from the method handleReadResponse was unnecessary
even in the original approach.
- [performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader
objects each time a new piece of data arrives for a pending reader.
- [code improvement] The fields NettyStream.pendingReader, pendingException are always written/read
inside synchronized blocks that use the same NettyStream object, so they can be plain.
Marking them volatile is unnecessary and potentially misleading.
- [code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed.

JAVA-3920
@stIncMale stIncMale force-pushed the JAVA-3920_noHandler branch from a98fd50 to c0bcc26 Compare January 17, 2021 00:21
Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its looking good.

Some style nits, also you can remove any commented out assert lines of code.

*/
final class NettyStream implements Stream {
private static final String READ_HANDLER_NAME = "ReadTimeoutHandler";
private static final int NO_SCHEDULE_TIMEOUT = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 defaults to no timeout can this also be zero?

Also can we name this to be explicit that it represents no timeout for reads?
NO_READ_TIMEOUT = 0

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
bootstrap.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(final SocketChannel ch) {
final ChannelPipeline pipeline = ch.pipeline();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: No need for local final variables.

Copy link
Member Author

@stIncMale stIncMale Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use a reference to the channel's pipeline in multiple places in this method. While calling ch.pipeline() each time we need it is correct, it is an unnecessary call, because we know that the method returns the same reference. We may make some assumptions about the method implementation/performance by looking at the source code, but we should treat it as subject for change, and something that we may believe is "free" for now, may not be free in the future.

Additionally, not calling a method multiple times reduces the number of usages that IDE finds when we want to find usages of the method. This simplifies code navigation.

As for the final modifier on local variables, it's a simple way to tell the reader that the variable is assigned once and only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase has been created by many authors, it looks homogenous because all the developers have tread lightly on the codebase and kept the code style consistent. Using final for local variables isn't a style that has been adopted in the past.

However, with the base line now being java 8 and the need for local final variable in closures perhaps this is a style we should adopt. It is more explicit as you say.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to consider final local variables as a standard going forward, but currently it's our standard not to, so let's stick with that for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that IntellJ now differentiates between mutated and effectively final local variables by underlining the former.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the final modifier. Is there a document with the style specified (not all style aspects are checked automatically)? I apologize for making not compliant changes, but I have not seen the document that specifies it. If there is none, I'll be picking up the style as I go.

ByteBuf buffer = null;
Throwable exceptionResult = null;
synchronized (this) {
exceptionResult = pendingException;
PendingReader pendingReader = this.pendingReader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Does this need to be made a local variable? If so seems we've used localPendingReader elsewhere to make it clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I am using local variables for shared state is to reduce the number of read usages we can see in the code (including the "Find usages" search in an IDE). The fewer code pieces include reads from or writes to a shared variable, the easier it is (at least to me) to think about the correctness of concurrent code.

The reason I did not call it localPendingReader is because local adds verbosity without adding any information: we can immediately see that this is a local variable on the line it is declared, and on the other lines where it is used IntelliJ IDEA IDE differentiates local and object variables (by default by using different boldness of the font).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cognitive load and cost to a code base by reusing the same named variable. It also adds future risk to the codebase as the localization of the variable may be changed.

Adding local does add verbosity but I disagree because it does add extra information, its explicit. Explicit should be preferred over implicit. Especially as you can't guarantee users have the same IDE or even settings in their IDE as you.

So please make it explicit, it costs nothing and improves the readability of the code for future development.

Copy link
Collaborator

@jyemin jyemin Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our practice generally in this code base is to minimize use of local variables. I can see the argument here in favor, since this is state that is protected by a lock, but at the same time I do find the style odd, particularly lines 288-290 below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the pendingReader local variable.

if (pendingReader == null) {//called by a public read method
pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
this.pendingReader = pendingReader;
} else {//called by a Netty channel handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else statement can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it can be, but does removing it make the code clearer? This code is not so trivial, and having comments that express what's going on, or even state some invariants and possible interpretations in case they are violated, appears to be useful. What are the downsides of having code comments like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't inline comment in the code because its considered a code smell. The code should be readable in intent, however there are scenarios where you may want to add comments. In general methods / classes have comments describing the logic.

I'll leave this one to you thinks best, It may be useful in x years when we revisit this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this else statement I don't think it adds to the code as contains no logic.

A single comment checking is if(pendingReader == null) should suffice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want the commented asserts to remain as documentation, how about also commenting out the else that contains them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the else block and the comments within.

Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, continued the discussion for localPendingReader & pendingReader and the empty else block. Happy to discuss over zoom if its easier.

if (pendingReader == null) {//called by a public read method
pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
this.pendingReader = pendingReader;
} else {//called by a Netty channel handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this else statement I don't think it adds to the code as contains no logic.

A single comment checking is if(pendingReader == null) should suffice.

Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic seems sound. Nice job with that. Most of my comments are about style.

Can you think of any way to test this code? Certainly our testing of the previous code was insufficient, and we haven't improved matters at all in that respect.

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
bootstrap.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(final SocketChannel ch) {
final ChannelPipeline pipeline = ch.pipeline();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to consider final local variables as a standard going forward, but currently it's our standard not to, so let's stick with that for now.

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
bootstrap.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(final SocketChannel ch) {
final ChannelPipeline pipeline = ch.pipeline();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that IntellJ now differentiates between mutated and effectively final local variables by underlining the former.

ByteBuf buffer = null;
Throwable exceptionResult = null;
synchronized (this) {
exceptionResult = pendingException;
PendingReader pendingReader = this.pendingReader;
Copy link
Collaborator

@jyemin jyemin Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our practice generally in this code base is to minimize use of local variables. I can see the argument here in favor, since this is state that is protected by a lock, but at the same time I do find the style odd, particularly lines 288-290 below.

if (pendingReader == null) {//called by a public read method
pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
this.pendingReader = pendingReader;
} else {//called by a Netty channel handler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want the commented asserts to remain as documentation, how about also commenting out the else that contains them?

}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline required at end, I believe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our .editorconfig contains insert_final_newline = false, so Idea does not allow me to end a file with a newline character. Should I also change the .editorconfig?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, since most (all?) existing code ends in a newline. Since this file did end in a newline let's add it back, and we can figure out the .editorconfig issue separately

ch.pipeline().addLast(READ_HANDLER_NAME, new ReadTimeoutHandler(readTimeout));
if (readTimeout > NO_SCHEDULE_TIME) {
readTimeoutMillis = readTimeout;
/* We need at least one handler before (in the inbound evaluation order) the InboundBufferHandler,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this comment. Where are we firing exception events? What would be the observed behavior if we didn't have this handler?

Copy link
Member Author

@stIncMale stIncMale Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we firing exception events?
ctx.fireExceptionCaught(ReadTimeoutException.INSTANCE) fires it.

The only way for us to fire an exception event in Netty not as a result of handing an inbound/outbound event in ChannelInboundHandler/ChannelOutboundHandler, but on demand, is to call ChannelHandlerContext.fireExceptionCaught. This method results "in having the ChannelInboundHandler.exceptionCaught(ChannelHandlerContext, Throwable) method called of the next ChannelInboundHandler contained in the ChannelPipeline of the Channel". So we need to store a reference to a context of a handler (we store it in ReadTimeoutTask.ctx), which is a legit approach: "You can keep the ChannelHandlerContext for later use, such as triggering an event outside the handler methods, even from a different thread.".

Now we need to decide which handler's context we will use. Our InboundBufferHandler knows how to process exceptions, so we need timeout exception events to be forwarded to this handler, which means we cannot use its context for firing such events (this follows from the docs quoted above). 👉 At this point you may read the comment again, and it will probably be clear this time. Hope this helps, but if not, let's have a call.

What would be the observed behavior if we didn't have this handler?

We would still need a context, and since InboundBufferHandler is the only handler that is always in the pipeline, we would pick its context. As a result, InboundBufferHandler would not handle timeout exceptions because its context would be forwarding them to the next handler, which does not exist, and they would be "discarded silently, or logged if it needs your attention" (in which cases Netty considers that an exception "needs attention" is unspecified).

@stIncMale
Copy link
Member Author

Can you think of any way to test this code?

  1. We need a test that calls reads with variable added timeout on a NettyStream object and checks that reads do not fail because of a premature timeout. Since we cannot have concurrent pending readers on a single NettyStream, such a test needs to use many NettyStream objects to increase the number of concurrent attempts. This increases our chances to spot a problem without having to run the test for too long. Apparently, we also need a server that sends data with variable delays. It is important to note that while a test like this is probabilistic (may not spot the problem), it does not fail spuriously, i.e., when code is correct, because of the lack of hard real-time constraints (provided that we use System.nanoTime to measure time intervals, as it is guaranteed to be monotonic).
  2. It would also be nice to have a way to check that reads do not fail due to timeouts if data is made available in a timely manner. Unlike 1., such tests are allowed to fail spuriously, i.e., fail even if the code is correct, because hard real-time computing is not what we do. However, if we set the max (soft) server delay as 2 seconds, while the min read timeout value is 30 seconds, then timing out almost certainly means we hit an actual code problem, e.g., a deadlock, rather than just experienced the expected behaviour of a non-real-time computing system.

As an alternative to writing and starting/stopping a test server bound to a loopback interface, we may try using EmbeddedChannel. Netty has it specifically to facilitate testing. From what I can see in the "Netty in action" book, this channel is supposed to be used for testing one or multiple channel handlers (it accepts an array of handlers as one of its constructor arguments and builds an internal pipeline), and I am not sure it can be used as a transport implementation. In other words, I am not sure we can specify it via Bootstrap.channel(Class<? extends C> channelClass). However, we can try, this will require changing NettyStream(...Class<? extends SocketChannel> socketChannelClass...) to NettyStream(...Class<? extends Channel> channelClass...) and changing the return type of the NettyStream.getSocketChannelClass method.

Writing such a test is not exactly a trivial task, so if we decide to write it, I am proposing to have a separate task in Jira. Should I create one?

@jyemin jyemin requested review from rozza and jyemin January 20, 2021 12:35
Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed about the testing. Let's push it off for now.

Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @stIncMale - great work.

Unfortunately, we don't have a fixed style guide so somethings will be pick up as you go along. However, its a testament to the pr that its mostly style nits are being discussed.

@stIncMale
Copy link
Member Author

The ticket to add tests: JAVA-3946.

@stIncMale stIncMale merged commit 4a76d98 into mongodb:master Jan 20, 2021
@stIncMale stIncMale deleted the JAVA-3920_noHandler branch January 20, 2021 19:13
stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Jan 20, 2021
stIncMale added a commit that referenced this pull request Jan 20, 2021
jfitzu pushed a commit to nextworld-tools/mongo-java-driver that referenced this pull request Mar 3, 2021
Both the new approach and the original one achieve the guarantee that
there are no concurrent read timeouts scheduled for a channel.
This is an essential property needed for a timeout implementation,
let us call it "at-most-one".

The original approach of achieving the at-most-one property:
- Schedule timeouts only by an event loop thread.
- If another thread needs to schedule a timeout, it submits a new scheduleTimeout task
to the channel's event loop (asynchronous timeout scheduling).
- The scheduleTimeout task schedules a new timeout if none is scheduled.
The original approach allowed executions in which a scheduleTimeout task runs after
completion of the read operation that submitted the task,
which resulted in unexpected timeout exceptions.

The new approach achieves the at-most-one property by using a lock.
As a result, timeouts can be scheduled by any thread and there is no
asynchronous timeout scheduling. This means we cannot miss removing a timeout
because it was submitted for scheduling, but has not been scheduled yet.

Other notable changes:
- [related  bug fix] NettyStream now must always respect the requested additional timeout.
The original implementation had a chance to ignore the requested additional timeout
and schedule a timeout with the default delay. This is again due to the asynchronous
timeout scheduling in the original approach.
- [performance optimization] Public read methods do not schedule timeouts anymore
if the requested number of bytes is available right away.
- [performance optimization] Netty channel handlers do not try to schedule timeouts anymore,
timeouts may be scheduled only by the public read methods.
Trying to schedule timeouts from the method handleReadResponse was unnecessary
even in the original approach.
- [performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader
objects each time a new piece of data arrives for a pending reader.
- [code improvement] The fields NettyStream.pendingReader, pendingException are always written/read
inside synchronized blocks that use the same NettyStream object, so they can be plain.
Marking them volatile is unnecessary and potentially misleading.
- [code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed.

JAVA-3920
jfitzu added a commit to nextworld-tools/mongo-java-driver that referenced this pull request Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants