Fix the read timeout implementation in NettyStream #636

stIncMale · 2021-01-16T20:13:00Z

Both the new approach and the original one achieve the guarantee that
there are no concurrent read timeouts scheduled for a channel.
This is an essential property needed for a timeout implementation,
let us call it "at-most-one".

The original approach of achieving the at-most-one property:

Schedule timeouts only by an event loop thread.
If another thread needs to schedule a timeout, it submits a new scheduleTimeout task
to the channel's event loop (asynchronous timeout scheduling).
The scheduleTimeout task schedules a new timeout if none is scheduled.
The original approach allowed executions in which a scheduleTimeout task runs after
completion of the read operation that submitted the task,
which resulted in unexpected timeout exceptions.

The new approach achieves the at-most-one property by using a lock.
As a result, timeouts can be scheduled by any thread and there is no
asynchronous timeout scheduling. This means we cannot miss removing a timeout
because it was submitted for scheduling, but has not been scheduled yet.

Other notable changes:

[related bug fix] NettyStream now must always respect the requested additional timeout.
The original implementation had a chance to ignore the requested additional timeout
and schedule a timeout with the default delay. This is again due to the asynchronous
timeout scheduling in the original approach.
[performance optimization] Public read methods do not schedule timeouts anymore
if the requested number of bytes is available right away.
[performance optimization] Netty channel handlers do not try to schedule timeouts anymore,
timeouts may be scheduled only by the public read methods.
Trying to schedule timeouts from the method handleReadResponse was unnecessary
even in the original approach.
[performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader
objects each time a new piece of data arrives for a pending reader.
[code improvement] The fields NettyStream.pendingReader, pendingException are always written/read
inside synchronized blocks that use the same NettyStream object, so they can be plain.
Marking them volatile is unnecessary and potentially misleading.
[code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed.

JAVA-3920

stIncMale · 2021-01-16T21:06:27Z

Almost all improvements described in the "Other notable changes" are done because of the suggestions expressed by @konrad-kaminski either verbally or via the proposed code. Thank you, Konrad!

Both the new approach and the original one achieve the guarantee that there are no concurrent read timeouts scheduled for a channel. This is an essential property needed for a timeout implementation, let us call it "at-most-one". The original approach of achieving the at-most-one property: - Schedule timeouts only by an event loop thread. - If another thread needs to schedule a timeout, it submits a new scheduleTimeout task to the channel's event loop (asynchronous timeout scheduling). - The scheduleTimeout task schedules a new timeout if none is scheduled. The original approach allowed executions in which a scheduleTimeout task runs after completion of the read operation that submitted the task, which resulted in unexpected timeout exceptions. The new approach achieves the at-most-one property by using a lock. As a result, timeouts can be scheduled by any thread and there is no asynchronous timeout scheduling. This means we cannot miss removing a timeout because it was submitted for scheduling, but has not been scheduled yet. Other notable changes: - [related bug fix] NettyStream now must always respect the requested additional timeout. The original implementation had a chance to ignore the requested additional timeout and schedule a timeout with the default delay. This is again due to the asynchronous timeout scheduling in the original approach. - [performance optimization] Public read methods do not schedule timeouts anymore if the requested number of bytes is available right away. - [performance optimization] Netty channel handlers do not try to schedule timeouts anymore, timeouts may be scheduled only by the public read methods. Trying to schedule timeouts from the method handleReadResponse was unnecessary even in the original approach. - [performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader objects each time a new piece of data arrives for a pending reader. - [code improvement] The fields NettyStream.pendingReader, pendingException are always written/read inside synchronized blocks that use the same NettyStream object, so they can be plain. Marking them volatile is unnecessary and potentially misleading. - [code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed. JAVA-3920

rozza

Its looking good.

Some style nits, also you can remove any commented out assert lines of code.

rozza · 2021-01-18T11:29:23Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

 */
 final class NettyStream implements Stream {
-    private static final String READ_HANDLER_NAME = "ReadTimeoutHandler";
+    private static final int NO_SCHEDULE_TIMEOUT = -1;


0 defaults to no timeout can this also be zero?

Also can we name this to be explicit that it represents no timeout for reads?
NO_READ_TIMEOUT = 0

rozza · 2021-01-18T11:31:50Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
            bootstrap.handler(new ChannelInitializer<SocketChannel>() {
                @Override
                public void initChannel(final SocketChannel ch) {
+                    final ChannelPipeline pipeline = ch.pipeline();


Nit: No need for local final variables.

We need to use a reference to the channel's pipeline in multiple places in this method. While calling ch.pipeline() each time we need it is correct, it is an unnecessary call, because we know that the method returns the same reference. We may make some assumptions about the method implementation/performance by looking at the source code, but we should treat it as subject for change, and something that we may believe is "free" for now, may not be free in the future.

Additionally, not calling a method multiple times reduces the number of usages that IDE finds when we want to find usages of the method. This simplifies code navigation.

As for the final modifier on local variables, it's a simple way to tell the reader that the variable is assigned once and only once.

This codebase has been created by many authors, it looks homogenous because all the developers have tread lightly on the codebase and kept the code style consistent. Using final for local variables isn't a style that has been adopted in the past.

However, with the base line now being java 8 and the need for local final variable in closures perhaps this is a style we should adopt. It is more explicit as you say.

I'm happy to consider final local variables as a standard going forward, but currently it's our standard not to, so let's stick with that for now.

Also, note that IntellJ now differentiates between mutated and effectively final local variables by underlining the former.

I removed the final modifier. Is there a document with the style specified (not all style aspects are checked automatically)? I apologize for making not compliant changes, but I have not seen the document that specifies it. If there is none, I'll be picking up the style as I go.

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

rozza · 2021-01-18T11:41:15Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

        ByteBuf buffer = null;
        Throwable exceptionResult = null;
        synchronized (this) {
            exceptionResult = pendingException;
+            PendingReader pendingReader = this.pendingReader;


Nit: Does this need to be made a local variable? If so seems we've used localPendingReader elsewhere to make it clearer.

The reason I am using local variables for shared state is to reduce the number of read usages we can see in the code (including the "Find usages" search in an IDE). The fewer code pieces include reads from or writes to a shared variable, the easier it is (at least to me) to think about the correctness of concurrent code.

The reason I did not call it localPendingReader is because local adds verbosity without adding any information: we can immediately see that this is a local variable on the line it is declared, and on the other lines where it is used IntelliJ IDEA IDE differentiates local and object variables (by default by using different boldness of the font).

There is a cognitive load and cost to a code base by reusing the same named variable. It also adds future risk to the codebase as the localization of the variable may be changed.

Adding local does add verbosity but I disagree because it does add extra information, its explicit. Explicit should be preferred over implicit. Especially as you can't guarantee users have the same IDE or even settings in their IDE as you.

So please make it explicit, it costs nothing and improves the readability of the code for future development.

Our practice generally in this code base is to minimize use of local variables. I can see the argument here in favor, since this is state that is protected by a lock, but at the same time I do find the style odd, particularly lines 288-290 below.

Removed the pendingReader local variable.

rozza · 2021-01-18T11:42:08Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

+                    if (pendingReader == null) {//called by a public read method
+                        pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
+                        this.pendingReader = pendingReader;
+                    } else {//called by a Netty channel handler


This else statement can be removed

I agree that it can be, but does removing it make the code clearer? This code is not so trivial, and having comments that express what's going on, or even state some invariants and possible interpretations in case they are violated, appears to be useful. What are the downsides of having code comments like this?

We generally don't inline comment in the code because its considered a code smell. The code should be readable in intent, however there are scenarios where you may want to add comments. In general methods / classes have comments describing the logic.

I'll leave this one to you thinks best, It may be useful in x years when we revisit this code.

Regarding this else statement I don't think it adds to the code as contains no logic.

A single comment checking is if(pendingReader == null) should suffice.

If you really want the commented asserts to remain as documentation, how about also commenting out the else that contains them?

Removed the else block and the comments within.

JAVA-3920

rozza

Looks good, continued the discussion for localPendingReader & pendingReader and the empty else block. Happy to discuss over zoom if its easier.

rozza · 2021-01-19T09:23:05Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

+                    if (pendingReader == null) {//called by a public read method
+                        pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
+                        this.pendingReader = pendingReader;
+                    } else {//called by a Netty channel handler


Regarding this else statement I don't think it adds to the code as contains no logic.

A single comment checking is if(pendingReader == null) should suffice.

jyemin

The logic seems sound. Nice job with that. Most of my comments are about style.

Can you think of any way to test this code? Certainly our testing of the previous code was insufficient, and we haven't improved matters at all in that respect.

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

jyemin · 2021-01-19T22:43:56Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
            bootstrap.handler(new ChannelInitializer<SocketChannel>() {
                @Override
                public void initChannel(final SocketChannel ch) {
+                    final ChannelPipeline pipeline = ch.pipeline();


I'm happy to consider final local variables as a standard going forward, but currently it's our standard not to, so let's stick with that for now.

jyemin · 2021-01-19T22:45:33Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

@@ -143,6 +188,7 @@ private void initializeChannel(final AsyncCompletionHandler<Void> handler, final
            bootstrap.handler(new ChannelInitializer<SocketChannel>() {
                @Override
                public void initChannel(final SocketChannel ch) {
+                    final ChannelPipeline pipeline = ch.pipeline();


Also, note that IntellJ now differentiates between mutated and effectively final local variables by underlining the former.

jyemin · 2021-01-19T22:50:01Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

        ByteBuf buffer = null;
        Throwable exceptionResult = null;
        synchronized (this) {
            exceptionResult = pendingException;
+            PendingReader pendingReader = this.pendingReader;


Our practice generally in this code base is to minimize use of local variables. I can see the argument here in favor, since this is state that is protected by a lock, but at the same time I do find the style odd, particularly lines 288-290 below.

jyemin · 2021-01-19T22:51:39Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

+                    if (pendingReader == null) {//called by a public read method
+                        pendingReader = new PendingReader(numBytes, handler, scheduleReadTimeout(readTimeoutTask, readTimeoutMillis));
+                        this.pendingReader = pendingReader;
+                    } else {//called by a Netty channel handler


If you really want the commented asserts to remain as documentation, how about also commenting out the else that contains them?

jyemin · 2021-01-19T22:51:56Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

    }
-}
+}


Newline required at end, I believe

Our .editorconfig contains insert_final_newline = false, so Idea does not allow me to end a file with a newline character. Should I also change the .editorconfig?

That's strange, since most (all?) existing code ends in a newline. Since this file did end in a newline let's add it back, and we can figure out the .editorconfig issue separately

jyemin · 2021-01-19T22:54:04Z

driver-core/src/main/com/mongodb/connection/netty/NettyStream.java

-                        ch.pipeline().addLast(READ_HANDLER_NAME, new ReadTimeoutHandler(readTimeout));
+                    if (readTimeout > NO_SCHEDULE_TIME) {
+                        readTimeoutMillis = readTimeout;
+                        /* We need at least one handler before (in the inbound evaluation order) the InboundBufferHandler,


I don't really understand this comment. Where are we firing exception events? What would be the observed behavior if we didn't have this handler?

Where are we firing exception events?
ctx.fireExceptionCaught(ReadTimeoutException.INSTANCE) fires it.

The only way for us to fire an exception event in Netty not as a result of handing an inbound/outbound event in ChannelInboundHandler/ChannelOutboundHandler, but on demand, is to call ChannelHandlerContext.fireExceptionCaught. This method results "in having the ChannelInboundHandler.exceptionCaught(ChannelHandlerContext, Throwable) method called of the next ChannelInboundHandler contained in the ChannelPipeline of the Channel". So we need to store a reference to a context of a handler (we store it in ReadTimeoutTask.ctx), which is a legit approach: "You can keep the ChannelHandlerContext for later use, such as triggering an event outside the handler methods, even from a different thread.".

Now we need to decide which handler's context we will use. Our InboundBufferHandler knows how to process exceptions, so we need timeout exception events to be forwarded to this handler, which means we cannot use its context for firing such events (this follows from the docs quoted above). 👉 At this point you may read the comment again, and it will probably be clear this time. Hope this helps, but if not, let's have a call.

What would be the observed behavior if we didn't have this handler?

We would still need a context, and since InboundBufferHandler is the only handler that is always in the pipeline, we would pick its context. As a result, InboundBufferHandler would not handle timeout exceptions because its context would be forwarding them to the next handler, which does not exist, and they would be "discarded silently, or logged if it needs your attention" (in which cases Netty considers that an exception "needs attention" is unspecified).

JAVA-3920

stIncMale · 2021-01-20T04:46:16Z

Can you think of any way to test this code?

We need a test that calls reads with variable added timeout on a NettyStream object and checks that reads do not fail because of a premature timeout. Since we cannot have concurrent pending readers on a single NettyStream, such a test needs to use many NettyStream objects to increase the number of concurrent attempts. This increases our chances to spot a problem without having to run the test for too long. Apparently, we also need a server that sends data with variable delays. It is important to note that while a test like this is probabilistic (may not spot the problem), it does not fail spuriously, i.e., when code is correct, because of the lack of hard real-time constraints (provided that we use System.nanoTime to measure time intervals, as it is guaranteed to be monotonic).
It would also be nice to have a way to check that reads do not fail due to timeouts if data is made available in a timely manner. Unlike 1., such tests are allowed to fail spuriously, i.e., fail even if the code is correct, because hard real-time computing is not what we do. However, if we set the max (soft) server delay as 2 seconds, while the min read timeout value is 30 seconds, then timing out almost certainly means we hit an actual code problem, e.g., a deadlock, rather than just experienced the expected behaviour of a non-real-time computing system.

As an alternative to writing and starting/stopping a test server bound to a loopback interface, we may try using EmbeddedChannel. Netty has it specifically to facilitate testing. From what I can see in the "Netty in action" book, this channel is supposed to be used for testing one or multiple channel handlers (it accepts an array of handlers as one of its constructor arguments and builds an internal pipeline), and I am not sure it can be used as a transport implementation. In other words, I am not sure we can specify it via Bootstrap.channel(Class<? extends C> channelClass). However, we can try, this will require changing NettyStream(...Class<? extends SocketChannel> socketChannelClass...) to NettyStream(...Class<? extends Channel> channelClass...) and changing the return type of the NettyStream.getSocketChannelClass method.

Writing such a test is not exactly a trivial task, so if we decide to write it, I am proposing to have a separate task in Jira. Should I create one?

jyemin

Agreed about the testing. Let's push it off for now.

rozza

LGTM @stIncMale - great work.

Unfortunately, we don't have a fixed style guide so somethings will be pick up as you go along. However, its a testament to the pr that its mostly style nits are being discussed.

JAVA-3920

stIncMale · 2021-01-20T15:47:02Z

The ticket to add tests: JAVA-3946.

The original PR mongodb#636 JAVA-3920

The original PR #636 JAVA-3920

Both the new approach and the original one achieve the guarantee that there are no concurrent read timeouts scheduled for a channel. This is an essential property needed for a timeout implementation, let us call it "at-most-one". The original approach of achieving the at-most-one property: - Schedule timeouts only by an event loop thread. - If another thread needs to schedule a timeout, it submits a new scheduleTimeout task to the channel's event loop (asynchronous timeout scheduling). - The scheduleTimeout task schedules a new timeout if none is scheduled. The original approach allowed executions in which a scheduleTimeout task runs after completion of the read operation that submitted the task, which resulted in unexpected timeout exceptions. The new approach achieves the at-most-one property by using a lock. As a result, timeouts can be scheduled by any thread and there is no asynchronous timeout scheduling. This means we cannot miss removing a timeout because it was submitted for scheduling, but has not been scheduled yet. Other notable changes: - [related bug fix] NettyStream now must always respect the requested additional timeout. The original implementation had a chance to ignore the requested additional timeout and schedule a timeout with the default delay. This is again due to the asynchronous timeout scheduling in the original approach. - [performance optimization] Public read methods do not schedule timeouts anymore if the requested number of bytes is available right away. - [performance optimization] Netty channel handlers do not try to schedule timeouts anymore, timeouts may be scheduled only by the public read methods. Trying to schedule timeouts from the method handleReadResponse was unnecessary even in the original approach. - [performance optimization] NettyStream does not produce excessive garbage by re-creating PendingReader objects each time a new piece of data arrives for a pending reader. - [code improvement] The fields NettyStream.pendingReader, pendingException are always written/read inside synchronized blocks that use the same NettyStream object, so they can be plain. Marking them volatile is unnecessary and potentially misleading. - [code improvement] ReadTimeoutHandler was removed because it wasn't acting as a handler and was not needed. JAVA-3920

…)" This reverts commit 47b31e0.

stIncMale requested review from rozza and jyemin January 16, 2021 20:13

stIncMale mentioned this pull request Jan 16, 2021

Fix: Improper read timeout handling. #621

Closed

stIncMale force-pushed the JAVA-3920_noHandler branch from 9f9aa2c to a98fd50 Compare January 17, 2021 00:04

stIncMale force-pushed the JAVA-3920_noHandler branch from a98fd50 to c0bcc26 Compare January 17, 2021 00:21

rozza requested changes Jan 18, 2021

View reviewed changes

Suggested changes

c17939f

JAVA-3920

rozza reviewed Jan 19, 2021

View reviewed changes

jyemin requested changes Jan 19, 2021

View reviewed changes

Suggested changes

826ac45

JAVA-3920

jyemin requested review from rozza and jyemin January 20, 2021 12:35

jyemin requested changes Jan 20, 2021

View reviewed changes

rozza approved these changes Jan 20, 2021

View reviewed changes

Add the newline character sequence at the end of NettyStream.java

05afdb5

JAVA-3920

jyemin approved these changes Jan 20, 2021

View reviewed changes

stIncMale merged commit 4a76d98 into mongodb:master Jan 20, 2021

stIncMale deleted the JAVA-3920_noHandler branch January 20, 2021 19:13

stIncMale mentioned this pull request Jan 20, 2021

Backport a fix of the read timeout implementation in NettyStream #640

Merged

stIncMale added a commit to stIncMale/mongo-java-driver that referenced this pull request Jan 20, 2021

Backport a fix of the read timeout implementation in NettyStream

d835d20

The original PR mongodb#636 JAVA-3920

stIncMale added a commit that referenced this pull request Jan 20, 2021

Backport a fix of the read timeout implementation in NettyStream

9ea346b

The original PR #636 JAVA-3920

This was referenced Jan 20, 2021

Small improvements. stIncMale/mongo-java-driver#1

Closed

Fix the read timeout implementation in NettyStream #635

Closed

jfitzu added a commit to nextworld-tools/mongo-java-driver that referenced this pull request Mar 3, 2021

Revert "Fix the read timeout implementation in NettyStream (mongodb#636…

815bf1b

…)" This reverts commit 47b31e0.

Fix the read timeout implementation in NettyStream #636

Fix the read timeout implementation in NettyStream #636

Uh oh!

Conversation

stIncMale commented Jan 16, 2021

Uh oh!

stIncMale commented Jan 16, 2021

Uh oh!

rozza left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stIncMale Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rozza left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stIncMale Jan 19, 2021 •

edited

Loading

jyemin Jan 19, 2021 •

edited

Loading

jyemin Jan 19, 2021 •

edited

Loading

stIncMale Jan 20, 2021 •

edited

Loading