Skip to content

Likely race that can prevent to recycle closed connections #1542

Closed
@pfreixes

Description

@pfreixes

We have upgraded our mysql client from 1.2.0 to 1.7.1 recently, and after a while we have observed that one of our instances that was running the client started to fail for all of the operations with an error that was saying invalid connection.

This error seemed to come from here

Our initial guess was that an unhealthy TCP connection was used all the time, and for some reason the TCP connection was not automatically recycled due to its supposed closing state.

We have found out that from the 1.2.0 version to the 1.7.1 there was this PR which changed drastically the way that the mysql driver was informing to the SQL Golang connection pool that connections were considered in bad state, and they should be recycled from the pool.

While the original changes seemed legit, with the propose of preventing retries for operations that might already touched the server, we found out that this was also changed. So returning the internal ErrInvalidConn in case the buffer could not be taken.

We could not yet reproduce the issue but we believe that some scenarios could lead to a situation where the length of the buffer is not 0, and the interpolation can not be done. For example consider the following scenario:

  • A new query is being done
  • data sent by the client is being processed by the Mysql server
  • Mysql returns a response
  • The client reads N bytes, which involves the header + some data from the response which is already in the buffer. But the overall packet len is not yet in the buffer. So the buffer will contain the header + some extra data
  • A RST packet is sent by the Mysql server, or an Proxy that could be placed in the middle
  • A subsequent read will get as a result an error since the RST package would take precedence over the reading of the remaining bytes of the packet len. At that point we would be leaving some orphan data into the buffer.
  • the ErrInvalidConn will be returned in that case
  • A new query is being done and reuses the same connection
  • When the interpolation function is being called, it fails since there are still some remaing bytes in the buffer from the previous operation.

As I said we could not yet probe that this was the root cause of the original issue, but seemed a plausible scenario that could have lead to the situation that we experimented.

Also you might want to consider if its semantically correct to return the ErrInvalidConn when the sever was not yet touched, considering maybe on just returning the driver.ErrBadConn which would automatically resolve any issue with unhealthy TCP connections.

Let me know what you think, we are more than happy to create a PR with this change if you believe that its worth it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions