Skip to content

GODRIVER-3516 Fix the issue in getOrQueueForIdleConn where connectionPerished conn.… #2051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

linfeip
Copy link

@linfeip linfeip commented May 14, 2025

GODRIVER-3516

…isAlive involves network requests, which can cause the p.idleMu.Lock() when obtaining conn from the pool to be blocked for too long.

Summary

the lock in getOrQueueForIdleConn will cause connection acquisition to block under high concurrency due to the IO request in c.nc.Read(b[:]) inside isAlive.

func (p *pool) getOrQueueForIdleConn(w *wantConn) bool {
        p.idleMu.Lock()
	defer p.idleMu.Unlock()

	// Try to deliver an idle connection from the idleConns stack first.
	for len(p.idleConns) > 0 {
		conn := p.idleConns[len(p.idleConns)-1]
		p.idleConns = p.idleConns[:len(p.idleConns)-1]

		if conn == nil {
			continue
		}

		if reason, perished := connectionPerished(conn); perished {
     ...
}

func (c *connection) isAlive() bool {
	if c.nc == nil {
		return false
	}
      ...
	err := c.nc.SetReadDeadline(time.Now().Add(1 * time.Millisecond))
	if err != nil {
		return false
	}
	var b [1]byte
	_, err = c.nc.Read(b[:])
	return errors.Is(err, os.ErrDeadlineExceeded)
}

Background & Motivation

Before modifying isAlive

image
image
756dcec288ac87b30a40a38327d9c1b4

After

image
image

…isAlive involves network requests, which can cause the p.idleMu.Lock() when obtaining conn from the pool to be blocked for too long.
@linfeip linfeip requested a review from a team as a code owner May 14, 2025 08:35
@linfeip linfeip requested a review from prestonvasquez May 14, 2025 08:35
@linfeip
Copy link
Author

linfeip commented May 14, 2025

I think it is unnecessary to verify whether this conn is alive here, because even if the connection is alive when it is obtained, there is no guarantee that the conn will still be alive when it is passed to the caller. The connection may have been disconnected between these two steps.

@linfeip
Copy link
Author

linfeip commented May 18, 2025

@prestonvasquez ping

@prestonvasquez prestonvasquez changed the title Fix the issue in getOrQueueForIdleConn where connectionPerished conn.… GODRIVER-3516 Fix the issue in getOrQueueForIdleConn where connectionPerished conn.… May 19, 2025
@prestonvasquez
Copy link
Member

@linfeip Thank you for the contribution! Would you mind also removing the connection.IsAlive() method since it's no longer being used? Otherwise this LGTM.

@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added the priority-3-low Low Priority PR for Review label May 19, 2025
Copy link
Contributor

API Change Report

No changes found!

@linfeip
Copy link
Author

linfeip commented May 20, 2025

@linfeip Thank you for the contribution! Would you mind also removing the connection.IsAlive() method since it's no longer being used? Otherwise this LGTM.

@prestonvasquez Alright, I have removed the unused method connection.isAlive()

prestonvasquez
prestonvasquez previously approved these changes May 20, 2025
@@ -169,7 +169,7 @@ type reason struct {
// connectionPerished checks if a given connection is perished and should be removed from the pool.
func connectionPerished(conn *connection) (reason, bool) {
switch {
case conn.closed() || !conn.isAlive():
case conn.closed():
// A connection would only be closed if it encountered a network error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linfeip, I appreciate your contribution! Would you mind updating the comment to match the changes? Otherwise, it looks good to me!

Copy link
Author

@linfeip linfeip May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qingyang-hu Alright, I've already updated the comments.

@limpo1989
Copy link

limpo1989 commented May 21, 2025

Is there a plan to backport to v1.17.x ? @qingyang-hu

@prestonvasquez
Copy link
Member

prestonvasquez commented May 21, 2025

@limpo1989 This has been scheduled for the 2.2.2 and 1.17.4 patch releases.

qingyang-hu
qingyang-hu previously approved these changes May 21, 2025
Copy link
Collaborator

@qingyang-hu qingyang-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

prestonvasquez
prestonvasquez previously approved these changes May 21, 2025
Copy link
Collaborator

@qingyang-hu qingyang-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linfeip, one more issue. Can you please remove the test case, "TestPool_checkOut/discards_connections_closed_by_the_server_side", at

t.Run("discards connections closed by the server side", func(t *testing.T) {
? It is no longer needed.

@linfeip linfeip dismissed stale reviews from prestonvasquez and qingyang-hu via fdfa57e May 22, 2025 00:08
@linfeip
Copy link
Author

linfeip commented May 22, 2025

@linfeip, one more issue. Can you please remove the test case, "TestPool_checkOut/discards_connections_closed_by_the_server_side", at

t.Run("discards connections closed by the server side", func(t *testing.T) {

? It is no longer needed.

@qingyang-hu I have removed the test cases that are no longer in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-3-low Low Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants