Skip to content

Reuse existing connection when executing each background job #5837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 8, 2023

Conversation

jtgeibel
Copy link
Member

@jtgeibel jtgeibel commented Jan 5, 2023

This is a backport of some work done in #4892. Once this lands it should be easy to finally land the port to diesel 2.0!

The 2nd commit includes 2 quick fixes to avoid obtaining 2 connections at the same time (which is now necessary to pass tests). We could do some refactoring to reuse an existing connection instead of dropping and reobtaining one, but these endpoints aren't performance critical so I'm not too worried about it.

@jtgeibel jtgeibel mentioned this pull request Jan 5, 2023
@Turbo87
Copy link
Member

Turbo87 commented Jan 5, 2023

I'm not sure I fully understand the implications of this PR yet. Do I read this correctly that all the background jobs now share a single database connection? Doesn't that prevent us from running jobs in parallel? 🤔

@jtgeibel jtgeibel changed the title Share background worker connection with the background job Reuse existing connection when executing each background job Jan 5, 2023
@jtgeibel
Copy link
Member Author

jtgeibel commented Jan 5, 2023

It will still be possible to run jobs in parallel. The previous design used 2 connections per task that it completed. One connection selects and locks a job row, and it then sits idle while the job (if needed) obtains its own connection to do its work. That is what was causing an issue with our test framework. We were using a reentrant mutex so that each of these connections were actually the same &PgConnection when running tests (but not in production). But now that we need a &mut PgConnection there is no safe way to obtain 2 unique references to the same value even though we stay within the same thread the whole time.

The new design is that each connection locks a job and then enters an inner transaction so that the job can reuse the database connection. Worst case the job only needs access to the deserialized JSON and this connection still sits idle while the task is executed.

The risk, addressed in the 3rd commit, is that we need to take care not to return a connection to the pool while it is still within a transaction. That could happen if the background job panics, so we manually do enough rollbacks to get the connection into the expected state. (Before, if the job panicked, we still had the untouched original connection holding the row locked so that we can update the failure count. And if the job had its own connection, then during a panic I assume the PgConnection is either rolled back correctly or discarded instead of being returned to the pool.)

@weiznich
Copy link
Contributor

weiznich commented Jan 5, 2023

The risk, addressed in the 3rd commit, is that we need to take care not to return a connection to the pool while it is still within a transaction. That could happen if the background job panics, so we manually do enough rollbacks to get the connection into the expected state. (Before, if the job panicked, we still had the untouched original connection holding the row locked so that we can update the failure count. And if the job had its own connection, then during a panic I assume the PgConnection is either rolled back correctly or discarded instead of being returned to the pool.)

To be clear here: Diesel 1.4 will happily return a connection from a panicked thread or with an open transaction to the connection pool. Diesel 2.0 introduced several checks to prevent this behavior.

@Turbo87 Turbo87 added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels Jan 5, 2023
@bors
Copy link
Contributor

bors commented Jan 5, 2023

☔ The latest upstream changes (presumably #5859) made this pull request unmergeable. Please resolve the merge conflicts.

@Turbo87 Turbo87 force-pushed the share-background-worker-connection branch from 2e3637d to e5ee424 Compare January 5, 2023 20:37
@Turbo87
Copy link
Member

Turbo87 commented Jan 5, 2023

@jtgeibel I've rebased the PR to fix the merge conflicts.

any thoughts on the "infinite loop" comment above?

@bors
Copy link
Contributor

bors commented Jan 6, 2023

☔ The latest upstream changes (presumably #5558) made this pull request unmergeable. Please resolve the merge conflicts.

@jtgeibel jtgeibel force-pushed the share-background-worker-connection branch from a3ba69b to ac700fa Compare January 7, 2023 01:38
@jtgeibel
Copy link
Member Author

jtgeibel commented Jan 7, 2023

As discussed in today's meeting I looked into mem::forgetting the connection to avoid the potential infinite loop. In order to drop the value (and not a reference) it would take a lot of careful refactoring to propagate the error to the right place while handling all the edge cases. Since this is temporary and a robust solution should be much easier with diesel 2.0, I've opted to just leak the thread as well. Meanwhile, in the rare case that we would hit this our existing monitoring should pick up on a job that runs for to long.

This commit is the initial backport from my `diesel2` branch.
It should be possible to handle this much better in diesel 2.0.
@Turbo87 Turbo87 force-pushed the share-background-worker-connection branch from ac700fa to 0888c89 Compare January 8, 2023 20:01
@Turbo87
Copy link
Member

Turbo87 commented Jan 8, 2023

@jtgeibel
Copy link
Member Author

jtgeibel commented Jan 8, 2023

Looks good, merging. One thing your PRs got me thinking about, is that due to the implicit update of the token's last_used_at we won't correctly update this column if the auth check is passed a read-only connection. When used with AuthCheck::only_cookie() this isn't an issue, as this update will run and fail and a later check rejects the authentication because it wasn't via a cookie. Everything ends up consistent.

I checked and there is only one place where we pass a read-only connection to a default auth check. That occurs when passing a following query parameter to the search endpoint. This can probably be made a cookie only option, though I'm not too worried about updating the last_used_at in this particular case.

@jtgeibel jtgeibel merged commit 8283354 into master Jan 8, 2023
@jtgeibel jtgeibel deleted the share-background-worker-connection branch January 8, 2023 20:57
@Turbo87
Copy link
Member

Turbo87 commented Jan 8, 2023

Indeed, that's a good point! 👍

Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Jan 10, 2023
…ound-worker-connection"

This reverts commit 8283354, reversing
changes made to ba69940.
jtgeibel added a commit that referenced this pull request Jan 11, 2023
Revert "Merge pull request #5837 from rust-lang/share-background-worker-connection"
jtgeibel added a commit to jtgeibel/crates.io that referenced this pull request Jan 12, 2023
…er-changes"

This reverts commit 838ac1d, reversing
changes made to a732c1f.

This reintroduces changes that were originally landed in rust-lang#5837 and
reverted in rust-lang#5909.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants