Allow the database to be set in read only mode #1670

sgrif · 2019-03-15T00:14:05Z

This is in preparation for our next database upgrade. Last time we
needed to do this, we took the whole site down. This means that for 30
minutes, cargo build just didn't work. This is no longer something we
consider acceptable. While we could just take everything down except
cargo build, we can do better.

The upgrade process is basically:

Set up a new replica of the primary database
Wait until the replica is "caught up" (< 200 commits behind)
Take down the site
Upgrade the replica in place
Verify the upgrade was successful
Swap over to the replica
Bring the site back up

There's no reason that we actually need to bring the site down though,
we just need to stop writing to the primary during the upgrade process.

This does that by setting the default transaction mode to READ ONLY, and
handling the error we will get back if we try to do a write.
Unfortunately, Diesel doesn't expose any API to give us the underlying
database error code, so we have to match on the error message instead.

This code assumes that:

We are never manually setting a transaction to READ WRITE
We are never manually setting a transaction to READ ONLY, and
expecting write errors to bubble up to the user

We're not currently doing either of those things, and I can't imagine
that we'll start doing either one in the future.

During testing I found some issues with how this interacts with token
auth. The middleware was assuming that any error from
User::find_by_api_token was diesel::NotFound, and would proceed to
run the request (this finally explains a spurious failure I've been
getting about a poisoned transaction, it was probably a deadlock in the
API key update)

Because our tests run in a single transaction, we need to wrap that
update to ensure that the error doesn't poison the transaction. If we
didn't do anything else, we'd just assume that any user trying to do
something with token auth was not logged in, and would give a 403
instead of a 503. To fix this, I've set it up to fall back to a simple
find if the update fails.

This does not make the download endpoint work when we're in read only
mode, there is a separate issue for that which will come in a separate
PR. I have, however, added an explicit test to ensure it works in read
only mode.

This is in preparation for our next database upgrade. Last time we needed to do this, we took the whole site down. This means that for 30 minutes, `cargo build` just didn't work. This is no longer something we consider acceptable. While we could just take everything down *except* `cargo build`, we can do better. The upgrade process is basically: - Set up a new replica of the primary database - Wait until the replica is "caught up" (< 200 commits behind) - Take down the site - Upgrade the replica in place - Verify the upgrade was successful - Swap over to the replica - Bring the site back up There's no reason that we actually need to bring the site down though, we just need to stop writing to the primary during the upgrade process. This does that by setting the default transaction mode to READ ONLY, and handling the error we will get back if we try to do a write. Unfortunately, Diesel doesn't expose any API to give us the underlying database error code, so we have to match on the error message instead. This code assumes that: - We are never manually setting a transaction to `READ WRITE` - We are never manually setting a transaction to `READ ONLY`, and expecting write errors to bubble up to the user We're not currently doing either of those things, and I can't imagine that we'll start doing either one in the future. During testing I found some issues with how this interacts with token auth. The middleware was assuming that *any* error from `User::find_by_api_token` was `diesel::NotFound`, and would proceed to run the request (this finally explains a spurious failure I've been getting about a poisoned transaction, it was probably a deadlock in the API key update) Because our tests run in a single transaction, we need to wrap that update to ensure that the error doesn't poison the transaction. If we didn't do anything else, we'd just assume that any user trying to do something with token auth was not logged in, and would give a 403 instead of a 503. To fix this, I've set it up to fall back to a simple find if the update fails. This does *not* make the download endpoint work when we're in read only mode, there is a separate issue for that which will come in a separate PR. I have, however, added an explicit test to ensure it works in read only mode.

sgrif · 2019-03-15T01:01:51Z

Test failure started after rebase since swirl returns Box<dyn Error + Send + Sync> from enqueue, and we're wrapping that in internal. Need to fix how that gets handled, since our From impl for CargoError never gets applied. Will not be able to fix until tomorrow, should not affect review.

The return type of `enqueue` is `Box<dyn Error + Send + Sync>`. Unfortunately, `Box<dyn Error>` does not implement `Error`, so our from impl doesn't apply. We're also not able to add an explicit `From` impl either (which is odd, coherence rules should let us write that impl). I've opted to just add an explicit converstion function that we can use here. We should probably check that there's nowhere else we're passing an error object to `human` or `internal`, but that will come in a separate PR

jtgeibel · 2019-03-17T17:00:01Z

Looks great to me. I love how clean this is, and I agree that it is extremely unlikely that our code will break either of these assumptions in the future.

@bors r+

bors · 2019-03-17T17:00:03Z

📌 Commit 1759db6 has been approved by jtgeibel

bors · 2019-03-17T17:00:10Z

⌛ Testing commit 1759db6 with merge ade9a8e...

Allow the database to be set in read only mode This is in preparation for our next database upgrade. Last time we needed to do this, we took the whole site down. This means that for 30 minutes, `cargo build` just didn't work. This is no longer something we consider acceptable. While we could just take everything down *except* `cargo build`, we can do better. The upgrade process is basically: - Set up a new replica of the primary database - Wait until the replica is "caught up" (< 200 commits behind) - Take down the site - Upgrade the replica in place - Verify the upgrade was successful - Swap over to the replica - Bring the site back up There's no reason that we actually need to bring the site down though, we just need to stop writing to the primary during the upgrade process. This does that by setting the default transaction mode to READ ONLY, and handling the error we will get back if we try to do a write. Unfortunately, Diesel doesn't expose any API to give us the underlying database error code, so we have to match on the error message instead. This code assumes that: - We are never manually setting a transaction to `READ WRITE` - We are never manually setting a transaction to `READ ONLY`, and expecting write errors to bubble up to the user We're not currently doing either of those things, and I can't imagine that we'll start doing either one in the future. During testing I found some issues with how this interacts with token auth. The middleware was assuming that *any* error from `User::find_by_api_token` was `diesel::NotFound`, and would proceed to run the request (this finally explains a spurious failure I've been getting about a poisoned transaction, it was probably a deadlock in the API key update) Because our tests run in a single transaction, we need to wrap that update to ensure that the error doesn't poison the transaction. If we didn't do anything else, we'd just assume that any user trying to do something with token auth was not logged in, and would give a 403 instead of a 503. To fix this, I've set it up to fall back to a simple find if the update fails. This does *not* make the download endpoint work when we're in read only mode, there is a separate issue for that which will come in a separate PR. I have, however, added an explicit test to ensure it works in read only mode.

bors · 2019-03-17T17:23:06Z

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing ade9a8e to master...

This is an error that applications are likely to want to handle. A database may be read only during maintenance, or for brief periods after failing over to a hot replica. Applications designed to handle this may wish to transform this error into a user visible HTTP 503 response, or skip certain optional writes. By providing this error variant, we allow them to gracefully do so, without having to resort to parsing error messages as was needed in rust-lang/crates.io#1670

sgrif force-pushed the sg-read-only-db branch from effc112 to 52862fa Compare March 15, 2019 00:28

serayuzgur mentioned this pull request Mar 15, 2019

Plugin sends too many concurrent requests to crates.io serayuzgur/crates#42

Closed

bors merged commit 1759db6 into rust-lang:master Mar 17, 2019

sgrif deleted the sg-read-only-db branch March 18, 2019 18:30

sgrif mentioned this pull request Mar 28, 2019

Add DatabaseErrorKind::ReadOnlyTransaction diesel-rs/diesel#2025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow the database to be set in read only mode #1670

Allow the database to be set in read only mode #1670

Uh oh!

sgrif commented Mar 15, 2019

Uh oh!

sgrif commented Mar 15, 2019

Uh oh!

jtgeibel commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

Uh oh!

Allow the database to be set in read only mode #1670

Allow the database to be set in read only mode #1670

Uh oh!

Conversation

sgrif commented Mar 15, 2019

Uh oh!

sgrif commented Mar 15, 2019

Uh oh!

jtgeibel commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

bors commented Mar 17, 2019

Uh oh!

Uh oh!