-
Notifications
You must be signed in to change notification settings - Fork 649
Allow the database to be set in read only mode #1670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is in preparation for our next database upgrade. Last time we needed to do this, we took the whole site down. This means that for 30 minutes, `cargo build` just didn't work. This is no longer something we consider acceptable. While we could just take everything down *except* `cargo build`, we can do better. The upgrade process is basically: - Set up a new replica of the primary database - Wait until the replica is "caught up" (< 200 commits behind) - Take down the site - Upgrade the replica in place - Verify the upgrade was successful - Swap over to the replica - Bring the site back up There's no reason that we actually need to bring the site down though, we just need to stop writing to the primary during the upgrade process. This does that by setting the default transaction mode to READ ONLY, and handling the error we will get back if we try to do a write. Unfortunately, Diesel doesn't expose any API to give us the underlying database error code, so we have to match on the error message instead. This code assumes that: - We are never manually setting a transaction to `READ WRITE` - We are never manually setting a transaction to `READ ONLY`, and expecting write errors to bubble up to the user We're not currently doing either of those things, and I can't imagine that we'll start doing either one in the future. During testing I found some issues with how this interacts with token auth. The middleware was assuming that *any* error from `User::find_by_api_token` was `diesel::NotFound`, and would proceed to run the request (this finally explains a spurious failure I've been getting about a poisoned transaction, it was probably a deadlock in the API key update) Because our tests run in a single transaction, we need to wrap that update to ensure that the error doesn't poison the transaction. If we didn't do anything else, we'd just assume that any user trying to do something with token auth was not logged in, and would give a 403 instead of a 503. To fix this, I've set it up to fall back to a simple find if the update fails. This does *not* make the download endpoint work when we're in read only mode, there is a separate issue for that which will come in a separate PR. I have, however, added an explicit test to ensure it works in read only mode.
Test failure started after rebase since |
The return type of `enqueue` is `Box<dyn Error + Send + Sync>`. Unfortunately, `Box<dyn Error>` does not implement `Error`, so our from impl doesn't apply. We're also not able to add an explicit `From` impl either (which is odd, coherence rules should let us write that impl). I've opted to just add an explicit converstion function that we can use here. We should probably check that there's nowhere else we're passing an error object to `human` or `internal`, but that will come in a separate PR
Looks great to me. I love how clean this is, and I agree that it is extremely unlikely that our code will break either of these assumptions in the future. @bors r+ |
📌 Commit 1759db6 has been approved by |
Allow the database to be set in read only mode This is in preparation for our next database upgrade. Last time we needed to do this, we took the whole site down. This means that for 30 minutes, `cargo build` just didn't work. This is no longer something we consider acceptable. While we could just take everything down *except* `cargo build`, we can do better. The upgrade process is basically: - Set up a new replica of the primary database - Wait until the replica is "caught up" (< 200 commits behind) - Take down the site - Upgrade the replica in place - Verify the upgrade was successful - Swap over to the replica - Bring the site back up There's no reason that we actually need to bring the site down though, we just need to stop writing to the primary during the upgrade process. This does that by setting the default transaction mode to READ ONLY, and handling the error we will get back if we try to do a write. Unfortunately, Diesel doesn't expose any API to give us the underlying database error code, so we have to match on the error message instead. This code assumes that: - We are never manually setting a transaction to `READ WRITE` - We are never manually setting a transaction to `READ ONLY`, and expecting write errors to bubble up to the user We're not currently doing either of those things, and I can't imagine that we'll start doing either one in the future. During testing I found some issues with how this interacts with token auth. The middleware was assuming that *any* error from `User::find_by_api_token` was `diesel::NotFound`, and would proceed to run the request (this finally explains a spurious failure I've been getting about a poisoned transaction, it was probably a deadlock in the API key update) Because our tests run in a single transaction, we need to wrap that update to ensure that the error doesn't poison the transaction. If we didn't do anything else, we'd just assume that any user trying to do something with token auth was not logged in, and would give a 403 instead of a 503. To fix this, I've set it up to fall back to a simple find if the update fails. This does *not* make the download endpoint work when we're in read only mode, there is a separate issue for that which will come in a separate PR. I have, however, added an explicit test to ensure it works in read only mode.
☀️ Test successful - checks-travis |
This is an error that applications are likely to want to handle. A database may be read only during maintenance, or for brief periods after failing over to a hot replica. Applications designed to handle this may wish to transform this error into a user visible HTTP 503 response, or skip certain optional writes. By providing this error variant, we allow them to gracefully do so, without having to resort to parsing error messages as was needed in rust-lang/crates.io#1670
This is an error that applications are likely to want to handle. A database may be read only during maintenance, or for brief periods after failing over to a hot replica. Applications designed to handle this may wish to transform this error into a user visible HTTP 503 response, or skip certain optional writes. By providing this error variant, we allow them to gracefully do so, without having to resort to parsing error messages as was needed in rust-lang/crates.io#1670
This is an error that applications are likely to want to handle. A database may be read only during maintenance, or for brief periods after failing over to a hot replica. Applications designed to handle this may wish to transform this error into a user visible HTTP 503 response, or skip certain optional writes. By providing this error variant, we allow them to gracefully do so, without having to resort to parsing error messages as was needed in rust-lang/crates.io#1670
This is an error that applications are likely to want to handle. A database may be read only during maintenance, or for brief periods after failing over to a hot replica. Applications designed to handle this may wish to transform this error into a user visible HTTP 503 response, or skip certain optional writes. By providing this error variant, we allow them to gracefully do so, without having to resort to parsing error messages as was needed in rust-lang/crates.io#1670
This is in preparation for our next database upgrade. Last time we
needed to do this, we took the whole site down. This means that for 30
minutes,
cargo build
just didn't work. This is no longer something weconsider acceptable. While we could just take everything down except
cargo build
, we can do better.The upgrade process is basically:
There's no reason that we actually need to bring the site down though,
we just need to stop writing to the primary during the upgrade process.
This does that by setting the default transaction mode to READ ONLY, and
handling the error we will get back if we try to do a write.
Unfortunately, Diesel doesn't expose any API to give us the underlying
database error code, so we have to match on the error message instead.
This code assumes that:
READ WRITE
READ ONLY
, andexpecting write errors to bubble up to the user
We're not currently doing either of those things, and I can't imagine
that we'll start doing either one in the future.
During testing I found some issues with how this interacts with token
auth. The middleware was assuming that any error from
User::find_by_api_token
wasdiesel::NotFound
, and would proceed torun the request (this finally explains a spurious failure I've been
getting about a poisoned transaction, it was probably a deadlock in the
API key update)
Because our tests run in a single transaction, we need to wrap that
update to ensure that the error doesn't poison the transaction. If we
didn't do anything else, we'd just assume that any user trying to do
something with token auth was not logged in, and would give a 403
instead of a 503. To fix this, I've set it up to fall back to a simple
find if the update fails.
This does not make the download endpoint work when we're in read only
mode, there is a separate issue for that which will come in a separate
PR. I have, however, added an explicit test to ensure it works in read
only mode.