Skip to content

Allow the database to be set in read only mode #1670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2019

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Mar 15, 2019

This is in preparation for our next database upgrade. Last time we
needed to do this, we took the whole site down. This means that for 30
minutes, cargo build just didn't work. This is no longer something we
consider acceptable. While we could just take everything down except
cargo build, we can do better.

The upgrade process is basically:

  • Set up a new replica of the primary database
  • Wait until the replica is "caught up" (< 200 commits behind)
  • Take down the site
  • Upgrade the replica in place
  • Verify the upgrade was successful
  • Swap over to the replica
  • Bring the site back up

There's no reason that we actually need to bring the site down though,
we just need to stop writing to the primary during the upgrade process.

This does that by setting the default transaction mode to READ ONLY, and
handling the error we will get back if we try to do a write.
Unfortunately, Diesel doesn't expose any API to give us the underlying
database error code, so we have to match on the error message instead.

This code assumes that:

  • We are never manually setting a transaction to READ WRITE
  • We are never manually setting a transaction to READ ONLY, and
    expecting write errors to bubble up to the user

We're not currently doing either of those things, and I can't imagine
that we'll start doing either one in the future.

During testing I found some issues with how this interacts with token
auth. The middleware was assuming that any error from
User::find_by_api_token was diesel::NotFound, and would proceed to
run the request (this finally explains a spurious failure I've been
getting about a poisoned transaction, it was probably a deadlock in the
API key update)

Because our tests run in a single transaction, we need to wrap that
update to ensure that the error doesn't poison the transaction. If we
didn't do anything else, we'd just assume that any user trying to do
something with token auth was not logged in, and would give a 403
instead of a 503. To fix this, I've set it up to fall back to a simple
find if the update fails.

This does not make the download endpoint work when we're in read only
mode, there is a separate issue for that which will come in a separate
PR. I have, however, added an explicit test to ensure it works in read
only mode.

This is in preparation for our next database upgrade. Last time we
needed to do this, we took the whole site down. This means that for 30
minutes, `cargo build` just didn't work. This is no longer something we
consider acceptable. While we could just take everything down *except*
`cargo build`, we can do better.

The upgrade process is basically:

- Set up a new replica of the primary database
- Wait until the replica is "caught up" (< 200 commits behind)
- Take down the site
- Upgrade the replica in place
- Verify the upgrade was successful
- Swap over to the replica
- Bring the site back up

There's no reason that we actually need to bring the site down though,
we just need to stop writing to the primary during the upgrade process.

This does that by setting the default transaction mode to READ ONLY, and
handling the error we will get back if we try to do a write.
Unfortunately, Diesel doesn't expose any API to give us the underlying
database error code, so we have to match on the error message instead.

This code assumes that:

- We are never manually setting a transaction to `READ WRITE`
- We are never manually setting a transaction to `READ ONLY`, and
  expecting write errors to bubble up to the user

We're not currently doing either of those things, and I can't imagine
that we'll start doing either one in the future.

During testing I found some issues with how this interacts with token
auth. The middleware was assuming that *any* error from
`User::find_by_api_token` was `diesel::NotFound`, and would proceed to
run the request (this finally explains a spurious failure I've been
getting about a poisoned transaction, it was probably a deadlock in the
API key update)

Because our tests run in a single transaction, we need to wrap that
update to ensure that the error doesn't poison the transaction. If we
didn't do anything else, we'd just assume that any user trying to do
something with token auth was not logged in, and would give a 403
instead of a 503. To fix this, I've set it up to fall back to a simple
find if the update fails.

This does *not* make the download endpoint work when we're in read only
mode, there is a separate issue for that which will come in a separate
PR. I have, however, added an explicit test to ensure it works in read
only mode.
@sgrif sgrif force-pushed the sg-read-only-db branch from effc112 to 52862fa Compare March 15, 2019 00:28
@sgrif
Copy link
Contributor Author

sgrif commented Mar 15, 2019

Test failure started after rebase since swirl returns Box<dyn Error + Send + Sync> from enqueue, and we're wrapping that in internal. Need to fix how that gets handled, since our From impl for CargoError never gets applied. Will not be able to fix until tomorrow, should not affect review.

The return type of `enqueue` is `Box<dyn Error + Send + Sync>`.
Unfortunately, `Box<dyn Error>` does not implement `Error`, so our from
impl doesn't apply. We're also not able to add an explicit `From` impl
either (which is odd, coherence rules should let us write that impl).

I've opted to just add an explicit converstion function that we can use
here. We should probably check that there's nowhere else we're passing
an error object to `human` or `internal`, but that will come in a
separate PR
@jtgeibel
Copy link
Member

Looks great to me. I love how clean this is, and I agree that it is extremely unlikely that our code will break either of these assumptions in the future.

@bors r+

@bors
Copy link
Contributor

bors commented Mar 17, 2019

📌 Commit 1759db6 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Mar 17, 2019

⌛ Testing commit 1759db6 with merge ade9a8e...

bors added a commit that referenced this pull request Mar 17, 2019
Allow the database to be set in read only mode

This is in preparation for our next database upgrade. Last time we
needed to do this, we took the whole site down. This means that for 30
minutes, `cargo build` just didn't work. This is no longer something we
consider acceptable. While we could just take everything down *except*
`cargo build`, we can do better.

The upgrade process is basically:

- Set up a new replica of the primary database
- Wait until the replica is "caught up" (< 200 commits behind)
- Take down the site
- Upgrade the replica in place
- Verify the upgrade was successful
- Swap over to the replica
- Bring the site back up

There's no reason that we actually need to bring the site down though,
we just need to stop writing to the primary during the upgrade process.

This does that by setting the default transaction mode to READ ONLY, and
handling the error we will get back if we try to do a write.
Unfortunately, Diesel doesn't expose any API to give us the underlying
database error code, so we have to match on the error message instead.

This code assumes that:

- We are never manually setting a transaction to `READ WRITE`
- We are never manually setting a transaction to `READ ONLY`, and
  expecting write errors to bubble up to the user

We're not currently doing either of those things, and I can't imagine
that we'll start doing either one in the future.

During testing I found some issues with how this interacts with token
auth. The middleware was assuming that *any* error from
`User::find_by_api_token` was `diesel::NotFound`, and would proceed to
run the request (this finally explains a spurious failure I've been
getting about a poisoned transaction, it was probably a deadlock in the
API key update)

Because our tests run in a single transaction, we need to wrap that
update to ensure that the error doesn't poison the transaction. If we
didn't do anything else, we'd just assume that any user trying to do
something with token auth was not logged in, and would give a 403
instead of a 503. To fix this, I've set it up to fall back to a simple
find if the update fails.

This does *not* make the download endpoint work when we're in read only
mode, there is a separate issue for that which will come in a separate
PR. I have, however, added an explicit test to ensure it works in read
only mode.
@bors
Copy link
Contributor

bors commented Mar 17, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing ade9a8e to master...

@bors bors merged commit 1759db6 into rust-lang:master Mar 17, 2019
@sgrif sgrif deleted the sg-read-only-db branch March 18, 2019 18:30
sgrif added a commit to diesel-rs/diesel that referenced this pull request Mar 28, 2019
This is an error that applications are likely to want to handle. A
database may be read only during maintenance, or for brief periods after
failing over to a hot replica.

Applications designed to handle this may wish to transform this error
into a user visible HTTP 503 response, or skip certain optional writes.
By providing this error variant, we allow them to gracefully do so,
without having to resort to parsing error messages as was needed in
rust-lang/crates.io#1670
sgrif added a commit to diesel-rs/diesel that referenced this pull request Mar 28, 2019
This is an error that applications are likely to want to handle. A
database may be read only during maintenance, or for brief periods after
failing over to a hot replica.

Applications designed to handle this may wish to transform this error
into a user visible HTTP 503 response, or skip certain optional writes.
By providing this error variant, we allow them to gracefully do so,
without having to resort to parsing error messages as was needed in
rust-lang/crates.io#1670
pietgeursen pushed a commit to pietgeursen/diesel that referenced this pull request May 16, 2019
This is an error that applications are likely to want to handle. A
database may be read only during maintenance, or for brief periods after
failing over to a hot replica.

Applications designed to handle this may wish to transform this error
into a user visible HTTP 503 response, or skip certain optional writes.
By providing this error variant, we allow them to gracefully do so,
without having to resort to parsing error messages as was needed in
rust-lang/crates.io#1670
disconsented pushed a commit to NarrativeApp/diesel that referenced this pull request Nov 26, 2020
This is an error that applications are likely to want to handle. A
database may be read only during maintenance, or for brief periods after
failing over to a hot replica.

Applications designed to handle this may wish to transform this error
into a user visible HTTP 503 response, or skip certain optional writes.
By providing this error variant, we allow them to gracefully do so,
without having to resort to parsing error messages as was needed in
rust-lang/crates.io#1670
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants