Skip to content

Remove crate_downloads #1693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 1, 2019
Merged

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Mar 22, 2019

⚠️ This PR contains destructive migrations. The commits in this PR must be deployed separately ⚠️

I've been looking through pg:diagnose on our new database, and we have two indexes on crate_downloads which are either unused or low scan, high write. I'd looked into these indexes previously and determined that we needed them (presumably this was before we introduced recent_crate_downloads), but seeing one flagged as unused was surprising so I gave this another look.

It turns out the only place we're still using crate_downloads directly is a place that should have been using recent_crate_downloads anyway. With that code changed to use recent_crate_downloads, the only place that crate_downloads is used is to populate recent_crate_downloads. We can just as easily populate that view from version_downloads instead.

It turns out that crate_downloads and its indexes accounts for 23% (!!!!) of our total database size, so there's a huge incentive to get rid of it. The only downside to doing this is that refreshing recent_crate_downloads takes longer if we do it on version_downloads. This is both due to the larger table size, and the lack of an index we can use for this (we could add an index to make it faster, but that's not worth it). When based on version_downloads, it takes 6 seconds instead of 600ms to populate. Ultimately this doesn't matter, since we do this off the web server anyway. Now that we're on PG 11, I also intend to partition the version_downloads table which will get our performance faster than 600ms when done.

@sgrif sgrif force-pushed the sg-remove-crate-downloads branch from 510fee6 to c1606d6 Compare March 22, 2019 17:57
@bors
Copy link
Contributor

bors commented Mar 28, 2019

☔ The latest upstream changes (presumably #1699) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@jtgeibel jtgeibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but not merging since this requires coordination with deployment.

sgrif added 2 commits April 1, 2019 09:58
The only place we're ever using `crate_downloads` directly is in a place
that we should have been using `recent_crate_downloads`. Once we make
that change, `crate_downloads` is only ever used to populate
`recent_crate_downloads`, which can just as easily be populated from
`version_downloads` instead.

Populating it from `version_downloads` is a bit slower since the table
is so much larger, and we don't have an index on it which we can use for
this query. However, this doesn't really matter since we run this
asynchronously. Now that we're on PG 11, I also intend to partition
`version_downloads` on date which will make populating the view much
faster once again.

Note that I have not removed `crate_downloads` yet. That needs to be
deployed separately, as it's a destructive change which would break
running code, so we can't run that migration until the code which is no
longer relying on `crate_downloads` is deployed
No part of our code base relies on this table any longer, so we can just
remove it.
@sgrif
Copy link
Contributor Author

sgrif commented Apr 1, 2019

@bors r=jtgeibel

@sgrif sgrif force-pushed the sg-remove-crate-downloads branch from c1606d6 to f6702c6 Compare April 1, 2019 15:59
@bors
Copy link
Contributor

bors commented Apr 1, 2019

📌 Commit f6702c6 has been approved by jtgeibel

@bors
Copy link
Contributor

bors commented Apr 1, 2019

⌛ Testing commit f6702c6 with merge 22f8632...

bors added a commit that referenced this pull request Apr 1, 2019
Remove `crate_downloads`

⚠️ This PR contains destructive migrations. The commits in this PR must be deployed separately ⚠️

I've been looking through `pg:diagnose` on our new database, and we have two indexes on `crate_downloads` which are either unused or low scan, high write. I'd looked into these indexes previously and determined that we needed them (presumably this was before we introduced `recent_crate_downloads`), but seeing one flagged as unused was surprising so I gave this another look.

It turns out the only place we're still using `crate_downloads` directly is a place that should have been using `recent_crate_downloads` anyway. With that code changed to use `recent_crate_downloads`, the only place that `crate_downloads` is used is to populate `recent_crate_downloads`. We can just as easily populate that view from `version_downloads` instead.

It turns out that `crate_downloads` and its indexes accounts for 23% (!!!!) of our total database size, so there's a huge incentive to get rid of it. The only downside to doing this is that refreshing `recent_crate_downloads` takes longer if we do it on `version_downloads`. This is both due to the larger table size, and the lack of an index we can use for this (we could add an index to make it faster, but that's not worth it). When based on `version_downloads`, it takes 6 seconds instead of 600ms to populate. Ultimately this doesn't matter, since we do this off the web server anyway. Now that we're on PG 11, I also intend to partition the `version_downloads` table which will get our performance faster than 600ms when done.
@bors
Copy link
Contributor

bors commented Apr 1, 2019

☀️ Test successful - checks-travis
Approved by: jtgeibel
Pushing 22f8632 to master...

@bors bors merged commit f6702c6 into rust-lang:master Apr 1, 2019
@sgrif sgrif deleted the sg-remove-crate-downloads branch April 1, 2019 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants