-
Notifications
You must be signed in to change notification settings - Fork 649
Remove crate_downloads
#1693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove crate_downloads
#1693
Conversation
510fee6
to
c1606d6
Compare
☔ The latest upstream changes (presumably #1699) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but not merging since this requires coordination with deployment.
The only place we're ever using `crate_downloads` directly is in a place that we should have been using `recent_crate_downloads`. Once we make that change, `crate_downloads` is only ever used to populate `recent_crate_downloads`, which can just as easily be populated from `version_downloads` instead. Populating it from `version_downloads` is a bit slower since the table is so much larger, and we don't have an index on it which we can use for this query. However, this doesn't really matter since we run this asynchronously. Now that we're on PG 11, I also intend to partition `version_downloads` on date which will make populating the view much faster once again. Note that I have not removed `crate_downloads` yet. That needs to be deployed separately, as it's a destructive change which would break running code, so we can't run that migration until the code which is no longer relying on `crate_downloads` is deployed
No part of our code base relies on this table any longer, so we can just remove it.
@bors r=jtgeibel |
c1606d6
to
f6702c6
Compare
📌 Commit f6702c6 has been approved by |
Remove `crate_downloads`⚠️ This PR contains destructive migrations. The commits in this PR must be deployed separately⚠️ I've been looking through `pg:diagnose` on our new database, and we have two indexes on `crate_downloads` which are either unused or low scan, high write. I'd looked into these indexes previously and determined that we needed them (presumably this was before we introduced `recent_crate_downloads`), but seeing one flagged as unused was surprising so I gave this another look. It turns out the only place we're still using `crate_downloads` directly is a place that should have been using `recent_crate_downloads` anyway. With that code changed to use `recent_crate_downloads`, the only place that `crate_downloads` is used is to populate `recent_crate_downloads`. We can just as easily populate that view from `version_downloads` instead. It turns out that `crate_downloads` and its indexes accounts for 23% (!!!!) of our total database size, so there's a huge incentive to get rid of it. The only downside to doing this is that refreshing `recent_crate_downloads` takes longer if we do it on `version_downloads`. This is both due to the larger table size, and the lack of an index we can use for this (we could add an index to make it faster, but that's not worth it). When based on `version_downloads`, it takes 6 seconds instead of 600ms to populate. Ultimately this doesn't matter, since we do this off the web server anyway. Now that we're on PG 11, I also intend to partition the `version_downloads` table which will get our performance faster than 600ms when done.
☀️ Test successful - checks-travis |
I've been looking through
pg:diagnose
on our new database, and we have two indexes oncrate_downloads
which are either unused or low scan, high write. I'd looked into these indexes previously and determined that we needed them (presumably this was before we introducedrecent_crate_downloads
), but seeing one flagged as unused was surprising so I gave this another look.It turns out the only place we're still using
crate_downloads
directly is a place that should have been usingrecent_crate_downloads
anyway. With that code changed to userecent_crate_downloads
, the only place thatcrate_downloads
is used is to populaterecent_crate_downloads
. We can just as easily populate that view fromversion_downloads
instead.It turns out that
crate_downloads
and its indexes accounts for 23% (!!!!) of our total database size, so there's a huge incentive to get rid of it. The only downside to doing this is that refreshingrecent_crate_downloads
takes longer if we do it onversion_downloads
. This is both due to the larger table size, and the lack of an index we can use for this (we could add an index to make it faster, but that's not worth it). When based onversion_downloads
, it takes 6 seconds instead of 600ms to populate. Ultimately this doesn't matter, since we do this off the web server anyway. Now that we're on PG 11, I also intend to partition theversion_downloads
table which will get our performance faster than 600ms when done.