Skip to content

Recent download stats by minor version #1941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simonrw
Copy link
Contributor

@simonrw simonrw commented Dec 5, 2019

Implement an implementation of the design suggested and sketched out in #1036.

This PR implements the basic functionality of showing a downloads graph, but there are still a few issues:

  1. the data available cheaply from the existing ember model shows total downloads, not downloads over the last 90 days which is likely more useful for the purposes suggested in Download breakdown by compatible versions #1036;
  2. the graph has not been tested for crates with an extreme number of major.minor versions, e.g. >10. There may need to be some filtering if this is the case, when the number of crates is high, or only return the top 10 downloaded versions even if they are not consecutive;
  3. it might be good to let the user filter based on some parameters, e.g. let them:
    • select the timespan desired, e.g. "all time", or "recent"
    • select the type of grouping, e.g. major, major.minor

I have added a basic test for the title of the plot, however I have not implemented a test for the graph itself. I don't think there is a test for the graph on the crate page, so perhaps this is not so bad?


I would appreciate some guidance on how to get the last 90 days downloaded stats from the models. I presume as it's not in the crate model itself, either:

  1. we add it to the crate model, update the API to return it, or
  2. we add an additional network fetch, or
  3. something I've not thought of.

Option 1 feels like returning a lot of nearly duplicate data that is only used in one place. This also adds scope creep to the API endpoint. Option 2 would add additional complexity of further async network requests that may not be required.

Add a new component which is tasked with taking the `.versions` array of
the `crate` model and showing a bar graph of the downloaded versions,
broken up by `major.minor` semver groups.

This component is heavily influenced by the `download-graph` component,
which performs a more complex function on the crate home page.

Issue rust-lang#1036
Added a test to show that the graph title is visible on the page.
@rust-highfive
Copy link

r? @sgrif

(rust_highfive has picked a reviewer for you, use r? to override)

@simonrw
Copy link
Contributor Author

simonrw commented Dec 6, 2019

For context the page looks like this:

Screenshot from 2019-12-06 08-33-33

My original draft design is in the issue #1036

Before the crate versions would be sorted by string comparisons, which
does not give the logical sorting order in a semver-compatible way.
Users will want to see their most recent crates at the top.

This is particularly true of the original use case, which is: "when can
I drop support for crates older than version n"? The graph should help
with this.
@carols10cents carols10cents assigned carols10cents and unassigned sgrif Dec 9, 2019
@carols10cents
Copy link
Member

This is really neat!! Thank you for working on this!!!

I tried out a few crates and the graphs are showing some interesting info, and they even look great on mobile widths!

mobile view of rand crate's graph

To do some stress testing, I found some crates that have a large number of versions-- in particular, rustc-ap-serialize has released 631 major versions, so its graph looks like this:

rustc-ap-serialize's graph

This makes me wonder if we should limit the number of versions at some point to a number that can be displayed in a visible way on the graph?

I also noticed in your mockups on the issue that the labels on the y-axis are "0.4.x", "0.3.x", etc but the labels on the y-axis using this PR are "0.4.0", "0.3.0", etc-- which can be a little confusing when, say, only "0.0.1" exists but "0.0.0" doesn't actually exist:

Screen Shot 2019-12-09 at 9 27 33 AM

I think changing the labels to be what you have in the mockup would be great, is that something you just haven't had a chance to do yet?

I would appreciate some guidance on how to get the last 90 days downloaded stats from the models. I presume as it's not in the crate model itself, either:

1. we add it to the crate model, update the API to return it, or

2. we add an additional network fetch, or

3. something I've not thought of.

Option 1 feels like returning a lot of nearly duplicate data that is only used in one place. This also adds scope creep to the API endpoint. Option 2 would add additional complexity of further async network requests that may not be required.

I was refreshing my memory about what the existing download graph does-- it shows downloads over the last 90 days, and uses the data returned from http://localhost:4200/api/v1/crates/{crate-name}/downloads. It's only the top 5 versions, and the download numbers are broken down by day.

I think to add a graph similar to the one you've added but for the last 90 days, we should create a new API endpoint that returns just the information you need-- all downloads from the last 90 days for all versions of the crate, grouped by minor version (further grouping those by major version could be done on the frontend).

We could make it such that this network request is only performed if you visit the http://localhost:4200/crates/{crate-name}/versions page, so I don't think that one extra network request for providing useful information when it's being shown should be a problem.

What do you think?

All these thoughts were from playing around with this PR; going to take a look at the code now :) Thank you so much!

Copy link
Member

@carols10cents carols10cents left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought about the version axis labels?

@simonrw
Copy link
Contributor Author

simonrw commented Dec 9, 2019

This is really neat!! Thank you for working on this!!!

I tried out a few crates and the graphs are showing some interesting info, and they even look great on mobile widths!

Great, glad to hear it! I had not specifically tweaked this or targeted this use case, it is a consequence of copying your crate downloads graph which is responsive already.

To do some stress testing, I found some crates that have a large number of versions-- in particular, rustc-ap-serialize has released 631 major versions, so its graph looks like this:

This makes me wonder if we should limit the number of versions at some point to a number that can be displayed in a visible way on the graph?

Yes I had considered this as well. We can choose between the most downloaded N versions, or the N most recent versions. I think the N most downloaded would suit the best as this serves the original use case of "which versions can I stop maintaining", and is probably the most interesting to people.

An extension could be to let the user choose what the N is in this case, though initially I think we should pick.

I also noticed in your mockups on the issue that the labels on the y-axis are "0.4.x", "0.3.x", etc but the labels on the y-axis using this PR are "0.4.0", "0.3.0", etc-- which can be a little confusing when, say, only "0.0.1" exists but "0.0.0" doesn't actually exist:

I think changing the labels to be what you have in the mockup would be great, is that something you just haven't had a chance to do yet?

Yeah as you saw in the code, I needed to add a ".0" section to get semver to parse it correctly. I will give it a go to see if it parses 0.1.x for example. Otherwise it is a presentation problem and I can update the presentation code.

I would appreciate some guidance on how to get the last 90 days downloaded stats from the models. I presume as it's not in the crate model itself, either:

1. we add it to the crate model, update the API to return it, or

2. we add an additional network fetch, or

3. something I've not thought of.

Option 1 feels like returning a lot of nearly duplicate data that is only used in one place. This also adds scope creep to the API endpoint. Option 2 would add additional complexity of further async network requests that may not be required.

I was refreshing my memory about what the existing download graph does-- it shows downloads over the last 90 days, and uses the data returned from http://localhost:4200/api/v1/crates/{crate-name}/downloads. It's only the top 5 versions, and the download numbers are broken down by day.

I think to add a graph similar to the one you've added but for the last 90 days, we should create a new API endpoint that returns just the information you need-- all downloads from the last 90 days for all versions of the crate, grouped by minor version (further grouping those by major version could be done on the frontend).

We could make it such that this network request is only performed if you visit the http://localhost:4200/crates/{crate-name}/versions page, so I don't think that one extra network request for providing useful information when it's being shown should be a problem.

What do you think?

That sounds like a good plan. I think you're right, the scope of the new endpoint will be fairly limited so only people that view the versions page will be accessing that endpoint.

I will try to get something going fairly soon.

All these thoughts were from playing around with this PR; going to take a look at the code now :) Thank you so much!

Glad to help, I'm super excited to give back to the Rust community!

The `semver` library requires versions to be numeric for the `major`,
`minor` and `patch` variables. This means that `0.2.x` is not a valid
version.

Including the `.0` suffix is confusing for the crate owner or user, as
the will wonder where the patch version crates are.

Instead, make it clear that we are grouping by minor patch version by
explicitally using the placeholder ".x".
@simonrw
Copy link
Contributor Author

simonrw commented Dec 9, 2019

If you're happy to wait a little bit, I will give the additional endpoint a go. Do you think it makes sense to have such a graph? Would you prefer a graph for all time, or the last 90 days?

@bjorn3
Copy link
Member

bjorn3 commented Dec 9, 2019

Maybe group 1.0.x and 1.1.x together as 1.x.y, because a 1.0.x requirement is satisfied by a 1.1.x version.

@bjorn3
Copy link
Member

bjorn3 commented Dec 9, 2019

Also 0.0.x and 0.0.y are not compatible.

This is an endpoint that gives the recent versions (defaults to 90 days)
for each semver version.

This route is needed to power the front end graph showing recent
downloads for a crate, broken down by crate versions.
@simonrw
Copy link
Contributor Author

simonrw commented Dec 9, 2019

I've just pushed some changes implementing a backend API endpoint. I'd appreciate a good review of it for what can be improved as I'm not too familiar with Diesel. I've tried to take inspiration from existing code. I've also added a test, which I'm not confident I've done very well..

@carols10cents
Copy link
Member

@bjorn3

Maybe group 1.0.x and 1.1.x together as 1.x.y, because a 1.0.x requirement is satisfied by a 1.1.x version.

While that is true for the default version specification without any operator (which has the same behavior as using a caret), library authors might be interested in people using tilde requirements.

Also 0.0.x and 0.0.y are not compatible.

You are correct-- I thought they were, according to Cargo when using caret specifications, but it turns out there's an exception when the minor version is 0.. Given that Cargo's default version when running cargo new is 0.1.0, I expect this situation to be relatively rare in practice (I can run some queries to be sure). I'm not saying we shouldn't break these out separately for correctness's sake, just that if it turns out to be prohibitively complex or expensive, we should consider the rarity of this case. I'm going to take a look at the current code now and see. Any thoughts on this @mindriot101 ?

@bjorn3
Copy link
Member

bjorn3 commented Dec 11, 2019

I expect this situation to be relatively rare in practice

goblin has been using 0.0.x until recently.

Copy link
Member

@carols10cents carols10cents left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but this is heading in the right direction!!

pub fn recent_downloads(req: &mut dyn Request) -> AppResult<Response> {
use diesel::dsl::*;

let ndays = 90;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's time to extract this magic number into a constant to have one source of truth that defines what we mean by "recent", because this will be the third spot (in addition to here and here).

It'd be nice if there was a way to use the same value that was used in this migration, but I don't think there's a way to do that. Maybe just add a comment to the constant that its value must be the same as what's in the SQL function, and if we decide to change the value, we'll need a new migration?

Even if we eventually allow this value to be configurable, the default value should probably match what we use in other places.

Mind making that extraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to look into this now unfortunately, but yes I think this would be a good idea to extract. I'm happy to take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now moved this constant into the Config struct which seems a reasonable place to put it. We don't want the user to change it, but it is something that can be accessed from all request handlers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now moved this constant into the Config struct which seems a reasonable place to put it. We don't want the user to change it, but it is something that can be accessed from all request handlers.

VersionDownload::create_or_increment(v1, conn).unwrap();

VersionDownload::create_or_increment(v2, conn).unwrap();
VersionDownload::create_or_increment(v2, conn).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think this test setup is fine, but it's a bit verbose and a bit hard to match up with the assertions.

We have a function on the crate builder that sets recent downloads on a crate (by adding downloads to an arbitrary version of the crate), I think we could do something similar on the version builder to enable adding downloads to a particular version?

CrateBuilder::version can also take a VersionBuilder rather than just a version string, so the test setup could get rid of these highlighted lines and do something like this instead:

let _krate = CrateBuilder::new("foo_versions", user.id)
            .version(VersionBuilder::new("0.5.1").recent_downloads(3))
            .version(VersionBuilder::new("1.0.0").recent_downloads(2))
            .expect_build(conn);

I'm also wondering if we should be testing that this endpoint doesn't include non-recent download stats, so we'd need a way to set older downloads. The crate builder isn't a great example for this as it kind of cheats. VersionDownload also doesn't have a nice function for setting downloads for a particular date because usually we only want to set downloads for today. The populate utility script for setting up test data has an example of doing this, though.

What do you think? Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the tests to try and add clarity to what is going on. The example you gave was close to working, what is the correct api is:

let _krate = CrateBuilder::new("foo_versions", user.id)
            .version(VersionBuilder::new("0.5.1")).recent_downloads(3)
            .version(VersionBuilder::new("1.0.0")).recent_downloads(2)
            .expect_build(conn);

however this only adds downloads to the most recently added downloaded version (i.e. 1.0.0). I had to use diesel to insert rows into the version_downloads table manaully, though I tried to add clarity.

I'm also wondering if we should be testing that this endpoint doesn't include non-recent download stats, so we'd need a way to set older downloads. The crate builder isn't a great example for this as it kind of cheats. VersionDownload also doesn't have a nice function for setting downloads for a particular date because usually we only want to set downloads for today. The populate utility script for setting up test data has an example of doing this, though.

I also added this to the test, so now it sets up two versions within 90 days and two versions older than 90 days. I also added some checking that the versions come back sorted, which should make the tests stable and reliable.

@simonrw
Copy link
Contributor Author

simonrw commented Dec 11, 2019

@carols10cents thank you so much for this feedback, and suggestions on how to continue. This will really help me understand the crates.io code base better.

This uses the diesel API to perform the correct merging and matching in
the database, rather than the previous hacky way of building the
relationships in Rust.

Thanks @carols10cents for the pointers.
This relies on the serde implementation of semver::Version
Update some of the schema, improve the readability of the test, and test
the recent download fetching by adding additional download entries
before the 90 day cutoff.
We now use the magic value "90" in three places in the code. Instead of
that, this commit moves this to the `Config` struct.  This is a
centralised place to put it, that all request handlers can see it. We
initialise to 90 days by default.

I have not let the user override this, as bad things will happen. I was
advised that this value must not get out of sync with
`migrations/2018-04-24-145128_create_recent_crate_downloads/up.sql`.
@carols10cents
Copy link
Member

(Again, apologies if I'm looking at this before you're ready, I saw there were more commits and wanted to check in)

I accidentally tried this out on a snapshot of production that I downloaded over 90 days ago, so that no crate's versions had any recent downloads! In this case, the recent_downloads API returns:

{"downloads":[],"meta":{"crate":"some_crate","ndays":90}}

and on the page, instead of the graph, I see an error message that says Data column(s) for axis #0 cannot be of type string.

Because we have crawlers and mirrors and such, it's probably rare that no version in the past 90 days would have any downloads, but we should probably handle this situation a bit nicer juuuust in case.

Now to go get a more recent snapshot so I can actually test this ;)

@carols10cents
Copy link
Member

Ahh, I see some TODOs in there. Carry on!

@simonrw
Copy link
Contributor Author

simonrw commented Dec 23, 2019

Hi @carols10cents thank you for testing this edge case I had not considered. I am still working on this, but happy for any feedback. I am unlikely to get time this week though, but next week I have some time.

@carols10cents
Copy link
Member

No problem!! Not trying to rush you, I'm just trying to make sure we don't leave any PRs without reviews for too long :) Take as much time as you need.

Also prevent variable overshadowing from the outer scope
The fixture sets up two versions. Therefore this expectation is that two
bars should exist on the graph.
@simonrw
Copy link
Contributor Author

simonrw commented Dec 24, 2019

Hi all, I'd love to get some help on how to implement a test for @carols10cents's example where no downloads have been performed within 90 days. In particular this commented out test. I'm not familiar enough with the javascript testing ecosystem. I think I need to update some of the fixtures in the mirage dir. Can anyone give me any pointers?

@carols10cents
Copy link
Member

Hi all, I'd love to get some help on how to implement a test for @carols10cents's example where no downloads have been performed within 90 days. In particular this commented out test. I'm not familiar enough with the javascript testing ecosystem. I think I need to update some of the fixtures in the mirage dir. Can anyone give me any pointers?

Hi! What do you think about something like this, in the mirage config?

  this.get('/crates/:crate_id/recent_downloads', (schema, request) => {
    let crate = request.params.crate_id;
    
    var downloads = [
      { version: '0.1.0', downloads: 32 }, 
      { version: '0.2.0', downloads: 128 }
    ];  

    if (crate == 'norecentdownloads') {
      downloads = [];
    }
    
    return { downloads: downloads, meta: { crate: crate, ndays: 90 } };
  });

This lint fixing broke some tests, as I was trying to use the global
"@model" variable, rather than the "model" attribute for the helper.
This fixes the tests.

This reverts commit 263e64d.
@bors
Copy link
Contributor

bors commented Jan 10, 2020

☔ The latest upstream changes (presumably #2112) made this pull request unmergeable. Please resolve the merge conflicts.

assert.dom('.graph svg g g g g text[text-anchor=end]').exists({ count: 2 });
});

/* TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using QUnit.todo make sense here?

@simonrw
Copy link
Contributor Author

simonrw commented Feb 15, 2020

Hi all, sorry for the lack of input recently. I'm having trouble creating a good test that exercises the situation when no recent downloads are found. I think this is an edge case, however, I'd prefer a test for it.

The expected behaviour is that it should gracefully show text saying that no recent downloads are available. So far I have handled this case (i.e. it does not throw an unhandled exception) but nothing further. I'd like a hand implementing the test.

The advice in carols10cents's post gets me close I think, but other tests fail. I think this is an issue with my lack of javascript and ember testing knowledge. I would really appreciate some help.

The alternative is that we do not have an automated test for this case and rely on manual testing. I can edit my local database to cause this case for a particular crate for example.

@bors
Copy link
Contributor

bors commented Mar 24, 2020

☔ The latest upstream changes (presumably #2308) made this pull request unmergeable. Please resolve the merge conflicts.

@simonrw
Copy link
Contributor Author

simonrw commented Mar 22, 2021

Sorry it's been so long with this, but IIRC I was having trouble with the JS tests. Can anyone help? cc @carols10cents ?

I realise that my PR needs to be rebased, which I can do later.

Thanks!

@@ -67,3 +69,53 @@ pub fn downloads(req: &mut dyn Request) -> AppResult<Response> {
meta,
}))
}

/// Handles the `GET /crates/:crate_id/recent_downloads` route.
pub fn recent_downloads(req: &mut dyn Request) -> AppResult<Response> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtgeibel any thoughts on this? it might conflict with our plans to move the historical download data to S3 🤔

@Turbo87 Turbo87 added A-backend ⚙️ A-frontend 🐹 C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works labels Jun 20, 2021
@Turbo87
Copy link
Member

Turbo87 commented Jun 25, 2021

hi @mindriot101!

first of all, sorry for the long radio silence on this PR. the team has been quite busy with just keeping the service running at all, and there hasn't been much capacity for implementing and reviewing new features.

we did however talk about this PR today in the team meeting. the code generally looks good, but we had a lot of performance issues with the version downloads database table lately and it looks like this feature would increase the load on that table even further. since this is more of a nice-to-have feature and not something that we would consider mission-critical, we will unfortunately have to reject this feature for the reasons above 😞

and please don't take this personally, we would love more PRs from you in the future, but we unfortunately need to be mindful with our limited server resources right now.

@Turbo87 Turbo87 closed this Jun 25, 2021
@simonrw
Copy link
Contributor Author

simonrw commented Jun 25, 2021

That's absolutely fine @Turbo87 . Thanks for the update 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ A-frontend 🐹 C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants