-
-
Notifications
You must be signed in to change notification settings - Fork 272
modified rugby analytics case study with increased arviz usage #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@OriolAbril I had a few doubts
|
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:11Z The try except looks like something from the times of python2 compatibility. It shold be replaced by plain import: |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:12Z not sure why or how it has appeared, but this unnamed column that is a duplicate of the previous one should not be present almostmeenal commented on 2021-03-30T15:50:08Z fixed that by changing try: df_all = pd.read_csv("../data/rugby.csv") except: df_all = pd.read_csv(pm.get_data("rugby.csv"), index_col=0) to df_all = pd.read_csv(pm.get_data("../data/rugby.csv"), index_col=0) in the earlier case, the try case did not have index_col=0, so that was causing the trouble |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:13Z unlike in the arviz example, here we should use
We should also use |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:13Z here we only need the trace, the posterior predictive is computed later (we'll get to that too) and the prior is not used. We should therefore use |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:14Z this goes for all the plots, and in general. you'll see warnings if you try to use arviz plots directly on pymc3 multitrace objects (the return type of
For both better functionality and performance, one should convert to inferencedata (or use |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:15Z This cell should be deleted, and the one below should be simplified to |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:15Z Gelman-Rubin is not used anymore, it should be |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:16Z you can get this with |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:16Z We do not want to convert the xarray dataset we get from
Also try using
Note: this applies to both this and the 2 cells below. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:17Z We don't need this anymore, now all the plots have the actual team in their title, there is no need to have the number encoding at hand. almostmeenal commented on 2021-03-30T16:16:57Z i was thinking i'll replace teams as an array instead of a df in that case |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:17Z Here we should use
with model: data.extend(az.from_pymc3( posterior_predictive=pm.sample_posterior_predictive(data) ), inplace=True)
which will add the posterior predictive samples to the inferencedata object as almostmeenal commented on 2021-03-30T18:32:28Z i did that, although i checked https://arviz-devs.github.io/arviz/api/generated/arviz.InferenceData.extend.html?highlight=data%20extend#arviz.InferenceData.extend and inplace is not a valid argument, i got that error as well OriolAbril commented on 2021-03-30T18:38:27Z True, I missremembered. almostmeenal commented on 2021-03-30T18:43:21Z hi, so i made the changes except these next two cells for the "sim_table" , I'll read that notebook and see how to work with xarrays im still new to that. I'll make a commit for the rest of the changes and you can have a look, meanwhile I'll fix these as well. Thanks for the patient and detailed responses, i understand I don't know a lot of things and made many naive errors. OriolAbril commented on 2021-03-30T20:09:30Z This sounds great! Thanks!
i understand I don't know a lot of things and made many naive errors.
We have little to no documentation on that, which is one of the reasons we need to update everything urgently so we don't expect anyone to know all this, even between maintainers, we are constantly asking about the parts of the library we don't know too much about. There are no stupid questions, so don't hesitate to ask anything. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:18Z Looks great, in this particular case, I would add |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-03-30T14:16:19Z Given the extent of changes in data processing and visualization, I would update this to something along the lines of author Peadar Coyle, updated by Meenal Jhajharia to use ArviZ and xarray. No need to leave your mail.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added several comments with review nb, that are also added to github, but don't read them on github as there its not possible to see to which cells and changes they are referring to.
fixed that by changing try: df_all = pd.read_csv("../data/rugby.csv") except: df_all = pd.read_csv(pm.get_data("rugby.csv"), index_col=0) to df_all = pd.read_csv("../data/rugby.csv", index_col=0)
in the earlier case, the try case did not have index_col=0, so that was causing the trouble View entire conversation on ReviewNB |
i was thinking i'll replace teams as an array instead of a df in that case View entire conversation on ReviewNB |
i did that, although i checked https://arviz-devs.github.io/arviz/api/generated/arviz.InferenceData.extend.html?highlight=data%20extend#arviz.InferenceData.extend and inplace is not a valid argument, i got that error as well View entire conversation on ReviewNB |
True, I missremembered. View entire conversation on ReviewNB |
hi, so i made the changes except these next two cells for the "sim_table" , I'll read that notebook and see how to work with xarrays im still new to that. I'll make a commit for the rest of the changes and you can have a look, meanwhile I'll fix these as well. Thanks for the patient and detailed responses, i understand I don't know a lot of things and made many naive errors. View entire conversation on ReviewNB |
This sounds great! Thanks!
i understand I don't know a lot of things and made many naive errors.
We have little to no documentation on that, which is one of the reasons we need to update everything urgently so we don't expect anyone to know all this, even between maintainers, we are constantly asking about the parts of the library we don't know too much about. There are no stupid questions, so don't hesitate to ask anything. View entire conversation on ReviewNB |
@OriolAbril i couldnt make a commit because of black failing as a precommit hook, i'll look into it and resolve it, then you can check the cells, just give me some time |
You should try recomitting again right after the first time when it fails. You can also run |
<p>yeah i thought that was supposed to happen, so when i tried the next time it showed the same error again, when i pushed it just showed everything up to date, also rn im a lil stuck with how to do the sim table thing in xarrays instead of pandas</p>
|
@OriolAbril okay so now its done |
I've used arviz inference data to extend the posterior predictive part as you mentioned, but I've stuck to dataframes for generating the sim table, I tried using xarray operations to calculate 3*home_won and do the grouping, it was getting complicated(maybe its just me). But i think its fairly reasonable to use pandas for generating that table, as all the data stays in arviz.inferencedata but its manipulation for viewing can be done using pandas.
Also, i kept constant data and calculated it with pp, because I feel like it's a good practice in general, for simple manipulation of arviz data, having the constant part contains which is home_team etc which should be stored too, makes it all happen in arviz and reduces the need to get data from initial csv/df.
Let me know if i should change anything else
|
Oh, I see, yes, I said commit above but you first have to make sure to add the modified files before committing again, or use
I have skimmed through the changes and they look amazing, thanks! There are some nits to polish, but nothing to big. I'll review later on reviewnb. Regarding the sim table, if you don't mind I can give it a go, now that everything else uses xarray and ArviZ it will be much easier for me to jump in and focus only on this problematic computation. Using pandas is fine, but I think that here everything will be much simpler once converted to xarray because the data in inherently multidimensional and trying to get pandas to play nicely with this data takes more time than the job itself. If so, I would do this after giving you time to work on the minor changes still needed so there are no git conflicts and push directly to this PR. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T15:48:39Z Add a semicolon at the end (like in
|
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T15:48:40Z Title should be "HDI" or "Posterior HDI" instead of HPD which is an acronym we have not introduced nor is part of the library "corpus". Also minor comment, we don't need |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T15:48:41Z Do you want to add the code that uses the extra
If so we should update this to
[...] over a total of 120k simulations, 30 per sample in the posterior.
If not we should update to "... over a total of 4000 simulations, one per sample in the posterior" |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T15:48:42Z I don't like what black is doing here. Can you try reformatting to
pp["home_win"] = ( (pp["home_points"] > pp["away_points"]) * 3 # home team wins and gets 3 points + (pp["home_points"] == pp["away_points"]) * 2 # tie -> home team gets 2 points ) pp["away_win"] = ( (pp["home_points"] < pp["away_points"]) * 3 + (pp["home_points"] == pp["away_points"]) * 2 )
and see if it likes it better? |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T15:48:43Z The xlabel should be updated, we are plotting all ranks now. Maybe "Probability of finishing in each rank (including ties)"? The ties note we need to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebook looks amazing now!
All remaining comments are mostly styling. I'll merge after this last review iteration. Thanks for all the patience and for addressing all the comments.
gives an error: /Users/almostmeenal/anaconda3/envs/p/lib/python3.9/site-packages/seaborn/axisgrid.py:64: UserWarning: This figure was using constrained_layout==True, but that is incompatible with subplots_adjust and or tight_layout: setting constrained_layout==False. self.fig.tight_layout(*args, **kwargs) i think this is happening after the arviz style, also i need to change the font size, is there something you suggest doing about this error? update: fixed it with
|
@OriolAbril took care of all the comments, except the one with nqba-black, tried twice, apparently its adamant |
wow, the reformat with the comments is terrible. Turn formatting of for this cell then, the style guide shows how: https://github.com/pymc-devs/pymc3/wiki/PyMC3-Jupyter-Notebook-Style-Guide |
done |
It looks like the notebook is not all executed now, maybe you saved and pushed before it finished? |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-02T18:34:35Z Sorry I didn't notice before, |
the only problem is, without changing the font, the x labels are collapsing into each other and not at all legible |
Try |
thanks this worked! this PR took insane patience, thankyou for that. finally the notebook looks good I think. also in case you missed it in the git notifs, I dropped an email too about random gsoc stuff |
Great! I see the change in the first cell with the seaborn facetgrid, but not in the second, you should edit the 2nd cell (currently labeled In [13]) and then do a "Restart kernel & run all", all cells should be run sequentially to make sure everything works as expected. |
did |
if this is done and ready to be merged then i can(?) finally remove my local branch, right now i'm constantly shifting between two branches |
I still don't see all cells executed sequentially, there are some >30 in the middle of the notebook. I can rerun it myself too if you prefer
Once the PR is merged github itself will show you a button to delete the branch, but switching between branches is good, you should get familiar with that and work on a single feature/change per PR and have each PR submitted from it's own branch. |
makes sense yeah
ah yeah I didn't save the notebook after re run again my bad. it should be fixed in the last commit i made |
Thanks for the PR! |
* Initial commit of the linear conjugate gradients * Updated __init__ so that we can import linear_cg * Since we are invoking the function as a class attribute, removing the self * Fixed the import * Cleaned up some unused code * Fixed the name of the function * If vector is not equal to itself, it might have NaN * removed temporary variable * Moved configuration from settings class to function arguments * Ensure output has same datatype regardless of value of n_tridiag * if n_tridiag = 0 then tridiagonal matrices are not computed instead an identity matrix is returned which doesn't affect downstream computation
I tried to convert some of the plots to arviz wherever possible, removed a few warnings because of deprecated plot names, and added inference data as arviz one
still working on: adding priors etc
Addresses issue #49 and aims to advance it to "Best Practices" column