-
-
Notifications
You must be signed in to change notification settings - Fork 272
Update dirichlet mixture model notebooks #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dirichlet mixture model notebooks #212
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-18T19:25:39Z ost_pdf_contribs = sp.stats.norm.pdf( np.atleast_3d(x_plot), posterior["mu"].T.values[:, np.newaxis, :], 1.0 / np.sqrt(posterior["lambda_"] * posterior["tau"]).T.values[:, np.newaxis, :], )
Is there a better way to code for the plots than using I tried passing the xarray dataset of all the data variables from I thought that might be because the OriolAbril commented on 2021-08-24T18:11:42Z There are some ways to make scipy.stats.norm compatible with xarray inputs. If you have the inferencedata saved we can discuss that tomorrow. chiral-carbon commented on 2021-08-26T06:22:32Z didnt get a chance to ask about using xarray with scipy.stats.norm functions, do you have any quick links I could refer to? OriolAbril commented on 2021-08-26T14:04:04Z https://arviz-devs.github.io/arviz/user_guide/pymc3_refitting_xr_lik.html is the example closest to this situation, then also check the apply_ufunc docs and this answer: https://stackoverflow.com/questions/58719696/how-to-apply-a-xarray-u-function-over-netcdf-and-return-a-2d-array-multiple-new/62012973#62012973 |
@@ -65,8 +65,9 @@ | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried running the notebook first without making any major changes (except using numpy random Generator for setting the seed), but I get an error here. The summary_multinomial
dataframe that I get is not the same as before, because,
creation of the column ess_mean_per_sec
depends on a column called ess_mean
. However, here we don't get the columns ess_mean
and ess_sd
, is there something obvious I missed here?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arviz summary now uses ess_bulk
as default. They are not exactly the same but nearly. For the purposes of these notebook we can just change ess_mean
for ess_bulk
View / edit / reply to this conversation on ReviewNB Sayam753 commented on 2021-08-19T13:37:35Z I think the equation
Reference - https://en.wikipedia.org/wiki/Dirichlet_process#The_stick-breaking_process chiral-carbon commented on 2021-08-19T16:54:07Z ah thanks! OriolAbril commented on 2021-08-24T18:03:25Z I don't know if this comes from wikipedia or if it's maybe a typo, but I find using both lowercase omega and w in the same set of equations to be extremely confusing and a recipe for disaster. chiral-carbon commented on 2021-08-25T07:15:33Z will change the variable for weights to \mu OriolAbril commented on 2021-08-25T08:10:00Z how about weights if that is what the variable represents? chiral-carbon commented on 2021-08-25T21:23:36Z wouldn't 'weights' be a little too big for a variable name? will change to that though if you think that would make things clearer |
View / edit / reply to this conversation on ReviewNB Sayam753 commented on 2021-08-19T13:37:36Z Line #3. ax.plot(x_plot, sample_cdfs[0], c="gray", alpha=0.75, label="DP sample CDFs") Line #4. ax.plot(x_plot, sample_cdfs[1:].T, c="gray", alpha=0.75)
This can be combined in one chiral-carbon commented on 2021-08-19T16:54:54Z thanks, will try running this. OriolAbril commented on 2021-08-24T18:04:48Z arviz dev version will have an ecdf plot very soon which could be used in this notebook https://github.com/arviz-devs/arviz/pull/1753 chiral-carbon commented on 2021-08-25T07:23:29Z then it would be better to wait for that I think because on contracting
Line #3. ax.plot(x_plot, sample_cdfs[0], c="gray", alpha=0.75, label="DP sample CDFs") Line #4. ax.plot(x_plot, sample_cdfs[1:].T, c="gray", alpha=0.75) to ax.plot(x_plot, sample_cdfs.T, c="gray", alpha=0.75, label="DP sample CDFs")
I end up getting multiple labels for |
ah thanks! View entire conversation on ReviewNB |
thanks, will try running this. View entire conversation on ReviewNB |
I don't know if this comes from wikipedia or if it's maybe a type, but I find using both lowercase omega and w in the same set of equations to be extremely confusing and a recipe for disaster. View entire conversation on ReviewNB |
arviz dev version will have an ecdf plot very soon which could be used in this notebook https://github.com/arviz-devs/arviz/pull/1753 View entire conversation on ReviewNB |
There are some ways to make scipy.stats.norm compatible with xarray inputs. If you have the inferencedata saved we can discuss that tomorrow. View entire conversation on ReviewNB |
@@ -65,8 +65,9 @@ | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use coords. This example tries to be more down-to-earth even if using simulated data so the number of clusters (k again) corresponds to the tree species, so it would be great to use as coord values real tree names :). For the number of observations, here they now correspond to the number of forests, so again, as there are only 10 "forests" we could also use real forest names.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have used coords/dims, what do you mean by using real tree names? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the dimension should be tree
, not num_trees
because the number of trees is an integer, k
in this case. tree_num
or tree_id
could also work, but I'd much rather have non-integer coordinate values now that they make sense. Here we count trees on different forests, so we can use the names of the trees we count and the names of the forests where we have counted these trees. The tree
dimension for example could have ["pine", "oak", "cedar", "cottonwook"...
and the coordinate values for the forest
dimension can also be real forest names. For example cool catalan forests: https://www.descobrir.cat/ca/notices/2021/03/7-boscos-espectaculars-per-descobrir-a-catalunya-4742.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, that's a cool idea. will add that!
@@ -65,8 +65,9 @@ | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #3. draws=int(5e3), chains=4, step=pm.Metropolis(), return_inferencedata=True
5000
is shorter than int(5e3)
and I find it more clear
Reply via ReviewNB
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T18:15:48Z k represents the number of components, n the number of observations, so |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T18:15:49Z truncating (...) to thirty components |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-24T18:15:49Z Line #5. ax.bar(plot_w - 0.5, posterior["w"].mean(axis=1), width=1.0, lw=0) Here we want the mean over the chain and draw dimensions, so we don't really need to use the stacked posterior, and as a general rule, we should not use |
will change the variable for weights to mu View entire conversation on ReviewNB |
then it would be better to wait for that I think because on contracting
Line #3. ax.plot(x_plot, sample_cdfs[0], c="gray", alpha=0.75, label="DP sample CDFs") Line #4. ax.plot(x_plot, sample_cdfs[1:].T, c="gray", alpha=0.75) to ax.plot(x_plot, sample_cdfs.T, c="gray", alpha=0.75, label="DP sample CDFs")
I end up getting multiple labels for View entire conversation on ReviewNB |
how about weights if that is what the variable represents? View entire conversation on ReviewNB |
wouldn't 'weights' be a little too big for a variable name? will change to that though if you think that would make things clearer View entire conversation on ReviewNB |
didnt get a chance to ask about using xarray with scipy.stats.norm functions, do you have any quick links I could refer to? View entire conversation on ReviewNB |
https://arviz-devs.github.io/arviz/user_guide/pymc3_refitting_xr_lik.html is the example closest to this situation, then also check the apply_ufunc docs and this answer: https://stackoverflow.com/questions/58719696/how-to-apply-a-xarray-u-function-over-netcdf-and-return-a-2d-array-multiple-new/62012973#62012973 View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-01T18:52:29Z Line #2. frac = pm.Dirichlet("frac", a=np.ones(k)) frac has dim |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-01T18:52:30Z This is looping over trees so it should use chiral-carbon commented on 2021-09-03T13:37:47Z even though I initialize |
even though I initialize View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-03T14:08:50Z Here you are using chiral-carbon commented on 2021-09-03T15:02:39Z using
I'm not sure why this is happening OriolAbril commented on 2021-09-03T15:53:03Z It's probably using |
using
I'm not sure why this is happening View entire conversation on ReviewNB |
It's probably using View entire conversation on ReviewNB |
30e7e7a
to
a842d8f
Compare
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-22T00:42:06Z the references here should be cited with the cite role so they are then also incorporated into the bibliography automatically. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-09-22T00:42:07Z the theme already adds a table of contents automatically, there is no longer need for a manual agenda |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-09-22T13:11:07Z Ohlssen, et al. provide justification for truncation, showing that
the link here points to http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf which gives an error. I'm also not sure which paper they are referring to from searching the keywords in the URL itself. it most likely is by this author but I'm not sure which paper this is supposed to cite. chiral-carbon commented on 2021-09-23T11:56:37Z hey @AustinRochford , could you maybe let me know which paper you meant to reference with http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf as the link now seems to be broken? |
hey @AustinRochford , could you maybe let me know which paper you meant to reference with http://fisher.osu.edu/~schroeder.9/AMIS900/Ohlssen2006.pdf as the link now seems to be broken? View entire conversation on ReviewNB |
8355d56
to
6e688b5
Compare
correct notation in text fixes and changes initialize trees and forests for dims use from_pymc3 changes
0ea547f
to
78ce655
Compare
Addresses issues #100 and #101 and aims to advance the notebooks to best practices.