WIP: graph representation of model #1683

stevenjkern · 2017-01-19T19:06:28Z

This is an attempt to address #1547, but it is neither clever nor fancy.

I have added an attribute to most of the Distribution classes that list the names of the distribution's parameters and adds a graph attribute (a networkx.DiGraph) to the Model which gets populated each time a new variable is added to the model by inspecting the values of the added distribution's parameters. I am totally open to suggestions for other approaches to accomplish this.

Below is an example of the graph in pymc3/examples/gelman_schools.py:

(pymc3_dev)skern pymc3[enh/model-to-graph]$ python -i pymc3/examples/gelman_schools.py 
Auto-assigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = -44.489: 100%|█████████████████████████████████████████████████████████████████| 200000/200000 [00:11<00:00, 17496.64it/s]
Finished [100%]: Average ELBO = -44.474
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 418.49it/s]
/home/skern/.local/git_repos/pymc3/pymc3/stats.py:202: UserWarning: Estimated shape parameter of Pareto distribution is
        greater than 0.7 for one or more samples.
        You should consider using a more robust model, this is
        because importance sampling is less likely to work well if the marginal
        posterior and LOO posterior are very different. This is more likely to
        happen with a non-robust model and highly influential observations.
  happen with a non-robust model and highly influential observations.""")
>>> import networkx as nx
>>> import matplotlib.pyplot as plt
>>> pos = nx.spectral_layout(schools.graph)
>>> nx.draw_networkx_nodes(schools.graph, pos)
<matplotlib.collections.PathCollection object at 0x7fa3c5098410>
>>> nx.draw_networkx_edges(schools.graph, pos)
<matplotlib.collections.LineCollection object at 0x7fa3d56dcc10>
>>> labels = {node: str(node) for node in schools.graph.nodes()}
>>> nx.draw_networkx_labels(schools.graph, pos, labels)
{eta: <matplotlib.text.Text object at 0x7fa3c50ab910>, obs: <matplotlib.text.Text object at 0x7fa3c50ba490>, tau_log_: <matplotlib.text.Text object at 0x7fa3c50ba050>, mu: <matplotlib.text.Text object at 0x7fa3c50abbd0>}
>>> plt.show()

… is added

ferrine · 2017-01-19T20:51:35Z

Wow great, I think that layout needs to be more pretty, and variable parameters associated with one random variable should be detected.

ColCarroll · 2017-01-20T14:50:33Z

I agree with ferrine -- this is a really cool PR!

I would prefer the API to be decoupled from the Model class, so that in your above example, you could instead just call pm.plot_network(schools, layout='spectral', labels=True) (or something like that!)

I suspect you could derive all the necessary information from the model (including, probably, the variable parameter names), rather than updating state, but I haven't tried this myself yet.

stevenjkern · 2017-01-20T17:56:49Z

Thanks @ferrine and @ColCarroll, I can extract the graph building logic entirely into its own function that takes the model as an argument and exposes all of the networkx layouts (at last the ones that don't depend on GraphViz/pygraphviz`).

Also tests, obviously.

…etwork function

twiecki · 2017-01-30T22:32:15Z

@stevenjkern Does the plot this creates still look like the one above?

stevenjkern · 2017-01-30T23:15:42Z

@twiecki, I've added the non-pygraphviz-dependent layouts as optional layouts for the matplotlib plotting that networkx exposes (circular, shell, spectral, spring, force-directed, random). An example below is from the lasso_missing.py example drawn with a circular layout. The example above was a spring layout, for reference. But the plotting function currently makes the drawing of the plot via matplotlib optional and returns the graph object that can be passed onto whatever networkx-supporting plotting utility the user desires, e.g. Graphcanvas*, Bokeh.

I am planning on getting back to this PR later this week to distributions that I'd missed and add tests.

* Full disclosure: I am the maintainer of Graphcanvas.

rtbs-dev · 2017-02-08T15:30:59Z

Interesting, I've been searching for a way to get the pymc2 graphing ability to work in pymc3, functionality like this would be much appreciated for things like Bayesian belief networks (like BayesFusion's Genie).

I actually just saw something relevant to this on r/python, but with the difference that the with pm.Model() as model syntax would be able to take a NeworkX DAG object as input and build up the model relationships that way. Sort of a rapid prototyping via the NX graph.

Something like this possible/useful?

rtbs-dev · 2017-02-08T18:26:43Z

@stevenjkern It should be relatively easy to add a method that sets the "positions" of the nodes so that each level of the hierarchy appears in order w.r.t the other levels. Should make reading these graphs much easier (more like a flow chart) instead of the defaults like circular.

Would that be helpful? I'll try to mock up an example in the next few days here.

twiecki · 2017-02-08T19:49:39Z

Dask has nice graphs, what do they use?

stevenjkern · 2017-02-08T19:54:57Z

I believe Dask uses dot and GraphViz for layouts. We can add a tree layout as a separate function without much trouble. I just didn't expose networkx's tree layout because it is dependent upon pygraphviz and GraphViz and I didn't want to have to add another requirement to the package beyond networkx.

fonnesbeck · 2017-02-08T20:02:11Z

Dask does use GraphViz, and its a nightmare. The Python bindings are very poorly maintained, and in fact they have stopped installing it with Dask on Conda (or are in the process of doing so). The graphs are pretty when they work, and there desperately needs to be a project to replace it. In the meantime, I am not sure what the best replacement is.

PaulSorenson · 2017-02-09T01:37:35Z

I have run into trouble with pydot in the past but I noticed that the pypi version 1.2.3 is py2/3 and fairly recent.

rtbs-dev · 2017-02-09T14:00:40Z

@fonnesbeck You're right, yesterday I tried recreating my old pygraphviz setup on a fresh conda install, and it was terrible. The graphviz install itself stopped adding PATH vars to the registry for windows users, so that in an of itself is an extra layer of difficulty if you were to use it as a dependency.

Any solution will probably need to either rely solely on networkX, or maybe output a tikz object of some kind to be rendered in a markdown cell (for jupyter users, anyway). I'm trying my hand at a pure NX solution based somewhat off of this answer on stackoverflow

stevenjkern · 2017-02-09T14:27:39Z

If all we really need is a tree layout, we probably don't need to bring in any extra libraries that we aren't enthusiastic about. In Graphcanvas there is a non-pygraphviz implementation of a moderately attractive tree layout. I can't speak to it's code quality as it predates my involvement in the project by several years, but it works, it's fairly speedy, and something like it could by implemented here without much trouble.

stevenjkern · 2017-02-13T05:07:14Z

I spent some time putting together a tree layout using just networkx. How does this look? The model is from the lasso_block_update.ipynb notebook in the docs.

twiecki · 2017-02-13T07:56:20Z

@stevenjkern Looks much better!

rtbs-dev · 2017-02-13T15:12:12Z

@stevenjkern looking great. So, to recapture some of the similar functionality as PyMC2 had (as shown here), It would be nice to alter the node shape based on whether a node is stochastic, deterministic (which as far as I can tell aren't currently tracked by the tree-builder, nor do I really know how they could be), or observed (perhaps just a check on whether the observed variable is defined?) , which would pretty much complete the visualization of the model itself.

The only other features I can think of are perhaps a coloring based on the sampled posterior mean values (for use after sampling, of course). That may or may not be a far-future kind of feature though.

mrocklin · 2017-02-14T21:25:50Z

FYI I believe that the graphviz conda packages were improved yesterday. Graphviz may now be less painful.

bkanuka · 2017-12-05T11:53:52Z

@stevenjkern Any update on this? I'm building a pretty complex model and this would be helpful for me!

rtbs-dev · 2017-12-05T13:43:31Z

@bkanuka Not sure if this will be robust enough for your model, but in tandem with a PMML parser there's some decent visualization ability through NetworkX here. There's quite a few limitations atm, but the visualization auto-parses latex-style variable names and observed types.

twiecki · 2017-12-05T14:41:24Z

@tbsexton Those are beautiful visualizations! Do you think with this PR we could get something similar in PyMC3?

ColCarroll · 2017-12-05T14:46:07Z

@mrocklin had some very slick in-notebook graphs in his https://github.com/mrocklin/streamz library. See the inline examples here: https://streamz.readthedocs.io/en/latest/ , and I think they were implemented with minimal fuss.

mrocklin · 2017-12-05T14:49:01Z

FWIW I use and now recommend graphviz for static node-link diagrams. I haven't seen users complain of installation issues for a while now, so I assume that those problems have been sorted out. The only issue I've encountered recently when working with students directly is that people pip install graphviz, but don't install the system library, which can confuse them. We've worked around this with an informative error message.

…

On Tue, Dec 5, 2017 at 9:46 AM, Colin ***@***.***> wrote: @mrocklin <https://github.com/mrocklin> had some very slick in-notebook graphs in his https://github.com/mrocklin/streamz library. See the inline examples here: https://streamz.readthedocs.io/en/latest/ , and I think they were implemented with minimal fuss. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1683 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszBt4fqawoOXV94jVjdosaiJreU_6ks5s9VczgaJpZM4Loe5D> .

rtbs-dev · 2017-12-05T14:55:36Z

@mrocklin That's great to hear. I went ahead and used graphviz in the above-mentioned visualizations, just to get reasonable heirarchical tree-layouts.

It does work much better now than it used to, except for

the conflicting pip/conda requirements between packages (i.e. networkx uses pydot? sometimes pygraphviz? Many different options at varying states of deprecation.) and
It has been very difficult for my windows/non-linux coworkers to be able to get graphviz working nicely. I've resorted to docker in some cases, but that's a whole other animal. On the other hand, the entire graphviz website looks much newer/better upon recent inspection and I don't see the "removal of windows support" notice anymore...very good to see.

@twiecki Definitely possible. Just needs some tweaking to the networkX graph representation, and allowing a dependency on graphviz/pydot (from networkx.drawing.nx_pydot import graphviz_layout)

zaxtax · 2018-01-08T12:03:15Z

Is there any opposition to using Daft (http://daft-pgm.org)?

fonnesbeck · 2018-01-08T14:13:39Z

@zaxtax I don't think there is opposition to any particular package. My impression with Daft is that it is a little harder to automate than PyDot.

rtbs-dev · 2018-01-08T15:24:21Z

I did do a bit of playing around with daft. As far as my experience with it (this is ~ 6mo ago) I like the package a lot for having only a dependency on matplotlib.

However, I believe it currently only supports drawing nodes with the patches.Ellipse object, which for my original use-case wasn't quite enough (often, deterministic nodes are square). Additionally, there's no default lay-out algorithm implemented, which means one must define the position of each node manually. If something like the dot package's algorithm could be hacked into a daft layout routine, I'd be all for using this as a pure-python solution.

zaxtax · 2018-01-08T16:13:44Z

There is a PR for adding rectangular nodes. The lack of a default layout algorithm is the real bummer.

…

On Mon, Jan 8, 2018 at 3:24 PM, Thurston Sexton ***@***.***> wrote: I did do a bit of playing around with daft <https://github.com/usnistgov/pmml_pymcBN/blob/master/tests/weld_full/daft_net_png.py>. As far as my experience with it (this is ~ 6mo ago) I like the package a lot for having only a dependency on matplotlib. However, I believe it currently only supports drawing nodes with the patches.Ellipse object, which for my original use-case wasn't quite enough (often, deterministic nodes are square). Additionally, there's no default lay-out algorithm implemented, which means one must define the position of each node manually. If something like the dot package's algorithm could be hacked into a daft layout routine, I'd be all for using this as a pure-python solution. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1683 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAhUeb9syYzZ7SNx8C8WtptgKWAur0Oks5tIjMrgaJpZM4Loe5D> .

rtbs-dev · 2018-01-08T16:44:20Z

If that's the case, what's the overall feeling of combining something like grandalf and Daft, to get pretty nice looking layouts in pure-python?

In all honesty, it may be worth submitting a PR to daft with a layout module dependent on grandalf, and then adding this functionality to pymc3.

fonnesbeck · 2018-01-08T16:49:57Z

@tbsexton That's probably a good idea.

I would not be too hung up on dependencies for something like this. I think its worth it to get nicer diagrams, plus this module (like our plotting modules) would likely be optional components anyway.

zaxtax · 2018-01-08T19:04:03Z

@tbsexton @fonnesbeck how actively maintained is daft these days?

As it's only one file, we could vendor it.

stevenjkern · 2018-12-11T22:03:24Z

Closing as superseded by #3049

stevenjkern added 3 commits January 19, 2017 11:51

ENH: add param_names attribute to most distributions

70323fc

ENH: add node and edges to Model.graph attribute each time a variable…

0897c64

… is added

FIX: remove unneccesary parameter names

67276ba

stevenjkern added 3 commits January 21, 2017 21:25

MAINT: extract graph construction functionality to pymc3.plots.plot_n…

e48d789

…etwork function

MAINT: add networkx to requirements

ec4cbca

MAINT: add networkx to create_testenv.sh installation list

a27e303

fonnesbeck added enhancements WIP labels Jan 27, 2017

stevenjkern added 2 commits February 12, 2017 22:23

ENH: better model creation, tree layout

66f8081

Merge branch 'master' into enh/model-to-graph

843bbf9

twiecki mentioned this pull request Feb 13, 2017

Is there a way to represent a model as a DAG in Pymc3? #590

Closed

rtbs-dev mentioned this pull request Feb 21, 2017

Composing categorical distributions #1790

Closed

twiecki added the beginner friendly label Oct 24, 2017

ColCarroll mentioned this pull request Jun 22, 2018

Add graphviz model graphs #3049

Merged

twiecki mentioned this pull request Sep 26, 2018

Make draw_values draw from the joint distribution #3214

Closed

stevenjkern closed this Dec 11, 2018

Uh oh!

WIP: graph representation of model #1683

WIP: graph representation of model #1683

Uh oh!

Conversation

stevenjkern commented Jan 19, 2017

Uh oh!

ferrine commented Jan 19, 2017

Uh oh!

ColCarroll commented Jan 20, 2017

Uh oh!

stevenjkern commented Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Jan 30, 2017

Uh oh!

stevenjkern commented Jan 30, 2017

Uh oh!

rtbs-dev commented Feb 8, 2017

Uh oh!

rtbs-dev commented Feb 8, 2017

Uh oh!

twiecki commented Feb 8, 2017

Uh oh!

stevenjkern commented Feb 8, 2017

Uh oh!

fonnesbeck commented Feb 8, 2017

Uh oh!

PaulSorenson commented Feb 9, 2017

Uh oh!

rtbs-dev commented Feb 9, 2017

Uh oh!

stevenjkern commented Feb 9, 2017

Uh oh!

stevenjkern commented Feb 13, 2017

Uh oh!

twiecki commented Feb 13, 2017

Uh oh!

rtbs-dev commented Feb 13, 2017

Uh oh!

mrocklin commented Feb 14, 2017

Uh oh!

bkanuka commented Dec 5, 2017

Uh oh!

rtbs-dev commented Dec 5, 2017

Uh oh!

twiecki commented Dec 5, 2017

Uh oh!

ColCarroll commented Dec 5, 2017

Uh oh!

mrocklin commented Dec 5, 2017 via email

Uh oh!

rtbs-dev commented Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zaxtax commented Jan 8, 2018

Uh oh!

fonnesbeck commented Jan 8, 2018

Uh oh!

rtbs-dev commented Jan 8, 2018

Uh oh!

zaxtax commented Jan 8, 2018 via email

Uh oh!

rtbs-dev commented Jan 8, 2018

Uh oh!

fonnesbeck commented Jan 8, 2018

Uh oh!

zaxtax commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevenjkern commented Dec 11, 2018

Uh oh!

Uh oh!

stevenjkern commented Jan 20, 2017 •

edited

Loading

rtbs-dev commented Dec 5, 2017 •

edited

Loading

zaxtax commented Jan 8, 2018 •

edited

Loading