Skip to content

WIP: graph representation of model #1683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

stevenjkern
Copy link

This is an attempt to address #1547, but it is neither clever nor fancy.

I have added an attribute to most of the Distribution classes that list the names of the distribution's parameters and adds a graph attribute (a networkx.DiGraph) to the Model which gets populated each time a new variable is added to the model by inspecting the values of the added distribution's parameters. I am totally open to suggestions for other approaches to accomplish this.

Below is an example of the graph in pymc3/examples/gelman_schools.py:

(pymc3_dev)skern pymc3[enh/model-to-graph]$ python -i pymc3/examples/gelman_schools.py 
Auto-assigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = -44.489: 100%|█████████████████████████████████████████████████████████████████| 200000/200000 [00:11<00:00, 17496.64it/s]
Finished [100%]: Average ELBO = -44.474
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 418.49it/s]
/home/skern/.local/git_repos/pymc3/pymc3/stats.py:202: UserWarning: Estimated shape parameter of Pareto distribution is
        greater than 0.7 for one or more samples.
        You should consider using a more robust model, this is
        because importance sampling is less likely to work well if the marginal
        posterior and LOO posterior are very different. This is more likely to
        happen with a non-robust model and highly influential observations.
  happen with a non-robust model and highly influential observations.""")
>>> import networkx as nx
>>> import matplotlib.pyplot as plt
>>> pos = nx.spectral_layout(schools.graph)
>>> nx.draw_networkx_nodes(schools.graph, pos)
<matplotlib.collections.PathCollection object at 0x7fa3c5098410>
>>> nx.draw_networkx_edges(schools.graph, pos)
<matplotlib.collections.LineCollection object at 0x7fa3d56dcc10>
>>> labels = {node: str(node) for node in schools.graph.nodes()}
>>> nx.draw_networkx_labels(schools.graph, pos, labels)
{eta: <matplotlib.text.Text object at 0x7fa3c50ab910>, obs: <matplotlib.text.Text object at 0x7fa3c50ba490>, tau_log_: <matplotlib.text.Text object at 0x7fa3c50ba050>, mu: <matplotlib.text.Text object at 0x7fa3c50abbd0>}
>>> plt.show()

gelman_schools_graph

@ferrine
Copy link
Member

ferrine commented Jan 19, 2017

Wow great, I think that layout needs to be more pretty, and variable parameters associated with one random variable should be detected.

@ColCarroll
Copy link
Member

I agree with ferrine -- this is a really cool PR!

I would prefer the API to be decoupled from the Model class, so that in your above example, you could instead just call pm.plot_network(schools, layout='spectral', labels=True) (or something like that!)

I suspect you could derive all the necessary information from the model (including, probably, the variable parameter names), rather than updating state, but I haven't tried this myself yet.

@stevenjkern
Copy link
Author

stevenjkern commented Jan 20, 2017

Thanks @ferrine and @ColCarroll, I can extract the graph building logic entirely into its own function that takes the model as an argument and exposes all of the networkx layouts (at last the ones that don't depend on GraphViz/pygraphviz`).

Also tests, obviously.

@twiecki
Copy link
Member

twiecki commented Jan 30, 2017

@stevenjkern Does the plot this creates still look like the one above?

@stevenjkern
Copy link
Author

@twiecki, I've added the non-pygraphviz-dependent layouts as optional layouts for the matplotlib plotting that networkx exposes (circular, shell, spectral, spring, force-directed, random). An example below is from the lasso_missing.py example drawn with a circular layout. The example above was a spring layout, for reference. But the plotting function currently makes the drawing of the plot via matplotlib optional and returns the graph object that can be passed onto whatever networkx-supporting plotting utility the user desires, e.g. Graphcanvas*, Bokeh.

I am planning on getting back to this PR later this week to distributions that I'd missed and add tests.

image

* Full disclosure: I am the maintainer of Graphcanvas.

@rtbs-dev
Copy link

rtbs-dev commented Feb 8, 2017

Interesting, I've been searching for a way to get the pymc2 graphing ability to work in pymc3, functionality like this would be much appreciated for things like Bayesian belief networks (like BayesFusion's Genie).

I actually just saw something relevant to this on r/python, but with the difference that the with pm.Model() as model syntax would be able to take a NeworkX DAG object as input and build up the model relationships that way. Sort of a rapid prototyping via the NX graph.

Something like this possible/useful?

@rtbs-dev
Copy link

rtbs-dev commented Feb 8, 2017

@stevenjkern It should be relatively easy to add a method that sets the "positions" of the nodes so that each level of the hierarchy appears in order w.r.t the other levels. Should make reading these graphs much easier (more like a flow chart) instead of the defaults like circular.

Would that be helpful? I'll try to mock up an example in the next few days here.

@twiecki
Copy link
Member

twiecki commented Feb 8, 2017

Dask has nice graphs, what do they use?

image

@stevenjkern
Copy link
Author

I believe Dask uses dot and GraphViz for layouts. We can add a tree layout as a separate function without much trouble. I just didn't expose networkx's tree layout because it is dependent upon pygraphviz and GraphViz and I didn't want to have to add another requirement to the package beyond networkx.

@fonnesbeck
Copy link
Member

Dask does use GraphViz, and its a nightmare. The Python bindings are very poorly maintained, and in fact they have stopped installing it with Dask on Conda (or are in the process of doing so). The graphs are pretty when they work, and there desperately needs to be a project to replace it. In the meantime, I am not sure what the best replacement is.

@PaulSorenson
Copy link
Contributor

I have run into trouble with pydot in the past but I noticed that the pypi version 1.2.3 is py2/3 and fairly recent.

@rtbs-dev
Copy link

rtbs-dev commented Feb 9, 2017

@fonnesbeck You're right, yesterday I tried recreating my old pygraphviz setup on a fresh conda install, and it was terrible. The graphviz install itself stopped adding PATH vars to the registry for windows users, so that in an of itself is an extra layer of difficulty if you were to use it as a dependency.

Any solution will probably need to either rely solely on networkX, or maybe output a tikz object of some kind to be rendered in a markdown cell (for jupyter users, anyway). I'm trying my hand at a pure NX solution based somewhat off of this answer on stackoverflow

@stevenjkern
Copy link
Author

If all we really need is a tree layout, we probably don't need to bring in any extra libraries that we aren't enthusiastic about. In Graphcanvas there is a non-pygraphviz implementation of a moderately attractive tree layout. I can't speak to it's code quality as it predates my involvement in the project by several years, but it works, it's fairly speedy, and something like it could by implemented here without much trouble.

@stevenjkern
Copy link
Author

I spent some time putting together a tree layout using just networkx. How does this look? The model is from the lasso_block_update.ipynb notebook in the docs.

image

@twiecki
Copy link
Member

twiecki commented Feb 13, 2017

@stevenjkern Looks much better!

@rtbs-dev
Copy link

@stevenjkern looking great. So, to recapture some of the similar functionality as PyMC2 had (as shown here), It would be nice to alter the node shape based on whether a node is stochastic, deterministic (which as far as I can tell aren't currently tracked by the tree-builder, nor do I really know how they could be), or observed (perhaps just a check on whether the observed variable is defined?) , which would pretty much complete the visualization of the model itself.

The only other features I can think of are perhaps a coloring based on the sampled posterior mean values (for use after sampling, of course). That may or may not be a far-future kind of feature though.

@mrocklin
Copy link

FYI I believe that the graphviz conda packages were improved yesterday. Graphviz may now be less painful.

@bkanuka
Copy link

bkanuka commented Dec 5, 2017

@stevenjkern Any update on this? I'm building a pretty complex model and this would be helpful for me!

@rtbs-dev
Copy link

rtbs-dev commented Dec 5, 2017

@bkanuka Not sure if this will be robust enough for your model, but in tandem with a PMML parser there's some decent visualization ability through NetworkX here. There's quite a few limitations atm, but the visualization auto-parses latex-style variable names and observed types.

@twiecki
Copy link
Member

twiecki commented Dec 5, 2017

@tbsexton Those are beautiful visualizations! Do you think with this PR we could get something similar in PyMC3?

@ColCarroll
Copy link
Member

@mrocklin had some very slick in-notebook graphs in his https://github.com/mrocklin/streamz library. See the inline examples here: https://streamz.readthedocs.io/en/latest/ , and I think they were implemented with minimal fuss.

@mrocklin
Copy link

mrocklin commented Dec 5, 2017 via email

@rtbs-dev
Copy link

rtbs-dev commented Dec 5, 2017

@mrocklin That's great to hear. I went ahead and used graphviz in the above-mentioned visualizations, just to get reasonable heirarchical tree-layouts.

It does work much better now than it used to, except for

  1. the conflicting pip/conda requirements between packages (i.e. networkx uses pydot? sometimes pygraphviz? Many different options at varying states of deprecation.) and
  2. It has been very difficult for my windows/non-linux coworkers to be able to get graphviz working nicely. I've resorted to docker in some cases, but that's a whole other animal. On the other hand, the entire graphviz website looks much newer/better upon recent inspection and I don't see the "removal of windows support" notice anymore...very good to see.

@twiecki Definitely possible. Just needs some tweaking to the networkX graph representation, and allowing a dependency on graphviz/pydot (from networkx.drawing.nx_pydot import graphviz_layout)

@zaxtax
Copy link
Contributor

zaxtax commented Jan 8, 2018

Is there any opposition to using Daft (http://daft-pgm.org)?

@fonnesbeck
Copy link
Member

@zaxtax I don't think there is opposition to any particular package. My impression with Daft is that it is a little harder to automate than PyDot.

@rtbs-dev
Copy link

rtbs-dev commented Jan 8, 2018

I did do a bit of playing around with daft. As far as my experience with it (this is ~ 6mo ago) I like the package a lot for having only a dependency on matplotlib.

However, I believe it currently only supports drawing nodes with the patches.Ellipse object, which for my original use-case wasn't quite enough (often, deterministic nodes are square). Additionally, there's no default lay-out algorithm implemented, which means one must define the position of each node manually. If something like the dot package's algorithm could be hacked into a daft layout routine, I'd be all for using this as a pure-python solution.

@zaxtax
Copy link
Contributor

zaxtax commented Jan 8, 2018 via email

@rtbs-dev
Copy link

rtbs-dev commented Jan 8, 2018

If that's the case, what's the overall feeling of combining something like grandalf and Daft, to get pretty nice looking layouts in pure-python?

In all honesty, it may be worth submitting a PR to daft with a layout module dependent on grandalf, and then adding this functionality to pymc3.

@fonnesbeck
Copy link
Member

@tbsexton That's probably a good idea.

I would not be too hung up on dependencies for something like this. I think its worth it to get nicer diagrams, plus this module (like our plotting modules) would likely be optional components anyway.

@zaxtax
Copy link
Contributor

zaxtax commented Jan 8, 2018

@tbsexton @fonnesbeck how actively maintained is daft these days?

As it's only one file, we could vendor it.

@stevenjkern
Copy link
Author

Closing as superseded by #3049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants