Skip to content

Text and SQLite backends for PyMC3 (update) #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Mar 1, 2014

These changes are based on the discussion in #449. Please let me know your thoughts.

@twiecki
Copy link
Member

twiecki commented Mar 1, 2014

======================================================================

ERROR: Failure: ImportError (No module named mock)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/loader.py", line 413, in loadTestsFromName

addr.filename, addr.module)

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath

return self.importFromDir(dir_path, fqname)

File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir

mod = load_module(part_fqname, fh, filename, desc)

File "/home/travis/build/pymc-devs/pymc/pymc/tests/test_ndarray_backend.py", line 6, in <module>

import mock

ImportError: No module named mock

Probably need to add mock to the travis build.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 1, 2014

Thanks. It's pre-installed [1] on travis, but I didn't consider that
we're using conda now.

[1] http://docs.travis-ci.com/user/languages/python/#Pre-installed-packages

kyleam added 7 commits March 1, 2014 11:49
This is only needed for python 2 because mock is in stdlib for python 3
(unittest.mock).
This commit contains a new backend for sampling and selecting values.
Non-backend files have been changed to work with the new backend.

This commit also merges the `sample` and `psample` functions. `sample`
now takes a keyword argument `njobs`, and if this is over one, the
multiprocessing version is used.
@twiecki
Copy link
Member

twiecki commented Mar 1, 2014

Is this good to merge then? I know @jsalvatier reviewed this and it certainly looks high quality.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 1, 2014

Is this good to merge then? I know @jsalvatier reviewed this and it
certainly looks high quality.

Sorry, I should have indicated that when I opened up the new PR. I think
it'd be nice to give the new version some time for review and
discussion.

@fonnesbeck
Copy link
Member

I notice that if you try running another chain using an existing backend, the chain variable does not get incremented in the backend (nor does the draw get reset to zero). It would be nice to have the backend check for the highest chain number before sampling, so that we can arbitrarily go back and add chains to a database without confusing which chain the samples came from.

Great work, BTW!

@fonnesbeck
Copy link
Member

Never mind. Now I see that the chain variable is set via sample. That's probably the best approach.

@fonnesbeck
Copy link
Member

After running a few tests, I think this looks really good. A couple of things:

  • we should add an example (or modify an existing one) that uses the SQLite backend
  • it might be nice to have a shortcut for using the backend with a default name. At present we have to pass trace = pymc.backends.SQLite('my_backend') to the sample call. Might be nice to be able to simply have trace="sqlite" do that for you, with a default name like "MCMC" or "model" for the database name.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 19, 2014

Thanks for the suggestions. I'll push an update soon.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 20, 2014

OK, I've pushed updates that incorporate your suggestions.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 20, 2014

As the Travis failures show, I only ran a subset of the tests locally.

@kyleam
Copy link
Contributor Author

kyleam commented Mar 20, 2014

The Travis issues should be fixed now.

I'm moving this under main so that it doesn't run with "test_examples".
This could be set up like the other examples, with a run definition that
allows for a short version, but I'd prefer not to for a couple of
reasons.

1. Everything aside from the SQLite backend is the same as
   "hierarchical.py", so it isn't testing much more for the time added
   to the run.
2. This results in an SQLite file, so a cleanup should be added
   somewhere if it is run with "test_examples".
njobs = 1

data = np.random.normal(size=(2, 20))
model = pm.Model()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just get a stock model from tests/models.py or at least move this model into there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.


def test_multitrace_init_unique_chains(self):
trace0 = mock.Mock()
trace0.chain = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to set trace0.chain? That seems undesirable. In general it would be nice to mostly test external behavior and internal behavior less.

@jsalvatier
Copy link
Member

Kyle, I want to thank you for your hard work and patience. You've put a lot of effort into this and it shows. I really like the external interface to the trace objects. Its simple and functional.

I think I might have opinions about how the backends are implemented, but its clear to me that I'm not going to put in the effort to help make those changes right now. So I'm in favor of merging this soon.

@kyleam
Copy link
Contributor Author

kyleam commented Apr 7, 2014

(Since several of your comments address the same thing, I'll just respond in a general comment.)

I think there's a trade-off between having isolated/fast tests and relying too much on internal details and mocking. Based on your comments (and because my tests are the first place mocking occurs in the test suite), I'd guess we have different preferences on this. I've tried to also add higher level tests (like those in the dump/load and selection tests). For now, I'd prefer to keep these unit tests and just extend the higher level tests as desired. The unit tests add a very small amount of time to the total test suite, and they can be deleted later if they become problematic.

@kyleam
Copy link
Contributor Author

kyleam commented Apr 7, 2014

I'll incorporate some of your suggestions, and then rebase this (since it will won't apply cleanly to recent changes in master) and open a new PR.

@jsalvatier
Copy link
Member

That's true, about fast tests. Our tests are generally pretty slow.

I look forward to seeing your changes.

@kyleam kyleam closed this Apr 9, 2014
@kyleam kyleam deleted the pymc3-backends-2 branch May 25, 2014 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants