Skip to content

[MRG] Added random_states #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 29, 2016

Conversation

bhargavvader
Copy link
Contributor

@bhargavvader bhargavvader commented Sep 19, 2016

Added options to pass a random_seed for all the algorithms which use it.
Addresses #33, and will help the tests for #26.

@bhargavvader
Copy link
Contributor Author

ping @perimosocordiae.
Could you please review this?

Copy link
Contributor

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docs, the object type should be a numpy RandomState.

"""
the random state object to be passed must be a random.seed() object and not a numpy random seed
"""
state = random_state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random_state object isn't used in this function, so I'd drop it entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it used because random is used in random.randint and random.sample?
I did not mention numpy on purpose, because while numpy randomstate isn't being used, a random seed still is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. We should convert those calls to use the numpy random module. Then, the same technique used in the other methods will apply.

"""
the random state object to be passed must be a numpy random seed
"""
state = random_state
n = len(all_labels)
num_ignored = max(0, n - num_preserved)
idx = np.random.randint(n, size=num_ignored)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of passing in a random state is to generate the random numbers we need from it. So in this case, it would look like:

def random_subset(all_labels, num_preserved=np.inf, random_state=np.random):
   ...
   idx = random_state.randint(n, size=num_ignored)
   ...

We can use the np.random module as the default value, which means it'll call np.random.randint by default.
This wouldn't support passing in a raw numeric seed, but that's ok by me: users can just pass in np.random.RandomState(1234) if they want to use seed 1234.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool. I'll add the np.random as the default in all the other classes as well.

@bhargavvader
Copy link
Contributor Author

@perimosocordiae could you let me know how to handle random seeds in the chunks method? It uses random but not the numpy one so the seed passed is different.

Could you also have a look at the other classes to see if it is fine?

@bhargavvader
Copy link
Contributor Author

@perimosocordiae , I addressed the comments and added np.random as default.

chunks = -np.ones_like(self.known_label_idx, dtype=int)
uniq, lookup = np.unique(self.known_labels, return_inverse=True)
all_inds = [set(np.where(lookup==c)[0]) for c in xrange(len(uniq))]
idx = 0
while idx < num_chunks and all_inds:
c = random.randint(0, len(all_inds)-1)
c = np.random.randint(0, high=len(all_inds)-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but we need to be using random_state here:

c = random_state.randint(0, high=len(all_inds)-1)

(Same comment for all the other places where we're directly calling functions on np.random in this module.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course - I feel rather silly now. Pushing the right commit!

@bhargavvader
Copy link
Contributor Author

@perimosocordiae, again, incorporated suggestions and changes (silly, stupid oversights).

@perimosocordiae
Copy link
Contributor

Looking much better, thanks! The only remaining issue is the positive_negative_pairs method (and its helper method _pairs) on the Constraints object. These methods should have a random_state keyword argument just like the others.

@bhargavvader
Copy link
Contributor Author

Ah, I hadn't noticed it. Made the changes.

@bhargavvader bhargavvader changed the title Added random_states [MRG] Added random_states Sep 29, 2016
@perimosocordiae perimosocordiae merged commit 890b6a8 into scikit-learn-contrib:master Sep 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants