-
Notifications
You must be signed in to change notification settings - Fork 229
[MRG] Refactor the metric() method #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
a3384b1
8e0d197
6dd118e
c7e40f6
1947ea5
bee6902
646cf97
00d37c9
c9eefb4
bd6aac0
22141f5
9e447f6
201320b
4b660fa
61a33cc
72153ed
d943406
d2c0614
5e29295
92669ae
a2955e0
c8708b2
7d4efd9
0c7c5dc
7dfd874
80c2943
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
from numpy.linalg import cholesky | ||
from scipy.spatial.distance import euclidean, _validate_vector | ||
from sklearn.base import BaseEstimator | ||
from sklearn.utils.validation import _is_arraylike | ||
from sklearn.metrics import roc_auc_score | ||
|
@@ -34,6 +36,13 @@ def score_pairs(self, pairs): | |
------- | ||
scores: `numpy.ndarray` of shape=(n_pairs,) | ||
The score of every pair. | ||
|
||
See Also | ||
-------- | ||
get_metric : a method that returns a function to compute the metric between | ||
two points. The difference is that it works on two 1D arrays and cannot | ||
use a preprocessor. Besides, the returned function is independent of | ||
the metric learner and hence is not modified if the metric learner is. | ||
""" | ||
|
||
def check_preprocessor(self): | ||
|
@@ -85,6 +94,24 @@ def _prepare_inputs(self, X, y=None, type_of_inputs='classic', | |
tuple_size=getattr(self, '_tuple_size', None), | ||
**kwargs) | ||
|
||
@abstractmethod | ||
def get_metric(self): | ||
"""Returns a function that returns the learned metric between two points. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Returns a function that takes as input two 1D arrays and outputs the learned metric score on these two points? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed it's clearer indeed |
||
This function will be independent from the metric learner that learned it | ||
(it will not be modified if the initial metric learner is modified). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add that the returned function can be directly plugged into the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, will do |
||
|
||
Returns | ||
------- | ||
metric_fun : function | ||
The function described above. | ||
|
||
See Also | ||
-------- | ||
score_pairs : a method that returns the metric between several pairs of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the metric score There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, will do |
||
points. But this is a method of the metric learner and therefore can | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -But +Unlike get_metric There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, will do |
||
change if the metric learner changes. Besides, it can use the metric | ||
learner's preprocessor, and works on concatenated arrays. | ||
""" | ||
|
||
class MetricTransformer(six.with_metaclass(ABCMeta)): | ||
|
||
|
@@ -146,6 +173,16 @@ def score_pairs(self, pairs): | |
------- | ||
scores: `numpy.ndarray` of shape=(n_pairs,) | ||
The learned Mahalanobis distance for every pair. | ||
|
||
See Also | ||
-------- | ||
get_metric : a method that returns a function to compute the metric between | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same updates as above |
||
two points. The difference is that it works on two 1D arrays and cannot | ||
use a preprocessor. Besides, the returned function is independent of | ||
the metric learner and hence is not modified if the metric learner is. | ||
|
||
:ref:`mahalanobis_distances` : The section of the project documentation | ||
that describes Mahalanobis Distances. | ||
""" | ||
pairs = check_input(pairs, type_of_inputs='tuples', | ||
preprocessor=self.preprocessor_, | ||
|
@@ -177,8 +214,57 @@ def transform(self, X): | |
accept_sparse=True) | ||
return X_checked.dot(self.transformer_.T) | ||
|
||
def metric(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should keep this for now but mark it as deprecated, and point to the new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree, I forgot about that |
||
return self.transformer_.T.dot(self.transformer_) | ||
def get_metric(self): | ||
"""Returns a function that returns the learned metric between two points. | ||
This function will be independent from the metric learner that learned it | ||
(it will not be modified if the initial metric learner is modified). | ||
|
||
Returns | ||
------- | ||
metric_fun : function | ||
The function described above. | ||
|
||
See Also | ||
-------- | ||
score_pairs : a method that returns the metric between several pairs of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, will do |
||
points. But this is a method of the metric learner and therefore can | ||
change if the metric learner changes. Besides, it can use the metric | ||
learner's preprocessor, and works on concatenated arrays. | ||
|
||
:ref:`mahalanobis_distances` : The section of the project documentation | ||
that describes Mahalanobis Distances. | ||
""" | ||
transformer_T = self.transformer_.T.copy() | ||
|
||
def metric_fun(u, v): | ||
"""This function computes the metric between u and v, according to the | ||
previously learned metric. | ||
|
||
Parameters | ||
---------- | ||
u : array-like, shape=(n_features,) | ||
The first point involved in the distances computation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. distance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, will do |
||
v : array-like, shape=(n_features,) | ||
The second point involved in the distances computation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. distance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will do |
||
Returns | ||
------- | ||
distance: float | ||
The distance between u and v according to the new metric. | ||
""" | ||
u = _validate_vector(u) | ||
v = _validate_vector(v) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here I use scipy's _validate_vector function (used in functions like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it means it's subject to change and we shouldn't depend on it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Allright, I'll replace it by something else |
||
return euclidean(u.dot(transformer_T), v.dot(transformer_T)) | ||
return metric_fun | ||
|
||
def get_mahalanobis_matrix(self): | ||
"""Returns a copy of the Mahalanobis matrix learned by the metric learner. | ||
|
||
Returns | ||
------- | ||
M : `numpy.ndarray`, shape=(n_components, n_features) | ||
The copy of the learned Mahalanobis matrix. | ||
""" | ||
return self.transformer_.T.dot(self.transformer_).copy() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's no need for a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right thanks, I don't know why I left the copy there... |
||
|
||
|
||
class _PairsClassifierMixin(BaseMetricLearner): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
import pytest | ||
import numpy as np | ||
from numpy.testing import assert_array_almost_equal | ||
from scipy.spatial.distance import pdist, squareform | ||
from scipy.spatial.distance import pdist, squareform, euclidean | ||
from sklearn import clone | ||
from sklearn.utils import check_random_state | ||
from sklearn.utils.testing import set_random_state | ||
|
@@ -167,3 +167,47 @@ def test_embed_is_linear(estimator, build_dataset): | |
model.transform(X[10:20])) | ||
assert_array_almost_equal(model.transform(5 * X[:10]), | ||
5 * model.transform(X[:10])) | ||
|
||
|
||
@pytest.mark.parametrize('estimator, build_dataset', metric_learners, | ||
ids=ids_metric_learners) | ||
def test_get_metric_equivalent_to_transform_and_euclidean(estimator, | ||
build_dataset): | ||
"""Tests that the get_metric method of mahalanobis metric learners is the | ||
euclidean distance in the transformed space | ||
""" | ||
rng = np.random.RandomState(42) | ||
input_data, labels, _, X = build_dataset() | ||
model = clone(estimator) | ||
set_random_state(model) | ||
model.fit(input_data, labels) | ||
metric = model.get_metric() | ||
n_features = X.shape[1] | ||
a, b = (rng.randn(n_features), rng.randn(n_features)) | ||
euc_dist = euclidean(model.transform(a[None]), model.transform(b[None])) | ||
assert (euc_dist - metric(a, b)) / euc_dist < 1e-15 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here I put 1e-15 because it fails for 1e-16 (I guess the transform plus euclidean distance give slightly different result from my implementation (transform of the difference plus sqrt of sum of squares)). But I think it's still OK right ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or you could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I agree it's better to use a built-in function. I just saw that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes relative error is probably better |
||
|
||
|
||
@pytest.mark.parametrize('estimator, build_dataset', metric_learners, | ||
ids=ids_metric_learners) | ||
def test_get_metric_is_pseudo_metric(estimator, build_dataset): | ||
"""Tests that the get_metric method of mahalanobis metric learners returns a | ||
pseudo-metric (metric but without one side of the equivalence of | ||
the identity of indiscernables property) | ||
""" | ||
rng = np.random.RandomState(42) | ||
input_data, labels, _, X = build_dataset() | ||
model = clone(estimator) | ||
set_random_state(model) | ||
model.fit(input_data, labels) | ||
metric = model.get_metric() | ||
|
||
n_features = X.shape[1] | ||
a, b, c = (rng.randn(n_features) for _ in range(3)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps it would be more convincing to test that these are true on a set of random triplets (a, b, c), instead of a single one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I agree, will do |
||
assert metric(a, b) >= 0 # positivity | ||
assert metric(a, b) == metric(b, a) # symmetry | ||
# one side of identity indiscernables: x == y => d(x, y) == 0. The other | ||
# side is not always true for Mahalanobis distances. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not exactly sure what this comment means. Can you elaborate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mahalanobis distances are only a "pseudo" metric because they do not satisfy the "identity of indiscernables": There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, |
||
assert metric(a, a) == 0 | ||
# triangular inequality | ||
assert metric(a, c) <= metric(a, b) + metric(b, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference with score_pairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will do