-
Notifications
You must be signed in to change notification settings - Fork 229
Implementation of MMC #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure whether PGDM is the correct name for this algorithm. The survey of Yang introduces this method by Xing et al. as "Supervised Global Distance Metric Learning" and then extends this approach in a subsection to "Probabilistic Global Distance Metric Learning". But the paper and the reference implementation referred to in #13 are not probabilistic at all. Maybe we should change the name to just "Global Distance Metric Learning", which could be abbreviated as GDML, or to MMC, as suggested by Weinberger et al. How do you think about this? |
I agree that the wish list was wrongly conflating this approach with the probabilistic version from the survey, and I'm happy to rename this version to something better. Despite the referenced usage, I think MMC is a bit of a bad name, as the original paper uses clustering as an example application, not the entire basis for the work. Do any other citations propose nice names for this algorithm? In lieu of any better ideas, GDML can work, even though it's pretty uninformative. |
I agree with you that both MMC and GDML are not the most informative names. Unfortunately, I haven't spotted any other term for this algorithm in the existing literature yet and Xing et al. completely avoid giving their algorithm a name (besides just "metric learning"). Although this algorithm is, for sure, applicable to more applications than just clustering, one could also argue that learning a metric that is particularly suitable for clustering is an essential part of the method's objective. It minimizes the total intra-class distance, while keeping the total inter-class distance larger than a given margin. Thus, the classes would be tightly clustered together in the space enforced by such a metric. However, that might, of course, also be the case for many other metric learning algorithms. I leave this decision to you. As soon as you notify me about the final name, I will rename everything and update this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice addition! I especially appreciate the thorough comments and docstrings.
test/metric_learn_test.py
Outdated
b.append(j) | ||
else: | ||
c.append(i) | ||
d.append(j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made a bit nicer with vectorization:
mask = self.iris_labels[None] == self.iris_labels[:,None]
a, b = np.nonzero(np.triu(mask, k=1))
c, d = np.nonzero(np.triu(~mask, k=1))
test/metric_learn_test.py
Outdated
d.append(j) | ||
|
||
# Full metric | ||
pgdm = PGDM(convergence_threshold = 0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nitpick: don't leave spaces around the =
in keyword arguments. (here and throughout)
metric_learn/pgdm.py
Outdated
from .constraints import Constraints | ||
|
||
|
||
# hack around lack of axis kwarg in older numpy versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is used in ITML as well, so let's pull this hack out into a new file, metric_learn/_util.py
.
metric_learn/pgdm.py
Outdated
|
||
# check to make sure that no two constrained vectors are identical | ||
a,b,c,d = constraints | ||
ident = _vector_norm(X[a] - X[b]) > 1e-9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this usage, ident
is actually a mask where True implies non-identical. Maybe non_ident
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree. I've copied this part from ITML. We should also change it there.
metric_learn/pgdm.py
Outdated
ident = _vector_norm(X[a] - X[b]) > 1e-9 | ||
a, b = a[ident], b[ident] | ||
ident = _vector_norm(X[c] - X[d]) > 1e-9 | ||
c, d = c[ident], d[ident] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also ensure that our constraint vectors aren't empty after this sanity check.
metric_learn/pgdm.py
Outdated
Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz | ||
""" | ||
|
||
from __future__ import print_function, absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing division
is a good idea as well.
metric_learn/pgdm.py
Outdated
t = w.dot(A.ravel() / 100.0) | ||
|
||
w1 = w / np.linalg.norm(w) # make `w` a unit vector | ||
t1 = t / np.linalg.norm(w) # distance from origin to `w^T*x=t` plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should compute the norm of w
once and reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, style nitpick: use two spaces between the end of the code and the comment character:
foo() # comment
metric_learn/pgdm.py
Outdated
obj_previous = self._fD(X, c, d, A_old) # g(A_old) | ||
obj = self._fD(X, c, d, A) # g(A) | ||
|
||
if ((obj > obj_previous) or (cycle == 0)) and (satisfy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need so many parens here:
if satisfy and (obj > obj_previous or cycle == 0):
metric_learn/pgdm.py
Outdated
if ((obj > obj_previous) or (cycle == 0)) and (satisfy): | ||
|
||
# If projection of 1 and 2 is successful, and such projection | ||
# imprives objective function, slightly increase learning rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imprives -> improves
metric_learn/pgdm.py
Outdated
dim = X.shape[1] | ||
diff = X[c] - X[d] | ||
M = np.einsum('ij,ik->ijk', diff, diff) # outer products of all rows in `diff` | ||
dist = np.sqrt(np.sum(M * A[None,:,:], axis = (1,2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might also benefit from conversion to an np.einsum
call.
Okay, I can live with MMC. |
Thanks for your valuable comments! I hope to find time for implementing the requested changes until next week. |
@perimosocordiae I've addressed your change requests and renamed the algorithm to MMC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a few more small comments.
metric_learn/mmc.py
Outdated
if len(a) == 0: | ||
raise RuntimeError('No similarity constraints given for MMC.') | ||
if len(c) == 0: | ||
raise RuntimeError('No dissimilarity constraints given for MMC.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a ValueError
makes more sense here. We should also specify that no non-trivial constraints were provided, otherwise the user might be confused.
metric_learn/mmc.py
Outdated
sum_deri = np.sum(M / (2 * (dist[:,None,None] + 1e-6)), axis = 0) | ||
M = np.einsum('ij,ik->ijk', diff, diff) # outer products of all rows in `diff` | ||
dist = np.sqrt(np.einsum('ijk,jk', M, A)) # equivalent to: np.sqrt(np.sum(M * A[None,:,:], axis=(1,2))) | ||
sum_deri = np.einsum('ijk,i->jk', M, 0.5 / (dist + 1e-6)) # equivalent to: np.sum(M / (2 * (dist[:,None,None] + 1e-6)), axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines shouldn't go longer than 80 chars, so I'd put the comments on the previous line.
metric_learn/mmc.py
Outdated
sum_deri2 = np.einsum( | ||
'ijk,i', | ||
np.einsum('ij,ik->ijk', diff_sq, diff_sq), | ||
-0.25 / np.maximum(1e-6, dist**3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested calls to np.einsum
are pretty hard to read. I think the previous version was fine, but you could also pull out the inner einsum
to a local variable if this version is significantly faster.
It's also possible to do einsum
with >2 arrays, but that can have unintended consequences (see numpy/numpy#5366). As of numpy 1.12 the situation is a bit better (see numpy/numpy#5488), but I don't think we need to go down that route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither do I think so. However, this version is about 4 times faster, since it does not require broadcasting dist
from 1-D to 3-D. So, I will just pull out the inner einsum
to a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found an even 10 times faster way to compute this without intermediate result in a single einsum
call:
sum_deri2 = np.einsum(
'ij,ik->jk',
diff_sq,
diff_sq / (-4 * np.maximum(1e-6, dist**3))[:,None]
)
@perimosocordiae Thanks, fixed :) |
Merged. Thanks for the PR, @Callidior! |
Thanks for your helpful comments! But please be aware that the metric learned by this algorithm defines a distance |
This PR implements Probabilistic Global Distance Metric Learning (PGDM) mentioned in #13, sometimes also referred to as MMC, originally proposed in the following paper:
The implementation is mainly a translation of the Matlab reference code into Python, only optimized a bit to make use of vectorization and the power of numpy. This makes this implementation 6 times faster than the reference code on the iris dataset.
Both variants of PGDM proposed in the original paper are supported: learning a full or a diagonal Mahalanobis matrix.
The implementation has been validated against the results obtained from the reference code on the iris dataset. Corresponding unit tests are provided.
The class separation score of less than 0.15 on the iris dataset is one of the best scores currently reported in the unit tests for this dataset.
Note that PGDM learns only a positive semi-definite metric
M
, which precludes obtaining a transformerL
withL.T*L=M
using the Cholesky decomposition. Instead,PGDM
overridestransformer()
and uses an eigenvector decompositionM = V * W * V.T
to obtain a transformerL = V.T * W^(-1/2)
.Important note: PGDM learns a distance function
(x-y)^T * M * (x-y)
, as opposed to(x-y)^T * inv(M) * (x-y)
as currently stated in the documentation ofmetric_learn
. Thus, this PR depends on solving issue #57, e.g., by merging PR #60. If you decide to stick to the current definition of a "metric",PGDM.metric()
would need to be changed in order to return the inverse of the learned matrix.