-
Notifications
You must be signed in to change notification settings - Fork 229
Add description of algorithms to the doc #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, thanks for the PR! I will review as soon as possible (most likely early next week) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution @hansen7 ! I just did a pass too with a few comments and nitpicks
@bellet and @perimosocordiae feel free to react on LMNN, on the convexity and slack variables remarks
doc/supervised.rst
Outdated
sharing the same label, and :math:`x_l` are all the other instances within | ||
that region with different labels, :math:`\eta_{ij}, y_{ij} \in \{0, 1\}` | ||
are both the indicators, :math:`\eta_{ij}` represents :math:`x_{j}` is the | ||
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` | |
k nearest neighbors (with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` |
doc/supervised.rst
Outdated
@@ -46,12 +46,31 @@ LMNN | |||
|
|||
Large-margin nearest neighbor metric learning. | |||
|
|||
`LMNN` learns a Mahanalobis distance metric in the kNN classification | |||
`LMNN` learns a Mahalanobis distance metric in the kNN classification | |||
setting using semidefinite programming. The learned metric attempts to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in your modifs, but I just realized this would be righter than what we have currently:
- setting using semidefinite programming. The learned metric attempts to keep
- k-nearest neighbors in the same class, while keeping examples from different
+ setting using semidefinite programming. The learned metric attempts to keep close
+ k-nearest neighbors from the same class, while keeping examples from different
doc/supervised.rst
Outdated
are both the indicators, :math:`\eta_{ij}` represents :math:`x_{j}` is the | ||
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` | ||
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+` | ||
is Hinge loss. In the optimization process, the second term is replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is Hinge loss. In the optimization process, the second term is replaced | |
is the Hinge loss. In the optimization process, the second term is replaced |
doc/supervised.rst
Outdated
|
||
.. math:: | ||
|
||
\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 + | |
\min_\mathbf{L}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 + |
doc/supervised.rst
Outdated
setting using semidefinite programming. The learned metric attempts to keep | ||
k-nearest neighbors in the same class, while keeping examples from different | ||
classes separated by a large margin. This algorithm makes no assumptions about | ||
the distribution of the data. | ||
|
||
The distance is learned using the following convex optimization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are optimizing on L
, I think our problem is not necessarily convex (it's the M
version that is). I am not sure we should say semidefinite programming either (above, l. 49-50).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The problem is not convex, so replace by
"The distance is learned by solving the following optimization problem"
doc/supervised.rst
Outdated
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` | ||
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+` | ||
is Hinge loss. In the optimization process, the second term is replaced | ||
by the slack variables :math:`\xi_{ijk}` for the sake of convexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure in our implementation we use slack variables, since we optimize on L
and in the paper they mention slack variables only when optimizing on M
(and for "sake of convexity", also same as above, I think it's convex only if we optimize on M
which we don't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we are not, we simply solve the problem by (sub)gradient descent, so this sentence should be removed
metric_learn/nca.py
Outdated
@@ -1,6 +1,40 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latex part below does not render well on my computer, but with this it's solved:
""" | |
r""" |
metric_learn/nca.py
Outdated
@@ -1,6 +1,40 @@ | |||
""" | |||
Neighborhood Components Analysis (NCA) | |||
Ported to Python from https://github.com/vomjom/nca | |||
|
|||
Neighborhood Components Analysis (`NCA`) is a distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're at it, I know that it's not completely done in metric-learn
yet, but we should try to adopt scikit-learn conventions for the user guide, i.e. :
- We should add short descriptions of the algorithms (without mathematical formulation) in the docstring of the class we want to document
- We should put the full description in the user guide
So I think what you could do is :
-
put a reference at the top of
metric_learn.nca.rst
like this:
https://github.com/scikit-learn/scikit-learn/blame/4140657700cc55830347c871134c8e982d29fab5/doc/modules/neighbors.rst#L515-L517 -
Put the following description in the docstring of the class
NCA
(at the top of the file we should just keep a title, the authors, and the encoding) with this line as a reference to the user guide: https://github.com/scikit-learn/scikit-learn/blob/4140657700cc55830347c871134c8e982d29fab5/sklearn/neighbors/nca.py#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the full doc with math descriptions in the docstrings is too much
metric_learn/rca.py
Outdated
@@ -7,6 +7,19 @@ | |||
Those relevant dimensions are estimated using "chunklets", | |||
subsets of points that are known to belong to the same class. | |||
|
|||
For a training set with :math:`n` training points in :math:`k` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as above here
metric_learn/sdml.py
Outdated
An efficient sparse metric learning in high-dimensional space via | ||
L1-penalized log-determinant regularization. | ||
ICML 2009 | ||
double regularization: L1-penalized on the off-diagonal elements of Mahalanobis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark here
Hi @hansen7, any reason you are closing this? I was planning to have a look today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hansen7 I interpreted you closing the branch as a mistake, so I have reviewed your PR. Please confirm that you still want to work on this - otherwise we will take it from there (we need this for our next release).
There are some changes to make, but this is definitely a good start. When you make the modifications in the rst files make sure you also update the docstrings accordingly
doc/supervised.rst
Outdated
@@ -46,12 +46,31 @@ LMNN | |||
|
|||
Large-margin nearest neighbor metric learning. | |||
|
|||
`LMNN` learns a Mahanalobis distance metric in the kNN classification | |||
`LMNN` learns a Mahalanobis distance metric in the kNN classification | |||
setting using semidefinite programming. The learned metric attempts to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the reference to semidefinite programming, as we are actually not solving the problem in the PSD matrix but in the unconstrained linear transformation matrix
doc/supervised.rst
Outdated
setting using semidefinite programming. The learned metric attempts to keep | ||
k-nearest neighbors in the same class, while keeping examples from different | ||
classes separated by a large margin. This algorithm makes no assumptions about | ||
the distribution of the data. | ||
|
||
The distance is learned using the following convex optimization: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The problem is not convex, so replace by
"The distance is learned by solving the following optimization problem"
doc/supervised.rst
Outdated
|
||
\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 + | ||
c\sum_{i, j, k}\eta_{ij}(1-y_{ij})[1+||\mathbf{L}(x_i-x_j)||^2-|| | ||
\mathbf{L}(x_i-x_l)||^2]_+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x_l should be x_k
doc/supervised.rst
Outdated
c\sum_{i, j, k}\eta_{ij}(1-y_{ij})[1+||\mathbf{L}(x_i-x_j)||^2-|| | ||
\mathbf{L}(x_i-x_l)||^2]_+) | ||
|
||
where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using the word "target" for x_i as in LMNN the term "target neighbors" are used for x_j
doc/supervised.rst
Outdated
\mathbf{L}(x_i-x_l)||^2]_+) | ||
|
||
where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors | ||
sharing the same label, and :math:`x_l` are all the other instances within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x_k
doc/weakly_supervised.rst
Outdated
@@ -317,6 +335,20 @@ implicit assumptions of MMC is that all classes form a compact set, i.e., | |||
follow a unimodal distribution, which restricts the possible use-cases of this | |||
method. However, it is one of the earliest and a still often cited technique. | |||
|
|||
This is the first Mahalanobis distance learning method, the algorithm aims at | |||
maximizing the sum of distances between all the instances from the dissimilar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not consistent with the first paragraph. While the two problems are equivalent, it is better to stick to the one which is actually solved in the code (which is to minimize the sum of distances between similar pairs while keeping the sum of dissimilar pairs larger than 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that more doc for additional algorithms can be added in a next PR. It would be nice to be able to merge this soon. @hansen7 do you expect to have time to address the small comments?
Hi, @wdevazelhes @bellet , thanks for the strong support! I have resolved the listed the issues, and add the mathematical descriptions of all the metric learning algorithm but the cov
. Please see attached, for the latex part, it is all work locally, and I have standardise the notations:
- uppercase bolded letters for matrices
- lowercase bolded letters for vectors
- uppercase unbolded letters for sets
- lowercase unbolded letters for scalars
doc/weakly_supervised.rst
Outdated
.. math:: | ||
|
||
\max_{\mathbf{M}\in\mathbb{S}_+^d}\sum_{(x_i, x_j)\in\mathbf{D}} | ||
d_{\mathbf{M}}(x_i, x_j)\qquad \qquad \text{s.t.} \qquad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason the \text{s.t.} does not show properly on my locally compiled version of the doc
doc/weakly_supervised.rst
Outdated
|
||
\max_{\mathbf{M}\in\mathbb{S}_+^d}\sum_{(x_i, x_j)\in\mathbf{D}} | ||
d_{\mathbf{M}}(x_i, x_j)\qquad \qquad \text{s.t.} \qquad | ||
\sum_{(x'_i, x'_j)\in\mathbf{S}} d^2_{\mathbf{M}}(x'_i, x'_j) \leq 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the primes are not necessary, it is clear without them as they come from the different S
metric_learn/lmnn.py
Outdated
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0` | ||
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+` | ||
is Hinge loss. In the optimization process, the second term is replaced | ||
by the slack variables :math:`\xi_{ijk}` for the sake of convexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not, see discussion above
metric_learn/nca.py
Outdated
@@ -1,6 +1,40 @@ | |||
""" | |||
Neighborhood Components Analysis (NCA) | |||
Ported to Python from https://github.com/vomjom/nca | |||
|
|||
Neighborhood Components Analysis (`NCA`) is a distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the full doc with math descriptions in the docstrings is too much
Sorry @bellet , I sort of mixing up the procedures for pushing the codes.. I thought my supplements has already been added, I will check all beforehand comments and make adjustments this weekend. |
Hi @hansen7 do you expect to have some time to work on this soon? |
yes... I will do it this weekend, sorry for the delay... |
Please, see the most recent recent push.. I have noticed that the arrangement of doc is quite different..plus, is it more systematic to add all the mathematical interpretations of the algorithms for |
@wdevazelhes @bellet , I have deleted the math part from the docstring, and use the same style for referencing as sklearn. But I am not sure about using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I have a few small change requests.
I think it would indeed be great to have a basic mathematical description for each algorithm. I think RCA is okay as it is - but there is currently nothing for itml, lfda, lsml and mlkr
doc/supervised.rst
Outdated
\mathbf{L}(x_i-x_l)||^2]_+) | ||
|
||
where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors | ||
where :math:`x_i` is an data point, :math:`x_j` are its k nearest neighbors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a data point
double regularization: L1-penalized on the off-diagonal elements of Mahalanobis | ||
matrix :math:`\mathbf{M}` and the log-determinant divergence between | ||
double regularization: an L1-penalization on the off-diagonal elements of the | ||
Mahalanobis matrix :math:`\mathbf{M}`, and a log-determinant divergence between | ||
:math:`\mathbf{M}` and :math:`\mathbf{M_0}` (set as either :math:`\mathbf{I}` | ||
or :math:`\mathbf{\Omega}^{-1}`, where :math:`\mathbf{\Omega}` is the | ||
covariance matrix). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empirical covariance matrix
doc/weakly_supervised.rst
Outdated
|
||
.. math:: | ||
|
||
\min_{\mathbf{M}} = \text{tr}((M_0 + \eta XLX^{T})\cdot M) - \log\det M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to bold all matrices
doc/weakly_supervised.rst
Outdated
\min_{\mathbf{M}} = \text{tr}((M_0 + \eta XLX^{T})\cdot M) - \log\det M | ||
+ \lambda ||M||_{1, off} | ||
|
||
where :math:`\mathbf{X}=[x_1, x_2, ..., x_n]`, :math:`\mathbf{L = D − K}` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where :math:\mathbf{X}=[x_1, x_2, ..., x_n]
is the training data
doc/weakly_supervised.rst
Outdated
:math:`\mathbf{K}` is the incidence matrix to encode the (dis)similarity | ||
information as :math:`\mathbf{K}_{ij} = 1` if :math:`(x_i,x_j)\in \mathbf{S}`, | ||
:math:`\mathbf{K}_{ij} = -1` if :math:`(x_i,x_j)\in \mathbf{D}`, | ||
:math:`||\cdot||_{1, off}` is the off-diagonal L1 norm of :math:`\mathbf{M}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a couple of issues here:
- please inverse the order: start with K, then D, then L so that everything needed has been defined when you introduce them
- D for dissimilar pairs conflicts with the matrix D. I suggest not to introduce new notations and simply say things like :math:
\mathbf{K}_{ij} = 1
if :math:(x_i,x_j)
is a similar pair
doc/weakly_supervised.rst
Outdated
+ \lambda ||M||_{1, off} | ||
|
||
where :math:`\mathbf{X}=[x_1, x_2, ..., x_n]`, :math:`\mathbf{L = D − K}` is | ||
the Laplacian matrix, :math:`\mathbf{D}` is a diagonal matrix whose diagonal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second occurrence of "diagonal" is not needed
doc/weakly_supervised.rst
Outdated
@@ -327,18 +348,17 @@ Side-Information, Xing et al., NIPS 2002 | |||
|
|||
`MMC` minimizes the sum of squared distances between similar examples, while | |||
enforcing the sum of distances between dissimilar examples to be greater than a | |||
certain margin. This leads to a convex and, thus, local-minima-free | |||
certain margin, default is 1. This leads to a convex and, thus, local-minima-free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not provide a way to change this margin (which is only a scaling factor anyway), so please simply say greater than 1.
Hi @hansen7 , yes, thanks for the update ! I think your PR is almost ready for merging:
You did it well, there's just need to replace Regarding Also, there's just the following problem that have appeared: I noticed that when we have a reference for the title, the link of the algorithm in the description are now broken (like when we have And I guess as soon as you have resolved @bellet's comments, we should be able to merge the PR Also, there are conflicts that you need to resolve (but it's ok, just in the gitignore you should accept both changes (I think we want to gitignore idea and DS_Store files) |
Yes, I agree that more doc for additional algorithms can be added in a next PR. It would be nice to be able to merge this soon. @hansen7 do you expect to have time to address the small comments? |
Hi @wdevazelhes and @bellet , thanks for the support! I have resolved all the comments listed above, plz check the most recent branch! I have added mathematical descriptions for all the metric learning algorithms but the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work, @hansen7 !
I have indicated a few simple changes to make to clarify the formulations, then I think we are good to merge
doc/supervised.rst
Outdated
\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\ | ||
|
||
here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity | ||
matrix :math:`\mathbf{A}`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the affinity matrix is not defined. You should make clear that it is given as input to the algorithm, and explain its semantics. This is key as it is the part which enforces the "locality"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/supervised.rst
Outdated
|
||
\hat{y}_i = \frac{\sum_{j\neq i}y_jk_{ij}}{\sum_{j\neq i}k_{ij}} | ||
|
||
The tractable property has enabled the distance metric learning problem to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to remove this paragraph and the following equation, to be consistent with other algorithms for which we have not described the algorithm but only the problem formulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/weakly_supervised.rst
Outdated
|
||
.. math:: | ||
|
||
\min_\mathbf{A} \textbf{KL}(p(\mathbf{x}; \mathbf{A}_0) || p(\mathbf{x}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to write the optimization problem in terms of the logdet divergence, which gives a more "concrete" formulation than the KL version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/weakly_supervised.rst
Outdated
|
||
.. math:: | ||
|
||
L(d(\mathbf{x}_a, \mathbf{x}_b) < d(\mathbf{x}_c, \mathbf{x}_d)) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could get rid of the notation L, which is redundant with H. You could define directly H(d_\mathbf{M}(\mathbf{x}_a, \mathbf{x}_b), which is what you use in the final problem formulation. Of course you can rename H by L then ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have forgotten this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, got confused
doc/weakly_supervised.rst
Outdated
where :math:`\mathbf{X}=[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n]` is | ||
the training data, incidence matrix :math:`\mathbf{K}_{ij} = 1` if | ||
:math:`(\mathbf{x}_i, \mathbf{x}_j)` is a similar pair, otherwise -1. The | ||
Laplacian matrix :math:`\mathbf{L}` is calculated from :math:`\mathbf{K}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to give the definition of K, even though it requires to define D. Without this, it is hard to understand
To be concise you could write something like:
The Laplacian matrix :math:\mathbf{L}=\mathbf{D}-\mathbf{K}
is calculated from :math:\mathbf{K}
and :math:\mathbf{D}
, a diagonal matrix whose entries are the sums of the row elements of :math:\mathbf{K}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/supervised.rst
Outdated
`NCA` is a distance metric learning algorithm which aims to improve the | ||
accuracy of nearest neighbors classification compared to the standard | ||
Euclidean distance. The algorithm directly maximizes a stochastic variant | ||
of the leave-one-out k-nearest neighbors(KNN) score on the training set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of the leave-one-out k-nearest neighbors(KNN) score on the training set. | |
of the leave-one-out k-nearest neighbors (KNN) score on the training set. |
@@ -41,17 +41,36 @@ the covariance matrix of the input data. This is a simple baseline method. | |||
|
|||
.. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936 | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should let the reference for LMNN here (right now the link from the docstring to the user guide doesn't work), this way:
.. _lmnn:
LMNN
--------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -80,16 +99,43 @@ The two implementations differ slightly, and the C++ version is more complete. | |||
-margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John | |||
Blitzer, Lawrence K. Saul | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -116,16 +162,54 @@ classification. | |||
.. [2] Wikipedia entry on Neighborhood Components Analysis | |||
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -155,13 +239,54 @@ LFDA is solved as a generalized eigenvalue problem. | |||
MLKR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -151,18 +151,50 @@ tuples you're working with (pairs, triplets...). See the docstring of the | |||
Algorithms | |||
================== | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -196,8 +228,60 @@ programming. | |||
LSML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -228,8 +312,31 @@ Residual | |||
SDML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -263,14 +370,28 @@ L1-penalized log-determinant regularization | |||
RCA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -301,21 +421,33 @@ of points that are known to belong to the same class. | |||
.. [3]'Learning a Mahalanobis metric from equivalence constraints', JMLR | |||
2005 | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks for the guide! There are still some format improvements can be pursued such as alignments, btw, what is the plan for the next version, will there be new algorithms added? |
Good job @hansen7 , LGTM
No, there will be no new algorithms, the next version will be mostly about API changes (more scikit-learn like API, allowing to GridSearch the Weakly Supervised Algorithms for instance), as well as fixes on algorithms like SDML, and of course, a better documentation, a lot coming from this PR :) |
@wdevazelhes Cool, can I also join the development for the development part..I have noticed that the next version of scikit-learn also add implementations for the metrics learn as well: |
Hi @hansen7, thanks for the update. Can you please resolve the conflicts and we are good to go. You're very welcome to contribute more to metric-learn :-) Besides what's already mentioned in the current issues, adding more algorithms and improving scalability by adding stochastic optimization solvers would be very useful. |
it seems there are some encoding issues, is it okay for me to just add |
Looking at the CI report, it seems that you have inserted a non-ASCII character (probably due to copy/paste) in line 4-5 of |
Fixed. Merging |
Thanks a lot, @hansen7 ! You are very welcome to contribute more :-) |
Fixes #172
add some descriptions of the algorithms, as discussed in the issue #172