Skip to content

Add description of algorithms to the doc #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 3, 2019

Conversation

hansen7
Copy link
Contributor

@hansen7 hansen7 commented Mar 6, 2019

Fixes #172
add some descriptions of the algorithms, as discussed in the issue #172

@bellet
Copy link
Member

bellet commented Mar 7, 2019

Hi, thanks for the PR! I will review as soon as possible (most likely early next week)

Copy link
Member

@wdevazelhes wdevazelhes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution @hansen7 ! I just did a pass too with a few comments and nitpicks

@bellet and @perimosocordiae feel free to react on LMNN, on the convexity and slack variables remarks

sharing the same label, and :math:`x_l` are all the other instances within
that region with different labels, :math:`\eta_{ij}, y_{ij} \in \{0, 1\}`
are both the indicators, :math:`\eta_{ij}` represents :math:`x_{j}` is the
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`
k nearest neighbors (with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`

@@ -46,12 +46,31 @@ LMNN

Large-margin nearest neighbor metric learning.

`LMNN` learns a Mahanalobis distance metric in the kNN classification
`LMNN` learns a Mahalanobis distance metric in the kNN classification
setting using semidefinite programming. The learned metric attempts to keep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not in your modifs, but I just realized this would be righter than what we have currently:

-  setting using semidefinite programming. The learned metric attempts to keep
-  k-nearest neighbors in the same class, while keeping examples from different
+  setting using semidefinite programming. The learned metric attempts to keep close
+  k-nearest neighbors from the same class, while keeping examples from different

are both the indicators, :math:`\eta_{ij}` represents :math:`x_{j}` is the
k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+`
is Hinge loss. In the optimization process, the second term is replaced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is Hinge loss. In the optimization process, the second term is replaced
is the Hinge loss. In the optimization process, the second term is replaced


.. math::

\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 +
\min_\mathbf{L}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 +

setting using semidefinite programming. The learned metric attempts to keep
k-nearest neighbors in the same class, while keeping examples from different
classes separated by a large margin. This algorithm makes no assumptions about
the distribution of the data.

The distance is learned using the following convex optimization:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are optimizing on L, I think our problem is not necessarily convex (it's the M version that is). I am not sure we should say semidefinite programming either (above, l. 49-50).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The problem is not convex, so replace by
"The distance is learned by solving the following optimization problem"

k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+`
is Hinge loss. In the optimization process, the second term is replaced
by the slack variables :math:`\xi_{ijk}` for the sake of convexity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure in our implementation we use slack variables, since we optimize on L and in the paper they mention slack variables only when optimizing on M (and for "sake of convexity", also same as above, I think it's convex only if we optimize on M which we don't)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we are not, we simply solve the problem by (sub)gradient descent, so this sentence should be removed

@@ -1,6 +1,40 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latex part below does not render well on my computer, but with this it's solved:

Suggested change
"""
r"""

@@ -1,6 +1,40 @@
"""
Neighborhood Components Analysis (NCA)
Ported to Python from https://github.com/vomjom/nca

Neighborhood Components Analysis (`NCA`) is a distance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're at it, I know that it's not completely done in metric-learn yet, but we should try to adopt scikit-learn conventions for the user guide, i.e. :

  • We should add short descriptions of the algorithms (without mathematical formulation) in the docstring of the class we want to document
  • We should put the full description in the user guide

So I think what you could do is :

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the full doc with math descriptions in the docstrings is too much

@@ -7,6 +7,19 @@
Those relevant dimensions are estimated using "chunklets",
subsets of points that are known to belong to the same class.

For a training set with :math:`n` training points in :math:`k`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above here

An efficient sparse metric learning in high-dimensional space via
L1-penalized log-determinant regularization.
ICML 2009
double regularization: L1-penalized on the off-diagonal elements of Mahalanobis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here

@hansen7 hansen7 closed this Mar 14, 2019
@hansen7 hansen7 deleted the hc branch March 14, 2019 14:12
@hansen7 hansen7 restored the hc branch March 14, 2019 14:14
@bellet
Copy link
Member

bellet commented Mar 14, 2019

Hi @hansen7, any reason you are closing this? I was planning to have a look today

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hansen7 I interpreted you closing the branch as a mistake, so I have reviewed your PR. Please confirm that you still want to work on this - otherwise we will take it from there (we need this for our next release).

There are some changes to make, but this is definitely a good start. When you make the modifications in the rst files make sure you also update the docstrings accordingly

@@ -46,12 +46,31 @@ LMNN

Large-margin nearest neighbor metric learning.

`LMNN` learns a Mahanalobis distance metric in the kNN classification
`LMNN` learns a Mahalanobis distance metric in the kNN classification
setting using semidefinite programming. The learned metric attempts to keep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the reference to semidefinite programming, as we are actually not solving the problem in the PSD matrix but in the unconstrained linear transformation matrix

setting using semidefinite programming. The learned metric attempts to keep
k-nearest neighbors in the same class, while keeping examples from different
classes separated by a large margin. This algorithm makes no assumptions about
the distribution of the data.

The distance is learned using the following convex optimization:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The problem is not convex, so replace by
"The distance is learned by solving the following optimization problem"


\min_\mathbf{M}\sum_{i, j}\eta_{ij}||\mathbf{L}(x_i-x_j)||^2 +
c\sum_{i, j, k}\eta_{ij}(1-y_{ij})[1+||\mathbf{L}(x_i-x_j)||^2-||
\mathbf{L}(x_i-x_l)||^2]_+)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x_l should be x_k

c\sum_{i, j, k}\eta_{ij}(1-y_{ij})[1+||\mathbf{L}(x_i-x_j)||^2-||
\mathbf{L}(x_i-x_l)||^2]_+)

where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using the word "target" for x_i as in LMNN the term "target neighbors" are used for x_j

\mathbf{L}(x_i-x_l)||^2]_+)

where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors
sharing the same label, and :math:`x_l` are all the other instances within
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x_k

@@ -317,6 +335,20 @@ implicit assumptions of MMC is that all classes form a compact set, i.e.,
follow a unimodal distribution, which restricts the possible use-cases of this
method. However, it is one of the earliest and a still often cited technique.

This is the first Mahalanobis distance learning method, the algorithm aims at
maximizing the sum of distances between all the instances from the dissimilar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not consistent with the first paragraph. While the two problems are equivalent, it is better to stick to the one which is actually solved in the code (which is to minimize the sum of distances between similar pairs while keeping the sum of dissimilar pairs larger than 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that more doc for additional algorithms can be added in a next PR. It would be nice to be able to merge this soon. @hansen7 do you expect to have time to address the small comments?

Hi, @wdevazelhes @bellet , thanks for the strong support! I have resolved the listed the issues, and add the mathematical descriptions of all the metric learning algorithm but the cov. Please see attached, for the latex part, it is all work locally, and I have standardise the notations:

  1. uppercase bolded letters for matrices
  2. lowercase bolded letters for vectors
  3. uppercase unbolded letters for sets
  4. lowercase unbolded letters for scalars

.. math::

\max_{\mathbf{M}\in\mathbb{S}_+^d}\sum_{(x_i, x_j)\in\mathbf{D}}
d_{\mathbf{M}}(x_i, x_j)\qquad \qquad \text{s.t.} \qquad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason the \text{s.t.} does not show properly on my locally compiled version of the doc


\max_{\mathbf{M}\in\mathbb{S}_+^d}\sum_{(x_i, x_j)\in\mathbf{D}}
d_{\mathbf{M}}(x_i, x_j)\qquad \qquad \text{s.t.} \qquad
\sum_{(x'_i, x'_j)\in\mathbf{S}} d^2_{\mathbf{M}}(x'_i, x'_j) \leq 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primes are not necessary, it is clear without them as they come from the different S

k nearest neighbors(with same labels) of :math:`x_{i}`, :math:`y_{ij}=0`
indicates :math:`x_{i}, x_{j}` belong to different class, :math:`[\cdot]_+`
is Hinge loss. In the optimization process, the second term is replaced
by the slack variables :math:`\xi_{ijk}` for the sake of convexity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not, see discussion above

@@ -1,6 +1,40 @@
"""
Neighborhood Components Analysis (NCA)
Ported to Python from https://github.com/vomjom/nca

Neighborhood Components Analysis (`NCA`) is a distance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the full doc with math descriptions in the docstrings is too much

@bellet bellet reopened this Mar 14, 2019
@hansen7
Copy link
Contributor Author

hansen7 commented Mar 15, 2019

@hansen7 I interpreted you closing the branch as a mistake, so I have reviewed your PR. Please confirm that you still want to work on this - otherwise we will take it from there (we need this for our next release).

There are some changes to make, but this is definitely a good start. When you make the modifications in the rst files make sure you also update the docstrings accordingly

Sorry @bellet , I sort of mixing up the procedures for pushing the codes.. I thought my supplements has already been added, I will check all beforehand comments and make adjustments this weekend.

@bellet
Copy link
Member

bellet commented Mar 22, 2019

Hi @hansen7 do you expect to have some time to work on this soon?

@hansen7
Copy link
Contributor Author

hansen7 commented Mar 22, 2019

Hi @hansen7 do you expect to have some time to work on this soon?

yes... I will do it this weekend, sorry for the delay...

@hansen7
Copy link
Contributor Author

hansen7 commented Mar 28, 2019

Please, see the most recent recent push.. I have noticed that the arrangement of doc is quite different..plus, is it more systematic to add all the mathematical interpretations of the algorithms for itml, lfda, lsml, mlkr, rca as well?

@hansen7
Copy link
Contributor Author

hansen7 commented Mar 28, 2019

@wdevazelhes @bellet , I have deleted the math part from the docstring, and use the same style for referencing as sklearn. But I am not sure about using Read more in the :ref:User Guide <nca>. properly, I am not sure here User Guide should be replaced with some other keywords or not...

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I have a few small change requests.
I think it would indeed be great to have a basic mathematical description for each algorithm. I think RCA is okay as it is - but there is currently nothing for itml, lfda, lsml and mlkr

\mathbf{L}(x_i-x_l)||^2]_+)

where :math:`x_i` is the 'target', :math:`x_j` are its k nearest neighbors
where :math:`x_i` is an data point, :math:`x_j` are its k nearest neighbors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a data point

double regularization: L1-penalized on the off-diagonal elements of Mahalanobis
matrix :math:`\mathbf{M}` and the log-determinant divergence between
double regularization: an L1-penalization on the off-diagonal elements of the
Mahalanobis matrix :math:`\mathbf{M}`, and a log-determinant divergence between
:math:`\mathbf{M}` and :math:`\mathbf{M_0}` (set as either :math:`\mathbf{I}`
or :math:`\mathbf{\Omega}^{-1}`, where :math:`\mathbf{\Omega}` is the
covariance matrix).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empirical covariance matrix


.. math::

\min_{\mathbf{M}} = \text{tr}((M_0 + \eta XLX^{T})\cdot M) - \log\det M
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to bold all matrices

\min_{\mathbf{M}} = \text{tr}((M_0 + \eta XLX^{T})\cdot M) - \log\det M
+ \lambda ||M||_{1, off}

where :math:`\mathbf{X}=[x_1, x_2, ..., x_n]`, :math:`\mathbf{L = D − K}` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where :math:\mathbf{X}=[x_1, x_2, ..., x_n] is the training data

:math:`\mathbf{K}` is the incidence matrix to encode the (dis)similarity
information as :math:`\mathbf{K}_{ij} = 1` if :math:`(x_i,x_j)\in \mathbf{S}`,
:math:`\mathbf{K}_{ij} = -1` if :math:`(x_i,x_j)\in \mathbf{D}`,
:math:`||\cdot||_{1, off}` is the off-diagonal L1 norm of :math:`\mathbf{M}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a couple of issues here:

  • please inverse the order: start with K, then D, then L so that everything needed has been defined when you introduce them
  • D for dissimilar pairs conflicts with the matrix D. I suggest not to introduce new notations and simply say things like :math:\mathbf{K}_{ij} = 1 if :math:(x_i,x_j) is a similar pair

+ \lambda ||M||_{1, off}

where :math:`\mathbf{X}=[x_1, x_2, ..., x_n]`, :math:`\mathbf{L = D − K}` is
the Laplacian matrix, :math:`\mathbf{D}` is a diagonal matrix whose diagonal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second occurrence of "diagonal" is not needed

@@ -327,18 +348,17 @@ Side-Information, Xing et al., NIPS 2002

`MMC` minimizes the sum of squared distances between similar examples, while
enforcing the sum of distances between dissimilar examples to be greater than a
certain margin. This leads to a convex and, thus, local-minima-free
certain margin, default is 1. This leads to a convex and, thus, local-minima-free
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not provide a way to change this margin (which is only a scaling factor anyway), so please simply say greater than 1.

@wdevazelhes
Copy link
Member

Hi @hansen7 , yes, thanks for the update !

I think your PR is almost ready for merging:
For RCA you already have the mathematical formulation in both the User Guide and the docstring, what you just need to do is to remove it from the docstring and replace it by Read more in the :ref:`User Guide <rca>` (and add the .. rca_: reference in the User Guide), like for the others you did.

But I am not sure about using Read more in the :ref:User Guide <nca>. properly, I am not sure here User Guide should be replaced with some other keywords or not...

You did it well, there's just need to replace :ref:User Guide <nca> by :ref:User Guide <lmnn> for instance for lmnn

Regarding ITML, LSML, LFDA and MLKR, (and Covariance) which right now have no mathematical formulation, yes eventually we will do a mathematical formulation for the release, but if you don't have time to work on it, I think it's OK, we'll deal with them in another PR

Also, there's just the following problem that have appeared: I noticed that when we have a reference for the title, the link of the algorithm in the description are now broken (like when we have .. _nca:) (the link `NCA` links now to the head of the paragraph ( .. _nca:) instead of the docstring). In the previous commit is an example of how to fix it: replace `ALGOXXX` in the User Guide by :py:class:`ALGOXXX <metric_learn.algoxxx.ALGOXXX>` . You could do that for all the algorithm where there is a reference in the User Guide like .. _nca:.

And I guess as soon as you have resolved @bellet's comments, we should be able to merge the PR

Also, there are conflicts that you need to resolve (but it's ok, just in the gitignore you should accept both changes (I think we want to gitignore idea and DS_Store files)

@bellet
Copy link
Member

bellet commented Apr 15, 2019

Yes, I agree that more doc for additional algorithms can be added in a next PR. It would be nice to be able to merge this soon. @hansen7 do you expect to have time to address the small comments?

@hansen7
Copy link
Contributor Author

hansen7 commented Apr 16, 2019

Yes, I agree that more doc for additional algorithms can be added in a next PR. It would be nice to be able to merge this soon. @hansen7 do you expect to have time to address the small comments?

Hi @wdevazelhes and @bellet , thanks for the support! I have resolved all the comments listed above, plz check the most recent branch! I have added mathematical descriptions for all the metric learning algorithms but the cov, fix the references issues, delete the redundant math parts from the docstrings, and formalised the notation in the latex math parts:

  1. bolded uppercase letters for matrices
  2. bolded lowercase letters for vectors
  3. unbolded uppercase letters for sets
  4. unbolded lowercase letters for scalars

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, @hansen7 !
I have indicated a few simple changes to make to clarify the formulations, then I think we are good to merge

\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\

here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity
matrix :math:`\mathbf{A}`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the affinity matrix is not defined. You should make clear that it is given as input to the algorithm, and explain its semantics. This is key as it is the part which enforces the "locality"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


\hat{y}_i = \frac{\sum_{j\neq i}y_jk_{ij}}{\sum_{j\neq i}k_{ij}}

The tractable property has enabled the distance metric learning problem to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to remove this paragraph and the following equation, to be consistent with other algorithms for which we have not described the algorithm but only the problem formulation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


.. math::

\min_\mathbf{A} \textbf{KL}(p(\mathbf{x}; \mathbf{A}_0) || p(\mathbf{x};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to write the optimization problem in terms of the logdet divergence, which gives a more "concrete" formulation than the KL version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


.. math::

L(d(\mathbf{x}_a, \mathbf{x}_b) < d(\mathbf{x}_c, \mathbf{x}_d)) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of the notation L, which is redundant with H. You could define directly H(d_\mathbf{M}(\mathbf{x}_a, \mathbf{x}_b), which is what you use in the final problem formulation. Of course you can rename H by L then ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have forgotten this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, got confused

where :math:`\mathbf{X}=[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n]` is
the training data, incidence matrix :math:`\mathbf{K}_{ij} = 1` if
:math:`(\mathbf{x}_i, \mathbf{x}_j)` is a similar pair, otherwise -1. The
Laplacian matrix :math:`\mathbf{L}` is calculated from :math:`\mathbf{K}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to give the definition of K, even though it requires to define D. Without this, it is hard to understand

To be concise you could write something like:
The Laplacian matrix :math:\mathbf{L}=\mathbf{D}-\mathbf{K} is calculated from :math:\mathbf{K} and :math:\mathbf{D}, a diagonal matrix whose entries are the sums of the row elements of :math:\mathbf{K}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@wdevazelhes wdevazelhes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @hansen7 ! This will be a great addition to metric-learn for the next release

You just need to put back the references of type .. _nca in the documentation otherwise the link "See User Guide won't work", and address @bellet 's comments

Otherwise, LGTM

`NCA` is a distance metric learning algorithm which aims to improve the
accuracy of nearest neighbors classification compared to the standard
Euclidean distance. The algorithm directly maximizes a stochastic variant
of the leave-one-out k-nearest neighbors(KNN) score on the training set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the leave-one-out k-nearest neighbors(KNN) score on the training set.
of the leave-one-out k-nearest neighbors (KNN) score on the training set.

@@ -41,17 +41,36 @@ the covariance matrix of the input data. This is a simple baseline method.

.. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936


Copy link
Member

@wdevazelhes wdevazelhes Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should let the reference for LMNN here (right now the link from the docstring to the user guide doesn't work), this way:

.. _lmnn:

LMNN
--------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -80,16 +99,43 @@ The two implementations differ slightly, and the C++ version is more complete.
-margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John
Blitzer, Lawrence K. Saul


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -116,16 +162,54 @@ classification.
.. [2] Wikipedia entry on Neighborhood Components Analysis
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -155,13 +239,54 @@ LFDA is solved as a generalized eigenvalue problem.
MLKR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -151,18 +151,50 @@ tuples you're working with (pairs, triplets...). See the docstring of the
Algorithms
==================


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -196,8 +228,60 @@ programming.
LSML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -228,8 +312,31 @@ Residual
SDML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -263,14 +370,28 @@ L1-penalized log-determinant regularization
RCA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -301,21 +421,33 @@ of points that are known to belong to the same class.
.. [3]'Learning a Mahalanobis metric from equivalence constraints', JMLR
2005


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hansen7
Copy link
Contributor Author

hansen7 commented Apr 27, 2019

Thanks for the guide! There are still some format improvements can be pursued such as alignments, btw, what is the plan for the next version, will there be new algorithms added?

@wdevazelhes
Copy link
Member

Good job @hansen7 , LGTM
You should should just resolve the conflicts by merging master into your branch

btw, what is the plan for the next version, will there be new algorithms added?

No, there will be no new algorithms, the next version will be mostly about API changes (more scikit-learn like API, allowing to GridSearch the Weakly Supervised Algorithms for instance), as well as fixes on algorithms like SDML, and of course, a better documentation, a lot coming from this PR :)

@hansen7
Copy link
Contributor Author

hansen7 commented Apr 29, 2019

No, there will be no new algorithms, the next version will be mostly about API changes (more scikit-learn like API, allowing to GridSearch the Weakly Supervised Algorithms for instance), as well as fixes on algorithms like SDML, and of course, a better documentation, a lot coming from this PR :)

@wdevazelhes Cool, can I also join the development for the development part..I have noticed that the next version of scikit-learn also add implementations for the metrics learn as well:
sklearn.neighbors.NeighborhoodComponentsAnalysis

@bellet
Copy link
Member

bellet commented Apr 30, 2019

Hi @hansen7, thanks for the update. Can you please resolve the conflicts and we are good to go.

You're very welcome to contribute more to metric-learn :-) Besides what's already mentioned in the current issues, adding more algorithms and improving scalability by adding stochastic optimization solvers would be very useful.

@hansen7
Copy link
Contributor Author

hansen7 commented May 2, 2019

it seems there are some encoding issues, is it okay for me to just add # coding: utf-8 at the head of each script

@bellet
Copy link
Member

bellet commented May 3, 2019

Looking at the CI report, it seems that you have inserted a non-ASCII character (probably due to copy/paste) in line 4-5 of itml.py. I think it is what appears to be a non-standard hyphen in "Kullback-Leibler". Please try to simply replace by normal hyphen

@bellet
Copy link
Member

bellet commented May 3, 2019

Fixed. Merging

@bellet bellet merged commit d4badc8 into scikit-learn-contrib:master May 3, 2019
@bellet
Copy link
Member

bellet commented May 3, 2019

Thanks a lot, @hansen7 ! You are very welcome to contribute more :-)

@bellet bellet changed the title Update_Doc_hc Add description of algorithms to the doc May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adding descriptions of the algorithms principles in the doc
3 participants