-
Notifications
You must be signed in to change notification settings - Fork 229
Add description of algorithms to the doc #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
afe0fb2
2fcd8ca
ecedf1d
7f2f0f6
abea174
ffc5e3e
19a00d2
70af371
328d7f5
c50fbaf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,3 @@ htmlcov/ | |
.cache/ | ||
.pytest_cache/ | ||
doc/auto_examples/* | ||
coverage | ||
.coverage | ||
.coverage* |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,17 +41,37 @@ the covariance matrix of the input data. This is a simple baseline method. | |
|
||
.. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936 | ||
|
||
.. _lmnn: | ||
|
||
LMNN | ||
----- | ||
|
||
Large-margin nearest neighbor metric learning. | ||
Large Margin Nearest Neighbor Metric Learning | ||
(:py:class:`LMNN <metric_learn.lmnn.LMNN>`) | ||
|
||
`LMNN` learns a Mahanalobis distance metric in the kNN classification | ||
setting using semidefinite programming. The learned metric attempts to keep | ||
k-nearest neighbors in the same class, while keeping examples from different | ||
classes separated by a large margin. This algorithm makes no assumptions about | ||
`LMNN` learns a Mahalanobis distance metric in the kNN classification | ||
setting. The learned metric attempts to keep close k-nearest neighbors | ||
from the same class, while keeping examples from different classes | ||
separated by a large margin. This algorithm makes no assumptions about | ||
the distribution of the data. | ||
|
||
The distance is learned by solving the following optimization problem: | ||
|
||
.. math:: | ||
|
||
\min_\mathbf{L}\sum_{i, j}\eta_{ij}||\mathbf{L(x_i-x_j)}||^2 + | ||
c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-|| | ||
\mathbf{L(x_i-x_l)}||^2]_+) | ||
|
||
where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one | ||
of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l` | ||
are all the other instances within that region with different labels, | ||
:math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators, | ||
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest | ||
neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0` | ||
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class, | ||
:math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss. | ||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
@@ -80,16 +100,44 @@ The two implementations differ slightly, and the C++ version is more complete. | |
-margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John | ||
Blitzer, Lawrence K. Saul | ||
|
||
.. _nca: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
NCA | ||
--- | ||
|
||
Neighborhood Components Analysis (`NCA`) is a distance metric learning | ||
algorithm which aims to improve the accuracy of nearest neighbors | ||
classification compared to the standard Euclidean distance. The algorithm | ||
directly maximizes a stochastic variant of the leave-one-out k-nearest | ||
neighbors (KNN) score on the training set. It can also learn a low-dimensional | ||
linear embedding of data that can be used for data visualization and fast | ||
classification. | ||
Neighborhood Components Analysis(:py:class:`NCA <metric_learn.nca.NCA>`) | ||
|
||
`NCA` is a distance metric learning algorithm which aims to improve the | ||
accuracy of nearest neighbors classification compared to the standard | ||
Euclidean distance. The algorithm directly maximizes a stochastic variant | ||
of the leave-one-out k-nearest neighbors (KNN) score on the training set. | ||
It can also learn a low-dimensional linear transformation of data that can | ||
be used for data visualization and fast classification. | ||
|
||
They use the decomposition :math:`\mathbf{M} = \mathbf{L}^T\mathbf{L}` and | ||
define the probability :math:`p_{ij}` that :math:`\mathbf{x}_i` is the | ||
neighbor of :math:`\mathbf{x}_j` by calculating the softmax likelihood of | ||
the Mahalanobis distance: | ||
|
||
.. math:: | ||
|
||
p_{ij} = \frac{\exp(-|| \mathbf{Lx}_i - \mathbf{Lx}_j ||_2^2)} | ||
{\sum_{l\neq i}\exp(-||\mathbf{Lx}_i - \mathbf{Lx}_l||_2^2)}, | ||
\qquad p_{ii}=0 | ||
|
||
Then the probability that :math:`\mathbf{x}_i` will be correctly classified | ||
by the stochastic nearest neighbors rule is: | ||
|
||
.. math:: | ||
|
||
p_{i} = \sum_{j:j\neq i, y_j=y_i}p_{ij} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again here, it does not show properly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it works for me too here |
||
|
||
The optimization problem is to find matrix :math:`\mathbf{L}` that maximizes | ||
the sum of probability of being correctly classified: | ||
|
||
.. math:: | ||
|
||
\mathbf{L} = \text{argmax}\sum_i p_i | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. argmax does not show properly. maybe try There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
.. topic:: Example Code: | ||
|
||
|
@@ -116,16 +164,55 @@ classification. | |
.. [2] Wikipedia entry on Neighborhood Components Analysis | ||
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis | ||
|
||
.. _lfda: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
LFDA | ||
---- | ||
|
||
Local Fisher Discriminant Analysis (LFDA) | ||
Local Fisher Discriminant Analysis(:py:class:`LFDA <metric_learn.lfda.LFDA>`) | ||
|
||
`LFDA` is a linear supervised dimensionality reduction method. It is | ||
particularly useful when dealing with multimodality, where one ore more classes | ||
particularly useful when dealing with multi-modality, where one ore more classes | ||
consist of separate clusters in input space. The core optimization problem of | ||
LFDA is solved as a generalized eigenvalue problem. | ||
|
||
|
||
The algorithm define the Fisher local within-/between-class scatter matrix | ||
:math:`\mathbf{S}^{(w)}/ \mathbf{S}^{(b)}` in a pairwise fashion: | ||
|
||
.. math:: | ||
|
||
\mathbf{S}^{(w)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(w)}(\mathbf{x}_i - | ||
\mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\ | ||
\mathbf{S}^{(b)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(b)}(\mathbf{x}_i - | ||
\mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\ | ||
|
||
where | ||
|
||
.. math:: | ||
|
||
W_{ij}^{(w)} = \left\{\begin{aligned}0 \qquad y_i\neq y_j \\ | ||
\,\,\mathbf{A}_{i,j}/n_l \qquad y_i = y_j\end{aligned}\right.\\ | ||
W_{ij}^{(b)} = \left\{\begin{aligned}1/n \qquad y_i\neq y_j \\ | ||
\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\ | ||
|
||
here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity | ||
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods. | ||
|
||
Then the learning problem becomes derive the LFDA transformation matrix | ||
:math:`\mathbf{T}_{LFDA}`: | ||
|
||
.. math:: | ||
|
||
\mathbf{T}_{LFDA} = \arg\max_\mathbf{T} | ||
[\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)} | ||
\mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})] | ||
|
||
That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that | ||
nearby data pairs in the same class are made close and the data pairs in | ||
different classes are separated from each other; far apart data pairs in the | ||
same class are not imposed to be close. | ||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
@@ -151,17 +238,50 @@ LFDA is solved as a generalized eigenvalue problem. | |
<https://gastrograph.com/resources/whitepapers/local-fisher | ||
-discriminant-analysis-on-beer-style-clustering.html#>`_ Yuan Tang. | ||
|
||
.. _mlkr: | ||
|
||
MLKR | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
---- | ||
|
||
Metric Learning for Kernel Regression. | ||
Metric Learning for Kernel Regression(:py:class:`MLKR <metric_learn.mlkr.MLKR>`) | ||
|
||
`MLKR` is an algorithm for supervised metric learning, which learns a | ||
distance function by directly minimising the leave-one-out regression error. | ||
distance function by directly minimizing the leave-one-out regression error. | ||
This algorithm can also be viewed as a supervised variation of PCA and can be | ||
used for dimensionality reduction and high dimensional data visualization. | ||
|
||
Theoretically, `MLKR` can be applied with many types of kernel functions and | ||
distance metrics, we hereafter focus the exposition on a particular instance | ||
of the Gaussian kernel and Mahalanobis metric, as these are used in our | ||
empirical development. The Gaussian kernel is denoted as: | ||
|
||
.. math:: | ||
|
||
k_{ij} = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{d(\mathbf{x}_i, | ||
\mathbf{x}_j)}{\sigma^2}) | ||
|
||
where :math:`d(\cdot, \cdot)` is the squared distance under some metrics, | ||
here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i, | ||
\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition | ||
matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis | ||
matrix :math:`\mathbf{M=A^TA}`. | ||
|
||
Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set | ||
:math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative | ||
leave-one-out quadratic regression error of the training samples as the | ||
loss function: | ||
|
||
.. math:: | ||
|
||
\mathcal{L} = \sum_i(y_i - \hat{y}_i)^2 | ||
|
||
where the prediction :math:`\hat{y}_i` is derived from kernel regression by | ||
calculating a weighted average of all the training samples: | ||
|
||
.. math:: | ||
|
||
\hat{y}_i = \frac{\sum_{j\neq i}y_jk_{ij}}{\sum_{j\neq i}k_{ij}} | ||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
@@ -193,7 +313,6 @@ generated from the labels information and passed to the underlying algorithm. | |
.. todo:: add more details about that (see issue `<https://github | ||
.com/metric-learn/metric-learn/issues/135>`_) | ||
|
||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should let the reference for LMNN here (right now the link from the docstring to the user guide doesn't work), this way:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done