Skip to content

Commit b015258

Browse files
author
William de Vazelhes
committed
Merge branch 'master' into feat/dim_red_for_all
# Conflicts: # metric_learn/rca.py # test/test_base_metric.py # test/test_utils.py
2 parents 5579c9a + 187b59e commit b015258

20 files changed

+754
-253
lines changed

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,3 @@ htmlcov/
77
.cache/
88
.pytest_cache/
99
doc/auto_examples/*
10-
coverage
11-
.coverage
12-
.coverage*

.travis.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,12 @@ python:
88
before_install:
99
- sudo apt-get install liblapack-dev
1010
- pip install --upgrade pip pytest
11-
- pip install wheel cython numpy scipy scikit-learn codecov pytest-cov
11+
- pip install wheel cython numpy scipy codecov pytest-cov
12+
- if $TRAVIS_PYTHON_VERSION == "3.6"; then
13+
pip install scikit-learn;
14+
else
15+
pip install scikit-learn==0.20.3;
16+
fi
1217
- if [[ ($TRAVIS_PYTHON_VERSION == "3.6") ||
1318
($TRAVIS_PYTHON_VERSION == "2.7")]]; then
1419
pip install git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8;

README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Metric Learning algorithms in Python.
2020
**Dependencies**
2121

2222
- Python 2.7+, 3.4+
23-
- numpy, scipy, scikit-learn
23+
- numpy, scipy, scikit-learn>=0.20.3
2424

2525
**Optional dependencies**
2626

doc/getting_started.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Alternately, download the source repository and run:
1515
**Dependencies**
1616

1717
- Python 2.7+, 3.4+
18-
- numpy, scipy, scikit-learn
18+
- numpy, scipy, scikit-learn>=0.20.3
1919

2020
**Optional dependencies**
2121

doc/supervised.rst

Lines changed: 136 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -41,17 +41,37 @@ the covariance matrix of the input data. This is a simple baseline method.
4141

4242
.. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936
4343
44+
.. _lmnn:
45+
4446
LMNN
4547
-----
4648

47-
Large-margin nearest neighbor metric learning.
49+
Large Margin Nearest Neighbor Metric Learning
50+
(:py:class:`LMNN <metric_learn.lmnn.LMNN>`)
4851

49-
`LMNN` learns a Mahanalobis distance metric in the kNN classification
50-
setting using semidefinite programming. The learned metric attempts to keep
51-
k-nearest neighbors in the same class, while keeping examples from different
52-
classes separated by a large margin. This algorithm makes no assumptions about
52+
`LMNN` learns a Mahalanobis distance metric in the kNN classification
53+
setting. The learned metric attempts to keep close k-nearest neighbors
54+
from the same class, while keeping examples from different classes
55+
separated by a large margin. This algorithm makes no assumptions about
5356
the distribution of the data.
5457

58+
The distance is learned by solving the following optimization problem:
59+
60+
.. math::
61+
62+
\min_\mathbf{L}\sum_{i, j}\eta_{ij}||\mathbf{L(x_i-x_j)}||^2 +
63+
c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-||
64+
\mathbf{L(x_i-x_l)}||^2]_+)
65+
66+
where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one
67+
of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l`
68+
are all the other instances within that region with different labels,
69+
:math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators,
70+
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest
71+
neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0`
72+
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class,
73+
:math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss.
74+
5575
.. topic:: Example Code:
5676

5777
::
@@ -80,16 +100,44 @@ The two implementations differ slightly, and the C++ version is more complete.
80100
-margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John
81101
Blitzer, Lawrence K. Saul
82102
103+
.. _nca:
104+
83105
NCA
84106
---
85107

86-
Neighborhood Components Analysis (`NCA`) is a distance metric learning
87-
algorithm which aims to improve the accuracy of nearest neighbors
88-
classification compared to the standard Euclidean distance. The algorithm
89-
directly maximizes a stochastic variant of the leave-one-out k-nearest
90-
neighbors (KNN) score on the training set. It can also learn a low-dimensional
91-
linear embedding of data that can be used for data visualization and fast
92-
classification.
108+
Neighborhood Components Analysis(:py:class:`NCA <metric_learn.nca.NCA>`)
109+
110+
`NCA` is a distance metric learning algorithm which aims to improve the
111+
accuracy of nearest neighbors classification compared to the standard
112+
Euclidean distance. The algorithm directly maximizes a stochastic variant
113+
of the leave-one-out k-nearest neighbors (KNN) score on the training set.
114+
It can also learn a low-dimensional linear transformation of data that can
115+
be used for data visualization and fast classification.
116+
117+
They use the decomposition :math:`\mathbf{M} = \mathbf{L}^T\mathbf{L}` and
118+
define the probability :math:`p_{ij}` that :math:`\mathbf{x}_i` is the
119+
neighbor of :math:`\mathbf{x}_j` by calculating the softmax likelihood of
120+
the Mahalanobis distance:
121+
122+
.. math::
123+
124+
p_{ij} = \frac{\exp(-|| \mathbf{Lx}_i - \mathbf{Lx}_j ||_2^2)}
125+
{\sum_{l\neq i}\exp(-||\mathbf{Lx}_i - \mathbf{Lx}_l||_2^2)},
126+
\qquad p_{ii}=0
127+
128+
Then the probability that :math:`\mathbf{x}_i` will be correctly classified
129+
by the stochastic nearest neighbors rule is:
130+
131+
.. math::
132+
133+
p_{i} = \sum_{j:j\neq i, y_j=y_i}p_{ij}
134+
135+
The optimization problem is to find matrix :math:`\mathbf{L}` that maximizes
136+
the sum of probability of being correctly classified:
137+
138+
.. math::
139+
140+
\mathbf{L} = \text{argmax}\sum_i p_i
93141
94142
.. topic:: Example Code:
95143

@@ -116,16 +164,55 @@ classification.
116164
.. [2] Wikipedia entry on Neighborhood Components Analysis
117165
https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
118166
167+
.. _lfda:
168+
119169
LFDA
120170
----
121171

122-
Local Fisher Discriminant Analysis (LFDA)
172+
Local Fisher Discriminant Analysis(:py:class:`LFDA <metric_learn.lfda.LFDA>`)
123173

124174
`LFDA` is a linear supervised dimensionality reduction method. It is
125-
particularly useful when dealing with multimodality, where one ore more classes
175+
particularly useful when dealing with multi-modality, where one ore more classes
126176
consist of separate clusters in input space. The core optimization problem of
127177
LFDA is solved as a generalized eigenvalue problem.
128178

179+
180+
The algorithm define the Fisher local within-/between-class scatter matrix
181+
:math:`\mathbf{S}^{(w)}/ \mathbf{S}^{(b)}` in a pairwise fashion:
182+
183+
.. math::
184+
185+
\mathbf{S}^{(w)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(w)}(\mathbf{x}_i -
186+
\mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\
187+
\mathbf{S}^{(b)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(b)}(\mathbf{x}_i -
188+
\mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\
189+
190+
where
191+
192+
.. math::
193+
194+
W_{ij}^{(w)} = \left\{\begin{aligned}0 \qquad y_i\neq y_j \\
195+
\,\,\mathbf{A}_{i,j}/n_l \qquad y_i = y_j\end{aligned}\right.\\
196+
W_{ij}^{(b)} = \left\{\begin{aligned}1/n \qquad y_i\neq y_j \\
197+
\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\
198+
199+
here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity
200+
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods.
201+
202+
Then the learning problem becomes derive the LFDA transformation matrix
203+
:math:`\mathbf{T}_{LFDA}`:
204+
205+
.. math::
206+
207+
\mathbf{T}_{LFDA} = \arg\max_\mathbf{T}
208+
[\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)}
209+
\mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})]
210+
211+
That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that
212+
nearby data pairs in the same class are made close and the data pairs in
213+
different classes are separated from each other; far apart data pairs in the
214+
same class are not imposed to be close.
215+
129216
.. topic:: Example Code:
130217

131218
::
@@ -151,17 +238,50 @@ LFDA is solved as a generalized eigenvalue problem.
151238
<https://gastrograph.com/resources/whitepapers/local-fisher
152239
-discriminant-analysis-on-beer-style-clustering.html#>`_ Yuan Tang.
153240
241+
.. _mlkr:
154242

155243
MLKR
156244
----
157245

158-
Metric Learning for Kernel Regression.
246+
Metric Learning for Kernel Regression(:py:class:`MLKR <metric_learn.mlkr.MLKR>`)
159247

160248
`MLKR` is an algorithm for supervised metric learning, which learns a
161-
distance function by directly minimising the leave-one-out regression error.
249+
distance function by directly minimizing the leave-one-out regression error.
162250
This algorithm can also be viewed as a supervised variation of PCA and can be
163251
used for dimensionality reduction and high dimensional data visualization.
164252

253+
Theoretically, `MLKR` can be applied with many types of kernel functions and
254+
distance metrics, we hereafter focus the exposition on a particular instance
255+
of the Gaussian kernel and Mahalanobis metric, as these are used in our
256+
empirical development. The Gaussian kernel is denoted as:
257+
258+
.. math::
259+
260+
k_{ij} = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{d(\mathbf{x}_i,
261+
\mathbf{x}_j)}{\sigma^2})
262+
263+
where :math:`d(\cdot, \cdot)` is the squared distance under some metrics,
264+
here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i,
265+
\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition
266+
matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis
267+
matrix :math:`\mathbf{M=A^TA}`.
268+
269+
Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set
270+
:math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative
271+
leave-one-out quadratic regression error of the training samples as the
272+
loss function:
273+
274+
.. math::
275+
276+
\mathcal{L} = \sum_i(y_i - \hat{y}_i)^2
277+
278+
where the prediction :math:`\hat{y}_i` is derived from kernel regression by
279+
calculating a weighted average of all the training samples:
280+
281+
.. math::
282+
283+
\hat{y}_i = \frac{\sum_{j\neq i}y_jk_{ij}}{\sum_{j\neq i}k_{ij}}
284+
165285
.. topic:: Example Code:
166286

167287
::
@@ -193,7 +313,6 @@ generated from the labels information and passed to the underlying algorithm.
193313
.. todo:: add more details about that (see issue `<https://github
194314
.com/metric-learn/metric-learn/issues/135>`_)
195315

196-
197316
.. topic:: Example Code:
198317

199318
::

0 commit comments

Comments
 (0)