Skip to content

Commit d00196d

Browse files
belletperimosocordiae
authored andcommitted
[MRG] Documentation: introduction to metric learning (#145)
* modified index, intro * cosmit * cosmit * add use-cases and a few nitpicks * cosmit
1 parent b386057 commit d00196d

File tree

2 files changed

+149
-40
lines changed

2 files changed

+149
-40
lines changed

doc/index.rst

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,15 @@ metric-learn: Metric Learning in Python
22
=======================================
33
|License| |PyPI version|
44

5-
Welcome to metric-learn's documentation !
6-
-----------------------------------------
5+
Metric-learn contains efficient Python implementations of several
6+
popular supervised and weakly-supervised metric learning algorithms. The API
7+
of metric-learn is compatible with `scikit-learn
8+
<https://scikit-learn.org/>`_, the leading library for machine learning in
9+
Python. This allows to use of all the scikit-learn routines (for pipelining,
10+
model selection, etc) with metric learning algorithms.
11+
12+
Documentation outline
13+
---------------------
714

815
.. toctree::
916
:maxdepth: 2

doc/introduction.rst

Lines changed: 140 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,140 @@
1-
============
2-
Introduction
3-
============
4-
5-
Distance metrics are widely used in the machine learning literature.
6-
Traditionally, practitioners would choose a standard distance metric
7-
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
8-
the domain.
9-
Distance metric learning (or simply, metric learning) is the sub-field of
10-
machine learning dedicated to automatically construct task-specific distance
11-
metrics from (weakly) supervised data.
12-
The learned distance metric often corresponds to a Euclidean distance in a new
13-
embedding space, hence distance metric learning can be seen as a form of
14-
representation learning.
15-
16-
This package contains a efficient Python implementations of several popular
17-
metric learning algorithms, compatible with scikit-learn. This allows to use
18-
all the scikit-learn routines for pipelining and model selection for
19-
metric learning algorithms.
20-
21-
22-
Currently, each metric learning algorithm supports the following methods:
23-
24-
- ``fit(...)``, which learns the model.
25-
- ``metric()``, which returns a Mahalanobis matrix
26-
:math:`M = L^{\top}L` such that distance between vectors ``x`` and
27-
``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
28-
- ``transformer_from_metric(metric)``, which returns a transformation matrix
29-
:math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
30-
data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
31-
:math:`D`-dimensional learned metric space :math:`X L^{\top}`,
32-
in which standard Euclidean distances may be used.
33-
- ``transform(X)``, which applies the aforementioned transformation.
34-
- ``score_pairs(pairs)`` which returns the distance between pairs of
35-
points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
36-
2, n_features)``, or it can be a 2D array-like of pairs indicators of
37-
shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
38-
details).
1+
========================
2+
What is Metric Learning?
3+
========================
4+
5+
Many approaches in machine learning require a measure of distance between data
6+
points. Traditionally, practitioners would choose a standard distance metric
7+
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the
8+
domain. However, it is often difficult to design metrics that are well-suited
9+
to the particular data and task of interest.
10+
11+
Distance metric learning (or simply, metric learning) aims at
12+
automatically constructing task-specific distance metrics from (weakly)
13+
supervised data, in a machine learning manner. The learned distance metric can
14+
then be used to perform various tasks (e.g., k-NN classification, clustering,
15+
information retrieval).
16+
17+
Problem Setting
18+
===============
19+
20+
Metric learning problems fall into two main categories depending on the type
21+
of supervision available about the training data:
22+
23+
- :doc:`Supervised learning <supervised>`: the algorithm has access to
24+
a set of data points, each of them belonging to a class (label) as in a
25+
standard classification problem.
26+
Broadly speaking, the goal in this setting is to learn a distance metric
27+
that puts points with the same label close together while pushing away
28+
points with different labels.
29+
- :doc:`Weakly supervised learning <weakly_supervised>`: the
30+
algorithm has access to a set of data points with supervision only
31+
at the tuple level (typically pairs, triplets, or quadruplets of
32+
data points). A classic example of such weaker supervision is a set of
33+
positive and negative pairs: in this case, the goal is to learn a distance
34+
metric that puts positive pairs close together and negative pairs far away.
35+
36+
Based on the above (weakly) supervised data, the metric learning problem is
37+
generally formulated as an optimization problem where one seeks to find the
38+
parameters of a distance function that optimize some objective function
39+
measuring the agreement with the training data.
40+
41+
Mahalanobis Distances
42+
=====================
43+
44+
In the metric-learn package, all algorithms currently implemented learn
45+
so-called Mahalanobis distances. Given a real-valued parameter matrix
46+
:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the
47+
number features describing the data, the Mahalanobis distance associated with
48+
:math:`L` is defined as follows:
49+
50+
.. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')}
51+
52+
In other words, a Mahalanobis distance is a Euclidean distance after a
53+
linear transformation of the feature space defined by :math:`L` (taking
54+
:math:`L` to be the identity matrix recovers the standard Euclidean distance).
55+
Mahalanobis distance metric learning can thus be seen as learning a new
56+
embedding space of dimension ``num_dims``. Note that when ``num_dims`` is
57+
smaller than ``n_features``, this achieves dimensionality reduction.
58+
59+
Strictly speaking, Mahalanobis distances are "pseudo-metrics": they satisfy
60+
three of the `properties of a metric <https://en.wikipedia.org/wiki/Metric_
61+
(mathematics)>`_ (non-negativity, symmetry, triangle inequality) but not
62+
necessarily the identity of indiscernibles.
63+
64+
.. note::
65+
66+
Mahalanobis distances can also be parameterized by a `positive semi-definite
67+
(PSD) matrix
68+
<https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_
69+
:math:`M`:
70+
71+
.. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
72+
73+
Using the fact that a PSD matrix :math:`M` can always be decomposed as
74+
:math:`M=L^\top L` for some :math:`L`, one can show that both
75+
parameterizations are equivalent. In practice, an algorithm may thus solve
76+
the metric learning problem with respect to either :math:`M` or :math:`L`.
77+
78+
Use-cases
79+
=========
80+
81+
There are many use-cases for metric learning. We list here a few popular
82+
examples (for code illustrating some of these use-cases, see the
83+
:doc:`examples <auto_examples/index>` section of the documentation):
84+
85+
- `Nearest neighbors models
86+
<https://scikit-learn.org/stable/modules/neighbors.html>`_: the learned
87+
metric can be used to improve nearest neighbors learning models for
88+
classification, regression, anomaly detection...
89+
- `Clustering <https://scikit-learn.org/stable/modules/clustering.html>`_:
90+
metric learning provides a way to bias the clusters found by algorithms like
91+
K-Means towards the intended semantics.
92+
- Information retrieval: the learned metric can be used to retrieve the
93+
elements of a database that are semantically closer to a query element.
94+
- Dimensionality reduction: metric learning may be seen as a way to reduce the
95+
data dimension in a (weakly) supervised setting.
96+
- More generally, the learned transformation :math:`L` can be used to project
97+
the data into a new embedding space before feeding it into another machine
98+
learning algorithm.
99+
100+
The API of metric-learn is compatible with `scikit-learn
101+
<https://scikit-learn.org/>`_, the leading library for machine
102+
learning in Python. This allows to easily pipeline metric learners with other
103+
scikit-learn estimators to realize the above use-cases, to perform joint
104+
hyperparameter tuning, etc.
105+
106+
Further reading
107+
===============
108+
109+
For more information about metric learning and its applications, one can refer
110+
to the following resources:
111+
112+
- **Tutorial:** `Similarity and Distance Metric Learning with Applications to
113+
Computer Vision
114+
<http://researchers.lille.inria.fr/abellet/talks/metric_learning_tutorial_ECML_PKDD.pdf>`_ (2015)
115+
- **Surveys:** `A Survey on Metric Learning for Feature Vectors and Structured
116+
Data <https://arxiv.org/pdf/1306.6709.pdf>`_ (2013), `Metric Learning: A
117+
Survey <http://dx.doi.org/10.1561/2200000019>`_ (2012)
118+
- **Book:** `Metric Learning
119+
<http://dx.doi.org/10.2200/S00626ED1V01Y201501AIM030>`_ (2015)
120+
121+
.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS]
122+
.. =============================================
123+
124+
.. Currently, each metric learning algorithm supports the following methods:
125+
126+
.. - ``fit(...)``, which learns the model.
127+
.. - ``metric()``, which returns a Mahalanobis matrix
128+
.. :math:`M = L^{\top}L` such that distance between vectors ``x`` and
129+
.. ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
130+
.. - ``transformer_from_metric(metric)``, which returns a transformation matrix
131+
.. :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
132+
.. data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
133+
.. :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
134+
.. in which standard Euclidean distances may be used.
135+
.. - ``transform(X)``, which applies the aforementioned transformation.
136+
.. - ``score_pairs(pairs)`` which returns the distance between pairs of
137+
.. points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
138+
.. 2, n_features)``, or it can be a 2D array-like of pairs indicators of
139+
.. shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
140+
.. details).

0 commit comments

Comments
 (0)