|
1 |
| -============ |
2 |
| -Introduction |
3 |
| -============ |
4 |
| - |
5 |
| -Distance metrics are widely used in the machine learning literature. |
6 |
| -Traditionally, practitioners would choose a standard distance metric |
7 |
| -(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of |
8 |
| -the domain. |
9 |
| -Distance metric learning (or simply, metric learning) is the sub-field of |
10 |
| -machine learning dedicated to automatically construct task-specific distance |
11 |
| -metrics from (weakly) supervised data. |
12 |
| -The learned distance metric often corresponds to a Euclidean distance in a new |
13 |
| -embedding space, hence distance metric learning can be seen as a form of |
14 |
| -representation learning. |
15 |
| - |
16 |
| -This package contains a efficient Python implementations of several popular |
17 |
| -metric learning algorithms, compatible with scikit-learn. This allows to use |
18 |
| -all the scikit-learn routines for pipelining and model selection for |
19 |
| -metric learning algorithms. |
20 |
| - |
21 |
| - |
22 |
| -Currently, each metric learning algorithm supports the following methods: |
23 |
| - |
24 |
| -- ``fit(...)``, which learns the model. |
25 |
| -- ``metric()``, which returns a Mahalanobis matrix |
26 |
| - :math:`M = L^{\top}L` such that distance between vectors ``x`` and |
27 |
| - ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`. |
28 |
| -- ``transformer_from_metric(metric)``, which returns a transformation matrix |
29 |
| - :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a |
30 |
| - data matrix :math:`X \in \mathbb{R}^{n \times d}` to the |
31 |
| - :math:`D`-dimensional learned metric space :math:`X L^{\top}`, |
32 |
| - in which standard Euclidean distances may be used. |
33 |
| -- ``transform(X)``, which applies the aforementioned transformation. |
34 |
| -- ``score_pairs(pairs)`` which returns the distance between pairs of |
35 |
| - points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs, |
36 |
| - 2, n_features)``, or it can be a 2D array-like of pairs indicators of |
37 |
| - shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more |
38 |
| - details). |
| 1 | +======================== |
| 2 | +What is Metric Learning? |
| 3 | +======================== |
| 4 | + |
| 5 | +Many approaches in machine learning require a measure of distance between data |
| 6 | +points. Traditionally, practitioners would choose a standard distance metric |
| 7 | +(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the |
| 8 | +domain. However, it is often difficult to design metrics that are well-suited |
| 9 | +to the particular data and task of interest. |
| 10 | + |
| 11 | +Distance metric learning (or simply, metric learning) aims at |
| 12 | +automatically constructing task-specific distance metrics from (weakly) |
| 13 | +supervised data, in a machine learning manner. The learned distance metric can |
| 14 | +then be used to perform various tasks (e.g., k-NN classification, clustering, |
| 15 | +information retrieval). |
| 16 | + |
| 17 | +Problem Setting |
| 18 | +=============== |
| 19 | + |
| 20 | +Metric learning problems fall into two main categories depending on the type |
| 21 | +of supervision available about the training data: |
| 22 | + |
| 23 | +- :doc:`Supervised learning <supervised>`: the algorithm has access to |
| 24 | + a set of data points, each of them belonging to a class (label) as in a |
| 25 | + standard classification problem. |
| 26 | + Broadly speaking, the goal in this setting is to learn a distance metric |
| 27 | + that puts points with the same label close together while pushing away |
| 28 | + points with different labels. |
| 29 | +- :doc:`Weakly supervised learning <weakly_supervised>`: the |
| 30 | + algorithm has access to a set of data points with supervision only |
| 31 | + at the tuple level (typically pairs, triplets, or quadruplets of |
| 32 | + data points). A classic example of such weaker supervision is a set of |
| 33 | + positive and negative pairs: in this case, the goal is to learn a distance |
| 34 | + metric that puts positive pairs close together and negative pairs far away. |
| 35 | + |
| 36 | +Based on the above (weakly) supervised data, the metric learning problem is |
| 37 | +generally formulated as an optimization problem where one seeks to find the |
| 38 | +parameters of a distance function that optimize some objective function |
| 39 | +measuring the agreement with the training data. |
| 40 | + |
| 41 | +Mahalanobis Distances |
| 42 | +===================== |
| 43 | + |
| 44 | +In the metric-learn package, all algorithms currently implemented learn |
| 45 | +so-called Mahalanobis distances. Given a real-valued parameter matrix |
| 46 | +:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the |
| 47 | +number features describing the data, the Mahalanobis distance associated with |
| 48 | +:math:`L` is defined as follows: |
| 49 | + |
| 50 | +.. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')} |
| 51 | + |
| 52 | +In other words, a Mahalanobis distance is a Euclidean distance after a |
| 53 | +linear transformation of the feature space defined by :math:`L` (taking |
| 54 | +:math:`L` to be the identity matrix recovers the standard Euclidean distance). |
| 55 | +Mahalanobis distance metric learning can thus be seen as learning a new |
| 56 | +embedding space of dimension ``num_dims``. Note that when ``num_dims`` is |
| 57 | +smaller than ``n_features``, this achieves dimensionality reduction. |
| 58 | + |
| 59 | +Strictly speaking, Mahalanobis distances are "pseudo-metrics": they satisfy |
| 60 | +three of the `properties of a metric <https://en.wikipedia.org/wiki/Metric_ |
| 61 | +(mathematics)>`_ (non-negativity, symmetry, triangle inequality) but not |
| 62 | +necessarily the identity of indiscernibles. |
| 63 | + |
| 64 | +.. note:: |
| 65 | + |
| 66 | + Mahalanobis distances can also be parameterized by a `positive semi-definite |
| 67 | + (PSD) matrix |
| 68 | + <https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_ |
| 69 | + :math:`M`: |
| 70 | + |
| 71 | + .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')} |
| 72 | + |
| 73 | + Using the fact that a PSD matrix :math:`M` can always be decomposed as |
| 74 | + :math:`M=L^\top L` for some :math:`L`, one can show that both |
| 75 | + parameterizations are equivalent. In practice, an algorithm may thus solve |
| 76 | + the metric learning problem with respect to either :math:`M` or :math:`L`. |
| 77 | + |
| 78 | +Use-cases |
| 79 | +========= |
| 80 | + |
| 81 | +There are many use-cases for metric learning. We list here a few popular |
| 82 | +examples (for code illustrating some of these use-cases, see the |
| 83 | +:doc:`examples <auto_examples/index>` section of the documentation): |
| 84 | + |
| 85 | +- `Nearest neighbors models |
| 86 | + <https://scikit-learn.org/stable/modules/neighbors.html>`_: the learned |
| 87 | + metric can be used to improve nearest neighbors learning models for |
| 88 | + classification, regression, anomaly detection... |
| 89 | +- `Clustering <https://scikit-learn.org/stable/modules/clustering.html>`_: |
| 90 | + metric learning provides a way to bias the clusters found by algorithms like |
| 91 | + K-Means towards the intended semantics. |
| 92 | +- Information retrieval: the learned metric can be used to retrieve the |
| 93 | + elements of a database that are semantically closer to a query element. |
| 94 | +- Dimensionality reduction: metric learning may be seen as a way to reduce the |
| 95 | + data dimension in a (weakly) supervised setting. |
| 96 | +- More generally, the learned transformation :math:`L` can be used to project |
| 97 | + the data into a new embedding space before feeding it into another machine |
| 98 | + learning algorithm. |
| 99 | + |
| 100 | +The API of metric-learn is compatible with `scikit-learn |
| 101 | +<https://scikit-learn.org/>`_, the leading library for machine |
| 102 | +learning in Python. This allows to easily pipeline metric learners with other |
| 103 | +scikit-learn estimators to realize the above use-cases, to perform joint |
| 104 | +hyperparameter tuning, etc. |
| 105 | + |
| 106 | +Further reading |
| 107 | +=============== |
| 108 | + |
| 109 | +For more information about metric learning and its applications, one can refer |
| 110 | +to the following resources: |
| 111 | + |
| 112 | +- **Tutorial:** `Similarity and Distance Metric Learning with Applications to |
| 113 | + Computer Vision |
| 114 | + <http://researchers.lille.inria.fr/abellet/talks/metric_learning_tutorial_ECML_PKDD.pdf>`_ (2015) |
| 115 | +- **Surveys:** `A Survey on Metric Learning for Feature Vectors and Structured |
| 116 | + Data <https://arxiv.org/pdf/1306.6709.pdf>`_ (2013), `Metric Learning: A |
| 117 | + Survey <http://dx.doi.org/10.1561/2200000019>`_ (2012) |
| 118 | +- **Book:** `Metric Learning |
| 119 | + <http://dx.doi.org/10.2200/S00626ED1V01Y201501AIM030>`_ (2015) |
| 120 | + |
| 121 | +.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS] |
| 122 | +.. ============================================= |
| 123 | +
|
| 124 | +.. Currently, each metric learning algorithm supports the following methods: |
| 125 | +
|
| 126 | +.. - ``fit(...)``, which learns the model. |
| 127 | +.. - ``metric()``, which returns a Mahalanobis matrix |
| 128 | +.. :math:`M = L^{\top}L` such that distance between vectors ``x`` and |
| 129 | +.. ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`. |
| 130 | +.. - ``transformer_from_metric(metric)``, which returns a transformation matrix |
| 131 | +.. :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a |
| 132 | +.. data matrix :math:`X \in \mathbb{R}^{n \times d}` to the |
| 133 | +.. :math:`D`-dimensional learned metric space :math:`X L^{\top}`, |
| 134 | +.. in which standard Euclidean distances may be used. |
| 135 | +.. - ``transform(X)``, which applies the aforementioned transformation. |
| 136 | +.. - ``score_pairs(pairs)`` which returns the distance between pairs of |
| 137 | +.. points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs, |
| 138 | +.. 2, n_features)``, or it can be a 2D array-like of pairs indicators of |
| 139 | +.. shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more |
| 140 | +.. details). |
0 commit comments