Skip to content

Changes in documentation. Rephrasing, fixed examples, standarized notation, etc. #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ metric-learn contains efficient Python implementations of several popular superv

- For SDML, using skggm will allow the algorithm to solve problematic cases
(install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
- For running the examples only: matplotlib

**Installation/Setup**
Expand Down
3 changes: 2 additions & 1 deletion doc/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Run ``pip install metric-learn`` to download and install from PyPI.
Alternately, download the source repository and run:

- ``python setup.py install`` for default installation.
- ``python setup.py test`` to run all tests.
- ``pytest test`` to run all tests.

**Dependencies**

Expand All @@ -21,6 +21,7 @@ Alternately, download the source repository and run:

- For SDML, using skggm will allow the algorithm to solve problematic cases
(install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
- For running the examples only: matplotlib

Quick start
Expand Down
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Documentation outline

auto_examples/index

:ref:`genindex` | :ref:`modindex` | :ref:`search`
:ref:`genindex` | :ref:`search`

.. |Travis-CI Build Status| image:: https://api.travis-ci.org/scikit-learn-contrib/metric-learn.svg?branch=master
:target: https://travis-ci.org/scikit-learn-contrib/metric-learn
Expand Down
30 changes: 15 additions & 15 deletions doc/supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,13 @@ The distance is learned by solving the following optimization problem:
c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-||
\mathbf{L(x_i-x_l)}||^2]_+)

where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one
of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l`
where :math:`\mathbf{x}_i` is a data point, :math:`\mathbf{x}_j` is one
of its k-nearest neighbors sharing the same label, and :math:`\mathbf{x}_l`
are all the other instances within that region with different labels,
:math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators,
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest
neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0`
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class,
:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k-nearest
neighbors (with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0`
indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different classes,
:math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss.

.. topic:: Example Code:
Expand Down Expand Up @@ -235,7 +235,7 @@ the sum of probability of being correctly classified:

Local Fisher Discriminant Analysis (:py:class:`LFDA <metric_learn.LFDA>`)

`LFDA` is a linear supervised dimensionality reduction method. It is
`LFDA` is a linear supervised dimensionality reduction method which effectively combines the ideas of `Linear Discriminant Analysis <https://en.wikipedia.org/wiki/Linear_discriminant_analysis>` and Locality-Preserving Projection . It is
particularly useful when dealing with multi-modality, where one ore more classes
consist of separate clusters in input space. The core optimization problem of
LFDA is solved as a generalized eigenvalue problem.
Expand All @@ -261,18 +261,18 @@ where
\,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\

here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods.
matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods, `n` and `n_l` are the total number of points and the number of points per cluster `l` respectively.

Then the learning problem becomes derive the LFDA transformation matrix
:math:`\mathbf{T}_{LFDA}`:
:math:`\mathbf{L}_{LFDA}`:

.. math::

\mathbf{T}_{LFDA} = \arg\max_\mathbf{T}
[\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)}
\mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})]
\mathbf{L}_{LFDA} = \arg\max_\mathbf{L}
[\text{tr}((\mathbf{L}^T\mathbf{S}^{(w)}
\mathbf{L})^{-1}\mathbf{L}^T\mathbf{S}^{(b)}\mathbf{L})]

That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that
That is, it is looking for a transformation matrix :math:`\mathbf{L}` such that
nearby data pairs in the same class are made close and the data pairs in
different classes are separated from each other; far apart data pairs in the
same class are not imposed to be close.
Expand Down Expand Up @@ -326,9 +326,9 @@ empirical development. The Gaussian kernel is denoted as:

where :math:`d(\cdot, \cdot)` is the squared distance under some metrics,
here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i,
\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition
matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis
matrix :math:`\mathbf{M=A^TA}`.
\mathbf{x}_j) = ||\mathbf{L}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition
matrix :math:`\mathbf{L}` is derived from the decomposition of Mahalanobis
matrix :math:`\mathbf{M=L^TL}`.

Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set
:math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative
Expand Down
37 changes: 17 additions & 20 deletions doc/weakly_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -367,36 +367,36 @@ other methods, `ITML` does not rely on an eigenvalue computation or
semi-definite programming.


Given a Mahalanobis distance parameterized by :math:`A`, its corresponding
Given a Mahalanobis distance parameterized by :math:`M`, its corresponding
multivariate Gaussian is denoted as:

.. math::
p(\mathbf{x}; \mathbf{A}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{A}
p(\mathbf{x}; \mathbf{M}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{M}
(\mathbf{x}, \mu))
= \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{A}
= \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{M}
(\mathbf{x} - \mu))

where :math:`Z` is the normalization constant, the inverse of Mahalanobis
matrix :math:`\mathbf{A}^{-1}` is the covariance of the Gaussian.
matrix :math:`\mathbf{M}^{-1}` is the covariance of the Gaussian.

Given pairs of similar points :math:`S` and pairs of dissimilar points
:math:`D`, the distance metric learning problem is to minimize the LogDet
divergence, which is equivalent as minimizing :math:`\textbf{KL}(p(\mathbf{x};
\mathbf{A}_0) || p(\mathbf{x}; \mathbf{A}))`:
\mathbf{M}_0) || p(\mathbf{x}; \mathbf{M}))`:

.. math::

\min_\mathbf{A} D_{\ell \mathrm{d}}\left(A, A_{0}\right) =
\operatorname{tr}\left(A A_{0}^{-1}\right)-\log \operatorname{det}
\left(A A_{0}^{-1}\right)-n\\
\text{subject to } \quad d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j)
\min_\mathbf{A} D_{\ell \mathrm{d}}\left(M, M_{0}\right) =
\operatorname{tr}\left(M M_{0}^{-1}\right)-\log \operatorname{det}
\left(M M_{0}^{-1}\right)-n\\
\text{subject to } \quad d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j)
\leq u \qquad (\mathbf{x}_i, \mathbf{x}_j)\in S \\
d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i,
d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i,
\mathbf{x}_j)\in D


where :math:`u` and :math:`l` is the upper and the lower bound of distance
for similar and dissimilar pairs respectively, and :math:`\mathbf{A}_0`
for similar and dissimilar pairs respectively, and :math:`\mathbf{M}_0`
is the prior distance metric, set to identity matrix by default,
:math:`D_{\ell \mathrm{d}}(\cdot)` is the log determinant.

Expand Down Expand Up @@ -518,17 +518,14 @@ as the Mahalanobis matrix.

from metric_learn import RCA

pairs = [[[1.2, 7.5], [1.3, 1.5]],
[[6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6]],
[[6.2, 5.5], [5.4, 5.4]]]
y = [1, 1, -1, -1]

# in this task we want points where the first feature is close to be closer
# to each other, no matter how close the second feature is
X = [[-0.05, 3.0],[0.05, -3.0],
[0.1, -3.55],[-0.1, 3.55],
[-0.95, -0.05],[0.95, 0.05],
[0.4, 0.05],[-0.4, -0.05]]
chunks = [0, 0, 1, 1, 2, 2, 3, 3]

rca = RCA()
rca.fit(pairs, y)
rca.fit(X, chunks)

.. topic:: References:

Expand Down
2 changes: 1 addition & 1 deletion examples/plot_metric_learning_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def plot_tsne(X, y, colormap=plt.cm.Paired):
#
# ITML uses a regularizer that automatically enforces a Semi-Definite
# Positive Matrix condition - the LogDet divergence. It uses soft
# must-link or cannot like constraints, and a simple algorithm based on
# must-link or cannot-link constraints, and a simple algorithm based on
# Bregman projections. Unlike LMNN, ITML will implicitly enforce points from
# the same class to belong to the same cluster, as you can see below.
#
Expand Down
27 changes: 20 additions & 7 deletions metric_learn/itml.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,16 @@ class ITML(_BaseITML, _PairsClassifierMixin):

Examples
--------
>>> from metric_learn import ITML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> itml = ITML_Supervised(num_constraints=200)
>>> itml.fit(X, Y)
>>> from metric_learn import ITML
>>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
>>> [[6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6]],
>>> [[6.2, 5.5], [5.4, 5.4]]]
>>> y = [1, 1, -1, -1]
>>> # in this task we want points where the first feature is close to be
>>> # closer to each other, no matter how close the second feature is
>>> itml = ITML()
>>> itml.fit(pairs, y)

References
----------
Expand Down Expand Up @@ -335,6 +338,16 @@ class ITML_Supervised(_BaseITML, TransformerMixin):
The linear transformation ``L`` deduced from the learned Mahalanobis
metric (See function `components_from_metric`.)

Examples
--------
>>> from metric_learn import ITML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> itml = ITML_Supervised(num_constraints=200)
>>> itml.fit(X, Y)

See Also
--------
metric_learn.ITML : The original weakly-supervised algorithm
Expand Down
26 changes: 19 additions & 7 deletions metric_learn/lsml.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,15 @@ class LSML(_BaseLSML, _QuadrupletsClassifierMixin):

Examples
--------
>>> from metric_learn import LSML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> lsml = LSML_Supervised(num_constraints=200)
>>> lsml.fit(X, Y)
>>> from metric_learn import LSML
>>> quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
>>> [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
>>> [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
>>> # we want to make closer points where the first feature is close, and
>>> # further if the second feature is close
>>> lsml = LSML()
>>> lsml.fit(quadruplets)

References
----------
Expand Down Expand Up @@ -290,6 +292,16 @@ class LSML_Supervised(_BaseLSML, TransformerMixin):
prior. In any case, `random_state` is also used to randomly sample
constraints from labels.

Examples
--------
>>> from metric_learn import LSML_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> lsml = LSML_Supervised(num_constraints=200)
>>> lsml.fit(X, Y)

Attributes
----------
n_iter_ : `int`
Expand Down
27 changes: 20 additions & 7 deletions metric_learn/mmc.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,13 +426,16 @@ class MMC(_BaseMMC, _PairsClassifierMixin):

Examples
--------
>>> from metric_learn import MMC_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> mmc = MMC_Supervised(num_constraints=200)
>>> mmc.fit(X, Y)
>>> from metric_learn import MMC
>>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
>>> [[6.4, 2.6], [6.2, 9.7]],
>>> [[1.3, 4.5], [3.2, 4.6]],
>>> [[6.2, 5.5], [5.4, 5.4]]]
>>> y = [1, 1, -1, -1]
>>> # in this task we want points where the first feature is close to be
>>> # closer to each other, no matter how close the second feature is
>>> mmc = MMC()
>>> mmc.fit(pairs, y)

References
----------
Expand Down Expand Up @@ -552,6 +555,16 @@ class MMC_Supervised(_BaseMMC, TransformerMixin):
samples, and pairs of dissimilar samples by taking different class
samples. It then passes these pairs to `MMC` for training.

Examples
--------
>>> from metric_learn import MMC_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> mmc = MMC_Supervised(num_constraints=200)
>>> mmc.fit(X, Y)

Attributes
----------
n_iter_ : `int`
Expand Down
25 changes: 18 additions & 7 deletions metric_learn/rca.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,14 @@ class RCA(MahalanobisMixin, TransformerMixin):

Examples
--------
>>> from metric_learn import RCA_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
>>> rca.fit(X, Y)
>>> from metric_learn import RCA
>>> X = [[-0.05, 3.0],[0.05, -3.0],
>>> [0.1, -3.55],[-0.1, 3.55],
>>> [-0.95, -0.05],[0.95, 0.05],
>>> [0.4, 0.05],[-0.4, -0.05]]
>>> chunks = [0, 0, 1, 1, 2, 2, 3, 3]
>>> rca = RCA()
>>> rca.fit(X, chunks)

References
------------------
Expand Down Expand Up @@ -196,6 +197,16 @@ class RCA_Supervised(RCA):
A pseudo random number generator object or a seed for it if int.
It is used to randomly sample constraints from labels.

Examples
--------
>>> from metric_learn import RCA_Supervised
>>> from sklearn.datasets import load_iris
>>> iris_data = load_iris()
>>> X = iris_data['data']
>>> Y = iris_data['target']
>>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
>>> rca.fit(X, Y)

Attributes
----------
components_ : `numpy.ndarray`, shape=(n_components, n_features)
Expand Down