Changes in documentation. Rephrasing, fixed examples, standarized notation, etc. (#274)

grudloff · bellet · commit 1b40c3b6210a · 2020-01-20T14:35:48.000+01:00
* Multiple changes to the documentation. Rephrasing, fixed examples and standarized notation, and others.

* Forgot to change one A to L

* Replaced broken modindex link for module list

* fixed compliance with flake8

* Fixed typos, misplaced example, etc

* No new bullet and rectification

* remove modules index link

* add "respectively"

* fix rca examples

* fix rca examples again
diff --git a/README.rst b/README.rst
@@ -26,6 +26,7 @@ metric-learn contains efficient Python implementations of several popular superv
 
 - For SDML, using skggm will allow the algorithm to solve problematic cases
   (install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
+  ``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
 -  For running the examples only: matplotlib
 
 **Installation/Setup**
diff --git a/doc/getting_started.rst b/doc/getting_started.rst
@@ -10,7 +10,7 @@ Run ``pip install metric-learn`` to download and install from PyPI.
 Alternately, download the source repository and run:
 
 -  ``python setup.py install`` for default installation.
--  ``python setup.py test`` to run all tests.
+-  ``pytest test`` to run all tests.
 
 **Dependencies**
 
@@ -21,6 +21,7 @@ Alternately, download the source repository and run:
 
 - For SDML, using skggm will allow the algorithm to solve problematic cases
   (install from commit `a0ed406 <https://github.com/skggm/skggm/commit/a0ed406586c4364ea3297a658f415e13b5cbdaf8>`_).
+  ``pip install 'git+https://github.com/skggm/skggm.git@a0ed406586c4364ea3297a658f415e13b5cbdaf8'`` to install the required version of skggm from GitHub.
 -  For running the examples only: matplotlib
 
 Quick start
diff --git a/doc/index.rst b/doc/index.rst
@@ -52,7 +52,7 @@ Documentation outline
 
    auto_examples/index
 
-:ref:`genindex` | :ref:`modindex` | :ref:`search`
+:ref:`genindex` | :ref:`search`
 
 .. |Travis-CI Build Status| image:: https://api.travis-ci.org/scikit-learn-contrib/metric-learn.svg?branch=master
    :target: https://travis-ci.org/scikit-learn-contrib/metric-learn
diff --git a/doc/supervised.rst b/doc/supervised.rst
@@ -131,13 +131,13 @@ The distance is learned by solving the following optimization problem:
       c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-||
       \mathbf{L(x_i-x_l)}||^2]_+)
 
-where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one 
-of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l` 
+where :math:`\mathbf{x}_i` is a data point, :math:`\mathbf{x}_j` is one 
+of its k-nearest neighbors sharing the same label, and :math:`\mathbf{x}_l` 
 are all the other instances within that region with different labels, 
 :math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators, 
-:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest 
-neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0` 
-indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class, 
+:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k-nearest 
+neighbors (with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0` 
+indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different classes, 
 :math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss.
 
 .. topic:: Example Code:
@@ -235,7 +235,7 @@ the sum of probability of being correctly classified:
 
 Local Fisher Discriminant Analysis (:py:class:`LFDA <metric_learn.LFDA>`)
 
-`LFDA` is a linear supervised dimensionality reduction method. It is
+`LFDA` is a linear supervised dimensionality reduction method which effectively combines the ideas of `Linear Discriminant Analysis <https://en.wikipedia.org/wiki/Linear_discriminant_analysis>` and Locality-Preserving Projection . It is
 particularly useful when dealing with multi-modality, where one ore more classes
 consist of separate clusters in input space. The core optimization problem of
 LFDA is solved as a generalized eigenvalue problem.
@@ -261,18 +261,18 @@ where
     \,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\
 
 here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity
-matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods.
+matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods, `n` and `n_l` are the total number of points and the number of points per cluster `l` respectively.
 
 Then the learning problem becomes derive the LFDA transformation matrix 
-:math:`\mathbf{T}_{LFDA}`:
+:math:`\mathbf{L}_{LFDA}`:
 
 .. math::
 
-    \mathbf{T}_{LFDA} = \arg\max_\mathbf{T}
-    [\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)}
-    \mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})]
+    \mathbf{L}_{LFDA} = \arg\max_\mathbf{L}
+    [\text{tr}((\mathbf{L}^T\mathbf{S}^{(w)}
+    \mathbf{L})^{-1}\mathbf{L}^T\mathbf{S}^{(b)}\mathbf{L})]
 
-That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that 
+That is, it is looking for a transformation matrix :math:`\mathbf{L}` such that 
 nearby data pairs in the same class are made close and the data pairs in 
 different classes are separated from each other; far apart data pairs in the 
 same class are not imposed to be close.
@@ -326,9 +326,9 @@ empirical development. The Gaussian kernel is denoted as:
 
 where :math:`d(\cdot, \cdot)` is the squared distance under some metrics, 
 here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i, 
-\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition 
-matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis 
-matrix :math:`\mathbf{M=A^TA}`.
+\mathbf{x}_j) = ||\mathbf{L}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition 
+matrix :math:`\mathbf{L}` is derived from the decomposition of Mahalanobis 
+matrix :math:`\mathbf{M=L^TL}`.
 
 Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set 
 :math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative 
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
@@ -367,36 +367,36 @@ other methods, `ITML` does not rely on an eigenvalue computation or
 semi-definite programming.
 
 
-Given a Mahalanobis distance parameterized by :math:`A`, its corresponding 
+Given a Mahalanobis distance parameterized by :math:`M`, its corresponding 
 multivariate Gaussian is denoted as:
 
 .. math::
-    p(\mathbf{x}; \mathbf{A}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{A}
+    p(\mathbf{x}; \mathbf{M}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{M}
     (\mathbf{x}, \mu)) 
-    =  \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{A}
+    =  \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{M}
     (\mathbf{x} - \mu)) 
 
 where :math:`Z` is the normalization constant, the inverse of Mahalanobis 
-matrix :math:`\mathbf{A}^{-1}` is the covariance of the Gaussian.
+matrix :math:`\mathbf{M}^{-1}` is the covariance of the Gaussian.
 
 Given pairs of similar points :math:`S` and pairs of dissimilar points 
 :math:`D`, the distance metric learning problem is to minimize the LogDet
 divergence, which is equivalent as minimizing :math:`\textbf{KL}(p(\mathbf{x}; 
-\mathbf{A}_0) || p(\mathbf{x}; \mathbf{A}))`:
+\mathbf{M}_0) || p(\mathbf{x}; \mathbf{M}))`:
 
 .. math::
 
-    \min_\mathbf{A} D_{\ell \mathrm{d}}\left(A, A_{0}\right) = 
-    \operatorname{tr}\left(A A_{0}^{-1}\right)-\log \operatorname{det}
-    \left(A A_{0}^{-1}\right)-n\\
-    \text{subject to } \quad d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) 
+    \min_\mathbf{A} D_{\ell \mathrm{d}}\left(M, M_{0}\right) = 
+    \operatorname{tr}\left(M M_{0}^{-1}\right)-\log \operatorname{det}
+    \left(M M_{0}^{-1}\right)-n\\
+    \text{subject to } \quad d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j) 
     \leq u \qquad (\mathbf{x}_i, \mathbf{x}_j)\in S \\
-    d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i, 
+    d_\mathbf{M}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i, 
     \mathbf{x}_j)\in D
 
 
 where :math:`u` and :math:`l` is the upper and the lower bound of distance
-for similar and dissimilar pairs respectively, and :math:`\mathbf{A}_0` 
+for similar and dissimilar pairs respectively, and :math:`\mathbf{M}_0` 
 is the prior distance metric, set to identity matrix by default, 
 :math:`D_{\ell \mathrm{d}}(\cdot)` is the log determinant.
 
@@ -518,17 +518,14 @@ as the Mahalanobis matrix.
 
     from metric_learn import RCA
 
-    pairs = [[[1.2, 7.5], [1.3, 1.5]],
-             [[6.4, 2.6], [6.2, 9.7]],
-             [[1.3, 4.5], [3.2, 4.6]],
-             [[6.2, 5.5], [5.4, 5.4]]]
-    y = [1, 1, -1, -1]
-
-    # in this task we want points where the first feature is close to be closer
-    # to each other, no matter how close the second feature is
+    X = [[-0.05,  3.0],[0.05, -3.0],
+        [0.1, -3.55],[-0.1, 3.55],
+        [-0.95, -0.05],[0.95, 0.05],
+        [0.4,  0.05],[-0.4, -0.05]]
+    chunks = [0, 0, 1, 1, 2, 2, 3, 3]
 
     rca = RCA()
-    rca.fit(pairs, y)
+    rca.fit(X, chunks)
 
 .. topic:: References:
 
diff --git a/examples/plot_metric_learning_examples.py b/examples/plot_metric_learning_examples.py
@@ -175,7 +175,7 @@ def plot_tsne(X, y, colormap=plt.cm.Paired):
 #
 # ITML uses a regularizer that automatically enforces a Semi-Definite
 # Positive Matrix condition - the LogDet divergence. It uses soft
-# must-link or cannot like constraints, and a simple algorithm based on
+# must-link or cannot-link constraints, and a simple algorithm based on
 # Bregman projections. Unlike LMNN, ITML will implicitly enforce points from
 # the same class to belong to the same cluster, as you can see below.
 #
diff --git a/metric_learn/itml.py b/metric_learn/itml.py
@@ -198,13 +198,16 @@ class ITML(_BaseITML, _PairsClassifierMixin):
 
   Examples
   --------
-  >>> from metric_learn import ITML_Supervised
-  >>> from sklearn.datasets import load_iris
-  >>> iris_data = load_iris()
-  >>> X = iris_data['data']
-  >>> Y = iris_data['target']
-  >>> itml = ITML_Supervised(num_constraints=200)
-  >>> itml.fit(X, Y)
+  >>> from metric_learn import ITML
+  >>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
+  >>>         [[6.4, 2.6], [6.2, 9.7]],
+  >>>         [[1.3, 4.5], [3.2, 4.6]],
+  >>>         [[6.2, 5.5], [5.4, 5.4]]]
+  >>> y = [1, 1, -1, -1]
+  >>> # in this task we want points where the first feature is close to be
+  >>> # closer to each other, no matter how close the second feature is
+  >>> itml = ITML()
+  >>> itml.fit(pairs, y)
 
   References
   ----------
@@ -335,6 +338,16 @@ class ITML_Supervised(_BaseITML, TransformerMixin):
       The linear transformation ``L`` deduced from the learned Mahalanobis
       metric (See function `components_from_metric`.)
 
+  Examples
+  --------
+  >>> from metric_learn import ITML_Supervised
+  >>> from sklearn.datasets import load_iris
+  >>> iris_data = load_iris()
+  >>> X = iris_data['data']
+  >>> Y = iris_data['target']
+  >>> itml = ITML_Supervised(num_constraints=200)
+  >>> itml.fit(X, Y)
+
   See Also
   --------
   metric_learn.ITML : The original weakly-supervised algorithm
diff --git a/metric_learn/lsml.py b/metric_learn/lsml.py
@@ -186,13 +186,15 @@ class LSML(_BaseLSML, _QuadrupletsClassifierMixin):
 
   Examples
   --------
-  >>> from metric_learn import LSML_Supervised
-  >>> from sklearn.datasets import load_iris
-  >>> iris_data = load_iris()
-  >>> X = iris_data['data']
-  >>> Y = iris_data['target']
-  >>> lsml = LSML_Supervised(num_constraints=200)
-  >>> lsml.fit(X, Y)
+  >>> from metric_learn import LSML
+  >>> quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
+  >>>                [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
+  >>>                [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
+  >>>                [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
+  >>> # we want to make closer points where the first feature is close, and
+  >>> # further if the second feature is close
+  >>> lsml = LSML()
+  >>> lsml.fit(quadruplets)
 
   References
   ----------
@@ -290,6 +292,16 @@ class LSML_Supervised(_BaseLSML, TransformerMixin):
       prior. In any case, `random_state` is also used to randomly sample
       constraints from labels.
 
+  Examples
+  --------
+  >>> from metric_learn import LSML_Supervised
+  >>> from sklearn.datasets import load_iris
+  >>> iris_data = load_iris()
+  >>> X = iris_data['data']
+  >>> Y = iris_data['target']
+  >>> lsml = LSML_Supervised(num_constraints=200)
+  >>> lsml.fit(X, Y)
+
   Attributes
   ----------
   n_iter_ : `int`
diff --git a/metric_learn/mmc.py b/metric_learn/mmc.py
@@ -426,13 +426,16 @@ class MMC(_BaseMMC, _PairsClassifierMixin):
 
   Examples
   --------
-  >>> from metric_learn import MMC_Supervised
-  >>> from sklearn.datasets import load_iris
-  >>> iris_data = load_iris()
-  >>> X = iris_data['data']
-  >>> Y = iris_data['target']
-  >>> mmc = MMC_Supervised(num_constraints=200)
-  >>> mmc.fit(X, Y)
+  >>> from metric_learn import MMC
+  >>> pairs = [[[1.2, 7.5], [1.3, 1.5]],
+  >>>          [[6.4, 2.6], [6.2, 9.7]],
+  >>>          [[1.3, 4.5], [3.2, 4.6]],
+  >>>          [[6.2, 5.5], [5.4, 5.4]]]
+  >>> y = [1, 1, -1, -1]
+  >>> # in this task we want points where the first feature is close to be
+  >>> # closer to each other, no matter how close the second feature is
+  >>> mmc = MMC()
+  >>> mmc.fit(pairs, y)
 
   References
   ----------
@@ -552,6 +555,16 @@ class MMC_Supervised(_BaseMMC, TransformerMixin):
   samples, and pairs of dissimilar samples by taking different class
   samples. It then passes these pairs to `MMC` for training.
 
+  Examples
+  --------
+  >>> from metric_learn import MMC_Supervised
+  >>> from sklearn.datasets import load_iris
+  >>> iris_data = load_iris()
+  >>> X = iris_data['data']
+  >>> Y = iris_data['target']
+  >>> mmc = MMC_Supervised(num_constraints=200)
+  >>> mmc.fit(X, Y)
+
   Attributes
   ----------
   n_iter_ : `int`
diff --git a/metric_learn/rca.py b/metric_learn/rca.py
@@ -62,13 +62,14 @@ class RCA(MahalanobisMixin, TransformerMixin):
 
   Examples
   --------
-  >>> from metric_learn import RCA_Supervised
-  >>> from sklearn.datasets import load_iris
-  >>> iris_data = load_iris()
-  >>> X = iris_data['data']
-  >>> Y = iris_data['target']
-  >>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
-  >>> rca.fit(X, Y)
+  >>> from metric_learn import RCA
+  >>> X = [[-0.05,  3.0],[0.05, -3.0],
+  >>>     [0.1, -3.55],[-0.1, 3.55],
+  >>>     [-0.95, -0.05],[0.95, 0.05],
+  >>>     [0.4,  0.05],[-0.4, -0.05]]
+  >>> chunks = [0, 0, 1, 1, 2, 2, 3, 3]
+  >>> rca = RCA()
+  >>> rca.fit(X, chunks)
 
   References
   ------------------
@@ -196,6 +197,16 @@ class RCA_Supervised(RCA):
       A pseudo random number generator object or a seed for it if int.
       It is used to randomly sample constraints from labels.
 
+  Examples
+  --------
+  >>> from metric_learn import RCA_Supervised
+  >>> from sklearn.datasets import load_iris
+  >>> iris_data = load_iris()
+  >>> X = iris_data['data']
+  >>> Y = iris_data['target']
+  >>> rca = RCA_Supervised(num_chunks=30, chunk_size=2)
+  >>> rca.fit(X, Y)
+
   Attributes
   ----------
   components_ : `numpy.ndarray`, shape=(n_components, n_features)

Original file line number	Diff line number	Diff line change
`@@ -175,7 +175,7 @@ def plot_tsne(X, y, colormap=plt.cm.Paired):`
`175`	`175`	`#`
`176`	`176`	`# ITML uses a regularizer that automatically enforces a Semi-Definite`
`177`	`177`	`# Positive Matrix condition - the LogDet divergence. It uses soft`
`178`		`-# must-link or cannot like constraints, and a simple algorithm based on`
	`178`	`+# must-link or cannot-link constraints, and a simple algorithm based on`
`179`	`179`	`# Bregman projections. Unlike LMNN, ITML will implicitly enforce points from`
`180`	`180`	`# the same class to belong to the same cluster, as you can see below.`
`181`	`181`	`#`