From 61856b31aa3458a86e277317b5ee1ae5548aa738 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Wed, 21 Nov 2018 17:01:14 +0100
Subject: [PATCH 01/32] Create some text to initialize the PR

---
 doc/metric_learn.preprocessor.rst | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 doc/metric_learn.preprocessor.rst

diff --git a/doc/metric_learn.preprocessor.rst b/doc/metric_learn.preprocessor.rst
new file mode 100644
index 00000000..96cdbf46
--- /dev/null
+++ b/doc/metric_learn.preprocessor.rst
@@ -0,0 +1 @@
+Just some text to initialize the PR
\ No newline at end of file

From d5dd5178ad416f33dedb365037481cfe2ca9e211 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Mon, 3 Dec 2018 11:53:49 +0100
Subject: [PATCH 02/32] DOC: add doc outline

---
 doc/outline.md | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 doc/outline.md

diff --git a/doc/outline.md b/doc/outline.md
new file mode 100644
index 00000000..301bfb38
--- /dev/null
+++ b/doc/outline.md
@@ -0,0 +1,35 @@
+documentation outline:
+
+
+- Getting started/Quick Start: 
+	
+	- Explanation of what metric learning is, and what is the purpose of this package
+	- installation
+	- a very quick example on how to import an algo (supervised or not ?) and how to do fit (and predict ?) (and split train and test) on some custom dataset (maybe sklearn.datasets.load_lfw_pairs ?)
+
+- User Guide/List of algorithms: 
+
+	- Supervised Metric Learning: (add links to examples/images from examples at the right place in the description)
+		- Problem setting
+		- Input data (+ see Preprocessor section)
+		- What you can do after fit (transform...)
+
+	- Weakly Supervised Metric Learning: (add links to examples/images from examples at the right place in the description)
+		- Problem setting
+		- Input data (+ See Preprocessor section)
+		- What you can do after fit (predict/score, tranform...)
+		- Scikit-learn compatibility (compatible with grid search + link to example of grid search)
+
+	- Usage of the Preprocessor:
+		- Purpose (performance)
+		- Use (as an argument "preprocessor" in every metric learner)
+
+
+- Examples/Tutorials: 
+	- One example with faces (prediction if same/different person) 
+	- One example of grid search to compare different algorithms (mmc, itml etc)
+	- Clustering with side information
+	- Instance retrieval
+
+- API:
+	- doc automatically generated by docstrings
\ No newline at end of file

From b54ee3496123f6b065dd8769df67798c22f0c544 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Mon, 3 Dec 2018 11:59:59 +0100
Subject: [PATCH 03/32] DOC: Add data visualisation to possible examples

---
 doc/outline.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/outline.md b/doc/outline.md
index 301bfb38..8f20a39c 100644
--- a/doc/outline.md
+++ b/doc/outline.md
@@ -30,6 +30,7 @@ documentation outline:
 	- One example of grid search to compare different algorithms (mmc, itml etc)
 	- Clustering with side information
 	- Instance retrieval
+	- Data visualisation
 
 - API:
 	- doc automatically generated by docstrings
\ No newline at end of file

From 7495b68a7031b3f755b68cb44d27662bb52c56aa Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Tue, 11 Dec 2018 15:51:27 +0100
Subject: [PATCH 04/32] Update documentation outline

---
 doc/outline.md | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/doc/outline.md b/doc/outline.md
index 8f20a39c..afa372cc 100644
--- a/doc/outline.md
+++ b/doc/outline.md
@@ -13,12 +13,21 @@ documentation outline:
 		- Problem setting
 		- Input data (+ see Preprocessor section)
 		- What you can do after fit (transform...)
+		- Scikit-learn compatibility (compatible with grid search + link to example of grid search)
+		- List of algorithms + a more detailed description of each of them than
+		 the one in the docstring
 
 	- Weakly Supervised Metric Learning: (add links to examples/images from examples at the right place in the description)
 		- Problem setting
 		- Input data (+ See Preprocessor section)
 		- What you can do after fit (predict/score, tranform...)
 		- Scikit-learn compatibility (compatible with grid search + link to example of grid search)
+		(more detailed than for supervised because more complicated)
+		- List of algorithms + a more detailed description of each of them than
+		 the one in the docstring
+		
+	- Somewhere: some section explaining Mahalanobis Metric Learning 
+	(properties of the learned matrix etc)
 
 	- Usage of the Preprocessor:
 		- Purpose (performance)
@@ -30,7 +39,7 @@ documentation outline:
 	- One example of grid search to compare different algorithms (mmc, itml etc)
 	- Clustering with side information
 	- Instance retrieval
-	- Data visualisation
+	- Dimensionality reduction
 
 - API:
 	- doc automatically generated by docstrings
\ No newline at end of file

From 18325cde2d8a84ecfcb87490034d5984fdbacccf Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 13 Dec 2018 11:27:36 +0100
Subject: [PATCH 05/32] Add doc from master

---
 doc/conf.py                       |  2 +-
 doc/index.rst                     | 35 ++++++++++++++++++++++++++-----
 doc/metric_learn.covariance.rst   |  1 +
 doc/metric_learn.itml.rst         |  1 +
 doc/metric_learn.lfda.rst         |  1 +
 doc/metric_learn.lmnn.rst         |  1 +
 doc/metric_learn.lsml.rst         |  1 +
 doc/metric_learn.mlkr.rst         |  1 +
 doc/metric_learn.mmc.rst          |  1 +
 doc/metric_learn.nca.rst          |  1 +
 doc/metric_learn.preprocessor.rst |  1 -
 doc/metric_learn.rca.rst          |  1 +
 doc/metric_learn.sdml.rst         |  1 +
 doc/outline.md => outline.md      |  0
 14 files changed, 41 insertions(+), 7 deletions(-)
 delete mode 100644 doc/metric_learn.preprocessor.rst
 rename doc/outline.md => outline.md (100%)

diff --git a/doc/conf.py b/doc/conf.py
index 1c8beeab..dff9ce47 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -15,7 +15,7 @@
 
 # General information about the project.
 project = u'metric-learn'
-copyright = u'2015-2017, CJ Carey and Yuan Tang'
+copyright = u'2015-2018, CJ Carey and Yuan Tang'
 author = u'CJ Carey and Yuan Tang'
 version = '0.4.0'
 release = '0.4.0'
diff --git a/doc/index.rst b/doc/index.rst
index f50781fe..36a6e80c 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -13,20 +13,45 @@ metrics.
 This package contains efficient Python implementations of several popular
 metric learning algorithms.
 
+Supervised Algorithms
+---------------------
+Supervised metric learning algorithms take as inputs points `X` and target
+labels `y`, and learn a distance matrix that make points from the same class
+(for classification) or with close target value (for regression) close to
+each other, and points from different classes or with distant target values
+far away from each other.
+
 .. toctree::
-   :caption: Algorithms
    :maxdepth: 1
 
    metric_learn.covariance
    metric_learn.lmnn
-   metric_learn.itml
-   metric_learn.sdml
-   metric_learn.lsml
    metric_learn.nca
    metric_learn.lfda
+   metric_learn.mlkr
+
+Weakly-Supervised Algorithms
+--------------------------
+Weakly supervised algorithms work on weaker information about the data points
+than supervised algorithms. Rather than labeled points, they take as input
+similarity judgments on tuples of data points, for instance pairs of similar
+and dissimilar points. Refer to the documentation of each algorithm for its
+particular form of input data.
+
+.. toctree::
+   :maxdepth: 1
+
+   metric_learn.itml
+   metric_learn.lsml
+   metric_learn.sdml
    metric_learn.rca
+   metric_learn.mmc
+
+Note that each weakly-supervised algorithm has a supervised version of the form
+`*_Supervised` where similarity constraints are generated from
+the labels information and passed to the underlying algorithm.
 
-Each metric supports the following methods:
+Each metric learning algorithm supports the following methods:
 
 -  ``fit(...)``, which learns the model.
 -  ``transformer()``, which returns a transformation matrix
diff --git a/doc/metric_learn.covariance.rst b/doc/metric_learn.covariance.rst
index 92326cc0..493878c1 100644
--- a/doc/metric_learn.covariance.rst
+++ b/doc/metric_learn.covariance.rst
@@ -6,6 +6,7 @@ Covariance metric (baseline method)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.itml.rst b/doc/metric_learn.itml.rst
index d6fb2221..addb4c76 100644
--- a/doc/metric_learn.itml.rst
+++ b/doc/metric_learn.itml.rst
@@ -6,6 +6,7 @@ Information Theoretic Metric Learning (ITML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.lfda.rst b/doc/metric_learn.lfda.rst
index 95cde90d..41088a68 100644
--- a/doc/metric_learn.lfda.rst
+++ b/doc/metric_learn.lfda.rst
@@ -6,6 +6,7 @@ Local Fisher Discriminant Analysis (LFDA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.lmnn.rst b/doc/metric_learn.lmnn.rst
index 4062bfa0..bc65161e 100644
--- a/doc/metric_learn.lmnn.rst
+++ b/doc/metric_learn.lmnn.rst
@@ -6,6 +6,7 @@ Large Margin Nearest Neighbor (LMNN)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.lsml.rst b/doc/metric_learn.lsml.rst
index c6c8ede9..0deae4e6 100644
--- a/doc/metric_learn.lsml.rst
+++ b/doc/metric_learn.lsml.rst
@@ -6,6 +6,7 @@ Least Squares Metric Learning (LSML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.mlkr.rst b/doc/metric_learn.mlkr.rst
index a2f36c4f..f71697de 100644
--- a/doc/metric_learn.mlkr.rst
+++ b/doc/metric_learn.mlkr.rst
@@ -6,6 +6,7 @@ Metric Learning for Kernel Regression (MLKR)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.mmc.rst b/doc/metric_learn.mmc.rst
index f3ddaa9e..bb9031ba 100644
--- a/doc/metric_learn.mmc.rst
+++ b/doc/metric_learn.mmc.rst
@@ -6,6 +6,7 @@ Mahalanobis Metric Learning for Clustering (MMC)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.nca.rst b/doc/metric_learn.nca.rst
index 6a2675e5..7a4ee2c4 100644
--- a/doc/metric_learn.nca.rst
+++ b/doc/metric_learn.nca.rst
@@ -6,6 +6,7 @@ Neighborhood Components Analysis (NCA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.preprocessor.rst b/doc/metric_learn.preprocessor.rst
deleted file mode 100644
index 96cdbf46..00000000
--- a/doc/metric_learn.preprocessor.rst
+++ /dev/null
@@ -1 +0,0 @@
-Just some text to initialize the PR
\ No newline at end of file
diff --git a/doc/metric_learn.rca.rst b/doc/metric_learn.rca.rst
index 2430cd82..027d583b 100644
--- a/doc/metric_learn.rca.rst
+++ b/doc/metric_learn.rca.rst
@@ -6,6 +6,7 @@ Relative Components Analysis (RCA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/metric_learn.sdml.rst b/doc/metric_learn.sdml.rst
index 83570483..3e350a70 100644
--- a/doc/metric_learn.sdml.rst
+++ b/doc/metric_learn.sdml.rst
@@ -6,6 +6,7 @@ Sparse Determinant Metric Learning (SDML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------
diff --git a/doc/outline.md b/outline.md
similarity index 100%
rename from doc/outline.md
rename to outline.md

From 0ddaee3fe24ab9d7d187d45815e64f87d59da45a Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Mon, 17 Dec 2018 09:56:25 +0100
Subject: [PATCH 06/32] DOC: add beginning of doc tree

---
 doc/auto_examples/index.rst |  0
 doc/auto_examples/test.py   |  2 ++
 doc/getting_started.rst     |  6 +++++
 doc/index.rst               | 48 ++++++++++++++++++++++++-------------
 doc/introduction.rst        |  5 ++++
 doc/mahalanobis.rst         |  3 +++
 doc/preprocessor.rst        |  5 ++++
 doc/supervised.rst          | 17 +++++++++++++
 doc/user_guide.rst          | 16 +++++++++++++
 doc/weakly_supervised.rst   | 35 +++++++++++++++++++++++++++
 10 files changed, 121 insertions(+), 16 deletions(-)
 create mode 100644 doc/auto_examples/index.rst
 create mode 100644 doc/auto_examples/test.py
 create mode 100644 doc/getting_started.rst
 create mode 100644 doc/introduction.rst
 create mode 100644 doc/mahalanobis.rst
 create mode 100644 doc/preprocessor.rst
 create mode 100644 doc/supervised.rst
 create mode 100644 doc/user_guide.rst
 create mode 100644 doc/weakly_supervised.rst

diff --git a/doc/auto_examples/index.rst b/doc/auto_examples/index.rst
new file mode 100644
index 00000000..e69de29b
diff --git a/doc/auto_examples/test.py b/doc/auto_examples/test.py
new file mode 100644
index 00000000..392fb27a
--- /dev/null
+++ b/doc/auto_examples/test.py
@@ -0,0 +1,2 @@
+###################################
+# This is an example of python code
\ No newline at end of file
diff --git a/doc/getting_started.rst b/doc/getting_started.rst
new file mode 100644
index 00000000..b915f968
--- /dev/null
+++ b/doc/getting_started.rst
@@ -0,0 +1,6 @@
+###############
+Getting started
+###############
+
+ .. note:: Put some getting started content here (installation, and some
+ very simple example)
\ No newline at end of file
diff --git a/doc/index.rst b/doc/index.rst
index 36a6e80c..8c2ec1f0 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -21,14 +21,11 @@ labels `y`, and learn a distance matrix that make points from the same class
 each other, and points from different classes or with distant target values
 far away from each other.
 
-.. toctree::
-   :maxdepth: 1
-
-   metric_learn.covariance
-   metric_learn.lmnn
-   metric_learn.nca
-   metric_learn.lfda
-   metric_learn.mlkr
+- `Covariance <metric_learn.covariance.html>`_
+- `LMNN <metric_learn.lmnn.html>`_
+- `NCA <metric_learn.nca.html>`_
+- `LFDA <metric_learn.covariance.html>`_
+- `MLKR <metric_learn.mlkr.html>`_
 
 Weakly-Supervised Algorithms
 --------------------------
@@ -38,14 +35,11 @@ similarity judgments on tuples of data points, for instance pairs of similar
 and dissimilar points. Refer to the documentation of each algorithm for its
 particular form of input data.
 
-.. toctree::
-   :maxdepth: 1
-
-   metric_learn.itml
-   metric_learn.lsml
-   metric_learn.sdml
-   metric_learn.rca
-   metric_learn.mmc
+- `ITML <metric_learn.itml.html>`_
+- `LSML <metric_learn.lsml.html>`_
+- `SDML <metric_learn.sdml.html>`_
+- `RCA <metric_learn.rca.html>`_
+- `MMC <metric_learn.mmc.html>`_
 
 Note that each weakly-supervised algorithm has a supervised version of the form
 `*_Supervised` where similarity constraints are generated from
@@ -91,6 +85,20 @@ more complete.
 Navigation
 ----------
 
+
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+
+   getting_started
+
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: User Guide
+
+   user_guide
+
 :ref:`genindex` | :ref:`modindex` | :ref:`search`
 
 .. toctree::
@@ -99,6 +107,14 @@ Navigation
 
    Package Overview <metric_learn>
 
+.. toctree::
+   :maxdepth: 2
+   :hidden:
+   :caption: Tutorial - Examples
+
+   auto_examples/index
+
+
 .. |PyPI version| image:: https://badge.fury.io/py/metric-learn.svg
    :target: http://badge.fury.io/py/metric-learn
 .. |License| image:: http://img.shields.io/:license-mit-blue.svg?style=flat
diff --git a/doc/introduction.rst b/doc/introduction.rst
new file mode 100644
index 00000000..15fc9a12
--- /dev/null
+++ b/doc/introduction.rst
@@ -0,0 +1,5 @@
+============
+Introduction
+============
+
+.. note:: put some introduction here
\ No newline at end of file
diff --git a/doc/mahalanobis.rst b/doc/mahalanobis.rst
new file mode 100644
index 00000000..fab8901e
--- /dev/null
+++ b/doc/mahalanobis.rst
@@ -0,0 +1,3 @@
+===========================
+Mahalanobis Metric Learning
+===========================
\ No newline at end of file
diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
new file mode 100644
index 00000000..65c8b570
--- /dev/null
+++ b/doc/preprocessor.rst
@@ -0,0 +1,5 @@
+============
+Preprocessor
+============
+
+.. note:: explain the preprocessor here
\ No newline at end of file
diff --git a/doc/supervised.rst b/doc/supervised.rst
new file mode 100644
index 00000000..80236bbf
--- /dev/null
+++ b/doc/supervised.rst
@@ -0,0 +1,17 @@
+==========================
+Supervised Metric Learning
+==========================
+
+Problem Setting
+===============
+
+Input data
+==========
+
+Machine Learning pipeline
+=========================
+
+.. note:: Everything about training, predicting etc
+
+List of algorithms
+==================
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
new file mode 100644
index 00000000..a55f0768
--- /dev/null
+++ b/doc/user_guide.rst
@@ -0,0 +1,16 @@
+.. title:: User guide: contents
+
+.. _user_guide:
+
+==========
+User Guide
+==========
+
+.. toctree::
+   :numbered:
+
+   introduction.rst
+   supervised.rst
+   weakly_supervised.rst
+   mahalanobis.rst
+   preprocessor.rst
\ No newline at end of file
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
new file mode 100644
index 00000000..8bb4b2f9
--- /dev/null
+++ b/doc/weakly_supervised.rst
@@ -0,0 +1,35 @@
+=================================
+Weakly Supervised Metric Learning
+=================================
+
+Problem Setting
+===============
+
+Input data
+==========
+
+Machine Learning pipeline
+=========================
+
+.. note:: Everything about training, predicting etc
+
+List of algorithms
+==================
+
+1. ITML
+-------
+
+Some description about :class:`metric_learn.itml.ITML`
+
+
+2. LSML
+-------
+
+3. SDML
+-------
+
+4. RCA
+------
+
+5. MMC
+------

From e99865245a5a03fde37b5ef96122902a8c5247aa Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Mon, 17 Dec 2018 14:52:21 +0100
Subject: [PATCH 07/32] DOC: add some beginning of example to get started with
 the examples section using sphinx-gallery

---
 doc/auto_examples/test.py |  2 --
 doc/conf.py               |  1 +
 examples/README.txt       |  4 ++++
 examples/plot_lfw.py      | 44 +++++++++++++++++++++++++++++++++++++++
 examples/sandwich.py      |  4 ++++
 5 files changed, 53 insertions(+), 2 deletions(-)
 delete mode 100644 doc/auto_examples/test.py
 create mode 100644 examples/README.txt
 create mode 100644 examples/plot_lfw.py

diff --git a/doc/auto_examples/test.py b/doc/auto_examples/test.py
deleted file mode 100644
index 392fb27a..00000000
--- a/doc/auto_examples/test.py
+++ /dev/null
@@ -1,2 +0,0 @@
-###################################
-# This is an example of python code
\ No newline at end of file
diff --git a/doc/conf.py b/doc/conf.py
index dff9ce47..f65411eb 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -7,6 +7,7 @@
     'sphinx.ext.viewcode',
     'sphinx.ext.mathjax',
     'numpydoc',
+    'sphinx_gallery.gen_gallery'
 ]
 
 templates_path = ['_templates']
diff --git a/examples/README.txt b/examples/README.txt
new file mode 100644
index 00000000..9497791a
--- /dev/null
+++ b/examples/README.txt
@@ -0,0 +1,4 @@
+Examples
+========
+
+Below is a gallery of example of metric-learn use cases.
\ No newline at end of file
diff --git a/examples/plot_lfw.py b/examples/plot_lfw.py
new file mode 100644
index 00000000..7489bba6
--- /dev/null
+++ b/examples/plot_lfw.py
@@ -0,0 +1,44 @@
+# -*- coding: utf-8 -*-
+"""
+Learning on pairs
+=========================
+"""
+
+##################################################################################
+# Let's import a dataset of pairs of images from scikit-learn.
+
+from sklearn.datasets import fetch_lfw_pairs
+from sklearn.utils import shuffle
+
+dataset = fetch_lfw_pairs()
+pairs, y = shuffle(dataset.pairs, dataset.target, random_state=42)
+y = 2*y - 1  # we want +1 to indicate similar pairs and -1 dissimilar pairs
+
+######################################################################################
+# Let's print a pair of dissimilar points:
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+label = -1
+first_pair_idx = np.where(y==label)[0][0]
+fig, ax = plt.subplots(ncols=2, nrows=1)
+for i, img in enumerate(pairs[first_pair_idx]):
+    ax[i].imshow(img, cmap='Greys_r')
+fig.suptitle('Pair n°{}, Label: {}\n\n'.format(first_pair_idx, label))
+######################################################################################
+# Now let's print a pair of similar points:
+
+label = 1
+first_pair_idx = np.where(y==label)[0][0]
+fig, ax = plt.subplots(ncols=2, nrows=1)
+for i, img in enumerate(pairs[first_pair_idx]):
+    ax[i].imshow(img, cmap='Greys_r')
+fig.suptitle('Pair n°{}, Label: {}\n\n'.format(first_pair_idx, label))
+###############################################################################
+# Let's reshape the dataset so that it si indeed a 3D array of size ``(n_tuples, 2, n_features)``,
+# and print the first three elements
+
+
+pairs = pairs.reshape(pairs.shape[0], 2, -1)
+print(pairs[:3])
\ No newline at end of file
diff --git a/examples/sandwich.py b/examples/sandwich.py
index 08ec17c5..0e7658d3 100644
--- a/examples/sandwich.py
+++ b/examples/sandwich.py
@@ -1,4 +1,8 @@
+# -*- coding: utf-8 -*-
 """
+Sandwich demo
+=============
+
 Sandwich demo based on code from http://nbviewer.ipython.org/6576096
 """
 

From 41b91822d5d4fec63571f9b9718370ee59867a6c Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Mon, 17 Dec 2018 14:54:03 +0100
Subject: [PATCH 08/32] DOC: modify gitignore to ignore auto_examples

---
 .gitignore                  | 1 +
 doc/auto_examples/index.rst | 0
 2 files changed, 1 insertion(+)
 delete mode 100644 doc/auto_examples/index.rst

diff --git a/.gitignore b/.gitignore
index 4c81e9fa..c532a6cb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,3 +5,4 @@ dist/
 .coverage
 htmlcov/
 .cache/
+doc/auto_examples/*
diff --git a/doc/auto_examples/index.rst b/doc/auto_examples/index.rst
deleted file mode 100644
index e69de29b..00000000

From 0adb3c0305b71b0d62df6c8b77bc243910b8846a Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Wed, 19 Dec 2018 12:01:20 +0100
Subject: [PATCH 09/32] WIP: add preprocessor section and some section about
 weakly supervised learners and copy the previous content to the different
 sections

---
 doc/conf.py               |   3 +
 doc/getting_started.rst   |  36 ++++-
 doc/index.rst             |  96 +------------
 doc/introduction.rst      |  25 +++-
 doc/mahalanobis.rst       |   3 -
 doc/preprocessor.rst      | 127 +++++++++++++++-
 doc/supervised.rst        | 170 ++++++++++++++++++++--
 doc/user_guide.rst        |   1 -
 doc/weakly_supervised.rst | 295 +++++++++++++++++++++++++++++++++++++-
 examples/plot_lfw.py      |  44 ------
 10 files changed, 641 insertions(+), 159 deletions(-)
 delete mode 100644 doc/mahalanobis.rst
 delete mode 100644 examples/plot_lfw.py

diff --git a/doc/conf.py b/doc/conf.py
index f65411eb..ed476edd 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -32,3 +32,6 @@
 html_static_path = ['_static']
 htmlhelp_basename = 'metric-learndoc'
 
+# Option to only need single backticks to refer to symbols
+default_role = 'any'
+
diff --git a/doc/getting_started.rst b/doc/getting_started.rst
index b915f968..30a645de 100644
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
@@ -2,5 +2,37 @@
 Getting started
 ###############
 
- .. note:: Put some getting started content here (installation, and some
- very simple example)
\ No newline at end of file
+Installation and Setup
+======================
+
+Run ``pip install metric-learn`` to download and install from PyPI.
+
+Alternately, download the source repository and run:
+
+-  ``python setup.py install`` for default installation.
+-  ``python setup.py test`` to run all tests.
+
+**Dependencies**
+
+-  Python 2.7+, 3.4+
+-  numpy, scipy, scikit-learn
+-  (for running the examples only: matplotlib)
+
+**Notes**
+
+If a recent version of the Shogun Python modular (``modshogun``) library
+is available, the LMNN implementation will use the fast C++ version from
+there. The two implementations differ slightly, and the C++ version is
+more complete.
+
+
+Quick start
+===========
+
+>>> from metric_learn import NCA
+>>> from sklearn.datasets import load_iris
+>>> from sklearn.model_selection import cross_val_score
+>>>
+>>> X, y = load_iris(return_X_y=True)
+>>> nca = NCA(n_components=2)
+>>> cross_val_score(nca, X, y)
\ No newline at end of file
diff --git a/doc/index.rst b/doc/index.rst
index 8c2ec1f0..baedb26d 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -2,118 +2,30 @@ metric-learn: Metric Learning in Python
 =======================================
 |License| |PyPI version|
 
-Distance metrics are widely used in the machine learning literature.
-Traditionally, practicioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
-the domain.
-Distance metric learning (or simply, metric learning) is the sub-field of
-machine learning dedicated to automatically constructing optimal distance
-metrics.
-
-This package contains efficient Python implementations of several popular
-metric learning algorithms.
-
-Supervised Algorithms
----------------------
-Supervised metric learning algorithms take as inputs points `X` and target
-labels `y`, and learn a distance matrix that make points from the same class
-(for classification) or with close target value (for regression) close to
-each other, and points from different classes or with distant target values
-far away from each other.
-
-- `Covariance <metric_learn.covariance.html>`_
-- `LMNN <metric_learn.lmnn.html>`_
-- `NCA <metric_learn.nca.html>`_
-- `LFDA <metric_learn.covariance.html>`_
-- `MLKR <metric_learn.mlkr.html>`_
-
-Weakly-Supervised Algorithms
---------------------------
-Weakly supervised algorithms work on weaker information about the data points
-than supervised algorithms. Rather than labeled points, they take as input
-similarity judgments on tuples of data points, for instance pairs of similar
-and dissimilar points. Refer to the documentation of each algorithm for its
-particular form of input data.
-
-- `ITML <metric_learn.itml.html>`_
-- `LSML <metric_learn.lsml.html>`_
-- `SDML <metric_learn.sdml.html>`_
-- `RCA <metric_learn.rca.html>`_
-- `MMC <metric_learn.mmc.html>`_
-
-Note that each weakly-supervised algorithm has a supervised version of the form
-`*_Supervised` where similarity constraints are generated from
-the labels information and passed to the underlying algorithm.
-
-Each metric learning algorithm supports the following methods:
-
--  ``fit(...)``, which learns the model.
--  ``transformer()``, which returns a transformation matrix
-   :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
-   data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
-   :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
-   in which standard Euclidean distances may be used.
--  ``transform(X)``, which applies the aforementioned transformation.
--  ``metric()``, which returns a Mahalanobis matrix
-   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
-   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
-
-
-Installation and Setup
-======================
-
-Run ``pip install metric-learn`` to download and install from PyPI.
-
-Alternately, download the source repository and run:
-
--  ``python setup.py install`` for default installation.
--  ``python setup.py test`` to run all tests.
-
-**Dependencies**
-
--  Python 2.7+, 3.4+
--  numpy, scipy, scikit-learn
--  (for running the examples only: matplotlib)
-
-**Notes**
-
-If a recent version of the Shogun Python modular (``modshogun``) library
-is available, the LMNN implementation will use the fast C++ version from
-there. The two implementations differ slightly, and the C++ version is
-more complete.
-
-Navigation
-----------
-
+Welcome to metric-learn's documentation !
+=========================================
 
 .. toctree::
    :maxdepth: 2
-   :hidden:
 
    getting_started
 
 .. toctree::
    :maxdepth: 2
-   :hidden:
-   :caption: User Guide
 
    user_guide
 
-:ref:`genindex` | :ref:`modindex` | :ref:`search`
-
 .. toctree::
-   :maxdepth: 4
-   :hidden:
+   :maxdepth: 2
 
    Package Overview <metric_learn>
 
 .. toctree::
    :maxdepth: 2
-   :hidden:
-   :caption: Tutorial - Examples
 
    auto_examples/index
 
+:ref:`genindex` | :ref:`modindex` | :ref:`search`
 
 .. |PyPI version| image:: https://badge.fury.io/py/metric-learn.svg
    :target: http://badge.fury.io/py/metric-learn
diff --git a/doc/introduction.rst b/doc/introduction.rst
index 15fc9a12..67a83251 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -2,4 +2,27 @@
 Introduction
 ============
 
-.. note:: put some introduction here
\ No newline at end of file
+Distance metrics are widely used in the machine learning literature.
+Traditionally, practitioners would choose a standard distance metric
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
+the domain.
+Distance metric learning (or simply, metric learning) is the sub-field of
+machine learning dedicated to automatically constructing optimal distance
+metrics.
+
+This package contains efficient Python implementations of several popular
+metric learning algorithms.
+
+
+Each metric learning algorithm supports the following methods:
+
+-  ``fit(...)``, which learns the model.
+-  ``transformer()``, which returns a transformation matrix
+   :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
+   data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
+   :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
+   in which standard Euclidean distances may be used.
+-  ``transform(X)``, which applies the aforementioned transformation.
+-  ``metric()``, which returns a Mahalanobis matrix
+   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
+   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
\ No newline at end of file
diff --git a/doc/mahalanobis.rst b/doc/mahalanobis.rst
deleted file mode 100644
index fab8901e..00000000
--- a/doc/mahalanobis.rst
+++ /dev/null
@@ -1,3 +0,0 @@
-===========================
-Mahalanobis Metric Learning
-===========================
\ No newline at end of file
diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 65c8b570..b1b891bb 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -1,5 +1,130 @@
+:ref:`preprocessor`
+
 ============
 Preprocessor
 ============
 
-.. note:: explain the preprocessor here
\ No newline at end of file
+Estimators in metric-learn all have a ``preprocessor`` option at instantiation.
+Filling this argument allows them to take more compact input representation
+when fitting, predicting etc...
+
+Two types of objects can be put in this argument:
+
+Array-like
+----------
+You can specify ``preprocessor=X`` where ``X`` is an array-like containing the
+dataset of points. In this case, the estimator will be able to take as
+inputs an array-like of indices, replacing under the hood each index by the
+corresponding sample.
+
+
+Example with a supervised metric learner:
+
+>>> from metric_learn import NCA
+>>>
+>>> X = np.array([[-0.7 , -0.23],
+>>>               [-0.43, -0.49],
+>>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>> points_indices = np.array([2, 0, 1, 0])
+>>> y = np.array([1, 0, 1, 1])
+>>>
+>>> nca = NCA(preprocessor=X)
+>>> nca.fit(points_indices, y)
+>>> # under the hood the algorithm will create
+>>> # points = np.array([[ 0.14, -0.37],
+>>> #                    [-0.7 , -0.23],
+>>> #                    [-0.43, -0.49],
+>>> #                    [ 0.14, -0.37]]) and fit on it
+
+
+Example with a weakly supervised metric learner:
+
+>>> from metric_learn import MMC
+>>> X = np.array([[-0.7 , -0.23],
+>>>               [-0.43, -0.49],
+>>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>> pairs_indices = np.array([[2, 0], [1, 0]])
+>>> y_pairs = np.array([1, -1])
+>>>
+>>> mmc = MMC(preprocessor=X)
+>>> mmc.fit(pairs_indices, y_pairs)
+>>> # under the hood the algorithm will create
+>>> # pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
+>>> #                    [[-0.43, -0.49], [-0.7 , -0.23]]]) and fit on it
+
+Callable
+--------
+Instead, you can provide a callable in the argument ``preprocessor``.
+Then the estimator will accept indicators of points instead of points.
+Under the hood, the estimator will call this callable on the indicators you
+provide as input when fitting, predicting etc...
+Using a callable can be really useful to represent lazily a dataset of
+images stored on the file system for instance.
+The callable should take as an input an array-like, and return a 2D
+array-like. For supervised learners it will be applied on the whole array of
+indicators at once, and for weakly supervised learners it will be applied
+on each column of the array of tuples.
+
+Example with a supervised metric learner:
+
+The callable should take as input an array-like, and return a 2D array-like.
+
+>>> def find_images(arr):
+>>>     X = np.array([[-0.7 , -0.23],
+>>>                   [-0.43, -0.49],
+>>>                   [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>>     result = []
+>>>     for img_path in arr:
+>>>         result.append(X[int(img_path[3:5])])
+>>>         # transforms 'img01.png' into X[1]
+>>>     return np.array(result)
+>>> images_paths = ['img01.png', 'img00.png', 'img02.png']
+>>> y = np.array([1, 0, 1])
+>>>
+>>> nca = NCA(preprocessor=find_images)
+>>> nca.fit(images_paths, y)
+>>> # under the hood preprocessor(indicators) will be called
+
+
+Example with a weakly supervised metric learner:
+
+The given callable should take as input an array-like, and return a
+2D array-like. It will be called on each column of the input tuples of
+indicators.
+
+>>> def find_images(arr):
+>>>     X = np.array([[-0.7 , -0.23],
+>>>                   [-0.43, -0.49],
+>>>                   [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>>     result = []
+>>>     for img_path in arr:
+>>>         result.append(X[int(img_path[3:5])])
+>>>         # transforms 'img01.png' into X[1]
+>>>     return np.array(result)
+>>> pairs_images_paths = [['img02.png', 'img00.png'],
+>>>                       ['img01.png', 'img00.png']]
+>>> y_pairs = np.array([1, -1])
+>>>
+>>> mmc = NCA(preprocessor=find_images)
+>>> mmc.fit(pairs_images_paths, y_pairs)
+>>> # under the hood preprocessor(pairs_indicators[i]) will be called for each
+>>> # i in [0, 1]
+
+
+.. note:: Note that when you fill the ``preprocessor`` option, it allows you
+ to give more compact inputs, but the classical way of providing inputs
+ stays valid (2D array-like for ``X`` for supervised learners and 3D
+ array-like of tuples for weakly supervised learners).
+
+ Example: This would work:
+
+ >>> from metric_learn import MMC
+ >>> X = np.array([[-0.7 , -0.23],
+ >>>               [-0.43, -0.49],
+ >>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+ >>> pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
+ >>>                   [[-0.43, -0.49], [-0.7 , -0.23]]])
+ >>> y_pairs = np.array([1, -1])
+ >>>
+ >>> mmc = MMC(preprocessor=X)
+ >>> mmc.fit(pairs, y_pairs)
diff --git a/doc/supervised.rst b/doc/supervised.rst
index 80236bbf..1e54103b 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -2,16 +2,170 @@
 Supervised Metric Learning
 ==========================
 
-Problem Setting
-===============
+Supervised metric learning algorithms take as inputs points `X` and target
+labels `y`, and learn a distance matrix that make points from the same class
+(for classification) or with close target value (for regression) close to
+each other, and points from different classes or with distant target values
+far away from each other.
 
-Input data
+Scikit-learn compatibility
+==========================
+
+All supervised algorithms are scikit-learn `Estimators`, so they are
+compatible with Pipelining and scikit-learn model selection routines.
+
+Algorithms
 ==========
 
-Machine Learning pipeline
-=========================
+Covariance
+----------
+
+.. todo:: Covariance is unsupervised, so its doc should not be here.
+
+:class:`Covariance` does not "learn" anything, rather it calculates
+the covariance matrix of the input data. This is a simple baseline method.
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import Covariance
+    from sklearn.datasets import load_iris
+
+    iris = load_iris()['data']
+
+    cov = Covariance().fit(iris)
+    x = cov.transform(iris)
+
+.. rubric:: References:
+On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936
+
+LMNN
+-----
+
+Large-margin nearest neighbor metric learning. (Weinberger 2005)
+
+:class:`LMNN` learns a Mahanalobis distance metric in the kNN
+classification setting using semidefinite programming. The learned metric
+attempts to keep k-nearest neighbors in the same class, while keeping examples
+from different classes separated by a large margin. This algorithm makes no
+assumptions about the distribution of the data.
+
+.. rubric:: Example Code:
+
+::
+
+    import numpy as np
+    from metric_learn import LMNN
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    lmnn = LMNN(k=5, learn_rate=1e-6)
+    lmnn.fit(X, Y, verbose=False)
+
+If a recent version of the Shogun Python modular (``modshogun``) library
+is available, the LMNN implementation will use the fast C++ version from
+there. Otherwise, the included pure-Python version will be used.
+The two implementations differ slightly, and the C++ version is more complete.
+
+.. rubric:: References:
+
+`Distance Metric Learning for Large Margin Nearest Neighbor Classification <http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification>`_ Kilian Q. Weinberger, John Blitzer, Lawrence K. Saul
+
+NCA
+---
+
+Neighborhood Components Analysis (:class:`NCA`) is a distance
+metric learning algorithm which aims to improve the accuracy of nearest
+neighbors classification compared to the standard Euclidean distance. The
+algorithm  directly  maximizes  a stochastic  variant  of  the leave-one-out
+k-nearest neighbors (KNN) score on the training set.  It can also learn a
+low-dimensional linear  embedding  of  data  that  can  be used for data
+visualization and fast classification.
+
+.. rubric:: Example Code:
+
+::
+
+    import numpy as np
+    from metric_learn import NCA
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    nca = NCA(max_iter=1000, learning_rate=0.01)
+    nca.fit(X, Y)
+
+.. rubric:: References:
+
+.. [1] J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov.
+"Neighbourhood Components Analysis". Advances in Neural Information
+Processing Systems. 17, 513-520, 2005.
+http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf
+
+.. [2] Wikipedia entry on Neighborhood Components Analysis
+https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
+
+LFDA
+----
+
+Local Fisher Discriminant Analysis (LFDA)
+
+:class:`LFDA` is a linear supervised dimensionality reduction
+method. It is particularly useful when dealing with multimodality, where one
+ore more classes consist of separate clusters in input space. The core
+optimization problem of LFDA is solved as a generalized eigenvalue problem.
+
+.. rubric:: Example Code:
+
+::
+
+    import numpy as np
+    from metric_learn import LFDA
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    lfda = LFDA(k=2, dim=2)
+    lfda.fit(X, Y)
+
+.. rubric:: References:
+
+`Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis <http://www.ms.k.u-tokyo.ac.jp/2007/LFDA.pdf>`_ Masashi Sugiyama.
+
+`Local Fisher Discriminant Analysis on Beer Style Clustering <https://gastrograph.com/resources/whitepapers/local-fisher-discriminant-analysis-on-beer-style-clustering.html#>`_ Yuan Tang.
+
+
+MLKR
+----
+
+:class:`MLKR` is an algorithm for supervised metric learning,
+which learns a distance function by directly minimising the leave-one-out
+regression error. This algorithm can also be viewed as a supervised variation
+of PCA and can be used for dimensionality reduction and high dimensional data
+visualization.
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import MLKR
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    mlkr = MLKR()
+    mlkr.fit(X, Y)
 
-.. note:: Everything about training, predicting etc
+.. rubric:: References:
 
-List of algorithms
-==================
+`Information-theoretic Metric Learning <http://machinelearning.wustl.edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis, et al.
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
index a55f0768..fb7060ce 100644
--- a/doc/user_guide.rst
+++ b/doc/user_guide.rst
@@ -12,5 +12,4 @@ User Guide
    introduction.rst
    supervised.rst
    weakly_supervised.rst
-   mahalanobis.rst
    preprocessor.rst
\ No newline at end of file
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 8bb4b2f9..b4ee2970 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -2,34 +2,315 @@
 Weakly Supervised Metric Learning
 =================================
 
-Problem Setting
-===============
+Weakly supervised algorithms work on weaker information about the data points
+than supervised algorithms. Rather than labeled points, they take as input
+similarity judgments on tuples of data points, for instance pairs of similar
+and dissimilar points. Refer to the documentation of each algorithm for its
+particular form of input data.
+
 
 Input data
 ==========
 
-Machine Learning pipeline
-=========================
+In the following paragraph we talk about tuples for sake of generality.
+These can be pairs, triplets, quadruplets etc, depending on what algorithm we
+use.
+
+Basic form
+----------
+Every weakly supervised algorithm will take as input tuples of points, and if needed labels for theses tuples.
+
+
+The `tuples` argument is the first argument of every method (like the X
+argument for classical algorithms in scikit-learn). The second argument is
+the label of the tuple: what it is depends on the algorithm used. For
+instance for pairs learners ``y`` is a label indicating if the pair is of
+similar samples or dissimilar samples.
+
+Then one can fit a Weakly Supervised Metric Learner on this tuple, like this:
+
+>>> my_algo.fit(tuples, y)
+
+Like in a classical setting we split the points ``X`` between train and test, here we split the ``tuples`` between train and test.
+
+>>> from sklearn.model_selection import train_test_split
+>>> pairs_train, pairs_test, y_train, y_test = train_test_split(pairs, y)
+
+These are two data structures that can be used to represent tuple in metric
+learn:
+
+3D array of tuples
+------------------
+
+The most intuitive way to represent tuples is to provide the algorithm with a
+3D array-like of tuples of shape ``(n_tuples, t, n_features)``, where
+``n_tuples`` is the number of tuples, ``tuple_size`` is the number of elements
+in a tuple (2 for pairs, 3 for triplets for instance), and ``n_features`` is
+the number of features of each point.
+
+.. rubric:: Example:
+   Here is an artificial dataset of 4 pairs of 2 points of 3 features each:
+
+>>> import numpy as np
+>>> tuples = np.array([[[-0.12, -1.21, -0.20],
+>>>                     [+0.05, -0.19, -0.05]],
+>>>
+>>>                    [[-2.16, +0.11, -0.02],
+>>>                     [+1.58, +0.16, +0.93]],
+>>>
+>>>                    [[+1.58, +0.16, +0.93 ],  # same as tuples[1, 1, :]
+>>>                     [+0.89, -0.34, +2.41]],
+>>>
+>>>                    [[-0.12, -1.21, -0.20 ],  # same as tuples[0, 0, :]
+>>>                     [-2.16, +0.11, -0.02]]])  # same as tuples[1, 0, :]
+>>> y = np.array([-1, 1, 1, -1])
+
+.. warning:: This way of specifying pairs is not recommended for a large number
+   of tuples, as it is redundant (see the comments in the example) and hence
+   takes a lot of memory. Indeed each feature vector of a point will be
+   replicated as many times as a point is involved in a tuple. The second way
+   to specify pairs is more efficient
+
+
+2D array of indicators + preprocessor
+-------------------------------------
+
+Instead of forming each point in each tuple, a more efficient representation
+would be to keep the dataset of points ``X`` aside, and just represent tuples
+as a collection of tuples of *indices* from the points in ``X``. Since we loose
+the feature dimension there, the resulting array is 2D.
+
+.. rubric:: Example: An equivalent representation of the above pairs would be:
+
+>>> X = np.array([[-0.12, -1.21, -0.20],
+>>>               [+0.05, -0.19, -0.05],
+>>>               [-2.16, +0.11, -0.02],
+>>>               [+1.58, +0.16, +0.93],
+>>>               [+0.89, -0.34, +2.41]])
+>>>
+>>> tuples_indices = np.array([[0, 1],
+>>>                            [2, 3],
+>>>                            [3, 4],
+>>>                            [0, 2]])
+>>> y = np.array([-1, 1, 1, -1])
+
+In order to fit metric learning algorithms with this type of input, we need
+to give the original dataset of points ``X`` to the estimator so that it
+knows what point the indices refer to. We do this when initializing the
+estimator, through the argument `preprocessor`.
 
-.. note:: Everything about training, predicting etc
+.. rubric:: Example:
 
-List of algorithms
+>>> from metric_learn import MMC
+>>> mmc = MMC(preprocessor=X)
+>>> mmc.fit(pairs_indice, y)
+
+
+.. note::
+
+   Instead of an array-like, you can give a callable in the argument
+   ``preprocessor``, which will go fetch and form the tuples. This allows to
+   give more general indicators than just indices from an array (for instance
+   paths in the filesystem, name of records in a database etc...) See section
+   :ref:`preprocessor` for more details on how to use the preprocessor.
+
+
+Scikit-learn compatibility
+==========================
+
+Weakly supervised estimators are compatible with scikit-learn routines for
+model selection (grid-search, cross-validation etc). See the scoring section
+for more details on what scoring is used in the case of Weakly Supervised
+Metric Learning.
+
+.. rubric:: Example
+
+>>> from metric_learn import MMC
+>>> from sklearn.datasets import load_iris
+>>> from sklearn.model_selection import cross_val_score
+>>> rng = np.random.RandomState(42)
+>>> X, _ = load_iris(return_X_y=True)
+>>> # let's sample 30 random pairs and labels of pairs
+>>> pairs_indices = rng.randint(X.shape[0], size=(30, 2))
+>>> y = rng.randint(2, size=30)
+>>> mmc = MMC(preprocessor=X)
+>>> cross_val_score(mmc, pairs_indices, y)
+
+Scoring
+=======
+
+Some default scoring are implemented in metric-learn, depending on which
+kind of tuples you work on. See the docstring of the `score` method of the
+estimator you use.
+
+
+Algorithms
 ==================
 
+Note that each weakly-supervised algorithm has a supervised version of the form
+`*_Supervised` where similarity tuples are generated from
+the labels information and passed to the underlying algorithm.
+
+.. todo:: add more details on `_Supervised` classes
+
 1. ITML
 -------
 
-Some description about :class:`metric_learn.itml.ITML`
+Information Theoretic Metric Learning, Kulis et al., ICML 2007
+
+`ITML` minimizes the differential relative entropy between two multivariate
+Gaussians under constraints on the distance function,
+which can be formulated into a Bregman optimization problem by minimizing the
+LogDet divergence subject to linear constraints.
+This algorithm can handle a wide variety of constraints and can optionally
+incorporate a prior on the distance function.
+Unlike some other methods, ITML does not rely on an eigenvalue computation
+or semi-definite programming.
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import ITML_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    itml = ITML_Supervised(num_constraints=200)
+    itml.fit(X, Y)
+
+.. rubric:: References:
+
+`Information-theoretic Metric Learning <http://machinelearning.wustl.edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis, et al.
+Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/itml/
 
 
 2. LSML
 -------
 
+`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
+Residual
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import LSML_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    lsml = LSML_Supervised(num_constraints=200)
+    lsml.fit(X, Y)
+
+.. rubric:: References:
+
+Liu et al.
+"Metric Learning from Relative Comparisons by Minimizing Squared Residual".
+ICDM 2012.
+
+Adapted from https://gist.github.com/kcarnold/5439917
+Paper: http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
+
+
 3. SDML
 -------
 
+`SDML`: An efficient sparse metric learning in high-dimensional space via
+L1-penalized log-determinant regularization
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import SDML_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    sdml = SDML_Supervised(num_constraints=200)
+    sdml.fit(X, Y)
+
+.. rubric:: References:
+
+Qi et al.
+An efficient sparse metric learning in high-dimensional space via
+L1-penalized log-determinant regularization.
+ICML 2009
+
+Adapted from https://gist.github.com/kcarnold/5439945
+Paper: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/icml09-guojun.pdf
+
+
 4. RCA
 ------
 
+Relative Components Analysis (RCA)
+
+`RCA` learns a full rank Mahalanobis distance metric based on a
+weighted sum of in-class covariance matrices.
+It applies a global linear transformation to assign large weights to
+relevant dimensions and low weights to irrelevant dimensions.
+Those relevant dimensions are estimated using "chunklets",
+subsets of points that are known to belong to the same class.
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import RCA_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    rca = RCA_Supervised(num_chunks=30, chunk_size=2)
+    rca.fit(X, Y)
+
+.. rubric:: References:
+`Adjustment learning and relevant component analysis <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.2871&rep=rep1&type=pdf>`_ Noam Shental, et al.
+'Learning distance functions using equivalence relations', ICML 2003
+'Learning a Mahalanobis metric from equivalence constraints', JMLR 2005
+
 5. MMC
 ------
+
+Mahalanobis Metric Learning with Application for Clustering with
+Side-Information, Xing et al., NIPS 2002
+
+`MMC` minimizes the sum of squared distances between similar examples, while
+enforcing the sum of distances between dissimilar examples to be greater than a
+certain margin. This leads to a convex and, thus, local-minima-free
+optimization problem that can be solved efficiently. However, the algorithm
+involves the computation of eigenvalues, which is the main speed-bottleneck.
+Since it has initially been designed for clustering applications, one of the
+implicit assumptions of MMC is that all classes form a compact set, i.e.,
+follow a unimodal distribution, which restricts the possible use-cases of this
+method. However, it is one of the earliest and a still often cited technique.
+
+Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz
+
+.. rubric:: Example Code:
+
+::
+
+    from metric_learn import MMC_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    mmc = MMC_Supervised(num_constraints=200)
+    mmc.fit(X, Y)
+
+.. rubric:: References:
+
+`Distance metric learning with application to clustering with side-information <http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
\ No newline at end of file
diff --git a/examples/plot_lfw.py b/examples/plot_lfw.py
deleted file mode 100644
index 7489bba6..00000000
--- a/examples/plot_lfw.py
+++ /dev/null
@@ -1,44 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-Learning on pairs
-=========================
-"""
-
-##################################################################################
-# Let's import a dataset of pairs of images from scikit-learn.
-
-from sklearn.datasets import fetch_lfw_pairs
-from sklearn.utils import shuffle
-
-dataset = fetch_lfw_pairs()
-pairs, y = shuffle(dataset.pairs, dataset.target, random_state=42)
-y = 2*y - 1  # we want +1 to indicate similar pairs and -1 dissimilar pairs
-
-######################################################################################
-# Let's print a pair of dissimilar points:
-
-import matplotlib.pyplot as plt
-import numpy as np
-
-label = -1
-first_pair_idx = np.where(y==label)[0][0]
-fig, ax = plt.subplots(ncols=2, nrows=1)
-for i, img in enumerate(pairs[first_pair_idx]):
-    ax[i].imshow(img, cmap='Greys_r')
-fig.suptitle('Pair n°{}, Label: {}\n\n'.format(first_pair_idx, label))
-######################################################################################
-# Now let's print a pair of similar points:
-
-label = 1
-first_pair_idx = np.where(y==label)[0][0]
-fig, ax = plt.subplots(ncols=2, nrows=1)
-for i, img in enumerate(pairs[first_pair_idx]):
-    ax[i].imshow(img, cmap='Greys_r')
-fig.suptitle('Pair n°{}, Label: {}\n\n'.format(first_pair_idx, label))
-###############################################################################
-# Let's reshape the dataset so that it si indeed a 3D array of size ``(n_tuples, 2, n_features)``,
-# and print the first three elements
-
-
-pairs = pairs.reshape(pairs.shape[0], 2, -1)
-print(pairs[:3])
\ No newline at end of file

From 26306ba7407eb8c932d1e163e1199ae599272551 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Wed, 19 Dec 2018 18:28:05 +0100
Subject: [PATCH 10/32] A few style improvements (text wraping to line limit,
 better references formatting ...)

---
 doc/supervised.rst        | 104 ++++++++++++++++-------------
 doc/weakly_supervised.rst | 137 ++++++++++++++++++++------------------
 2 files changed, 132 insertions(+), 109 deletions(-)

diff --git a/doc/supervised.rst b/doc/supervised.rst
index 1e54103b..cde5dc60 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -4,9 +4,9 @@ Supervised Metric Learning
 
 Supervised metric learning algorithms take as inputs points `X` and target
 labels `y`, and learn a distance matrix that make points from the same class
-(for classification) or with close target value (for regression) close to
-each other, and points from different classes or with distant target values
-far away from each other.
+(for classification) or with close target value (for regression) close to each
+other, and points from different classes or with distant target values far away
+from each other.
 
 Scikit-learn compatibility
 ==========================
@@ -25,7 +25,7 @@ Covariance
 :class:`Covariance` does not "learn" anything, rather it calculates
 the covariance matrix of the input data. This is a simple baseline method.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -37,21 +37,22 @@ the covariance matrix of the input data. This is a simple baseline method.
     cov = Covariance().fit(iris)
     x = cov.transform(iris)
 
-.. rubric:: References:
-On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936
+.. topic:: References:
+
+    .. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936
 
 LMNN
 -----
 
-Large-margin nearest neighbor metric learning. (Weinberger 2005)
+Large-margin nearest neighbor metric learning.
 
-:class:`LMNN` learns a Mahanalobis distance metric in the kNN
-classification setting using semidefinite programming. The learned metric
-attempts to keep k-nearest neighbors in the same class, while keeping examples
-from different classes separated by a large margin. This algorithm makes no
-assumptions about the distribution of the data.
+:class:`LMNN` learns a Mahanalobis distance metric in the kNN classification
+setting using semidefinite programming. The learned metric attempts to keep
+k-nearest neighbors in the same class, while keeping examples from different
+classes separated by a large margin. This algorithm makes no assumptions about
+the distribution of the data.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -71,22 +72,26 @@ is available, the LMNN implementation will use the fast C++ version from
 there. Otherwise, the included pure-Python version will be used.
 The two implementations differ slightly, and the C++ version is more complete.
 
-.. rubric:: References:
+.. topic:: References:
 
-`Distance Metric Learning for Large Margin Nearest Neighbor Classification <http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification>`_ Kilian Q. Weinberger, John Blitzer, Lawrence K. Saul
+    .. [1] `Distance Metric Learning for Large Margin Nearest Neighbor
+       Classification
+       <http://papers.nips.cc/paper/2795-distance-metric-learning-for-large
+       -margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John
+       Blitzer, Lawrence K. Saul
 
 NCA
 ---
 
-Neighborhood Components Analysis (:class:`NCA`) is a distance
-metric learning algorithm which aims to improve the accuracy of nearest
-neighbors classification compared to the standard Euclidean distance. The
-algorithm  directly  maximizes  a stochastic  variant  of  the leave-one-out
-k-nearest neighbors (KNN) score on the training set.  It can also learn a
-low-dimensional linear  embedding  of  data  that  can  be used for data
-visualization and fast classification.
+Neighborhood Components Analysis (:class:`NCA`) is a distance metric learning
+algorithm which aims to improve the accuracy of nearest neighbors
+classification compared to the standard Euclidean distance. The algorithm
+directly  maximizes  a stochastic  variant  of  the leave-one-out k-nearest
+neighbors (KNN) score on the training set.  It can also learn a low-dimensional
+linear  embedding  of  data  that  can  be used for data visualization and fast
+classification.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -101,27 +106,27 @@ visualization and fast classification.
     nca = NCA(max_iter=1000, learning_rate=0.01)
     nca.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-.. [1] J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov.
-"Neighbourhood Components Analysis". Advances in Neural Information
-Processing Systems. 17, 513-520, 2005.
-http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf
+    .. [1] J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov.
+       "Neighbourhood Components Analysis". Advances in Neural Information
+       Processing Systems. 17, 513-520, 2005.
+       http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf
 
-.. [2] Wikipedia entry on Neighborhood Components Analysis
-https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
+    .. [2] Wikipedia entry on Neighborhood Components Analysis
+       https://en.wikipedia.org/wiki/Neighbourhood_components_analysis
 
 LFDA
 ----
 
 Local Fisher Discriminant Analysis (LFDA)
 
-:class:`LFDA` is a linear supervised dimensionality reduction
-method. It is particularly useful when dealing with multimodality, where one
-ore more classes consist of separate clusters in input space. The core
-optimization problem of LFDA is solved as a generalized eigenvalue problem.
+:class:`LFDA` is a linear supervised dimensionality reduction method. It is
+particularly useful when dealing with multimodality, where one ore more classes
+consist of separate clusters in input space. The core optimization problem of
+LFDA is solved as a generalized eigenvalue problem.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -136,23 +141,28 @@ optimization problem of LFDA is solved as a generalized eigenvalue problem.
     lfda = LFDA(k=2, dim=2)
     lfda.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-`Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis <http://www.ms.k.u-tokyo.ac.jp/2007/LFDA.pdf>`_ Masashi Sugiyama.
+    .. [1] `Dimensionality Reduction of Multimodal Labeled Data by Local
+       Fisher Discriminant Analysis <http://www.ms.k.u-tokyo.ac.jp/2007/LFDA
+       .pdf>`_ Masashi Sugiyama.
 
-`Local Fisher Discriminant Analysis on Beer Style Clustering <https://gastrograph.com/resources/whitepapers/local-fisher-discriminant-analysis-on-beer-style-clustering.html#>`_ Yuan Tang.
+    .. [2] `Local Fisher Discriminant Analysis on Beer Style Clustering
+       <https://gastrograph.com/resources/whitepapers/local-fisher
+       -discriminant-analysis-on-beer-style-clustering.html#>`_ Yuan Tang.
 
 
 MLKR
 ----
 
-:class:`MLKR` is an algorithm for supervised metric learning,
-which learns a distance function by directly minimising the leave-one-out
-regression error. This algorithm can also be viewed as a supervised variation
-of PCA and can be used for dimensionality reduction and high dimensional data
-visualization.
+Metric Learning for Kernel Regression.
+
+:class:`MLKR` is an algorithm for supervised metric learning, which learns a
+distance function by directly minimising the leave-one-out regression error.
+This algorithm can also be viewed as a supervised variation of PCA and can be
+used for dimensionality reduction and high dimensional data visualization.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -166,6 +176,8 @@ visualization.
     mlkr = MLKR()
     mlkr.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-`Information-theoretic Metric Learning <http://machinelearning.wustl.edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis, et al.
+    .. [1] `Information-theoretic Metric Learning <http://machinelearning.wustl
+       .edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis,
+       et al.
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index b4ee2970..7e20e236 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -12,26 +12,27 @@ particular form of input data.
 Input data
 ==========
 
-In the following paragraph we talk about tuples for sake of generality.
-These can be pairs, triplets, quadruplets etc, depending on what algorithm we
-use.
+In the following paragraph we talk about tuples for sake of generality. These
+can be pairs, triplets, quadruplets etc, depending on what algorithm we use.
 
 Basic form
 ----------
-Every weakly supervised algorithm will take as input tuples of points, and if needed labels for theses tuples.
+Every weakly supervised algorithm will take as input tuples of points, and if
+needed labels for theses tuples.
 
 
 The `tuples` argument is the first argument of every method (like the X
-argument for classical algorithms in scikit-learn). The second argument is
-the label of the tuple: what it is depends on the algorithm used. For
-instance for pairs learners ``y`` is a label indicating if the pair is of
-similar samples or dissimilar samples.
+argument for classical algorithms in scikit-learn). The second argument is the
+label of the tuple: what it is depends on the algorithm used. For instance for
+pairs learners ``y`` is a label indicating if the pair is of similar samples or
+dissimilar samples.
 
 Then one can fit a Weakly Supervised Metric Learner on this tuple, like this:
 
 >>> my_algo.fit(tuples, y)
 
-Like in a classical setting we split the points ``X`` between train and test, here we split the ``tuples`` between train and test.
+Like in a classical setting we split the points ``X`` between train and test,
+here we split the ``tuples`` between train and test.
 
 >>> from sklearn.model_selection import train_test_split
 >>> pairs_train, pairs_test, y_train, y_test = train_test_split(pairs, y)
@@ -48,7 +49,7 @@ The most intuitive way to represent tuples is to provide the algorithm with a
 in a tuple (2 for pairs, 3 for triplets for instance), and ``n_features`` is
 the number of features of each point.
 
-.. rubric:: Example:
+.. topic:: Example:
    Here is an artificial dataset of 4 pairs of 2 points of 3 features each:
 
 >>> import numpy as np
@@ -80,7 +81,7 @@ would be to keep the dataset of points ``X`` aside, and just represent tuples
 as a collection of tuples of *indices* from the points in ``X``. Since we loose
 the feature dimension there, the resulting array is 2D.
 
-.. rubric:: Example: An equivalent representation of the above pairs would be:
+.. topic:: Example: An equivalent representation of the above pairs would be:
 
 >>> X = np.array([[-0.12, -1.21, -0.20],
 >>>               [+0.05, -0.19, -0.05],
@@ -94,12 +95,12 @@ the feature dimension there, the resulting array is 2D.
 >>>                            [0, 2]])
 >>> y = np.array([-1, 1, 1, -1])
 
-In order to fit metric learning algorithms with this type of input, we need
-to give the original dataset of points ``X`` to the estimator so that it
-knows what point the indices refer to. We do this when initializing the
-estimator, through the argument `preprocessor`.
+In order to fit metric learning algorithms with this type of input, we need to
+give the original dataset of points ``X`` to the estimator so that it knows
+what point the indices refer to. We do this when initializing the estimator,
+through the argument `preprocessor`.
 
-.. rubric:: Example:
+.. topic:: Example:
 
 >>> from metric_learn import MMC
 >>> mmc = MMC(preprocessor=X)
@@ -123,7 +124,7 @@ model selection (grid-search, cross-validation etc). See the scoring section
 for more details on what scoring is used in the case of Weakly Supervised
 Metric Learning.
 
-.. rubric:: Example
+.. topic:: Example
 
 >>> from metric_learn import MMC
 >>> from sklearn.datasets import load_iris
@@ -139,17 +140,17 @@ Metric Learning.
 Scoring
 =======
 
-Some default scoring are implemented in metric-learn, depending on which
-kind of tuples you work on. See the docstring of the `score` method of the
-estimator you use.
+Some default scoring are implemented in metric-learn, depending on which kind
+of tuples you work on. See the docstring of the `score` method of the estimator
+you use.
 
 
 Algorithms
 ==================
 
 Note that each weakly-supervised algorithm has a supervised version of the form
-`*_Supervised` where similarity tuples are generated from
-the labels information and passed to the underlying algorithm.
+`*_Supervised` where similarity tuples are generated from the labels
+information and passed to the underlying algorithm.
 
 .. todo:: add more details on `_Supervised` classes
 
@@ -159,15 +160,14 @@ the labels information and passed to the underlying algorithm.
 Information Theoretic Metric Learning, Kulis et al., ICML 2007
 
 `ITML` minimizes the differential relative entropy between two multivariate
-Gaussians under constraints on the distance function,
-which can be formulated into a Bregman optimization problem by minimizing the
-LogDet divergence subject to linear constraints.
-This algorithm can handle a wide variety of constraints and can optionally
-incorporate a prior on the distance function.
-Unlike some other methods, ITML does not rely on an eigenvalue computation
-or semi-definite programming.
+Gaussians under constraints on the distance function, which can be formulated
+into a Bregman optimization problem by minimizing the LogDet divergence subject
+to linear constraints. This algorithm can handle a wide variety of constraints
+and can optionally incorporate a prior on the distance function. Unlike some
+other methods, ITML does not rely on an eigenvalue computation or semi-definite
+programming.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -181,10 +181,14 @@ or semi-definite programming.
     itml = ITML_Supervised(num_constraints=200)
     itml.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-`Information-theoretic Metric Learning <http://machinelearning.wustl.edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis, et al.
-Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/itml/
+    .. [1] `Information-theoretic Metric Learning <http://machinelearning.wustl
+       .edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis,
+       et al.
+
+    .. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/
+       itml/
 
 
 2. LSML
@@ -193,7 +197,7 @@ Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/itml/
 `LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
 Residual
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -207,14 +211,13 @@ Residual
     lsml = LSML_Supervised(num_constraints=200)
     lsml.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-Liu et al.
-"Metric Learning from Relative Comparisons by Minimizing Squared Residual".
-ICDM 2012.
+    .. [1] Liu et al.
+       "Metric Learning from Relative Comparisons by Minimizing Squared
+       Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
 
-Adapted from https://gist.github.com/kcarnold/5439917
-Paper: http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
+    .. [2] Adapted from https://gist.github.com/kcarnold/5439917
 
 
 3. SDML
@@ -223,7 +226,7 @@ Paper: http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
 `SDML`: An efficient sparse metric learning in high-dimensional space via
 L1-penalized log-determinant regularization
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -237,15 +240,15 @@ L1-penalized log-determinant regularization
     sdml = SDML_Supervised(num_constraints=200)
     sdml.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-Qi et al.
-An efficient sparse metric learning in high-dimensional space via
-L1-penalized log-determinant regularization.
-ICML 2009
+    .. [1] Qi et al.
+       An efficient sparse metric learning in high-dimensional space via
+       L1-penalized log-determinant regularization. ICML 2009.
+       http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/
+       icml09-guojun.pdf
 
-Adapted from https://gist.github.com/kcarnold/5439945
-Paper: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/icml09-guojun.pdf
+    .. [2] Adapted from https://gist.github.com/kcarnold/5439945
 
 
 4. RCA
@@ -253,14 +256,13 @@ Paper: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/ic
 
 Relative Components Analysis (RCA)
 
-`RCA` learns a full rank Mahalanobis distance metric based on a
-weighted sum of in-class covariance matrices.
-It applies a global linear transformation to assign large weights to
-relevant dimensions and low weights to irrelevant dimensions.
-Those relevant dimensions are estimated using "chunklets",
-subsets of points that are known to belong to the same class.
+`RCA` learns a full rank Mahalanobis distance metric based on a weighted sum of
+in-class covariance matrices. It applies a global linear transformation to
+assign large weights to relevant dimensions and low weights to irrelevant
+dimensions. Those relevant dimensions are estimated using "chunklets", subsets
+of points that are known to belong to the same class.
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -274,10 +276,15 @@ subsets of points that are known to belong to the same class.
     rca = RCA_Supervised(num_chunks=30, chunk_size=2)
     rca.fit(X, Y)
 
-.. rubric:: References:
-`Adjustment learning and relevant component analysis <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.2871&rep=rep1&type=pdf>`_ Noam Shental, et al.
-'Learning distance functions using equivalence relations', ICML 2003
-'Learning a Mahalanobis metric from equivalence constraints', JMLR 2005
+.. topic:: References:
+    .. [1] `Adjustment learning and relevant component analysis
+       <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.2871
+       &rep=rep1&type=pdf>`_ Noam Shental, et al.
+
+    .. [2] 'Learning distance functions using equivalence relations', ICML 2003
+
+    .. [3]'Learning a Mahalanobis metric from equivalence constraints', JMLR
+       2005
 
 5. MMC
 ------
@@ -295,9 +302,10 @@ implicit assumptions of MMC is that all classes form a compact set, i.e.,
 follow a unimodal distribution, which restricts the possible use-cases of this
 method. However, it is one of the earliest and a still often cited technique.
 
-Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz
+Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/
+code_Metric_online.tar.gz
 
-.. rubric:: Example Code:
+.. topic:: Example Code:
 
 ::
 
@@ -311,6 +319,9 @@ Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/co
     mmc = MMC_Supervised(num_constraints=200)
     mmc.fit(X, Y)
 
-.. rubric:: References:
+.. topic:: References:
 
-`Distance metric learning with application to clustering with side-information <http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
\ No newline at end of file
+  .. [1] `Distance metric learning with application to clustering with
+        side-information <http://papers.nips
+        .cc/paper/2164-distance-metric-learning-with-application-to-clustering
+        -with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
\ No newline at end of file

From 4eb84953a2bee24b76dff1da3e5b58028b6d2ca3 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Wed, 19 Dec 2018 18:43:25 +0100
Subject: [PATCH 11/32] Address
 https://github.com/metric-learn/metric-learn/pull/133#pullrequestreview-186611132

---
 doc/preprocessor.rst | 48 +++++++++++++++-----------------------------
 examples/README.txt  |  2 +-
 outline.md           | 45 -----------------------------------------
 3 files changed, 17 insertions(+), 78 deletions(-)
 delete mode 100644 outline.md

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index b1b891bb..6ca2040f 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -69,38 +69,21 @@ Example with a supervised metric learner:
 
 The callable should take as input an array-like, and return a 2D array-like.
 
->>> def find_images(arr):
->>>     X = np.array([[-0.7 , -0.23],
->>>                   [-0.43, -0.49],
->>>                   [ 0.14, -0.37]])  # array of 3 samples of 2 features
->>>     result = []
->>>     for img_path in arr:
->>>         result.append(X[int(img_path[3:5])])
->>>         # transforms 'img01.png' into X[1]
->>>     return np.array(result)
->>> images_paths = ['img01.png', 'img00.png', 'img02.png']
->>> y = np.array([1, 0, 1])
+>>> def find_images(file_paths):
+>>>    # each file contains a small image to use as an input datapoint
+>>>    return np.row_stack([imread(f).ravel() for f in file_paths])
 >>>
 >>> nca = NCA(preprocessor=find_images)
->>> nca.fit(images_paths, y)
+>>> nca.fit(['img01.png', 'img00.png', 'img02.png'], [1, 0, 1])
 >>> # under the hood preprocessor(indicators) will be called
 
 
 Example with a weakly supervised metric learner:
 
 The given callable should take as input an array-like, and return a
-2D array-like. It will be called on each column of the input tuples of
-indicators.
-
->>> def find_images(arr):
->>>     X = np.array([[-0.7 , -0.23],
->>>                   [-0.43, -0.49],
->>>                   [ 0.14, -0.37]])  # array of 3 samples of 2 features
->>>     result = []
->>>     for img_path in arr:
->>>         result.append(X[int(img_path[3:5])])
->>>         # transforms 'img01.png' into X[1]
->>>     return np.array(result)
+2D array-like, as before. It will be called on each column of the input
+tuples of indicators.
+
 >>> pairs_images_paths = [['img02.png', 'img00.png'],
 >>>                       ['img01.png', 'img00.png']]
 >>> y_pairs = np.array([1, -1])
@@ -113,18 +96,19 @@ indicators.
 
 .. note:: Note that when you fill the ``preprocessor`` option, it allows you
  to give more compact inputs, but the classical way of providing inputs
- stays valid (2D array-like for ``X`` for supervised learners and 3D
- array-like of tuples for weakly supervised learners).
+ stays valid (2D array-like for supervised learners and 3D array-like of
+ tuples for weakly supervised learners). If a classical input
+ is provided, the metric learner will not use the preprocessor.
 
- Example: This would work:
+ Example: This will work:
 
  >>> from metric_learn import MMC
- >>> X = np.array([[-0.7 , -0.23],
- >>>               [-0.43, -0.49],
- >>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+ >>> def preprocessor_wip(array):
+ >>>    return NotImplementedError("This preprocessor does nothing yet.")
+ >>>
  >>> pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
  >>>                   [[-0.43, -0.49], [-0.7 , -0.23]]])
  >>> y_pairs = np.array([1, -1])
  >>>
- >>> mmc = MMC(preprocessor=X)
- >>> mmc.fit(pairs, y_pairs)
+ >>> mmc = MMC(preprocessor=preprocessor_wip)
+ >>> mmc.fit(pairs, y_pairs)  # preprocessor_wip will not be called here
diff --git a/examples/README.txt b/examples/README.txt
index 9497791a..10dbe0d5 100644
--- a/examples/README.txt
+++ b/examples/README.txt
@@ -1,4 +1,4 @@
 Examples
 ========
 
-Below is a gallery of example of metric-learn use cases.
\ No newline at end of file
+Below is a gallery of example metric-learn use cases.
\ No newline at end of file
diff --git a/outline.md b/outline.md
deleted file mode 100644
index afa372cc..00000000
--- a/outline.md
+++ /dev/null
@@ -1,45 +0,0 @@
-documentation outline:
-
-
-- Getting started/Quick Start: 
-	
-	- Explanation of what metric learning is, and what is the purpose of this package
-	- installation
-	- a very quick example on how to import an algo (supervised or not ?) and how to do fit (and predict ?) (and split train and test) on some custom dataset (maybe sklearn.datasets.load_lfw_pairs ?)
-
-- User Guide/List of algorithms: 
-
-	- Supervised Metric Learning: (add links to examples/images from examples at the right place in the description)
-		- Problem setting
-		- Input data (+ see Preprocessor section)
-		- What you can do after fit (transform...)
-		- Scikit-learn compatibility (compatible with grid search + link to example of grid search)
-		- List of algorithms + a more detailed description of each of them than
-		 the one in the docstring
-
-	- Weakly Supervised Metric Learning: (add links to examples/images from examples at the right place in the description)
-		- Problem setting
-		- Input data (+ See Preprocessor section)
-		- What you can do after fit (predict/score, tranform...)
-		- Scikit-learn compatibility (compatible with grid search + link to example of grid search)
-		(more detailed than for supervised because more complicated)
-		- List of algorithms + a more detailed description of each of them than
-		 the one in the docstring
-		
-	- Somewhere: some section explaining Mahalanobis Metric Learning 
-	(properties of the learned matrix etc)
-
-	- Usage of the Preprocessor:
-		- Purpose (performance)
-		- Use (as an argument "preprocessor" in every metric learner)
-
-
-- Examples/Tutorials: 
-	- One example with faces (prediction if same/different person) 
-	- One example of grid search to compare different algorithms (mmc, itml etc)
-	- Clustering with side information
-	- Instance retrieval
-	- Dimensionality reduction
-
-- API:
-	- doc automatically generated by docstrings
\ No newline at end of file

From 3db2653aec9ae33849dc6c971242a40fd4b9480f Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 11:09:27 +0100
Subject: [PATCH 12/32] raise instead of return

---
 doc/preprocessor.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 6ca2040f..76f12a43 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -104,7 +104,7 @@ tuples of indicators.
 
  >>> from metric_learn import MMC
  >>> def preprocessor_wip(array):
- >>>    return NotImplementedError("This preprocessor does nothing yet.")
+ >>>    raise NotImplementedError("This preprocessor does nothing yet.")
  >>>
  >>> pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
  >>>                   [[-0.43, -0.49], [-0.7 , -0.23]]])

From 3891b93869a77fcc13581eb1b8116174b1541a1e Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 12:05:48 +0100
Subject: [PATCH 13/32] Fix quickstart example

---
 doc/getting_started.rst | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/doc/getting_started.rst b/doc/getting_started.rst
index 30a645de..040adedc 100644
--- a/doc/getting_started.rst
+++ b/doc/getting_started.rst
@@ -29,10 +29,14 @@ more complete.
 Quick start
 ===========
 
+This example loads the iris dataset, and evaluates a k-nearest neighbors
+algorithm on an embedding space learned with `NCA`.
+
 >>> from metric_learn import NCA
 >>> from sklearn.datasets import load_iris
 >>> from sklearn.model_selection import cross_val_score
+>>> from sklearn.pipeline import make_pipeline
 >>>
 >>> X, y = load_iris(return_X_y=True)
->>> nca = NCA(n_components=2)
->>> cross_val_score(nca, X, y)
\ No newline at end of file
+>>> clf = make_pipeline(NCA(), KNeighborsClassifier())
+>>> cross_val_score(clf, X, y)

From 7dcfb547e458c3d36cfea03ee76f8b1474363157 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 12:09:38 +0100
Subject: [PATCH 14/32] Emphasize scikit-learn compatibility

---
 doc/introduction.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 67a83251..74198613 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -10,7 +10,9 @@ Distance metric learning (or simply, metric learning) is the sub-field of
 machine learning dedicated to automatically constructing optimal distance
 metrics.
 
-This package contains efficient Python implementations of several popular
+This package contains a efficient Python implementations of several popular
+metric learning algorithms, compatible with scikit-learn. This allows to use
+all the scikit-learn routines for pipelining and model selection for
 metric learning algorithms.
 
 

From 1b83569ef28db580435d479abe59812e96120e64 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 13:24:42 +0100
Subject: [PATCH 15/32] Update introduction with new methods

---
 doc/introduction.rst | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 74198613..3e057e6a 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -16,15 +16,16 @@ all the scikit-learn routines for pipelining and model selection for
 metric learning algorithms.
 
 
-Each metric learning algorithm supports the following methods:
+Currently, each metric learning algorithm supports the following methods:
 
 -  ``fit(...)``, which learns the model.
--  ``transformer()``, which returns a transformation matrix
+-  ``metric()``, which returns a Mahalanobis matrix
+   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
+   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
+-  ``transformer_from_metric(metric)``, which returns a transformation matrix
    :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
    data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
    :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
    in which standard Euclidean distances may be used.
 -  ``transform(X)``, which applies the aforementioned transformation.
--  ``metric()``, which returns a Mahalanobis matrix
-   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
-   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
\ No newline at end of file
+- ``score_pairs`` which returns the similarity of pairs of points.
\ No newline at end of file

From 70f16a9bee7da0503b0bd899f59afa5be627e2f5 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 13:30:58 +0100
Subject: [PATCH 16/32] address
 https://github.com/metric-learn/metric-learn/pull/133#discussion_r243007352

---
 doc/preprocessor.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 76f12a43..ef43d391 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -13,9 +13,9 @@ Two types of objects can be put in this argument:
 Array-like
 ----------
 You can specify ``preprocessor=X`` where ``X`` is an array-like containing the
-dataset of points. In this case, the estimator will be able to take as
-inputs an array-like of indices, replacing under the hood each index by the
-corresponding sample.
+dataset of points. In this case, the fit/predict/score/etc... methods of the
+estimator will be able to take as inputs an array-like of indices, replacing
+under the hood each index by the corresponding sample.
 
 
 Example with a supervised metric learner:

From ed0a00ea105d865e93425f8f92b0d3d84243e38f Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 13:37:21 +0100
Subject: [PATCH 17/32] explain what happens when preprocessor=None

---
 doc/preprocessor.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index ef43d391..fb35d3c5 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -8,7 +8,11 @@ Estimators in metric-learn all have a ``preprocessor`` option at instantiation.
 Filling this argument allows them to take more compact input representation
 when fitting, predicting etc...
 
-Two types of objects can be put in this argument:
+If ``preprocessor=None``, no preprocessor will be used and the user must
+provide the classical representation to the fit/predict/score/etc... methods of
+the estimators (see the documentation of the particular estimator to know what
+type of input it accepts). Otherwise, two types of objects can be put in this
+argument:
 
 Array-like
 ----------

From 868d42bf228a1f607ff568d16cb5733501dde8c9 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 13:46:39 +0100
Subject: [PATCH 18/32] Precisions in doc about the input accepted by the
 preprocessor

---
 doc/preprocessor.rst | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index fb35d3c5..01d64810 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -58,21 +58,18 @@ Example with a weakly supervised metric learner:
 
 Callable
 --------
-Instead, you can provide a callable in the argument ``preprocessor``.
-Then the estimator will accept indicators of points instead of points.
-Under the hood, the estimator will call this callable on the indicators you
-provide as input when fitting, predicting etc...
-Using a callable can be really useful to represent lazily a dataset of
-images stored on the file system for instance.
-The callable should take as an input an array-like, and return a 2D
-array-like. For supervised learners it will be applied on the whole array of
-indicators at once, and for weakly supervised learners it will be applied
-on each column of the array of tuples.
+Instead, you can provide a callable in the argument ``preprocessor``. Then the
+estimator will accept indicators of points instead of points. Under the hood,
+the estimator will call this callable on the indicators you provide as input
+when fitting, predicting etc... Using a callable can be really useful to
+represent lazily a dataset of images stored on the file system for instance.
+The callable should take as an input a 1D array-like, and return a 2D
+array-like. For supervised learners it will be applied on the whole 1D array of
+indicators at once, and for weakly supervised learners it will be applied on
+each column of the 2D array of tuples.
 
 Example with a supervised metric learner:
 
-The callable should take as input an array-like, and return a 2D array-like.
-
 >>> def find_images(file_paths):
 >>>    # each file contains a small image to use as an input datapoint
 >>>    return np.row_stack([imread(f).ravel() for f in file_paths])
@@ -84,10 +81,6 @@ The callable should take as input an array-like, and return a 2D array-like.
 
 Example with a weakly supervised metric learner:
 
-The given callable should take as input an array-like, and return a
-2D array-like, as before. It will be called on each column of the input
-tuples of indicators.
-
 >>> pairs_images_paths = [['img02.png', 'img00.png'],
 >>>                       ['img01.png', 'img00.png']]
 >>> y_pairs = np.array([1, -1])

From 1fe33576d9b7fcb08ba8cd94120b6a3d9e3991db Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 13:48:29 +0100
Subject: [PATCH 19/32] address
 https://github.com/metric-learn/metric-learn/pull/133#discussion_r243008557

---
 doc/preprocessor.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 01d64810..7efcf588 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -58,7 +58,7 @@ Example with a weakly supervised metric learner:
 
 Callable
 --------
-Instead, you can provide a callable in the argument ``preprocessor``. Then the
+Alternatively, you can provide a callable as ``preprocessor``. Then the
 estimator will accept indicators of points instead of points. Under the hood,
 the estimator will call this callable on the indicators you provide as input
 when fitting, predicting etc... Using a callable can be really useful to

From ea487b7392d589ada40533e5bee883c32566e67b Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:06:57 +0100
Subject: [PATCH 20/32] Better formulation of sentences

---
 doc/preprocessor.rst      |  2 +-
 doc/weakly_supervised.rst | 19 ++++++++++---------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 7efcf588..2586abfa 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -10,7 +10,7 @@ when fitting, predicting etc...
 
 If ``preprocessor=None``, no preprocessor will be used and the user must
 provide the classical representation to the fit/predict/score/etc... methods of
-the estimators (see the documentation of the particular estimator to know what
+the estimators (see the documentation of the particular estimator to know the
 type of input it accepts). Otherwise, two types of objects can be put in this
 argument:
 
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 7e20e236..c13a2d24 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -13,7 +13,8 @@ Input data
 ==========
 
 In the following paragraph we talk about tuples for sake of generality. These
-can be pairs, triplets, quadruplets etc, depending on what algorithm we use.
+can be pairs, triplets, quadruplets etc, depending on the particular metric
+learning algorithm we use.
 
 Basic form
 ----------
@@ -23,9 +24,9 @@ needed labels for theses tuples.
 
 The `tuples` argument is the first argument of every method (like the X
 argument for classical algorithms in scikit-learn). The second argument is the
-label of the tuple: what it is depends on the algorithm used. For instance for
-pairs learners ``y`` is a label indicating if the pair is of similar samples or
-dissimilar samples.
+label of the tuple: its semantic depends on the algorithm used. For instance
+for pairs learners ``y`` is a label indicating whether the pair is of similar
+samples or dissimilar samples.
 
 Then one can fit a Weakly Supervised Metric Learner on this tuple, like this:
 
@@ -97,7 +98,7 @@ the feature dimension there, the resulting array is 2D.
 
 In order to fit metric learning algorithms with this type of input, we need to
 give the original dataset of points ``X`` to the estimator so that it knows
-what point the indices refer to. We do this when initializing the estimator,
+the points the indices refer to. We do this when initializing the estimator,
 through the argument `preprocessor`.
 
 .. topic:: Example:
@@ -121,7 +122,7 @@ Scikit-learn compatibility
 
 Weakly supervised estimators are compatible with scikit-learn routines for
 model selection (grid-search, cross-validation etc). See the scoring section
-for more details on what scoring is used in the case of Weakly Supervised
+for more details on the scoring used in the case of Weakly Supervised
 Metric Learning.
 
 .. topic:: Example
@@ -140,9 +141,9 @@ Metric Learning.
 Scoring
 =======
 
-Some default scoring are implemented in metric-learn, depending on which kind
-of tuples you work on. See the docstring of the `score` method of the estimator
-you use.
+Some default scoring are implemented in metric-learn, depending on the kind of
+tuples you're working with (pairs, triplets...). See the docstring of the
+`score` method of the estimator you use.
 
 
 Algorithms

From 16ba60acadbf35ce57c756dcd1046568d67398e3 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:09:50 +0100
Subject: [PATCH 21/32] change title formatting in index

---
 doc/index.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/index.rst b/doc/index.rst
index baedb26d..9dbcd9b0 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -3,7 +3,7 @@ metric-learn: Metric Learning in Python
 |License| |PyPI version|
 
 Welcome to metric-learn's documentation !
-=========================================
+-----------------------------------------
 
 .. toctree::
    :maxdepth: 2

From 95f07023b66e18edbf4fbb325d4374d0f2ff2b03 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:16:32 +0100
Subject: [PATCH 22/32] Fix references and some numering issue

---
 doc/supervised.rst        |  6 +++---
 doc/weakly_supervised.rst | 22 +++++++++++-----------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/doc/supervised.rst b/doc/supervised.rst
index cde5dc60..aa6046aa 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -178,6 +178,6 @@ used for dimensionality reduction and high dimensional data visualization.
 
 .. topic:: References:
 
-    .. [1] `Information-theoretic Metric Learning <http://machinelearning.wustl
-       .edu/mlpapers/paper_files/icml2007_DavisKJSD07.pdf>`_ Jason V. Davis,
-       et al.
+    .. [1] `Metric Learning for Kernel Regression <http://proceedings.mlr.
+       press/v2/weinberger07a/weinberger07a.pdf>`_ Kilian Q. Weinberger,
+       Gerald Tesauro
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index c13a2d24..15a23d9d 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -155,10 +155,10 @@ information and passed to the underlying algorithm.
 
 .. todo:: add more details on `_Supervised` classes
 
-1. ITML
--------
+ITML
+----
 
-Information Theoretic Metric Learning, Kulis et al., ICML 2007
+Information Theoretic Metric Learning, Davis et al., ICML 2007
 
 `ITML` minimizes the differential relative entropy between two multivariate
 Gaussians under constraints on the distance function, which can be formulated
@@ -192,8 +192,8 @@ programming.
        itml/
 
 
-2. LSML
--------
+LSML
+----
 
 `LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
 Residual
@@ -221,8 +221,8 @@ Residual
     .. [2] Adapted from https://gist.github.com/kcarnold/5439917
 
 
-3. SDML
--------
+SDML
+----
 
 `SDML`: An efficient sparse metric learning in high-dimensional space via
 L1-penalized log-determinant regularization
@@ -252,8 +252,8 @@ L1-penalized log-determinant regularization
     .. [2] Adapted from https://gist.github.com/kcarnold/5439945
 
 
-4. RCA
-------
+RCA
+---
 
 Relative Components Analysis (RCA)
 
@@ -287,8 +287,8 @@ of points that are known to belong to the same class.
     .. [3]'Learning a Mahalanobis metric from equivalence constraints', JMLR
        2005
 
-5. MMC
-------
+MMC
+---
 
 Mahalanobis Metric Learning with Application for Clustering with
 Side-Information, Xing et al., NIPS 2002

From 6cb328f190f5bd3863ef9610c6fb66e186fa8104 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:25:25 +0100
Subject: [PATCH 23/32] Reformat link to preprocessor

---
 doc/preprocessor.rst      | 2 +-
 doc/weakly_supervised.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
index 2586abfa..ad1ffd8f 100644
--- a/doc/preprocessor.rst
+++ b/doc/preprocessor.rst
@@ -1,4 +1,4 @@
-:ref:`preprocessor`
+.. _preprocessor_section:
 
 ============
 Preprocessor
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 15a23d9d..1daf441d 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -114,7 +114,7 @@ through the argument `preprocessor`.
    ``preprocessor``, which will go fetch and form the tuples. This allows to
    give more general indicators than just indices from an array (for instance
    paths in the filesystem, name of records in a database etc...) See section
-   :ref:`preprocessor` for more details on how to use the preprocessor.
+   :ref:`preprocessor_section` for more details on how to use the preprocessor.
 
 
 Scikit-learn compatibility

From ff4d30e220566e6ccf5fad2b2a8f2407f9d0f7ba Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:29:19 +0100
Subject: [PATCH 24/32] Fix automatic link to algorithms for the supervised
 section

---
 doc/supervised.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/supervised.rst b/doc/supervised.rst
index aa6046aa..1730cde0 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -22,7 +22,7 @@ Covariance
 
 .. todo:: Covariance is unsupervised, so its doc should not be here.
 
-:class:`Covariance` does not "learn" anything, rather it calculates
+`Covariance` does not "learn" anything, rather it calculates
 the covariance matrix of the input data. This is a simple baseline method.
 
 .. topic:: Example Code:
@@ -46,7 +46,7 @@ LMNN
 
 Large-margin nearest neighbor metric learning.
 
-:class:`LMNN` learns a Mahanalobis distance metric in the kNN classification
+`LMNN` learns a Mahanalobis distance metric in the kNN classification
 setting using semidefinite programming. The learned metric attempts to keep
 k-nearest neighbors in the same class, while keeping examples from different
 classes separated by a large margin. This algorithm makes no assumptions about
@@ -83,7 +83,7 @@ The two implementations differ slightly, and the C++ version is more complete.
 NCA
 ---
 
-Neighborhood Components Analysis (:class:`NCA`) is a distance metric learning
+Neighborhood Components Analysis (`NCA`) is a distance metric learning
 algorithm which aims to improve the accuracy of nearest neighbors
 classification compared to the standard Euclidean distance. The algorithm
 directly  maximizes  a stochastic  variant  of  the leave-one-out k-nearest
@@ -121,7 +121,7 @@ LFDA
 
 Local Fisher Discriminant Analysis (LFDA)
 
-:class:`LFDA` is a linear supervised dimensionality reduction method. It is
+`LFDA` is a linear supervised dimensionality reduction method. It is
 particularly useful when dealing with multimodality, where one ore more classes
 consist of separate clusters in input space. The core optimization problem of
 LFDA is solved as a generalized eigenvalue problem.
@@ -157,7 +157,7 @@ MLKR
 
 Metric Learning for Kernel Regression.
 
-:class:`MLKR` is an algorithm for supervised metric learning, which learns a
+`MLKR` is an algorithm for supervised metric learning, which learns a
 distance function by directly minimising the leave-one-out regression error.
 This algorithm can also be viewed as a supervised variation of PCA and can be
 used for dimensionality reduction and high dimensional data visualization.

From 37cd11c55fbc3b52ca15d630cf695d1623023f0a Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 14:58:11 +0100
Subject: [PATCH 25/32] Reformatting and adding examples about supervised
 version in the end of the weakly supervised version

---
 doc/weakly_supervised.rst | 40 +++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 1daf441d..174b2e47 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -149,12 +149,6 @@ tuples you're working with (pairs, triplets...). See the docstring of the
 Algorithms
 ==================
 
-Note that each weakly-supervised algorithm has a supervised version of the form
-`*_Supervised` where similarity tuples are generated from the labels
-information and passed to the underlying algorithm.
-
-.. todo:: add more details on `_Supervised` classes
-
 ITML
 ----
 
@@ -278,6 +272,7 @@ of points that are known to belong to the same class.
     rca.fit(X, Y)
 
 .. topic:: References:
+
     .. [1] `Adjustment learning and relevant component analysis
        <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.2871
        &rep=rep1&type=pdf>`_ Noam Shental, et al.
@@ -303,9 +298,6 @@ implicit assumptions of MMC is that all classes form a compact set, i.e.,
 follow a unimodal distribution, which restricts the possible use-cases of this
 method. However, it is one of the earliest and a still often cited technique.
 
-Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/
-code_Metric_online.tar.gz
-
 .. topic:: Example Code:
 
 ::
@@ -325,4 +317,32 @@ code_Metric_online.tar.gz
   .. [1] `Distance metric learning with application to clustering with
         side-information <http://papers.nips
         .cc/paper/2164-distance-metric-learning-with-application-to-clustering
-        -with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
\ No newline at end of file
+        -with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
+  .. [2] Adapted from Matlab code `here <http://www.cs.cmu
+     .edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_.
+
+
+_Supervised version
+--------------------
+
+Note that each weakly-supervised algorithm has a supervised version of the form
+`*_Supervised` where similarity tuples are generated from the labels
+information and passed to the underlying algorithm.
+
+.. todo:: add more details about that (see issue https://github
+          .com/metric-learn/metric-learn/issues/135)
+
+
+.. topic:: Example Code:
+
+::
+
+    from metric_learn import MMC_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    mmc = MMC_Supervised(num_constraints=200)
+    mmc.fit(X, Y)
\ No newline at end of file

From 6eee8629f1cb678df3f3563fe4f42edcac2ae9ee Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 15:12:16 +0100
Subject: [PATCH 26/32] add precisions in the intro

---
 doc/introduction.rst | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 3e057e6a..6ed962d6 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -7,8 +7,11 @@ Traditionally, practitioners would choose a standard distance metric
 (Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
 the domain.
 Distance metric learning (or simply, metric learning) is the sub-field of
-machine learning dedicated to automatically constructing optimal distance
-metrics.
+machine learning dedicated to automatically construct task-specific distance
+metrics from (weakly) supervised data.
+The learned distance metric often corresponds to a Euclidean distance in a new
+embedding space, hence distance metric learning can be seen as a form of
+representation learning.
 
 This package contains a efficient Python implementations of several popular
 metric learning algorithms, compatible with scikit-learn. This allows to use

From bee4a8c234849889fb2d61d5addc8c807b219fe5 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 15:19:37 +0100
Subject: [PATCH 27/32] add precisions for score_pairs in the intro

---
 doc/introduction.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 6ed962d6..079b82d0 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -31,4 +31,8 @@ Currently, each metric learning algorithm supports the following methods:
    :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
    in which standard Euclidean distances may be used.
 -  ``transform(X)``, which applies the aforementioned transformation.
-- ``score_pairs`` which returns the similarity of pairs of points.
\ No newline at end of file
+- ``score_pairs(pairs)`` which returns the distance between pairs of
+  points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
+  2, n_features)``, or it can be a 2D array-like of pairs indicators of
+  shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
+  details).

From d49ba6807076b0d59d101af397a2ff6e54d12f4c Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 16:53:33 +0100
Subject: [PATCH 28/32] Change examples for weakly supervised section

---
 doc/weakly_supervised.rst | 90 ++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 35 deletions(-)

diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 174b2e47..b1f384b7 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -166,15 +166,19 @@ programming.
 
 ::
 
-    from metric_learn import ITML_Supervised
-    from sklearn.datasets import load_iris
+    from metric_learn import ITML
 
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
+    pairs = [[[1.2, 5.5], [1.3, 4.5]],
+             [[6.4, 4.6], [6.2, 3.7]],
+             [[1.3, 4.5], [6.4, 4.6]],
+             [[1.2, 5.5], [6.2, 5.4]]]
 
-    itml = ITML_Supervised(num_constraints=200)
-    itml.fit(X, Y)
+    y = [1, 1, -1, -1]
+    # we want to make closer points where the first feature is close, and
+    # further if the second feature is close
+
+    itml = ITML()
+    itml.fit(pairs, y)
 
 .. topic:: References:
 
@@ -196,15 +200,18 @@ Residual
 
 ::
 
-    from metric_learn import LSML_Supervised
-    from sklearn.datasets import load_iris
+    from metric_learn import LSML
 
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
+    quadruplets = [[[1.2, 5.5], [1.3, 4.5], [6.4, 4.6], [6.2, 3.7]],
+                   [[1.3, 4.5], [6.4, 4.6], [1.2, 5.5], [6.2, 5.4]],
+                   [[3.2, 5.5], [3.3, 4.5], [8.4, 4.6], [8.2, 3.7]],
+                   [[3.3, 4.5], [8.4, 4.6], [3.2, 5.5], [8.2, 5.4]]]
 
-    lsml = LSML_Supervised(num_constraints=200)
-    lsml.fit(X, Y)
+    # we want to make closer points where the first feature is close, and
+    # further if the second feature is close
+
+    lsml = LSML()
+    lsml.fit(quadruplets)
 
 .. topic:: References:
 
@@ -225,15 +232,19 @@ L1-penalized log-determinant regularization
 
 ::
 
-    from metric_learn import SDML_Supervised
-    from sklearn.datasets import load_iris
+    from metric_learn import SDML
 
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
+    pairs = [[[1.2, 5.5], [1.3, 4.5]],
+             [[6.4, 4.6], [6.2, 3.7]],
+             [[1.3, 4.5], [6.4, 4.6]],
+             [[1.2, 5.5], [6.2, 5.4]]]
 
-    sdml = SDML_Supervised(num_constraints=200)
-    sdml.fit(X, Y)
+    y = [1, 1, -1, -1]
+    # we want to make closer points where the first feature is close, and
+    # further if the second feature is close
+
+    sdml = SDML()
+    sdml.fit(pairs, y)
 
 .. topic:: References:
 
@@ -261,15 +272,20 @@ of points that are known to belong to the same class.
 
 ::
 
-    from metric_learn import RCA_Supervised
-    from sklearn.datasets import load_iris
+    from metric_learn import RCA
 
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
+    pairs = [[[1.2, 5.5], [1.3, 4.5]],
+             [[6.4, 4.6], [6.2, 3.7]],
+             [[1.3, 4.5], [6.4, 4.6]],
+             [[1.2, 5.5], [6.2, 5.4]]]
+
+    y = [1, 1, -1, -1]
+    # we want to make closer points where the first feature is close, and
+    # further if the second feature is close
+
+    rca = RCA()
+    rca.fit(pairs, y)
 
-    rca = RCA_Supervised(num_chunks=30, chunk_size=2)
-    rca.fit(X, Y)
 
 .. topic:: References:
 
@@ -302,15 +318,19 @@ method. However, it is one of the earliest and a still often cited technique.
 
 ::
 
-    from metric_learn import MMC_Supervised
-    from sklearn.datasets import load_iris
+    from metric_learn import MMC
 
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
+    pairs = [[[1.2, 5.5], [1.3, 4.5]],
+             [[6.4, 4.6], [6.2, 3.7]],
+             [[1.3, 4.5], [6.4, 4.6]],
+             [[1.2, 5.5], [6.2, 5.4]]]
 
-    mmc = MMC_Supervised(num_constraints=200)
-    mmc.fit(X, Y)
+    y = [1, 1, -1, -1]
+    # we want to make closer points where the first feature is close, and
+    # further if the second feature is close
+
+    mmc = MMC()
+    mmc.fit(pairs, y)
 
 .. topic:: References:
 

From 202e3fe14fa9667785579f05d8f139686dc05ab7 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 17:26:38 +0100
Subject: [PATCH 29/32] add _Supervised section in Supervised section

---
 doc/supervised.rst        | 26 ++++++++++++++++++++++++++
 doc/weakly_supervised.rst | 28 ++--------------------------
 2 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/doc/supervised.rst b/doc/supervised.rst
index 1730cde0..32dba84b 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -181,3 +181,29 @@ used for dimensionality reduction and high dimensional data visualization.
     .. [1] `Metric Learning for Kernel Regression <http://proceedings.mlr.
        press/v2/weinberger07a/weinberger07a.pdf>`_ Kilian Q. Weinberger,
        Gerald Tesauro
+
+
+Supervised versions of weakly-supervised algorithms
+---------------------------------------------------
+
+Note that each :ref:`weakly-supervised algorithm <weakly_supervised_section>`
+has a supervised version of the form `*_Supervised` where similarity tuples are
+generated from the labels information and passed to the underlying algorithm.
+
+.. todo:: add more details about that (see issue `<https://github
+          .com/metric-learn/metric-learn/issues/135>`_)
+
+
+.. topic:: Example Code:
+
+::
+
+    from metric_learn import MMC_Supervised
+    from sklearn.datasets import load_iris
+
+    iris_data = load_iris()
+    X = iris_data['data']
+    Y = iris_data['target']
+
+    mmc = MMC_Supervised(num_constraints=200)
+    mmc.fit(X, Y)
diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index b1f384b7..1e80068c 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -1,3 +1,5 @@
+.. _weakly_supervised_section:
+
 =================================
 Weakly Supervised Metric Learning
 =================================
@@ -340,29 +342,3 @@ method. However, it is one of the earliest and a still often cited technique.
         -with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
   .. [2] Adapted from Matlab code `here <http://www.cs.cmu
      .edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_.
-
-
-_Supervised version
---------------------
-
-Note that each weakly-supervised algorithm has a supervised version of the form
-`*_Supervised` where similarity tuples are generated from the labels
-information and passed to the underlying algorithm.
-
-.. todo:: add more details about that (see issue https://github
-          .com/metric-learn/metric-learn/issues/135)
-
-
-.. topic:: Example Code:
-
-::
-
-    from metric_learn import MMC_Supervised
-    from sklearn.datasets import load_iris
-
-    iris_data = load_iris()
-    X = iris_data['data']
-    Y = iris_data['target']
-
-    mmc = MMC_Supervised(num_constraints=200)
-    mmc.fit(X, Y)
\ No newline at end of file

From c1075843f2322c9ebff8c6857eee716e37591a3d Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 17:35:03 +0100
Subject: [PATCH 30/32] change examples in weakly supervised section

---
 doc/weakly_supervised.rst | 65 ++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst
index 1e80068c..deae9b40 100644
--- a/doc/weakly_supervised.rst
+++ b/doc/weakly_supervised.rst
@@ -170,14 +170,15 @@ programming.
 
     from metric_learn import ITML
 
-    pairs = [[[1.2, 5.5], [1.3, 4.5]],
-             [[6.4, 4.6], [6.2, 3.7]],
-             [[1.3, 4.5], [6.4, 4.6]],
-             [[1.2, 5.5], [6.2, 5.4]]]
-
+    pairs = [[[1.2, 7.5], [1.3, 1.5]],
+             [[6.4, 2.6], [6.2, 9.7]],
+             [[1.3, 4.5], [3.2, 4.6]],
+             [[6.2, 5.5], [5.4, 5.4]]]
     y = [1, 1, -1, -1]
-    # we want to make closer points where the first feature is close, and
-    # further if the second feature is close
+
+    # in this task we want points where the first feature is close to be closer
+    # to each other, no matter how close the second feature is
+
 
     itml = ITML()
     itml.fit(pairs, y)
@@ -204,10 +205,10 @@ Residual
 
     from metric_learn import LSML
 
-    quadruplets = [[[1.2, 5.5], [1.3, 4.5], [6.4, 4.6], [6.2, 3.7]],
-                   [[1.3, 4.5], [6.4, 4.6], [1.2, 5.5], [6.2, 5.4]],
-                   [[3.2, 5.5], [3.3, 4.5], [8.4, 4.6], [8.2, 3.7]],
-                   [[3.3, 4.5], [8.4, 4.6], [3.2, 5.5], [8.2, 5.4]]]
+    quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
+                   [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
+                   [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
+                   [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
 
     # we want to make closer points where the first feature is close, and
     # further if the second feature is close
@@ -236,14 +237,14 @@ L1-penalized log-determinant regularization
 
     from metric_learn import SDML
 
-    pairs = [[[1.2, 5.5], [1.3, 4.5]],
-             [[6.4, 4.6], [6.2, 3.7]],
-             [[1.3, 4.5], [6.4, 4.6]],
-             [[1.2, 5.5], [6.2, 5.4]]]
-
+    pairs = [[[1.2, 7.5], [1.3, 1.5]],
+             [[6.4, 2.6], [6.2, 9.7]],
+             [[1.3, 4.5], [3.2, 4.6]],
+             [[6.2, 5.5], [5.4, 5.4]]]
     y = [1, 1, -1, -1]
-    # we want to make closer points where the first feature is close, and
-    # further if the second feature is close
+
+    # in this task we want points where the first feature is close to be closer
+    # to each other, no matter how close the second feature is
 
     sdml = SDML()
     sdml.fit(pairs, y)
@@ -276,14 +277,14 @@ of points that are known to belong to the same class.
 
     from metric_learn import RCA
 
-    pairs = [[[1.2, 5.5], [1.3, 4.5]],
-             [[6.4, 4.6], [6.2, 3.7]],
-             [[1.3, 4.5], [6.4, 4.6]],
-             [[1.2, 5.5], [6.2, 5.4]]]
-
+    pairs = [[[1.2, 7.5], [1.3, 1.5]],
+             [[6.4, 2.6], [6.2, 9.7]],
+             [[1.3, 4.5], [3.2, 4.6]],
+             [[6.2, 5.5], [5.4, 5.4]]]
     y = [1, 1, -1, -1]
-    # we want to make closer points where the first feature is close, and
-    # further if the second feature is close
+
+    # in this task we want points where the first feature is close to be closer
+    # to each other, no matter how close the second feature is
 
     rca = RCA()
     rca.fit(pairs, y)
@@ -322,14 +323,14 @@ method. However, it is one of the earliest and a still often cited technique.
 
     from metric_learn import MMC
 
-    pairs = [[[1.2, 5.5], [1.3, 4.5]],
-             [[6.4, 4.6], [6.2, 3.7]],
-             [[1.3, 4.5], [6.4, 4.6]],
-             [[1.2, 5.5], [6.2, 5.4]]]
-
+    pairs = [[[1.2, 7.5], [1.3, 1.5]],
+             [[6.4, 2.6], [6.2, 9.7]],
+             [[1.3, 4.5], [3.2, 4.6]],
+             [[6.2, 5.5], [5.4, 5.4]]]
     y = [1, 1, -1, -1]
-    # we want to make closer points where the first feature is close, and
-    # further if the second feature is close
+
+    # in this task we want points where the first feature is close to be closer
+    # to each other, no matter how close the second feature is
 
     mmc = MMC()
     mmc.fit(pairs, y)

From 9de2e9c9a8984e35504002abf089021e0f3f1aa8 Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 17:47:44 +0100
Subject: [PATCH 31/32] fix empty module contents

---
 doc/metric_learn.rst | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/doc/metric_learn.rst b/doc/metric_learn.rst
index 70a99a04..c2472408 100644
--- a/doc/metric_learn.rst
+++ b/doc/metric_learn.rst
@@ -1,8 +1,8 @@
 metric_learn package
 ====================
 
-Submodules
-----------
+Module Contents
+---------------
 
 .. toctree::
 
@@ -16,11 +16,3 @@ Submodules
    metric_learn.nca
    metric_learn.rca
    metric_learn.sdml
-
-Module contents
----------------
-
-.. automodule:: metric_learn
-    :members:
-    :undoc-members:
-    :show-inheritance:

From 13711225949358ac35eeea157679847d333f3dcd Mon Sep 17 00:00:00 2001
From: William de Vazelhes <william.de-vazelhes@inria.fr>
Date: Thu, 20 Dec 2018 18:22:49 +0100
Subject: [PATCH 32/32] rename sandwich.py into plot_sandwich.py to be found by
 sphinx-gallery

---
 examples/{sandwich.py => plot_sandwich.py} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename examples/{sandwich.py => plot_sandwich.py} (100%)

diff --git a/examples/sandwich.py b/examples/plot_sandwich.py
similarity index 100%
rename from examples/sandwich.py
rename to examples/plot_sandwich.py