scikit-learn-contrib · perimosocordiae · Dec 20, 2018 · Nov 21, 2018 · Dec 3, 2018 · Dec 3, 2018
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,4 @@ dist/
 .coverage
 htmlcov/
 .cache/
+doc/auto_examples/*
diff --git a/doc/conf.py b/doc/conf.py
@@ -7,6 +7,7 @@
     'sphinx.ext.viewcode',
     'sphinx.ext.mathjax',
     'numpydoc',
+    'sphinx_gallery.gen_gallery'
 ]
 
 templates_path = ['_templates']
@@ -15,7 +16,7 @@
 
 # General information about the project.
 project = u'metric-learn'
-copyright = u'2015-2017, CJ Carey and Yuan Tang'
+copyright = u'2015-2018, CJ Carey and Yuan Tang'
 author = u'CJ Carey and Yuan Tang'
 version = '0.4.0'
 release = '0.4.0'
@@ -31,3 +32,6 @@
 html_static_path = ['_static']
 htmlhelp_basename = 'metric-learndoc'
 
+# Option to only need single backticks to refer to symbols
+default_role = 'any'
+
diff --git a/doc/getting_started.rst b/doc/getting_started.rst
@@ -0,0 +1,42 @@
+###############
+Getting started
+###############
+
+Installation and Setup
+======================
+
+Run ``pip install metric-learn`` to download and install from PyPI.
+
+Alternately, download the source repository and run:
+
+-  ``python setup.py install`` for default installation.
+-  ``python setup.py test`` to run all tests.
+
+**Dependencies**
+
+-  Python 2.7+, 3.4+
+-  numpy, scipy, scikit-learn
+-  (for running the examples only: matplotlib)
+
+**Notes**
+
+If a recent version of the Shogun Python modular (``modshogun``) library
+is available, the LMNN implementation will use the fast C++ version from
+there. The two implementations differ slightly, and the C++ version is
+more complete.
+
+
+Quick start
+===========
+
+This example loads the iris dataset, and evaluates a k-nearest neighbors
+algorithm on an embedding space learned with `NCA`.
+
+>>> from metric_learn import NCA
+>>> from sklearn.datasets import load_iris
+>>> from sklearn.model_selection import cross_val_score
+>>> from sklearn.pipeline import make_pipeline
+>>>
+>>> X, y = load_iris(return_X_y=True)
+>>> clf = make_pipeline(NCA(), KNeighborsClassifier())
+>>> cross_val_score(clf, X, y)
diff --git a/doc/index.rst b/doc/index.rst
@@ -2,78 +2,31 @@ metric-learn: Metric Learning in Python
 =======================================
 |License| |PyPI version|
 
-Distance metrics are widely used in the machine learning literature.
-Traditionally, practicioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
-the domain.
-Distance metric learning (or simply, metric learning) is the sub-field of
-machine learning dedicated to automatically constructing optimal distance
-metrics.
-
-This package contains efficient Python implementations of several popular
-metric learning algorithms.
+Welcome to metric-learn's documentation !
+=========================================
 
 .. toctree::
-   :caption: Algorithms
-   :maxdepth: 1
-
-   metric_learn.covariance
-   metric_learn.lmnn
-   metric_learn.itml
-   metric_learn.sdml
-   metric_learn.lsml
-   metric_learn.nca
-   metric_learn.lfda
-   metric_learn.rca
-
-Each metric supports the following methods:
-
--  ``fit(...)``, which learns the model.
--  ``transformer()``, which returns a transformation matrix
-   :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
-   data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
-   :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
-   in which standard Euclidean distances may be used.
--  ``transform(X)``, which applies the aforementioned transformation.
--  ``metric()``, which returns a Mahalanobis matrix
-   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
-   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
+   :maxdepth: 2
 
+   getting_started
 
-Installation and Setup
-======================
-
-Run ``pip install metric-learn`` to download and install from PyPI.
-
-Alternately, download the source repository and run:
-
--  ``python setup.py install`` for default installation.
--  ``python setup.py test`` to run all tests.
+.. toctree::
+   :maxdepth: 2
 
-**Dependencies**
+   user_guide
 
--  Python 2.7+, 3.4+
--  numpy, scipy, scikit-learn
--  (for running the examples only: matplotlib)
+.. toctree::
+   :maxdepth: 2
 
-**Notes**
+   Package Overview <metric_learn>
 
-If a recent version of the Shogun Python modular (``modshogun``) library
-is available, the LMNN implementation will use the fast C++ version from
-there. The two implementations differ slightly, and the C++ version is
-more complete.
+.. toctree::
+   :maxdepth: 2
 
-Navigation
-----------
+   auto_examples/index
 
 :ref:`genindex` | :ref:`modindex` | :ref:`search`
 
-.. toctree::
-   :maxdepth: 4
-   :hidden:
-
-   Package Overview <metric_learn>
-
 .. |PyPI version| image:: https://badge.fury.io/py/metric-learn.svg
    :target: http://badge.fury.io/py/metric-learn
 .. |License| image:: http://img.shields.io/:license-mit-blue.svg?style=flat

diff --git a/doc/introduction.rst b/doc/introduction.rst
@@ -0,0 +1,31 @@
+============
+Introduction
+============
+
+Distance metrics are widely used in the machine learning literature.
+Traditionally, practitioners would choose a standard distance metric
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
+the domain.
+Distance metric learning (or simply, metric learning) is the sub-field of
+machine learning dedicated to automatically constructing optimal distance
+metrics.
+
+This package contains a efficient Python implementations of several popular
+metric learning algorithms, compatible with scikit-learn. This allows to use
+all the scikit-learn routines for pipelining and model selection for
+metric learning algorithms.
+
+
+Currently, each metric learning algorithm supports the following methods:
+
+-  ``fit(...)``, which learns the model.
+-  ``metric()``, which returns a Mahalanobis matrix
+   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
+   ``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`.
+-  ``transformer_from_metric(metric)``, which returns a transformation matrix
+   :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
+   data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
+   :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
+   in which standard Euclidean distances may be used.
+-  ``transform(X)``, which applies the aforementioned transformation.
+- ``score_pairs`` which returns the similarity of pairs of points.
diff --git a/doc/metric_learn.covariance.rst b/doc/metric_learn.covariance.rst
@@ -6,6 +6,7 @@ Covariance metric (baseline method)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.itml.rst b/doc/metric_learn.itml.rst
@@ -6,6 +6,7 @@ Information Theoretic Metric Learning (ITML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.lfda.rst b/doc/metric_learn.lfda.rst
@@ -6,6 +6,7 @@ Local Fisher Discriminant Analysis (LFDA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.lmnn.rst b/doc/metric_learn.lmnn.rst
@@ -6,6 +6,7 @@ Large Margin Nearest Neighbor (LMNN)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.lsml.rst b/doc/metric_learn.lsml.rst
@@ -6,6 +6,7 @@ Least Squares Metric Learning (LSML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.mlkr.rst b/doc/metric_learn.mlkr.rst
@@ -6,6 +6,7 @@ Metric Learning for Kernel Regression (MLKR)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.mmc.rst b/doc/metric_learn.mmc.rst
@@ -6,6 +6,7 @@ Mahalanobis Metric Learning for Clustering (MMC)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.nca.rst b/doc/metric_learn.nca.rst
@@ -6,6 +6,7 @@ Neighborhood Components Analysis (NCA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.rca.rst b/doc/metric_learn.rca.rst
@@ -6,6 +6,7 @@ Relative Components Analysis (RCA)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/metric_learn.sdml.rst b/doc/metric_learn.sdml.rst
@@ -6,6 +6,7 @@ Sparse Determinant Metric Learning (SDML)
     :undoc-members:
     :inherited-members:
     :show-inheritance:
+    :special-members: __init__
 
 Example Code
 ------------

diff --git a/doc/preprocessor.rst b/doc/preprocessor.rst
@@ -0,0 +1,118 @@
+:ref:`preprocessor`
+
+============
+Preprocessor
+============
+
+Estimators in metric-learn all have a ``preprocessor`` option at instantiation.
+Filling this argument allows them to take more compact input representation
+when fitting, predicting etc...
+
+If ``preprocessor=None``, no preprocessor will be used and the user must
+provide the classical representation to the fit/predict/score/etc... methods of
+the estimators (see the documentation of the particular estimator to know what
+type of input it accepts). Otherwise, two types of objects can be put in this
+argument:
+
+Array-like
+----------
+You can specify ``preprocessor=X`` where ``X`` is an array-like containing the
+dataset of points. In this case, the fit/predict/score/etc... methods of the
+estimator will be able to take as inputs an array-like of indices, replacing
+under the hood each index by the corresponding sample.
+
+
+Example with a supervised metric learner:
+
+>>> from metric_learn import NCA
+>>>
+>>> X = np.array([[-0.7 , -0.23],
+>>>               [-0.43, -0.49],
+>>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>> points_indices = np.array([2, 0, 1, 0])
+>>> y = np.array([1, 0, 1, 1])
+>>>
+>>> nca = NCA(preprocessor=X)
+>>> nca.fit(points_indices, y)
+>>> # under the hood the algorithm will create
+>>> # points = np.array([[ 0.14, -0.37],
+>>> #                    [-0.7 , -0.23],
+>>> #                    [-0.43, -0.49],
+>>> #                    [ 0.14, -0.37]]) and fit on it
+
+
+Example with a weakly supervised metric learner:
+
+>>> from metric_learn import MMC
+>>> X = np.array([[-0.7 , -0.23],
+>>>               [-0.43, -0.49],
+>>>               [ 0.14, -0.37]])  # array of 3 samples of 2 features
+>>> pairs_indices = np.array([[2, 0], [1, 0]])
+>>> y_pairs = np.array([1, -1])
+>>>
+>>> mmc = MMC(preprocessor=X)
+>>> mmc.fit(pairs_indices, y_pairs)
+>>> # under the hood the algorithm will create
+>>> # pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
+>>> #                    [[-0.43, -0.49], [-0.7 , -0.23]]]) and fit on it
+
+Callable
+--------
+Instead, you can provide a callable in the argument ``preprocessor``.
+Then the estimator will accept indicators of points instead of points.
+Under the hood, the estimator will call this callable on the indicators you
+provide as input when fitting, predicting etc...
+Using a callable can be really useful to represent lazily a dataset of
+images stored on the file system for instance.
+The callable should take as an input an array-like, and return a 2D
+array-like. For supervised learners it will be applied on the whole array of
+indicators at once, and for weakly supervised learners it will be applied
+on each column of the array of tuples.
+
+Example with a supervised metric learner:
+
+The callable should take as input an array-like, and return a 2D array-like.
+
+>>> def find_images(file_paths):
+>>>    # each file contains a small image to use as an input datapoint
+>>>    return np.row_stack([imread(f).ravel() for f in file_paths])
+>>>
+>>> nca = NCA(preprocessor=find_images)
+>>> nca.fit(['img01.png', 'img00.png', 'img02.png'], [1, 0, 1])
+>>> # under the hood preprocessor(indicators) will be called
+
+
+Example with a weakly supervised metric learner:
+
+The given callable should take as input an array-like, and return a
+2D array-like, as before. It will be called on each column of the input
+tuples of indicators.
+
+>>> pairs_images_paths = [['img02.png', 'img00.png'],
+>>>                       ['img01.png', 'img00.png']]
+>>> y_pairs = np.array([1, -1])
+>>>
+>>> mmc = NCA(preprocessor=find_images)
+>>> mmc.fit(pairs_images_paths, y_pairs)
+>>> # under the hood preprocessor(pairs_indicators[i]) will be called for each
+>>> # i in [0, 1]
+
+
+.. note:: Note that when you fill the ``preprocessor`` option, it allows you
+ to give more compact inputs, but the classical way of providing inputs
+ stays valid (2D array-like for supervised learners and 3D array-like of
+ tuples for weakly supervised learners). If a classical input
+ is provided, the metric learner will not use the preprocessor.
+
+ Example: This will work:
+
+ >>> from metric_learn import MMC
+ >>> def preprocessor_wip(array):
+ >>>    raise NotImplementedError("This preprocessor does nothing yet.")
+ >>>
+ >>> pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]],
+ >>>                   [[-0.43, -0.49], [-0.7 , -0.23]]])
+ >>> y_pairs = np.array([1, -1])
+ >>>
+ >>> mmc = MMC(preprocessor=preprocessor_wip)
+ >>> mmc.fit(pairs, y_pairs)  # preprocessor_wip will not be called here