-
Notifications
You must be signed in to change notification settings - Fork 229
Add documentation for the new API #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
61856b3
d5dd517
b54ee34
7495b68
18325cd
0ddaee3
e998652
41b9182
0adb3c0
26306ba
4eb8495
3db2653
813f658
3891b93
7dcfb54
1b83569
70f16a9
ed0a00e
868d42b
1fe3357
ea487b7
16ba60a
95f0702
6cb328f
ff4d30e
37cd11c
6eee862
bee4a8c
d49ba68
202e3fe
c107584
9de2e9c
1371122
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,3 +5,4 @@ dist/ | |
.coverage | ||
htmlcov/ | ||
.cache/ | ||
doc/auto_examples/* |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
############### | ||
Getting started | ||
############### | ||
|
||
Installation and Setup | ||
====================== | ||
|
||
Run ``pip install metric-learn`` to download and install from PyPI. | ||
|
||
Alternately, download the source repository and run: | ||
|
||
- ``python setup.py install`` for default installation. | ||
- ``python setup.py test`` to run all tests. | ||
|
||
**Dependencies** | ||
|
||
- Python 2.7+, 3.4+ | ||
- numpy, scipy, scikit-learn | ||
- (for running the examples only: matplotlib) | ||
|
||
**Notes** | ||
|
||
If a recent version of the Shogun Python modular (``modshogun``) library | ||
is available, the LMNN implementation will use the fast C++ version from | ||
there. The two implementations differ slightly, and the C++ version is | ||
more complete. | ||
|
||
|
||
Quick start | ||
=========== | ||
|
||
This example loads the iris dataset, and evaluates a k-nearest neighbors | ||
algorithm on an embedding space learned with `NCA`. | ||
|
||
>>> from metric_learn import NCA | ||
>>> from sklearn.datasets import load_iris | ||
>>> from sklearn.model_selection import cross_val_score | ||
>>> from sklearn.pipeline import make_pipeline | ||
>>> | ||
>>> X, y = load_iris(return_X_y=True) | ||
>>> clf = make_pipeline(NCA(), KNeighborsClassifier()) | ||
>>> cross_val_score(clf, X, y) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,78 +2,31 @@ metric-learn: Metric Learning in Python | |
======================================= | ||
|License| |PyPI version| | ||
|
||
Distance metrics are widely used in the machine learning literature. | ||
Traditionally, practicioners would choose a standard distance metric | ||
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of | ||
the domain. | ||
Distance metric learning (or simply, metric learning) is the sub-field of | ||
machine learning dedicated to automatically constructing optimal distance | ||
metrics. | ||
|
||
This package contains efficient Python implementations of several popular | ||
metric learning algorithms. | ||
Welcome to metric-learn's documentation ! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this appears with same title level as the main title (metric-learn: Metric Learning in Python) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right, done |
||
========================================= | ||
|
||
.. toctree:: | ||
:caption: Algorithms | ||
:maxdepth: 1 | ||
|
||
metric_learn.covariance | ||
metric_learn.lmnn | ||
metric_learn.itml | ||
metric_learn.sdml | ||
metric_learn.lsml | ||
metric_learn.nca | ||
metric_learn.lfda | ||
metric_learn.rca | ||
|
||
Each metric supports the following methods: | ||
|
||
- ``fit(...)``, which learns the model. | ||
- ``transformer()``, which returns a transformation matrix | ||
:math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a | ||
data matrix :math:`X \in \mathbb{R}^{n \times d}` to the | ||
:math:`D`-dimensional learned metric space :math:`X L^{\top}`, | ||
in which standard Euclidean distances may be used. | ||
- ``transform(X)``, which applies the aforementioned transformation. | ||
- ``metric()``, which returns a Mahalanobis matrix | ||
:math:`M = L^{\top}L` such that distance between vectors ``x`` and | ||
``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`. | ||
:maxdepth: 2 | ||
|
||
getting_started | ||
|
||
Installation and Setup | ||
====================== | ||
|
||
Run ``pip install metric-learn`` to download and install from PyPI. | ||
|
||
Alternately, download the source repository and run: | ||
|
||
- ``python setup.py install`` for default installation. | ||
- ``python setup.py test`` to run all tests. | ||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
**Dependencies** | ||
user_guide | ||
|
||
- Python 2.7+, 3.4+ | ||
- numpy, scipy, scikit-learn | ||
- (for running the examples only: matplotlib) | ||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
**Notes** | ||
Package Overview <metric_learn> | ||
|
||
If a recent version of the Shogun Python modular (``modshogun``) library | ||
is available, the LMNN implementation will use the fast C++ version from | ||
there. The two implementations differ slightly, and the C++ version is | ||
more complete. | ||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
Navigation | ||
---------- | ||
auto_examples/index | ||
|
||
:ref:`genindex` | :ref:`modindex` | :ref:`search` | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
:hidden: | ||
|
||
Package Overview <metric_learn> | ||
|
||
.. |PyPI version| image:: https://badge.fury.io/py/metric-learn.svg | ||
:target: http://badge.fury.io/py/metric-learn | ||
.. |License| image:: http://img.shields.io/:license-mit-blue.svg?style=flat | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
============ | ||
Introduction | ||
============ | ||
|
||
Distance metrics are widely used in the machine learning literature. | ||
Traditionally, practitioners would choose a standard distance metric | ||
(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of | ||
the domain. | ||
Distance metric learning (or simply, metric learning) is the sub-field of | ||
machine learning dedicated to automatically constructing optimal distance | ||
metrics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest two changes here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree, done
I agree, for now I just added this sentence, but indeed we should emphasize this more, maybe in the future with some section in the docs, and also maybe this will get clearer with examples of metric learners used as transformers (examples of dimensionality reduction for instance) |
||
|
||
This package contains a efficient Python implementations of several popular | ||
metric learning algorithms, compatible with scikit-learn. This allows to use | ||
all the scikit-learn routines for pipelining and model selection for | ||
metric learning algorithms. | ||
|
||
|
||
Currently, each metric learning algorithm supports the following methods: | ||
|
||
- ``fit(...)``, which learns the model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing a few other generic methods, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, thanks. I've added |
||
- ``metric()``, which returns a Mahalanobis matrix | ||
:math:`M = L^{\top}L` such that distance between vectors ``x`` and | ||
``y`` can be computed as :math:`\left(x-y\right)M\left(x-y\right)`. | ||
- ``transformer_from_metric(metric)``, which returns a transformation matrix | ||
:math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a | ||
data matrix :math:`X \in \mathbb{R}^{n \times d}` to the | ||
:math:`D`-dimensional learned metric space :math:`X L^{\top}`, | ||
in which standard Euclidean distances may be used. | ||
- ``transform(X)``, which applies the aforementioned transformation. | ||
- ``score_pairs`` which returns the similarity of pairs of points. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe clarify the inputs to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right, done |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
:ref:`preprocessor` | ||
|
||
============ | ||
Preprocessor | ||
============ | ||
|
||
Estimators in metric-learn all have a ``preprocessor`` option at instantiation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe briefly explain default behavior (when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, done |
||
Filling this argument allows them to take more compact input representation | ||
when fitting, predicting etc... | ||
|
||
If ``preprocessor=None``, no preprocessor will be used and the user must | ||
provide the classical representation to the fit/predict/score/etc... methods of | ||
the estimators (see the documentation of the particular estimator to know what | ||
type of input it accepts). Otherwise, two types of objects can be put in this | ||
argument: | ||
|
||
Array-like | ||
---------- | ||
You can specify ``preprocessor=X`` where ``X`` is an array-like containing the | ||
dataset of points. In this case, the fit/predict/score/etc... methods of the | ||
estimator will be able to take as inputs an array-like of indices, replacing | ||
under the hood each index by the corresponding sample. | ||
|
||
|
||
Example with a supervised metric learner: | ||
|
||
>>> from metric_learn import NCA | ||
>>> | ||
>>> X = np.array([[-0.7 , -0.23], | ||
>>> [-0.43, -0.49], | ||
>>> [ 0.14, -0.37]]) # array of 3 samples of 2 features | ||
>>> points_indices = np.array([2, 0, 1, 0]) | ||
>>> y = np.array([1, 0, 1, 1]) | ||
>>> | ||
>>> nca = NCA(preprocessor=X) | ||
>>> nca.fit(points_indices, y) | ||
>>> # under the hood the algorithm will create | ||
>>> # points = np.array([[ 0.14, -0.37], | ||
>>> # [-0.7 , -0.23], | ||
>>> # [-0.43, -0.49], | ||
>>> # [ 0.14, -0.37]]) and fit on it | ||
|
||
|
||
Example with a weakly supervised metric learner: | ||
|
||
>>> from metric_learn import MMC | ||
>>> X = np.array([[-0.7 , -0.23], | ||
>>> [-0.43, -0.49], | ||
>>> [ 0.14, -0.37]]) # array of 3 samples of 2 features | ||
>>> pairs_indices = np.array([[2, 0], [1, 0]]) | ||
>>> y_pairs = np.array([1, -1]) | ||
>>> | ||
>>> mmc = MMC(preprocessor=X) | ||
>>> mmc.fit(pairs_indices, y_pairs) | ||
>>> # under the hood the algorithm will create | ||
>>> # pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]], | ||
>>> # [[-0.43, -0.49], [-0.7 , -0.23]]]) and fit on it | ||
|
||
Callable | ||
-------- | ||
Instead, you can provide a callable in the argument ``preprocessor``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, you can provide a callable as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better indeed, done |
||
Then the estimator will accept indicators of points instead of points. | ||
Under the hood, the estimator will call this callable on the indicators you | ||
provide as input when fitting, predicting etc... | ||
Using a callable can be really useful to represent lazily a dataset of | ||
images stored on the file system for instance. | ||
The callable should take as an input an array-like, and return a 2D | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. take as input a 1D array-like? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes indeed, thanks |
||
array-like. For supervised learners it will be applied on the whole array of | ||
indicators at once, and for weakly supervised learners it will be applied | ||
on each column of the array of tuples. | ||
|
||
Example with a supervised metric learner: | ||
|
||
The callable should take as input an array-like, and return a 2D array-like. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a repeat from above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, removed it |
||
|
||
>>> def find_images(file_paths): | ||
>>> # each file contains a small image to use as an input datapoint | ||
>>> return np.row_stack([imread(f).ravel() for f in file_paths]) | ||
>>> | ||
>>> nca = NCA(preprocessor=find_images) | ||
>>> nca.fit(['img01.png', 'img00.png', 'img02.png'], [1, 0, 1]) | ||
>>> # under the hood preprocessor(indicators) will be called | ||
|
||
|
||
Example with a weakly supervised metric learner: | ||
|
||
The given callable should take as input an array-like, and return a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again a repeat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, removed it |
||
2D array-like, as before. It will be called on each column of the input | ||
tuples of indicators. | ||
|
||
>>> pairs_images_paths = [['img02.png', 'img00.png'], | ||
>>> ['img01.png', 'img00.png']] | ||
>>> y_pairs = np.array([1, -1]) | ||
>>> | ||
>>> mmc = NCA(preprocessor=find_images) | ||
>>> mmc.fit(pairs_images_paths, y_pairs) | ||
>>> # under the hood preprocessor(pairs_indicators[i]) will be called for each | ||
>>> # i in [0, 1] | ||
|
||
|
||
.. note:: Note that when you fill the ``preprocessor`` option, it allows you | ||
to give more compact inputs, but the classical way of providing inputs | ||
stays valid (2D array-like for supervised learners and 3D array-like of | ||
tuples for weakly supervised learners). If a classical input | ||
is provided, the metric learner will not use the preprocessor. | ||
|
||
Example: This will work: | ||
|
||
>>> from metric_learn import MMC | ||
>>> def preprocessor_wip(array): | ||
>>> raise NotImplementedError("This preprocessor does nothing yet.") | ||
>>> | ||
>>> pairs = np.array([[[ 0.14, -0.37], [-0.7 , -0.23]], | ||
>>> [[-0.43, -0.49], [-0.7 , -0.23]]]) | ||
>>> y_pairs = np.array([1, -1]) | ||
>>> | ||
>>> mmc = MMC(preprocessor=preprocessor_wip) | ||
>>> mmc.fit(pairs, y_pairs) # preprocessor_wip will not be called here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just add a sentence or two to briefly describe what the code snippet does (compute cross validation score of NCA on iris dataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've added a quick description and also modified the example since it didn't work in fact (for now we don't have a scoring for cross-validation on supervised metric learners), so I updated the example with a pipeline nca + knn