Skip to content

Commit edad55d

Browse files
wdevazelhesbellet
authored andcommitted
[MRG+1] Threshold for pairs learners (#168)
* add some tests for testing that different scores work using the scoring function * ENH: Add tests and basic threshold implementation * Add support for LSML and more generally quadruplets * Make CalibratedClassifierCV work (for preprocessor case) thanks to classes_ * Fix some tests and PEP8 errors * change the sign in decision function * Add docstring for threshold_ and classes_ in the base _PairsClassifier class * remove quadruplets from the test with scikit learn custom scorings * Remove argument y in quadruplets learners and lsml * FIX fix docstrings of decision functions * FIX the threshold by taking the opposite (to be adapted to the decision function) * Fix tests to have no y for quadruplets' estimator fit * Remove isin to be compatible with old numpy versions * Fix threshold so that it has a positive value and add small test * Fix threshold for itml * FEAT: Add calibrate_threshold and tests * MAINT: remove starred syntax for compatibility with older versions of python * Remove debugging prints and make tests for ITML pass, while waiting for #175 to be solved * FIX: from __future__ import division to pass tests for python 2.7 * Add some documentation for calibration * DOC: fix style * Address most comments from aurelien's reviews * Remove classes_ attribute and test for CalibratedClassifierCV * Rename make_args_inc_quadruplets into remove_y_quadruplets * TST: Fix remaining threshold into min_rate * Remove default_threshold and put calibrate_threshold instead * Use calibrate_threshold for ITML, and remove description * ENH: use calibrate_threshold by default and display its parameters from the fit method * Add a small test to test automatic calibration * Update documentation of the default threshold * Inverse sense for threshold comparison to be more intuitive * Address remaining review comments * MAINT: Rename threshold_params into calibration_params * TST: Add test for extreme cases * MAINT: rename threshold_params into calibration_params * MAINT: rename threshold_params into calibration_params * FIX: Make tests work, and add the right threshold (mean between lowest accepted value and highest rejected value), and max + 1 or min - 1 for extreme points * Go back to previous version of finding the threshold * Extract method for validating calibration parameters * Validate calibration params before fit * Address #168 (comment)
1 parent b28933c commit edad55d

11 files changed

+1066
-148
lines changed

doc/weakly_supervised.rst

Lines changed: 83 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,47 @@ tuples you're working with (pairs, triplets...). See the docstring of the
148148
`score` method of the estimator you use.
149149

150150

151+
Learning on pairs
152+
=================
153+
154+
Some metric learning algorithms learn on pairs of samples. In this case, one
155+
should provide the algorithm with ``n_samples`` pairs of points, with a
156+
corresponding target containing ``n_samples`` values being either +1 or -1.
157+
These values indicate whether the given pairs are similar points or
158+
dissimilar points.
159+
160+
161+
.. _calibration:
162+
163+
Thresholding
164+
------------
165+
In order to predict whether a new pair represents similar or dissimilar
166+
samples, we need to set a distance threshold, so that points closer (in the
167+
learned space) than this threshold are predicted as similar, and points further
168+
away are predicted as dissimilar. Several methods are possible for this
169+
thresholding.
170+
171+
- **At fit time**: The threshold is set with `calibrate_threshold` (see
172+
below) on the trainset. You can specify the calibration parameters directly
173+
in the `fit` method with the `threshold_params` parameter (see the
174+
documentation of the `fit` method of any metric learner that learns on pairs
175+
of points for more information). This method can cause a little bit of
176+
overfitting. If you want to avoid that, calibrate the threshold after
177+
fitting, on a validation set.
178+
179+
- **Manual**: calling `set_threshold` will set the threshold to a
180+
particular value.
181+
182+
- **Calibration**: calling `calibrate_threshold` will calibrate the
183+
threshold to achieve a particular score on a validation set, the score
184+
being among the classical scores for classification (accuracy, f1 score...).
185+
186+
187+
See also: `sklearn.calibration`.
188+
189+
151190
Algorithms
152-
==================
191+
==========
153192

154193
ITML
155194
----
@@ -192,39 +231,6 @@ programming.
192231
.. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/
193232
itml/
194233
195-
196-
LSML
197-
----
198-
199-
`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
200-
Residual
201-
202-
.. topic:: Example Code:
203-
204-
::
205-
206-
from metric_learn import LSML
207-
208-
quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
209-
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
210-
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
211-
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
212-
213-
# we want to make closer points where the first feature is close, and
214-
# further if the second feature is close
215-
216-
lsml = LSML()
217-
lsml.fit(quadruplets)
218-
219-
.. topic:: References:
220-
221-
.. [1] Liu et al.
222-
"Metric Learning from Relative Comparisons by Minimizing Squared
223-
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
224-
225-
.. [2] Adapted from https://gist.github.com/kcarnold/5439917
226-
227-
228234
SDML
229235
----
230236

@@ -343,3 +349,46 @@ method. However, it is one of the earliest and a still often cited technique.
343349
-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
344350
.. [2] Adapted from Matlab code `here <http://www.cs.cmu
345351
.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_.
352+
353+
Learning on quadruplets
354+
=======================
355+
356+
A type of information even weaker than pairs is information about relative
357+
comparisons between pairs. The user should provide the algorithm with a
358+
quadruplet of points, where the two first points are closer than the two
359+
last points. No target vector (``y``) is needed, since the supervision is
360+
already in the order that points are given in the quadruplet.
361+
362+
Algorithms
363+
==========
364+
365+
LSML
366+
----
367+
368+
`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
369+
Residual
370+
371+
.. topic:: Example Code:
372+
373+
::
374+
375+
from metric_learn import LSML
376+
377+
quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
378+
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
379+
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
380+
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]
381+
382+
# we want to make closer points where the first feature is close, and
383+
# further if the second feature is close
384+
385+
lsml = LSML()
386+
lsml.fit(quadruplets)
387+
388+
.. topic:: References:
389+
390+
.. [1] Liu et al.
391+
"Metric Learning from Relative Comparisons by Minimizing Squared
392+
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf
393+
394+
.. [2] Adapted from https://gist.github.com/kcarnold/5439917

0 commit comments

Comments
 (0)