diff --git a/.gitignore b/.gitignore index a51c1a82..449f70ea 100644 --- a/.gitignore +++ b/.gitignore @@ -7,6 +7,3 @@ htmlcov/ .cache/ .pytest_cache/ doc/auto_examples/* -coverage -.coverage -.coverage* diff --git a/doc/supervised.rst b/doc/supervised.rst index 26934a47..83bf4449 100644 --- a/doc/supervised.rst +++ b/doc/supervised.rst @@ -41,17 +41,37 @@ the covariance matrix of the input data. This is a simple baseline method. .. [1] On the Generalized Distance in Statistics, P.C.Mahalanobis, 1936 +.. _lmnn: + LMNN ----- -Large-margin nearest neighbor metric learning. +Large Margin Nearest Neighbor Metric Learning +(:py:class:`LMNN `) -`LMNN` learns a Mahanalobis distance metric in the kNN classification -setting using semidefinite programming. The learned metric attempts to keep -k-nearest neighbors in the same class, while keeping examples from different -classes separated by a large margin. This algorithm makes no assumptions about +`LMNN` learns a Mahalanobis distance metric in the kNN classification +setting. The learned metric attempts to keep close k-nearest neighbors +from the same class, while keeping examples from different classes +separated by a large margin. This algorithm makes no assumptions about the distribution of the data. +The distance is learned by solving the following optimization problem: + +.. math:: + + \min_\mathbf{L}\sum_{i, j}\eta_{ij}||\mathbf{L(x_i-x_j)}||^2 + + c\sum_{i, j, l}\eta_{ij}(1-y_{ij})[1+||\mathbf{L(x_i-x_j)}||^2-|| + \mathbf{L(x_i-x_l)}||^2]_+) + +where :math:`\mathbf{x}_i` is an data point, :math:`\mathbf{x}_j` is one +of its k nearest neighbors sharing the same label, and :math:`\mathbf{x}_l` +are all the other instances within that region with different labels, +:math:`\eta_{ij}, y_{ij} \in \{0, 1\}` are both the indicators, +:math:`\eta_{ij}` represents :math:`\mathbf{x}_{j}` is the k nearest +neighbors(with same labels) of :math:`\mathbf{x}_{i}`, :math:`y_{ij}=0` +indicates :math:`\mathbf{x}_{i}, \mathbf{x}_{j}` belong to different class, +:math:`[\cdot]_+=\max(0, \cdot)` is the Hinge loss. + .. topic:: Example Code: :: @@ -80,16 +100,44 @@ The two implementations differ slightly, and the C++ version is more complete. -margin -nearest-neighbor-classification>`_ Kilian Q. Weinberger, John Blitzer, Lawrence K. Saul +.. _nca: + NCA --- -Neighborhood Components Analysis (`NCA`) is a distance metric learning -algorithm which aims to improve the accuracy of nearest neighbors -classification compared to the standard Euclidean distance. The algorithm -directly maximizes a stochastic variant of the leave-one-out k-nearest -neighbors (KNN) score on the training set. It can also learn a low-dimensional -linear embedding of data that can be used for data visualization and fast -classification. +Neighborhood Components Analysis(:py:class:`NCA `) + +`NCA` is a distance metric learning algorithm which aims to improve the +accuracy of nearest neighbors classification compared to the standard +Euclidean distance. The algorithm directly maximizes a stochastic variant +of the leave-one-out k-nearest neighbors (KNN) score on the training set. +It can also learn a low-dimensional linear transformation of data that can +be used for data visualization and fast classification. + +They use the decomposition :math:`\mathbf{M} = \mathbf{L}^T\mathbf{L}` and +define the probability :math:`p_{ij}` that :math:`\mathbf{x}_i` is the +neighbor of :math:`\mathbf{x}_j` by calculating the softmax likelihood of +the Mahalanobis distance: + +.. math:: + + p_{ij} = \frac{\exp(-|| \mathbf{Lx}_i - \mathbf{Lx}_j ||_2^2)} + {\sum_{l\neq i}\exp(-||\mathbf{Lx}_i - \mathbf{Lx}_l||_2^2)}, + \qquad p_{ii}=0 + +Then the probability that :math:`\mathbf{x}_i` will be correctly classified +by the stochastic nearest neighbors rule is: + +.. math:: + + p_{i} = \sum_{j:j\neq i, y_j=y_i}p_{ij} + +The optimization problem is to find matrix :math:`\mathbf{L}` that maximizes +the sum of probability of being correctly classified: + +.. math:: + + \mathbf{L} = \text{argmax}\sum_i p_i .. topic:: Example Code: @@ -116,16 +164,55 @@ classification. .. [2] Wikipedia entry on Neighborhood Components Analysis https://en.wikipedia.org/wiki/Neighbourhood_components_analysis +.. _lfda: + LFDA ---- -Local Fisher Discriminant Analysis (LFDA) +Local Fisher Discriminant Analysis(:py:class:`LFDA `) `LFDA` is a linear supervised dimensionality reduction method. It is -particularly useful when dealing with multimodality, where one ore more classes +particularly useful when dealing with multi-modality, where one ore more classes consist of separate clusters in input space. The core optimization problem of LFDA is solved as a generalized eigenvalue problem. + +The algorithm define the Fisher local within-/between-class scatter matrix +:math:`\mathbf{S}^{(w)}/ \mathbf{S}^{(b)}` in a pairwise fashion: + +.. math:: + + \mathbf{S}^{(w)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(w)}(\mathbf{x}_i - + \mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\ + \mathbf{S}^{(b)} = \frac{1}{2}\sum_{i,j=1}^nW_{ij}^{(b)}(\mathbf{x}_i - + \mathbf{x}_j)(\mathbf{x}_i - \mathbf{x}_j)^T,\\ + +where + +.. math:: + + W_{ij}^{(w)} = \left\{\begin{aligned}0 \qquad y_i\neq y_j \\ + \,\,\mathbf{A}_{i,j}/n_l \qquad y_i = y_j\end{aligned}\right.\\ + W_{ij}^{(b)} = \left\{\begin{aligned}1/n \qquad y_i\neq y_j \\ + \,\,\mathbf{A}_{i,j}(1/n-1/n_l) \qquad y_i = y_j\end{aligned}\right.\\ + +here :math:`\mathbf{A}_{i,j}` is the :math:`(i,j)`-th entry of the affinity +matrix :math:`\mathbf{A}`:, which can be calculated with local scaling methods. + +Then the learning problem becomes derive the LFDA transformation matrix +:math:`\mathbf{T}_{LFDA}`: + +.. math:: + + \mathbf{T}_{LFDA} = \arg\max_\mathbf{T} + [\text{tr}((\mathbf{T}^T\mathbf{S}^{(w)} + \mathbf{T})^{-1}\mathbf{T}^T\mathbf{S}^{(b)}\mathbf{T})] + +That is, it is looking for a transformation matrix :math:`\mathbf{T}` such that +nearby data pairs in the same class are made close and the data pairs in +different classes are separated from each other; far apart data pairs in the +same class are not imposed to be close. + .. topic:: Example Code: :: @@ -151,17 +238,50 @@ LFDA is solved as a generalized eigenvalue problem. `_ Yuan Tang. +.. _mlkr: MLKR ---- -Metric Learning for Kernel Regression. +Metric Learning for Kernel Regression(:py:class:`MLKR `) `MLKR` is an algorithm for supervised metric learning, which learns a -distance function by directly minimising the leave-one-out regression error. +distance function by directly minimizing the leave-one-out regression error. This algorithm can also be viewed as a supervised variation of PCA and can be used for dimensionality reduction and high dimensional data visualization. +Theoretically, `MLKR` can be applied with many types of kernel functions and +distance metrics, we hereafter focus the exposition on a particular instance +of the Gaussian kernel and Mahalanobis metric, as these are used in our +empirical development. The Gaussian kernel is denoted as: + +.. math:: + + k_{ij} = \frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{d(\mathbf{x}_i, + \mathbf{x}_j)}{\sigma^2}) + +where :math:`d(\cdot, \cdot)` is the squared distance under some metrics, +here in the fashion of Mahalanobis, it should be :math:`d(\mathbf{x}_i, +\mathbf{x}_j) = ||\mathbf{A}(\mathbf{x}_i - \mathbf{x}_j)||`, the transition +matrix :math:`\mathbf{A}` is derived from the decomposition of Mahalanobis +matrix :math:`\mathbf{M=A^TA}`. + +Since :math:`\sigma^2` can be integrated into :math:`d(\cdot)`, we can set +:math:`\sigma^2=1` for the sake of simplicity. Here we use the cumulative +leave-one-out quadratic regression error of the training samples as the +loss function: + +.. math:: + + \mathcal{L} = \sum_i(y_i - \hat{y}_i)^2 + +where the prediction :math:`\hat{y}_i` is derived from kernel regression by +calculating a weighted average of all the training samples: + +.. math:: + + \hat{y}_i = \frac{\sum_{j\neq i}y_jk_{ij}}{\sum_{j\neq i}k_{ij}} + .. topic:: Example Code: :: @@ -193,7 +313,6 @@ generated from the labels information and passed to the underlying algorithm. .. todo:: add more details about that (see issue ``_) - .. topic:: Example Code: :: diff --git a/doc/weakly_supervised.rst b/doc/weakly_supervised.rst index 6bf6f993..93720ffc 100644 --- a/doc/weakly_supervised.rst +++ b/doc/weakly_supervised.rst @@ -190,18 +190,55 @@ See also: `sklearn.calibration`. Algorithms ========== +.. _itml: + ITML ---- -Information Theoretic Metric Learning, Davis et al., ICML 2007 +Information Theoretic Metric Learning(:py:class:`ITML `) -`ITML` minimizes the differential relative entropy between two multivariate -Gaussians under constraints on the distance function, which can be formulated -into a Bregman optimization problem by minimizing the LogDet divergence subject -to linear constraints. This algorithm can handle a wide variety of constraints +`ITML` minimizes the (differential) relative entropy, aka Kullback–Leibler +divergence, between two multivariate Gaussians subject to constraints on the +associated Mahalanobis distance, which can be formulated into a Bregman +optimization problem by minimizing the LogDet divergence subject to +linear constraints. This algorithm can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Unlike some -other methods, ITML does not rely on an eigenvalue computation or semi-definite -programming. +other methods, `ITML` does not rely on an eigenvalue computation or +semi-definite programming. + + +Given a Mahalanobis distance parameterized by :math:`A`, its corresponding +multivariate Gaussian is denoted as: + +.. math:: + p(\mathbf{x}; \mathbf{A}) = \frac{1}{Z}\exp(-\frac{1}{2}d_\mathbf{A} + (\mathbf{x}, \mu)) + = \frac{1}{Z}\exp(-\frac{1}{2}((\mathbf{x} - \mu)^T\mathbf{A} + (\mathbf{x} - \mu)) + +where :math:`Z` is the normalization constant, the inverse of Mahalanobis +matrix :math:`\mathbf{A}^{-1}` is the covariance of the Gaussian. + +Given pairs of similar points :math:`S` and pairs of dissimilar points +:math:`D`, the distance metric learning problem is to minimize the LogDet +divergence, which is equivalent as minimizing :math:`\textbf{KL}(p(\mathbf{x}; +\mathbf{A}_0) || p(\mathbf{x}; \mathbf{A}))`: + +.. math:: + + \min_\mathbf{A} D_{\ell \mathrm{d}}\left(A, A_{0}\right) = + \operatorname{tr}\left(A A_{0}^{-1}\right)-\log \operatorname{det} + \left(A A_{0}^{-1}\right)-n\\ + \text{subject to } \quad d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) + \leq u \qquad (\mathbf{x}_i, \mathbf{x}_j)\in S \\ + d_\mathbf{A}(\mathbf{x}_i, \mathbf{x}_j) \geq l \qquad (\mathbf{x}_i, + \mathbf{x}_j)\in D + + +where :math:`u` and :math:`l` is the upper and the lower bound of distance +for similar and dissimilar pairs respectively, and :math:`\mathbf{A}_0` +is the prior distance metric, set to identity matrix by default, +:math:`D_{\ell \mathrm{d}}(\cdot)` is the log determinant. .. topic:: Example Code: @@ -231,11 +268,124 @@ programming. .. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/ itml/ + +.. _lsml: + +LSML +---- + +Metric Learning from Relative Comparisons by Minimizing Squared Residual +(:py:class:`LSML `) + +`LSML` proposes a simple, yet effective, algorithm that minimizes a convex +objective function corresponding to the sum of squared residuals of +constraints. This algorithm uses the constraints in the form of the +relative distance comparisons, such method is especially useful where +pairwise constraints are not natural to obtain, thus pairwise constraints +based algorithms become infeasible to be deployed. Furthermore, its sparsity +extension leads to more stable estimation when the dimension is high and +only a small amount of constraints is given. + +The loss function of each constraint +:math:`d(\mathbf{x}_a, \mathbf{x}_b) < d(\mathbf{x}_c, \mathbf{x}_d)` is +denoted as: + +.. math:: + + H(d_\mathbf{M}(\mathbf{x}_a, \mathbf{x}_b) + - d_\mathbf{M}(\mathbf{x}_c, \mathbf{x}_d)) + +where :math:`H(\cdot)` is the squared Hinge loss function defined as: + +.. math:: + + H(x) = \left\{\begin{aligned}0 \qquad x\leq 0 \\ + \,\,x^2 \qquad x>0\end{aligned}\right.\\ + +The summed loss function :math:`L(C)` is the simple sum over all constraints +:math:`C = \{(\mathbf{x}_a , \mathbf{x}_b , \mathbf{x}_c , \mathbf{x}_d) +: d(\mathbf{x}_a , \mathbf{x}_b) < d(\mathbf{x}_c , \mathbf{x}_d)\}`. The +original paper suggested here should be a weighted sum since the confidence +or probability of each constraint might differ. However, for the sake of +simplicity and assumption of no extra knowledge provided, we just deploy +the simple sum here as well as what the authors did in the experiments. + +The distance metric learning problem becomes minimizing the summed loss +function of all constraints plus a regularization term w.r.t. the prior +knowledge: + +.. math:: + + \min_\mathbf{M}(D_{ld}(\mathbf{M, M_0}) + \sum_{(\mathbf{x}_a, + \mathbf{x}_b, \mathbf{x}_c, \mathbf{x}_d)\in C}H(d_\mathbf{M}( + \mathbf{x}_a, \mathbf{x}_b) - d_\mathbf{M}(\mathbf{x}_c, \mathbf{x}_c))\\ + +where :math:`\mathbf{M}_0` is the prior metric matrix, set as identity +by default, :math:`D_{ld}(\mathbf{\cdot, \cdot})` is the LogDet divergence: + +.. math:: + + D_{ld}(\mathbf{M, M_0}) = \text{tr}(\mathbf{MM_0}) − \text{logdet} + (\mathbf{M}) + +.. topic:: Example Code: + +:: + + from metric_learn import LSML + + quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]], + [[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]], + [[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]], + [[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]] + + # we want to make closer points where the first feature is close, and + # further if the second feature is close + + lsml = LSML() + lsml.fit(quadruplets) + +.. topic:: References: + + .. [1] Liu et al. + "Metric Learning from Relative Comparisons by Minimizing Squared + Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf + + .. [2] Adapted from https://gist.github.com/kcarnold/5439917 + +.. _sdml: + +======= + SDML ---- -`SDML`: An efficient sparse metric learning in high-dimensional space via -L1-penalized log-determinant regularization +Sparse High-Dimensional Metric Learning +(:py:class:`SDML `) + +`SDML` is an efficient sparse metric learning in high-dimensional space via +double regularization: an L1-penalization on the off-diagonal elements of the +Mahalanobis matrix :math:`\mathbf{M}`, and a log-determinant divergence between +:math:`\mathbf{M}` and :math:`\mathbf{M_0}` (set as either :math:`\mathbf{I}` +or :math:`\mathbf{\Omega}^{-1}`, where :math:`\mathbf{\Omega}` is the +covariance matrix). + +The formulated optimization on the semidefinite matrix :math:`\mathbf{M}` +is convex: + +.. math:: + + \min_{\mathbf{M}} = \text{tr}((\mathbf{M}_0 + \eta \mathbf{XLX}^{T}) + \cdot \mathbf{M}) - \log\det \mathbf{M} + \lambda ||\mathbf{M}||_{1, off} + +where :math:`\mathbf{X}=[\mathbf{x}_1, \mathbf{x}_2, ..., \mathbf{x}_n]` is +the training data, the incidence matrix :math:`\mathbf{K}_{ij} = 1` if +:math:`(\mathbf{x}_i, \mathbf{x}_j)` is a similar pair, otherwise -1. The +Laplacian matrix :math:`\mathbf{L}=\mathbf{D}-\mathbf{K}` is calculated from +:math:`\mathbf{K}` and :math:`\mathbf{D}`, a diagonal matrix whose entries are +the sums of the row elements of :math:`\mathbf{K}`., :math:`||\cdot||_{1, off}` +is the off-diagonal L1 norm. + .. topic:: Example Code: @@ -265,18 +415,33 @@ L1-penalized log-determinant regularization .. [2] Adapted from https://gist.github.com/kcarnold/5439945 +.. _rca: RCA --- -Relative Components Analysis (RCA) +Relative Components Analysis (:py:class:`RCA `) `RCA` learns a full rank Mahalanobis distance metric based on a weighted sum of -in-class covariance matrices. It applies a global linear transformation to -assign large weights to relevant dimensions and low weights to irrelevant -dimensions. Those relevant dimensions are estimated using "chunklets", subsets +in-chunklets covariance matrices. It applies a global linear transformation to +assign large weights to relevant dimensions and low weights to irrelevant +dimensions. Those relevant dimensions are estimated using "chunklets", subsets of points that are known to belong to the same class. +For a training set with :math:`n` training points in :math:`k` chunklets, the +algorithm is efficient since it simply amounts to computing + +.. math:: + + \mathbf{C} = \frac{1}{n}\sum_{j=1}^k\sum_{i=1}^{n_j} + (\mathbf{x}_{ji}-\hat{\mathbf{m}}_j) + (\mathbf{x}_{ji}-\hat{\mathbf{m}}_j)^T + + +where chunklet :math:`j` consists of :math:`\{\mathbf{x}_{ji}\}_{i=1}^{n_j}` +with a mean :math:`\hat{m}_j`. The inverse of :math:`\mathbf{C}^{-1}` is used +as the Mahalanobis matrix. + .. topic:: Example Code: :: @@ -295,7 +460,6 @@ of points that are known to belong to the same class. rca = RCA() rca.fit(pairs, y) - .. topic:: References: .. [1] `Adjustment learning and relevant component analysis @@ -307,21 +471,34 @@ of points that are known to belong to the same class. .. [3]'Learning a Mahalanobis metric from equivalence constraints', JMLR 2005 +.. _mmc: + MMC --- -Mahalanobis Metric Learning with Application for Clustering with -Side-Information, Xing et al., NIPS 2002 - -`MMC` minimizes the sum of squared distances between similar examples, while -enforcing the sum of distances between dissimilar examples to be greater than a -certain margin. This leads to a convex and, thus, local-minima-free -optimization problem that can be solved efficiently. However, the algorithm -involves the computation of eigenvalues, which is the main speed-bottleneck. -Since it has initially been designed for clustering applications, one of the -implicit assumptions of MMC is that all classes form a compact set, i.e., -follow a unimodal distribution, which restricts the possible use-cases of this -method. However, it is one of the earliest and a still often cited technique. +Metric Learning with Application for Clustering with Side Information +(:py:class:`MMC `) + +`MMC` minimizes the sum of squared distances between similar points, while +enforcing the sum of distances between dissimilar ones to be greater than one. +This leads to a convex and, thus, local-minima-free optimization problem that +can be solved efficiently. +However, the algorithm involves the computation of eigenvalues, which is the +main speed-bottleneck. Since it has initially been designed for clustering +applications, one of the implicit assumptions of MMC is that all classes form +a compact set, i.e., follow a unimodal distribution, which restricts the +possible use-cases of this method. However, it is one of the earliest and a +still often cited technique. + +The algorithm aims at minimizing the sum of distances between all the similar +points, while constrains the sum of distances between dissimilar points: + +.. math:: + + \min_{\mathbf{M}\in\mathbb{S}_+^d}\sum_{(\mathbf{x}_i, + \mathbf{x}_j)\in S} d_{\mathbf{M}}(\mathbf{x}_i, \mathbf{x}_j) + \qquad \qquad \text{s.t.} \qquad \sum_{(\mathbf{x}_i, \mathbf{x}_j) + \in D} d^2_{\mathbf{M}}(\mathbf{x}_i, \mathbf{x}_j) \geq 1 .. topic:: Example Code: diff --git a/metric_learn/itml.py b/metric_learn/itml.py index 9b6dccb2..6cb34313 100644 --- a/metric_learn/itml.py +++ b/metric_learn/itml.py @@ -1,16 +1,17 @@ -""" -Information Theoretic Metric Learning, Kulis et al., ICML 2007 - -ITML minimizes the differential relative entropy between two multivariate -Gaussians under constraints on the distance function, -which can be formulated into a Bregman optimization problem by minimizing the -LogDet divergence subject to linear constraints. -This algorithm can handle a wide variety of constraints and can optionally -incorporate a prior on the distance function. -Unlike some other methods, ITML does not rely on an eigenvalue computation -or semi-definite programming. - -Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/itml/ +r""" +Information Theoretic Metric Learning(ITML) + +`ITML` minimizes the (differential) relative entropy, aka Kullback-Leibler +divergence, between two multivariate Gaussians subject to constraints on the +associated Mahalanobis distance, which can be formulated into a Bregman +optimization problem by minimizing the LogDet divergence subject to +linear constraints. This algorithm can handle a wide variety of constraints +and can optionally incorporate a prior on the distance function. Unlike some +other methods, `ITML` does not rely on an eigenvalue computation or +semi-definite programming. + +Read more in the :ref:`User Guide `. + """ from __future__ import print_function, absolute_import diff --git a/metric_learn/lfda.py b/metric_learn/lfda.py index 2feff211..2ca085d4 100644 --- a/metric_learn/lfda.py +++ b/metric_learn/lfda.py @@ -1,14 +1,13 @@ -""" -Local Fisher Discriminant Analysis (LFDA) +r""" +Local Fisher Discriminant Analysis(LFDA) + +LFDA is a linear supervised dimensionality reduction method. It is +particularly useful when dealing with multimodality, where one ore more classes +consist of separate clusters in input space. The core optimization problem of +LFDA is solved as a generalized eigenvalue problem. -Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction -Sugiyama, ICML 2006 +Read more in the :ref:`User Guide `. -LFDA is a linear supervised dimensionality reduction method. -It is particularly useful when dealing with multimodality, -where one ore more classes consist of separate clusters in input space. -The core optimization problem of LFDA is solved as a generalized -eigenvalue problem. """ from __future__ import division, absolute_import import numpy as np diff --git a/metric_learn/lmnn.py b/metric_learn/lmnn.py index f9cd0e91..9e606c56 100644 --- a/metric_learn/lmnn.py +++ b/metric_learn/lmnn.py @@ -1,11 +1,14 @@ -""" -Large-margin nearest neighbor metric learning. (Weinberger 2005) +r""" +Large Margin Nearest Neighbor Metric learning(LMNN) + +LMNN learns a Mahalanobis distance metric in the kNN classification +setting. The learned metric attempts to keep close k-nearest neighbors +from the same class, while keeping examples from different classes +separated by a large margin. This algorithm makes no assumptions about +the distribution of the data. + +Read more in the :ref:`User Guide `. -LMNN learns a Mahanalobis distance metric in the kNN classification setting -using semidefinite programming. -The learned metric attempts to keep k-nearest neighbors in the same class, -while keeping examples from different classes separated by a large margin. -This algorithm makes no assumptions about the distribution of the data. """ #TODO: periodic recalculation of impostors, PCA initialization diff --git a/metric_learn/lsml.py b/metric_learn/lsml.py index 536719ba..1d66cbc0 100644 --- a/metric_learn/lsml.py +++ b/metric_learn/lsml.py @@ -1,10 +1,17 @@ -""" -Liu et al. -"Metric Learning from Relative Comparisons by Minimizing Squared Residual". -ICDM 2012. +r""" +Metric Learning from Relative Comparisons by Minimizing Squared Residual(LSML) + +`LSML` proposes a simple, yet effective, algorithm that minimizes a convex +objective function corresponding to the sum of squared residuals of +constraints. This algorithm uses the constraints in the form of the +relative distance comparisons, such method is especially useful where +pairwise constraints are not natural to obtain, thus pairwise constraints +based algorithms become infeasible to be deployed. Furthermore, its sparsity +extension leads to more stable estimation when the dimension is high and +only a small amount of constraints is given. + +Read more in the :ref:`User Guide `. -Adapted from https://gist.github.com/kcarnold/5439917 -Paper: http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf """ from __future__ import print_function, absolute_import, division diff --git a/metric_learn/mlkr.py b/metric_learn/mlkr.py index 74a21a82..927c64e3 100644 --- a/metric_learn/mlkr.py +++ b/metric_learn/mlkr.py @@ -1,10 +1,13 @@ -""" -Metric Learning for Kernel Regression (MLKR), Weinberger et al., +r""" +Metric Learning for Kernel Regression(MLKR) + +MLKR is an algorithm for supervised metric learning, which learns a +distance function by directly minimizing the leave-one-out regression error. +This algorithm can also be viewed as a supervised variation of PCA and can be +used for dimensionality reduction and high dimensional data visualization. + +Read more in the :ref:`User Guide `. -MLKR is an algorithm for supervised metric learning, which learns a distance -function by directly minimising the leave-one-out regression error. This -algorithm can also be viewed as a supervised variation of PCA and can be used -for dimensionality reduction and high dimensional data visualization. """ from __future__ import division, print_function import time diff --git a/metric_learn/mmc.py b/metric_learn/mmc.py index 346db2f8..eb7dc529 100644 --- a/metric_learn/mmc.py +++ b/metric_learn/mmc.py @@ -1,19 +1,19 @@ -""" -Mahalanobis Metric Learning with Application for Clustering with Side-Information, Xing et al., NIPS 2002 +r""" +Metric Learning with Application for Clustering with Side Information(MMC) -MMC minimizes the sum of squared distances between similar examples, -while enforcing the sum of distances between dissimilar examples to be -greater than a certain margin. -This leads to a convex and, thus, local-minima-free optimization problem -that can be solved efficiently. +MMC minimizes the sum of squared distances between similar points, while +enforcing the sum of distances between dissimilar ones to be greater than one. +This leads to a convex and, thus, local-minima-free optimization problem that +can be solved efficiently. However, the algorithm involves the computation of eigenvalues, which is the -main speed-bottleneck. -Since it has initially been designed for clustering applications, one of the -implicit assumptions of MMC is that all classes form a compact set, i.e., -follow a unimodal distribution, which restricts the possible use-cases of -this method. However, it is one of the earliest and a still often cited technique. +main speed-bottleneck. Since it has initially been designed for clustering +applications, one of the implicit assumptions of MMC is that all classes form +a compact set, i.e., follow a unimodal distribution, which restricts the +possible use-cases of this method. However, it is one of the earliest and a +still often cited technique. + +Read more in the :ref:`User Guide `. -Adapted from Matlab code at http://www.cs.cmu.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz """ from __future__ import print_function, absolute_import, division diff --git a/metric_learn/nca.py b/metric_learn/nca.py index 5abe52e3..7139f0ff 100644 --- a/metric_learn/nca.py +++ b/metric_learn/nca.py @@ -1,6 +1,15 @@ -""" -Neighborhood Components Analysis (NCA) -Ported to Python from https://github.com/vomjom/nca +r""" +Neighborhood Components Analysis(NCA) + +NCA is a distance metric learning algorithm which aims to improve the +accuracy of nearest neighbors classification compared to the standard +Euclidean distance. The algorithm directly maximizes a stochastic variant +of the leave-one-out k-nearest neighbors(KNN) score on the training set. +It can also learn a low-dimensional linear transformation of data that can +be used for data visualization and fast classification. + +Read more in the :ref:`User Guide `. + """ from __future__ import absolute_import diff --git a/metric_learn/rca.py b/metric_learn/rca.py index c9fedd59..88538e8b 100644 --- a/metric_learn/rca.py +++ b/metric_learn/rca.py @@ -1,14 +1,14 @@ -"""Relative Components Analysis (RCA) +r""" +Relative Components Analysis(RCA) -RCA learns a full rank Mahalanobis distance metric based on a -weighted sum of in-class covariance matrices. -It applies a global linear transformation to assign large weights to -relevant dimensions and low weights to irrelevant dimensions. -Those relevant dimensions are estimated using "chunklets", -subsets of points that are known to belong to the same class. +RCA learns a full rank Mahalanobis distance metric based on a weighted sum of +in-chunklets covariance matrices. It applies a global linear transformation to +assign large weights to relevant dimensions and low weights to irrelevant +dimensions. Those relevant dimensions are estimated using "chunklets", subsets +of points that are known to belong to the same class. + +Read more in the :ref:`User Guide `. -'Learning distance functions using equivalence relations', ICML 2003 -'Learning a Mahalanobis metric from equivalence constraints', JMLR 2005 """ from __future__ import absolute_import diff --git a/metric_learn/sdml.py b/metric_learn/sdml.py index e9828d07..b300b9ac 100644 --- a/metric_learn/sdml.py +++ b/metric_learn/sdml.py @@ -1,11 +1,15 @@ -""" -Qi et al. -An efficient sparse metric learning in high-dimensional space via -L1-penalized log-determinant regularization. -ICML 2009 +r""" +Sparse High-Dimensional Metric Learning(SDML) + +SDML is an efficient sparse metric learning in high-dimensional space via +double regularization: an L1-penalization on the off-diagonal elements of the +Mahalanobis matrix :math:`\mathbf{M}`, and a log-determinant divergence between +:math:`\mathbf{M}` and :math:`\mathbf{M_0}` (set as either :math:`\mathbf{I}` +or :math:`\mathbf{\Omega}^{-1}`, where :math:`\mathbf{\Omega}` is the +covariance matrix). + +Read more in the :ref:`User Guide `. -Adapted from https://gist.github.com/kcarnold/5439945 -Paper: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/icml09-guojun.pdf """ from __future__ import absolute_import