Skip to content

[MRG] Improve docstrings: add them for Constraints class and methods and fix minor problems #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 59 additions & 59 deletions metric_learn/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,45 +448,45 @@ def _initialize_components(n_components, input, y=None, init='auto',
The input labels (or not if there are no labels).

init : string or numpy array, optional (default='auto')
Initialization of the linear transformation. Possible options are
'auto', 'pca', 'lda', 'identity', 'random', and a numpy array of shape
(n_features_a, n_features_b).

'auto'
Depending on ``n_components``, the most reasonable initialization
will be chosen. If ``n_components <= n_classes`` we use 'lda' (see
the description of 'lda' init), as it uses labels information. If
not, but ``n_components < min(n_features, n_samples)``, we use 'pca',
as it projects data onto meaningful directions (those of higher
variance). Otherwise, we just use 'identity'.

'pca'
``n_components`` principal components of the inputs passed
to :meth:`fit` will be used to initialize the transformation.
(See `sklearn.decomposition.PCA`)

'lda'
``min(n_components, n_classes)`` most discriminative
components of the inputs passed to :meth:`fit` will be used to
initialize the transformation. (If ``n_components > n_classes``,
the rest of the components will be zero.) (See
`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`).
This initialization is possible only if `has_classes == True`.

'identity'
The identity matrix. If ``n_components`` is strictly smaller than the
dimensionality of the inputs passed to :meth:`fit`, the identity
matrix will be truncated to the first ``n_components`` rows.

'random'
The initial transformation will be a random array of shape
`(n_components, n_features)`. Each value is sampled from the
standard normal distribution.

numpy array
n_features_b must match the dimensionality of the inputs passed to
:meth:`fit` and n_features_a must be less than or equal to that.
If ``n_components`` is not None, n_features_a must match it.
Initialization of the linear transformation. Possible options are
'auto', 'pca', 'lda', 'identity', 'random', and a numpy array of shape
(n_features_a, n_features_b).

'auto'
Depending on ``n_components``, the most reasonable initialization
will be chosen. If ``n_components <= n_classes`` we use 'lda' (see
the description of 'lda' init), as it uses labels information. If
not, but ``n_components < min(n_features, n_samples)``, we use 'pca',
as it projects data onto meaningful directions (those of higher
variance). Otherwise, we just use 'identity'.

'pca'
``n_components`` principal components of the inputs passed
to :meth:`fit` will be used to initialize the transformation.
(See `sklearn.decomposition.PCA`)

'lda'
``min(n_components, n_classes)`` most discriminative
components of the inputs passed to :meth:`fit` will be used to
initialize the transformation. (If ``n_components > n_classes``,
the rest of the components will be zero.) (See
`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`).
This initialization is possible only if `has_classes == True`.

'identity'
The identity matrix. If ``n_components`` is strictly smaller than the
dimensionality of the inputs passed to :meth:`fit`, the identity
matrix will be truncated to the first ``n_components`` rows.

'random'
The initial transformation will be a random array of shape
`(n_components, n_features)`. Each value is sampled from the
standard normal distribution.

numpy array
n_features_b must match the dimensionality of the inputs passed to
:meth:`fit` and n_features_a must be less than or equal to that.
If ``n_components`` is not None, n_features_a must match it.

verbose : bool
Whether to print the details of the initialization or not.
Expand Down Expand Up @@ -606,26 +606,26 @@ def _initialize_metric_mahalanobis(input, init='identity', random_state=None,
The input samples (can be tuples or regular samples).

init : string or numpy array, optional (default='identity')
Specification for the matrix to initialize. Possible options are
'identity', 'covariance', 'random', and a numpy array of shape
(n_features, n_features).

'identity'
An identity matrix of shape (n_features, n_features).

'covariance'
The (pseudo-)inverse covariance matrix (raises an error if the
covariance matrix is not definite and `strict_pd == True`)

'random'
A random positive definite (PD) matrix of shape
`(n_features, n_features)`, generated using
`sklearn.datasets.make_spd_matrix`.

numpy array
A PSD matrix (or strictly PD if strict_pd==True) of
shape (n_features, n_features), that will be used as such to
initialize the metric, or set the prior.
Specification for the matrix to initialize. Possible options are
'identity', 'covariance', 'random', and a numpy array of shape
(n_features, n_features).

'identity'
An identity matrix of shape (n_features, n_features).

'covariance'
The (pseudo-)inverse covariance matrix (raises an error if the
covariance matrix is not definite and `strict_pd == True`)

'random'
A random positive definite (PD) matrix of shape
`(n_features, n_features)`, generated using
`sklearn.datasets.make_spd_matrix`.

numpy array
A PSD matrix (or strictly PD if strict_pd==True) of
shape (n_features, n_features), that will be used as such to
initialize the metric, or set the prior.

random_state : int or `numpy.RandomState` or None, optional (default=None)
A pseudo random number generator object or a seed for it if int. If
Expand Down
12 changes: 6 additions & 6 deletions metric_learn/base_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,12 @@ def transform(self, X):
Parameters
----------
X : (n x d) matrix
Data to transform.
Data to transform.

Returns
-------
transformed : (n x d) matrix
Input data transformed to the metric space by :math:`XL^{\\top}`
Input data transformed to the metric space by :math:`XL^{\\top}`
"""


Expand All @@ -180,7 +180,7 @@ class MahalanobisMixin(six.with_metaclass(ABCMeta, BaseMetricLearner,
Attributes
----------
components_ : `numpy.ndarray`, shape=(n_components, n_features)
The learned linear transformation ``L``.
The learned linear transformation ``L``.
"""

def score_pairs(self, pairs):
Expand Down Expand Up @@ -313,9 +313,9 @@ class _PairsClassifierMixin(BaseMetricLearner):
Attributes
----------
threshold_ : `float`
If the distance metric between two points is lower than this threshold,
points will be classified as similar, otherwise they will be
classified as dissimilar.
If the distance metric between two points is lower than this threshold,
points will be classified as similar, otherwise they will be
classified as dissimilar.
"""

_tuple_size = 2 # number of points in a tuple, 2 for pairs
Expand Down
74 changes: 70 additions & 4 deletions metric_learn/constraints.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,60 @@

class Constraints(object):
"""
Class to build constraints from labels.
Class to build constraints from labeled data.

See more in the :ref:`User Guide <supervised_version>`
See more in the :ref:`User Guide <supervised_version>`.

Parameters
----------
partial_labels : `numpy.ndarray` of ints, shape=(n_samples,)
Array of labels, with -1 indicating unknown label.

Attributes
----------
partial_labels : `numpy.ndarray` of ints, shape=(n_samples,)
Array of labels, with -1 indicating unknown label.
"""

def __init__(self, partial_labels):
'''partial_labels : int arraylike, -1 indicating unknown label'''
partial_labels = np.asanyarray(partial_labels, dtype=int)
self.partial_labels = partial_labels

def positive_negative_pairs(self, num_constraints, same_length=False,
random_state=None):
"""
Generates positive pairs and negative pairs from labeled data.

Positive pairs are formed by randomly drawing ``num_constraints`` pairs of
points with the same label. Negative pairs are formed by randomly drawing
``num_constraints`` pairs of points with different label.

In the case where it is not possible to generate enough positive or
negative pairs, a smaller number of pairs will be returned with a warning.

Parameters
----------
num_constraints : int
Number of positive and negative constraints to generate.
same_length : bool, optional (default=False)
If True, forces the number of positive and negative pairs to be
equal by ignoring some pairs from the larger set.
random_state : int or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int.
Returns
-------
a : array-like, shape=(n_constraints,)
1D array of indicators for the left elements of positive pairs.

b : array-like, shape=(n_constraints,)
1D array of indicators for the right elements of positive pairs.

c : array-like, shape=(n_constraints,)
1D array of indicators for the left elements of negative pairs.

d : array-like, shape=(n_constraints,)
1D array of indicators for the right elements of negative pairs.
"""
random_state = check_random_state(random_state)
a, b = self._pairs(num_constraints, same_label=True,
random_state=random_state)
Expand Down Expand Up @@ -60,7 +103,30 @@ def _pairs(self, num_constraints, same_label=True, max_iter=10,

def chunks(self, num_chunks=100, chunk_size=2, random_state=None):
"""
the random state object to be passed must be a numpy random seed
Generates chunks from labeled data.

Each of ``num_chunks`` chunks is composed of ``chunk_size`` points from
the same class drawn at random. Each point can belong to at most 1 chunk.

In the case where there is not enough points to generate ``num_chunks``
chunks of size ``chunk_size``, a ValueError will be raised.

Parameters
----------
num_chunks : int, optional (default=100)
Number of chunks to generate.

chunk_size : int, optional (default=2)
Number of points in each chunk.

random_state : int or numpy.RandomState or None, optional (default=None)
A pseudo random number generator object or a seed for it if int.

Returns
-------
chunks : array-like, shape=(n_samples,)
1D array of chunk indicators, where -1 indicates that the point does not
belong to any chunk.
"""
random_state = check_random_state(random_state)
chunks = -np.ones_like(self.partial_labels, dtype=int)
Expand Down
Loading