-
Notifications
You must be signed in to change notification settings - Fork 229
[MRG] Remove preprocessing the data for RCA #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
e21e856
3882b51
b6c216e
32b86c0
d6cdb8f
2497366
ea68d0d
a78767e
27985ec
ddee294
8021089
a05782f
6d6a38a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,20 +44,18 @@ class RCA(MahalanobisMixin, TransformerMixin): | |
The learned linear transformation ``L``. | ||
""" | ||
|
||
def __init__(self, num_dims=None, pca_comps=None, preprocessor=None): | ||
def __init__(self, num_dims=None, pca_comps='deprecated', preprocessor=None): | ||
"""Initialize the learner. | ||
|
||
Parameters | ||
---------- | ||
num_dims : int, optional | ||
embedding dimension (default: original dimension of data) | ||
|
||
pca_comps : int, float, None or string | ||
Number of components to keep during PCA preprocessing. | ||
If None (default), does not perform PCA. | ||
If ``0 < pca_comps < 1``, it is used as | ||
the minimum explained variance ratio. | ||
See sklearn.decomposition.PCA for more details. | ||
pca_comps : Not used | ||
.. deprecated:: 0.5.0 | ||
`pca_comps` was deprecated in version 0.5.0 and will | ||
be removed in 0.6.0. | ||
|
||
preprocessor : array-like, shape=(n_samples, n_features) or callable | ||
The preprocessor to call to get tuples from indices. If array-like, | ||
|
@@ -98,26 +96,24 @@ def fit(self, X, chunks): | |
When ``chunks[i] == -1``, point i doesn't belong to any chunklet. | ||
When ``chunks[i] == j``, point i belongs to chunklet j. | ||
""" | ||
if self.pca_comps != 'deprecated': | ||
warnings.warn('"pca_comps" parameter is not used.' | ||
' It has been deprecated in version 0.5.0 and will be' | ||
'removed in 0.6.0', DeprecationWarning) | ||
|
||
X = self._prepare_inputs(X, ensure_min_samples=2) | ||
|
||
# PCA projection to remove noise and redundant information. | ||
if self.pca_comps is not None: | ||
pca = decomposition.PCA(n_components=self.pca_comps) | ||
X_t = pca.fit_transform(X) | ||
M_pca = pca.components_ | ||
else: | ||
X_t = X - X.mean(axis=0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this centering step gone? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess because we should remove any pre-processing step, but I agree I didn't talk about it at all, maybe we should keep the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough (I double-checked and this centering is not part of standard RCA) Finally, have you checked the influence of removing the centering step on the examples? |
||
M_pca = None | ||
|
||
chunks = np.asanyarray(chunks, dtype=int) | ||
chunk_mask, chunked_data = _chunk_mean_centering(X_t, chunks) | ||
chunk_mask, chunked_data = _chunk_mean_centering(X, chunks) | ||
|
||
inner_cov = np.atleast_2d(np.cov(chunked_data, rowvar=0, bias=1)) | ||
dim = self._check_dimension(np.linalg.matrix_rank(inner_cov), X_t) | ||
dim = self._check_dimension(np.linalg.matrix_rank(inner_cov), X) | ||
|
||
# Fisher Linear Discriminant projection | ||
if dim < X_t.shape[1]: | ||
total_cov = np.cov(X_t[chunk_mask], rowvar=0) | ||
if dim < X.shape[1]: | ||
total_cov = np.cov(X[chunk_mask], rowvar=0) | ||
tmp = np.linalg.lstsq(total_cov, inner_cov)[0] | ||
vals, vecs = np.linalg.eig(tmp) | ||
inds = np.argsort(vals)[:dim] | ||
|
@@ -127,9 +123,6 @@ def fit(self, X, chunks): | |
else: | ||
self.transformer_ = _inv_sqrtm(inner_cov).T | ||
|
||
if M_pca is not None: | ||
self.transformer_ = np.atleast_2d(self.transformer_.dot(M_pca)) | ||
|
||
return self | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this code was giving a PCA initialization at the same time, so for now we'll remove it, but I think I'll do the PR about initialization before merging this PR into master, and then we can merge it into this PR to keep the same possibility of initialization with PCA