Skip to content

Unconsistent behaviour of SDML when ussing skggm with fixed seed. #272

Closed
@grudloff

Description

@grudloff

Description

I was using SDML_Supervised() for a subsequent 2D visualization with UMAP (Similar to t-sne) and got large differences in the results on every fit instance while using the same data. Fixing the seed doesn't make a difference. I tracked down the problem to the call of quic() done when skggm is installed, reviewing their code I found there is a fixed seed but anyway the results from that function vary in every call.

note: I am using the latest version from Skggm, will try to reproduce later with the version indicated in the documentation.

Steps/Code to Reproduce

from metric_learn import SDML_Supervised
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

wine=load_wine()
X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

SDML=SDML_Supervised(random_state=42)
X_transform=SDML.fit_transform(X_train,y_train)
print(np.sum(np.abs(X_transform - SDML.fit_transform(X_train,y_train))))

Expected Results

The two instances of SDML fit should have the same result, then the printed difference should be zero.

Actual Results

Large numbers in the order of 100 to 300.

Versions

Linux-5.0.0-37-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Metric-Learn 0.5.0
Skggm 0.2.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions