Description
Description
I was using SDML_Supervised() for a subsequent 2D visualization with UMAP (Similar to t-sne) and got large differences in the results on every fit instance while using the same data. Fixing the seed doesn't make a difference. I tracked down the problem to the call of quic() done when skggm is installed, reviewing their code I found there is a fixed seed but anyway the results from that function vary in every call.
note: I am using the latest version from Skggm, will try to reproduce later with the version indicated in the documentation.
Steps/Code to Reproduce
from metric_learn import SDML_Supervised
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np
wine=load_wine()
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
SDML=SDML_Supervised(random_state=42)
X_transform=SDML.fit_transform(X_train,y_train)
print(np.sum(np.abs(X_transform - SDML.fit_transform(X_train,y_train))))
Expected Results
The two instances of SDML fit should have the same result, then the printed difference should be zero.
Actual Results
Large numbers in the order of 100 to 300.
Versions
Linux-5.0.0-37-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
NumPy 1.18.1
SciPy 1.4.1
Scikit-Learn 0.22.1
Metric-Learn 0.5.0
Skggm 0.2.8