Closed
Description
Before submitting a bug, please make sure the issue hasn't been already
addressed by searching through the past issues.
Describe the bug
Using a FunctionTransformer
which add a column to a dataframe will raise a SettingWithCopyWarning
when used in cross_validate`.
This is highly linked to #8723 but I think that in this case, we should do something about it since:
FunctionTransformer
is stateless transformer- such pipeline could be a valid pipeline
Steps/Code to Reproduce
from sklearn.datasets import fetch_openml
X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)
def processing(X):
# trigger a dummy processing which should be ok
X.loc[:, 'new_age'] = X["age"].fillna(0).astype(int)
return X
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
func_trans = FunctionTransformer(processing)
encoder = make_column_transformer(
(make_pipeline(
SimpleImputer(strategy="constant", fill_value="missing"),
OneHotEncoder(handle_unknown='ignore')
), ["pclass", "sex", "embarked"]),
(make_pipeline(SimpleImputer(), StandardScaler()), ["age", "new_age"])
)
pipeline = make_pipeline(
func_trans,
encoder,
LogisticRegression()
)
print("Fit the pipeline on the entire data")
pipeline.fit(X, y)
from sklearn.model_selection import cross_val_score
print("Apply cross-validation")
cross_val_score(pipeline, X, y)
Expected Results
Getting some SettingWithCopyWarning