Skip to content

Raising SettingWithCopyWarning in cross_validate #16191

Closed
@glemaitre

Description

@glemaitre

Before submitting a bug, please make sure the issue hasn't been already
addressed by searching through the past issues.

Describe the bug

Using a FunctionTransformer which add a column to a dataframe will raise a SettingWithCopyWarning when used in cross_validate`.

This is highly linked to #8723 but I think that in this case, we should do something about it since:

  • FunctionTransformer is stateless transformer
  • such pipeline could be a valid pipeline

Steps/Code to Reproduce

from sklearn.datasets import fetch_openml

X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)

def processing(X):
    # trigger a dummy processing which should be ok
    X.loc[:, 'new_age'] = X["age"].fillna(0).astype(int)
    return X

from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

func_trans = FunctionTransformer(processing)
encoder = make_column_transformer(
    (make_pipeline(
        SimpleImputer(strategy="constant", fill_value="missing"),
        OneHotEncoder(handle_unknown='ignore')
    ), ["pclass", "sex", "embarked"]),
    (make_pipeline(SimpleImputer(), StandardScaler()), ["age", "new_age"])
)
pipeline = make_pipeline(
    func_trans,
    encoder,
    LogisticRegression()
)

print("Fit the pipeline on the entire data")
pipeline.fit(X, y)

from sklearn.model_selection import cross_val_score

print("Apply cross-validation")
cross_val_score(pipeline, X, y)

Expected Results

Getting some SettingWithCopyWarning

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions