Skip to content

API: DataFrame.agg has no partial failure #40211

@rhshadrach

Description

@rhshadrach

Synthetic example:

def foo(x):
    if x.name == "a":
        raise ValueError
    return x

def bar(x):
    if x.name == "a":
        raise ValueError
    return x.sum()

df = pd.DataFrame(
    {'a': [1, 2, 3], 'b': [1, 2, 3]}
)
print(df.transform([foo]))
print(df.agg([bar]))

The transform call results in

    b
  foo
0   1
1   2
2   3

whereas the agg call fails outright due to the raise ValueError. Series/DataFrame apply just calls agg. I also tried but couldn't find an example similar to transform's behavior of partial failure using DataFrame.groupby with apply/agg/transform, but wasn't able to (the code paths are a bit complex here, so I've resorted to blackbox testing).

My thinking here is having transform fail outright in the example above is the good way to go. It would be simpler from a code perspective and avoids silent failure, although perhaps it would make a user drop nuisance columns. I'll also mention that one of the things I'd like to work toward is having e.g.

df.transform(['mean'])

have the same performance as

df.transform('mean')

for which having transform allow partial failure like this means there would need to be a fallback.

cc @jorisvandenbossche @jreback @jbrockmendel

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorApplyApply, Aggregate, Transform, MapNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions