Description
Synthetic example:
def foo(x):
if x.name == "a":
raise ValueError
return x
def bar(x):
if x.name == "a":
raise ValueError
return x.sum()
df = pd.DataFrame(
{'a': [1, 2, 3], 'b': [1, 2, 3]}
)
print(df.transform([foo]))
print(df.agg([bar]))
The transform call results in
b
foo
0 1
1 2
2 3
whereas the agg call fails outright due to the raise ValueError
. Series/DataFrame apply
just calls agg
. I also tried but couldn't find an example similar to transform's behavior of partial failure using DataFrame.groupby
with apply/agg/transform, but wasn't able to (the code paths are a bit complex here, so I've resorted to blackbox testing).
My thinking here is having transform
fail outright in the example above is the good way to go. It would be simpler from a code perspective and avoids silent failure, although perhaps it would make a user drop nuisance columns. I'll also mention that one of the things I'd like to work toward is having e.g.
df.transform(['mean'])
have the same performance as
df.transform('mean')
for which having transform allow partial failure like this means there would need to be a fallback.