-
-
Notifications
You must be signed in to change notification settings - Fork 143
make Series.apply() return Series or DataFrame based on callable #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas-stubs/core/series.pyi
Outdated
self, func: Callable, convertDType: _bool = ..., args: tuple = ..., **kwds | ||
) -> Series | DataFrame: ... | ||
self, | ||
func: Callable[..., Scalar], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for these overloads it would be fine for the first func
to return Hashable
, as Series
is not hashable. But I'm not sure whether that is true: does a function that returns a tuple
(is hashable) return a Series
or a DataFrame
- what about a function returning a list
(not hashable and not in Scalar
). I don't mind not covering list
if that is too messy (or having a fallback overload where func
returns object
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pandas as pd
s = pd.Series([-10, 2, 2, 3.4, 10, 10])
def retlist(x) -> list:
return [x, 2 * x]
def rettuple(x) -> tuple:
return (x, 2 * x)
r1 = s.apply(retlist)
print(r1)
print(type(r1))
r2 = s.apply(rettuple)
print(r2)
print(type(r2))
produces
0 [-10.0, -20.0]
1 [2.0, 4.0]
2 [2.0, 4.0]
3 [3.4, 6.8]
4 [10.0, 20.0]
5 [10.0, 20.0]
dtype: object
<class 'pandas.core.series.Series'>
0 (-10.0, -20.0)
1 (2.0, 4.0)
2 (2.0, 4.0)
3 (3.4, 6.8)
4 (10.0, 20.0)
5 (10.0, 20.0)
dtype: object
<class 'pandas.core.series.Series'>
I'm guessing you could return any object, except a Series
and it would still return a Series
. Problem here is that there really isn't a way to say "match everything except Series
".
I changed it to use Hashable
in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rarely use apply: how are np.ndarray and pd.Index handled? If that is a common usecase, it might be nice to add them to the return values (same for list: Hashable | list
).
Would also be happy to wait and add those non-Hashable types that seem popular (are reported in issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, you are using apply
to call a function on each item of a Series
, so you return a single object that would be a member of the new Series
.
Since you typically don't store any object in a Series
, with the values being floats, ints, strs, etc., putting in the Scalar
as values in a Series
makes sense. Hashable
expands that, but it is probably only extending it to rarely used objects that would be stored in the Series
.
Thanks @Dr-Irv ! |
test_series_apply()
to useassert_type()
and added tests corresponding to the issueThere are probably other cases we should catch in testing, but this is a good change, IMHO.