Skip to content

make Series.apply() return Series or DataFrame based on callable #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 30, 2022

Conversation

Dr-Irv
Copy link
Collaborator

@Dr-Irv Dr-Irv commented Sep 29, 2022

There are probably other cases we should catch in testing, but this is a good change, IMHO.

@Dr-Irv Dr-Irv requested a review from twoertwein September 29, 2022 22:06
self, func: Callable, convertDType: _bool = ..., args: tuple = ..., **kwds
) -> Series | DataFrame: ...
self,
func: Callable[..., Scalar],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for these overloads it would be fine for the first func to return Hashable, as Series is not hashable. But I'm not sure whether that is true: does a function that returns a tuple (is hashable) return a Series or a DataFrame - what about a function returning a list (not hashable and not in Scalar). I don't mind not covering list if that is too messy (or having a fallback overload where func returns object).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import pandas as pd

s = pd.Series([-10, 2, 2, 3.4, 10, 10])


def retlist(x) -> list:
    return [x, 2 * x]


def rettuple(x) -> tuple:
    return (x, 2 * x)


r1 = s.apply(retlist)
print(r1)
print(type(r1))

r2 = s.apply(rettuple)
print(r2)
print(type(r2))

produces

0    [-10.0, -20.0]
1        [2.0, 4.0]
2        [2.0, 4.0]
3        [3.4, 6.8]
4      [10.0, 20.0]
5      [10.0, 20.0]
dtype: object
<class 'pandas.core.series.Series'>
0    (-10.0, -20.0)
1        (2.0, 4.0)
2        (2.0, 4.0)
3        (3.4, 6.8)
4      (10.0, 20.0)
5      (10.0, 20.0)
dtype: object
<class 'pandas.core.series.Series'>

I'm guessing you could return any object, except a Series and it would still return a Series. Problem here is that there really isn't a way to say "match everything except Series".

I changed it to use Hashable in the next commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rarely use apply: how are np.ndarray and pd.Index handled? If that is a common usecase, it might be nice to add them to the return values (same for list: Hashable | list).

Would also be happy to wait and add those non-Hashable types that seem popular (are reported in issues).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, you are using apply to call a function on each item of a Series, so you return a single object that would be a member of the new Series.

Since you typically don't store any object in a Series, with the values being floats, ints, strs, etc., putting in the Scalar as values in a Series makes sense. Hashable expands that, but it is probably only extending it to rarely used objects that would be stored in the Series.

@twoertwein twoertwein merged commit 903d6d7 into pandas-dev:main Sep 30, 2022
@twoertwein
Copy link
Member

Thanks @Dr-Irv !

@Dr-Irv Dr-Irv deleted the apply branch December 28, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use better typing for Series.apply based on return type of the callable determining return type of Series.apply()
2 participants