-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix calling groupBy(...).apply(func) on an empty dataframe invokes func #48579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
432ffba
fix to maintain consistency for apply UDF on empty inputs
ntachukwu 519fa10
use DataFrame instead of pd.DataFrame for test
ntachukwu f734276
change apply function to only handle TypeError for when df is empty a…
ntachukwu 7285e60
Merge branch 'main' into apply_empty_input
ntachukwu 1ad10e1
improve test for udfs on empty inputs
ntachukwu 2964658
fix typo
ntachukwu 98b3030
remove unrelated test
ntachukwu 405e28a
Merge branch 'main' into apply_empty_input
ntachukwu 6003dc3
Merge branch 'main' into apply_empty_input
ntachukwu 60941f4
change test for empty df
ntachukwu 862a58f
Merge branch 'apply_empty_input' of github.com:Th3nn3ss/pandas into a…
ntachukwu ee08f0f
fix test for udf on empty df
ntachukwu 348b14a
add to whatsnew documentation
ntachukwu 336c6c7
Merge branch 'main' into apply_empty_input
ntachukwu 6972fd4
fix whatsnew v1.5.1 documentation
ntachukwu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the
try/except IndexError
only be fordata.iloc[:0]
? A user'sf
could throw anIndexError
which we don't necessarily want to capture here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test that could demonstrate such an instance? I am having a little bit of an issue coming up with one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a UDF like:
Should still raise the
IndexError
and got be caught by thistry/except
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In version
1.3.5
before this PR #44092 was merged. The IndexError will not be raised (I just tested it out).I believe the idea was that, regardless of the UD function, if the groupby has no groups then an empty dataframe is always returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Nonetheless, if we know
data
is empty and.iloc[:0]
will raise anIndexError
, it might be more explicit to useI just want to avoid any unintended side effects of
except IndexError
catching the exception fromf
and notdata.iloc[:0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome idea. Will try it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
not data.empty
fails the tests for #44092.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one in particular?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe
data.iloc[:0]
should ever raise on any frame; is there an example where it does?I do think we need to decide a general pattern of behavior for pandas methods that take a UDF and operate on an empty frame. I'm planning on putting a proposal together but it will be some time (maybe a week or two). In the mean time, it seems to me catching some error types like IndexError but not others might be confusing for users. I believe this code was added in #44092 (cc @jbrockmendel) to handle cases where
apply
was called with a pandas method (e.g. "skew"). Can we instead add this block to just those particular methods?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.