Skip to content

Bug: DataFrame.clip fixed for lower > upper #52365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

m-ganko
Copy link
Contributor

@m-ganko m-ganko commented Apr 2, 2023

Fixed behavior of DataFrame.clip for lower > upper arguments. Changed to be consistent with numpy - all equal to upper. Main reason of bug was undocumented swap of arguments for scalars and for list-like arguments comparison with initial DataFrame for both bounds.

m-ganko added 4 commits April 2, 2023 00:09
* Before when lower > upper arguments were swapped, changed to be consistent
with numpy, all equal to upper
* test added
* DataFrame.clip fixed for lower > upper with list-like arguments
@m-ganko
Copy link
Contributor Author

m-ganko commented Apr 2, 2023

Looks like some tests are failing after merging in main branch. I will look at this.

@mroeschke
Copy link
Member

I personally think raising is the most informative behavior, but open to thoughts from other core team members

@m-ganko
Copy link
Contributor Author

m-ganko commented Apr 3, 2023

It is worth mentioning that Series.clip also works in the same way as numpy.clip. I think it is rather logical result, in first step we are clipping lower numbers, in second step upper numbers. We end up with all numbers equal to upper.

Another similar example from pandas api, pandas.date_range - when we pass start > end there is no exception, but empty DatetimeIndex.

@mroeschke
Copy link
Member

mroeschke commented Apr 3, 2023

Another similar example from pandas api, pandas.date_range - when we pass start > end there is no exception, but empty DatetimeIndex.

pd.date_range makes sense to be more flexible in this sense because the freq can be negative allowing start > end to produce a non-empty date range

@m-ganko
Copy link
Contributor Author

m-ganko commented Apr 12, 2023

Ok, so I guess we also need to change Series.clip method to raise exception.

I am not sure which exception it should raise, some existing pandas exception? I guess ValueError might be appropriate.

@m-ganko
Copy link
Contributor Author

m-ganko commented Apr 13, 2023

I have started implementation and I think there is too much to make validation for scalars, lists and DataFrame arguments.

So if you insist on doing this, I can close this PR and unassign from issue.

@mroeschke
Copy link
Member

I would suggest closing this PR now until there is some consensus on the main issue on how to handle this scenario

@mroeschke
Copy link
Member

Thanks for the PR, but I think this scenario needs a little more discussion before moving forward so closing

@mroeschke mroeschke closed this Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.clip does not handle inverted lower/upper bounds consistently
2 participants