Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently (2.2.0), there is a relatively cryptic error message for groupby on columns that contain object values which are not sortable.
The default argument value for sort
in groupby is True
, which understandably can't be honored for columns with values that just can't be sorted (e.g. of different types).
Feature Description
currently:
import pandas as pd
df = pd.DataFrame({'a': [False, 'string', True], 'b': [1, 2, 3]})
df.groupby('a').describe() # works
df = pd.DataFrame({'a': [False, (1, 2,), True], 'b': [1, 2, 3]})
df.groupby('a').describe() # fails!
# TypeError: '<' not supported between instances of 'tuple' and 'bool'
It would be nice if Pandas would just fall back to not sorting the output. As illustrated by the case with the 'string'
value, Pandas is lenient and sorts booleans and strings without complaining, which is not technically correct, and it would be convenient if this behavior is extended to other types.
Alternative Solutions
I realize that the suggested change may be a conflicting one as the argument sort=True
clearly requests a sorted output. For applications that would rely on this output being indeed sorted, it might be better to be more strict and keep having an error. In this case, however, it would be good to get a hint that sort=False
would solve the issue. But then again, there's the issue that it currently is supported between non-sortable values (string vs bool), and for consistency, I would suggest the sort
argument be ignored.
Additional Context
No response