Skip to content

Commit 92b546b

Browse files
rhshadrachcbpygit
authored andcommitted
REGR: groupby with Decimal and NA values (pandas-dev#56522)
1 parent 6abb38b commit 92b546b

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed

doc/source/whatsnew/v2.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -659,11 +659,11 @@ Groupby/resample/rolling
659659
- Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_count` would sort by proportions rather than frequencies when ``sort=True`` and ``normalize=True`` (:issue:`55951`)
660660
- Bug in :meth:`DataFrame.asfreq` and :meth:`Series.asfreq` with a :class:`DatetimeIndex` with non-nanosecond resolution incorrectly converting to nanosecond resolution (:issue:`55958`)
661661
- Bug in :meth:`DataFrame.ewm` when passed ``times`` with non-nanosecond ``datetime64`` or :class:`DatetimeTZDtype` dtype (:issue:`56262`)
662+
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` where grouping by a combination of ``Decimal`` and NA values would fail when ``sort=True`` (:issue:`54847`)
662663
- Bug in :meth:`DataFrame.resample` not respecting ``closed`` and ``label`` arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`)
663664
- Bug in :meth:`DataFrame.resample` when resampling on a :class:`ArrowDtype` of ``pyarrow.timestamp`` or ``pyarrow.duration`` type (:issue:`55989`)
664665
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
665666
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.MonthBegin` (:issue:`55271`)
666-
-
667667

668668
Reshaping
669669
^^^^^^^^^

pandas/core/algorithms.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
"""
55
from __future__ import annotations
66

7+
import decimal
78
import operator
89
from textwrap import dedent
910
from typing import (
@@ -1514,7 +1515,7 @@ def safe_sort(
15141515
try:
15151516
sorter = values.argsort()
15161517
ordered = values.take(sorter)
1517-
except TypeError:
1518+
except (TypeError, decimal.InvalidOperation):
15181519
# Previous sorters failed or were not applicable, try `_sort_mixed`
15191520
# which would work, but which fails for special case of 1d arrays
15201521
# with tuples.

pandas/tests/groupby/test_groupby.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from datetime import datetime
2+
import decimal
23
from decimal import Decimal
34
import re
45

@@ -3313,3 +3314,23 @@ def test_depr_grouper_attrs(attr):
33133314
msg = f"{attr} is deprecated"
33143315
with tm.assert_produces_warning(FutureWarning, match=msg):
33153316
getattr(gb.grouper, attr)
3317+
3318+
3319+
@pytest.mark.parametrize("test_series", [True, False])
3320+
def test_decimal_na_sort(test_series):
3321+
# GH#54847
3322+
# We catch both TypeError and decimal.InvalidOperation exceptions in safe_sort.
3323+
# If this next assert raises, we can just catch TypeError
3324+
assert not isinstance(decimal.InvalidOperation, TypeError)
3325+
df = DataFrame(
3326+
{
3327+
"key": [Decimal(1), Decimal(1), None, None],
3328+
"value": [Decimal(2), Decimal(3), Decimal(4), Decimal(5)],
3329+
}
3330+
)
3331+
gb = df.groupby("key", dropna=False)
3332+
if test_series:
3333+
gb = gb["value"]
3334+
result = gb.grouper.result_index
3335+
expected = Index([Decimal(1), None], name="key")
3336+
tm.assert_index_equal(result, expected)

0 commit comments

Comments
 (0)