Skip to content

Commit ae0d302

Browse files
authored
BUG: DataFrame.stack not handling NaN in MultiIndex columns correct (#39507)
1 parent caf482d commit ae0d302

File tree

3 files changed

+33
-12
lines changed

3 files changed

+33
-12
lines changed

doc/source/whatsnew/v1.3.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,7 @@ Reshaping
413413
- :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`)
414414
- Bug in :meth:`DataFrame.join` not assigning values correctly when having :class:`MultiIndex` where at least one dimension is from dtype ``Categorical`` with non-alphabetically sorted categories (:issue:`38502`)
415415
- :meth:`Series.value_counts` and :meth:`Series.mode` return consistent keys in original order (:issue:`12679`, :issue:`11227` and :issue:`39007`)
416+
- Bug in :meth:`DataFrame.stack` not handling ``NaN`` in :class:`MultiIndex` columns correct (:issue:`39481`)
416417
- Bug in :meth:`DataFrame.apply` would give incorrect results when used with a string argument and ``axis=1`` when the axis argument was not supported and now raises a ``ValueError`` instead (:issue:`39211`)
417418
- Bug in :meth:`DataFrame.sort_values` not reshaping index correctly after sorting on columns, when ``ignore_index=True`` (:issue:`39464`)
418419
- Bug in :meth:`DataFrame.append` returning incorrect dtypes with combinations of ``ExtensionDtype`` dtypes (:issue:`39454`)

pandas/core/reshape/reshape.py

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -629,16 +629,12 @@ def _convert_level_number(level_num, columns):
629629

630630
# tuple list excluding level for grouping columns
631631
if len(frame.columns.levels) > 2:
632-
tuples = list(
633-
zip(
634-
*[
635-
lev.take(level_codes)
636-
for lev, level_codes in zip(
637-
this.columns.levels[:-1], this.columns.codes[:-1]
638-
)
639-
]
640-
)
641-
)
632+
levs = []
633+
for lev, level_codes in zip(this.columns.levels[:-1], this.columns.codes[:-1]):
634+
if -1 in level_codes:
635+
lev = np.append(lev, None)
636+
levs.append(np.take(lev, level_codes))
637+
tuples = list(zip(*levs))
642638
unique_groups = [key for key, _ in itertools.groupby(tuples)]
643639
new_names = this.columns.names[:-1]
644640
new_columns = MultiIndex.from_tuples(unique_groups, names=new_names)
@@ -650,7 +646,9 @@ def _convert_level_number(level_num, columns):
650646
new_data = {}
651647
level_vals = this.columns.levels[-1]
652648
level_codes = sorted(set(this.columns.codes[-1]))
653-
level_vals_used = level_vals[level_codes]
649+
level_vals_nan = level_vals.insert(len(level_vals), None)
650+
651+
level_vals_used = np.take(level_vals_nan, level_codes)
654652
levsize = len(level_codes)
655653
drop_cols = []
656654
for key in unique_groups:
@@ -671,7 +669,7 @@ def _convert_level_number(level_num, columns):
671669

672670
if slice_len != levsize:
673671
chunk = this.loc[:, this.columns[loc]]
674-
chunk.columns = level_vals.take(chunk.columns.codes[-1])
672+
chunk.columns = level_vals_nan.take(chunk.columns.codes[-1])
675673
value_slice = chunk.reindex(columns=level_vals_used).values
676674
else:
677675
if frame._is_homogeneous_type and is_extension_array_dtype(

pandas/tests/frame/test_stack_unstack.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1931,3 +1931,25 @@ def test_unstack_with_level_has_nan(self):
19311931
)
19321932

19331933
tm.assert_index_equal(result, expected)
1934+
1935+
def test_stack_nan_in_multiindex_columns(self):
1936+
# GH#39481
1937+
df = DataFrame(
1938+
np.zeros([1, 5]),
1939+
columns=MultiIndex.from_tuples(
1940+
[
1941+
(0, None, None),
1942+
(0, 2, 0),
1943+
(0, 2, 1),
1944+
(0, 3, 0),
1945+
(0, 3, 1),
1946+
],
1947+
),
1948+
)
1949+
result = df.stack(2)
1950+
expected = DataFrame(
1951+
[[0.0, np.nan, np.nan], [np.nan, 0.0, 0.0], [np.nan, 0.0, 0.0]],
1952+
index=Index([(0, None), (0, 0), (0, 1)]),
1953+
columns=Index([(0, None), (0, 2), (0, 3)]),
1954+
)
1955+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)