Description
Please run the code below.
Notice how the column values are swapped to the wrong labels.
This is due to stack()
failing to preserve the order in the MultiIndex.
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
values = np.arange(5)
data = np.vstack([['b{}'.format(x) for x in values], # b0, b1, ..
['a{}'.format(x) for x in values]]) # a0, a1, ..
df = pd.DataFrame(data.T, columns=['b', 'a'])
df.columns.name = 'first'
# Call pd.concat to get the 2-level MultiIndex *unsorted* columns.
# The bug seems to happen when having one of these unsorted MultiIndexes.
second_level_dict = {'x': df}
multi_level_df = pd.concat(second_level_dict, axis=1)
multi_level_df.columns.names = ['second', 'first']
# Sort the columns, i.e. [a, b] instead of [b, a].
sorted_cols_df = multi_level_df.reindex(sorted(multi_level_df.columns), axis=1)
print('Before the restack:')
print(sorted_cols_df)
# Stack and unstack, should be the same.
# This is what causes the bug. sorted_cols_df.stack() also exposes the problem
restacked = sorted_cols_df.stack(['first', 'second']).unstack(['first', 'second'])
print()
print('Restacked:')
print(restacked)
print('(Notice the swapped column values)')
Output
$ python pandas_bug.py
Before the restack:
second x
first a b
0 a0 b0
1 a1 b1
2 a2 b2
3 a3 b3
4 a4 b4
Restacked:
first a b
second x x
0 b0 a0 <-- notice the swapped values
1 b1 a1
2 b2 a2
3 b3 a3
4 b4 a4
Output of pd.show_versions()
I've reproduced this on both 0.21 and 0.20.
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-97-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None