Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame([
[4, 2, 'x'],
[3, 1, 'y'],
],
columns=['A','B','C']).set_index(['A', 'B'])
print(df)
# Consider this DataFrame:
#
# > C
# > A B
# > 4 2 x
# > 3 1 y
# Iterating over the group works if both levels
# of the multi-index are used for grouping.
for idx, group in df.groupby(level=[0, 1], sort=False):
print(idx)
# > (4, 2)
# > (3, 1)
# However, grouping by only one level,
# suddenly sorts the index.
for idx, group in df.groupby(level=0, sort=False):
print(idx)
# > 3
# > 4
# If the DataFrame has only one index,
# it works correctly
df2 = pd.DataFrame([
[4, 2, 'x'],
[3, 1, 'y'],
],
columns=['A','B','C']).set_index(['A'])
print(df2)
# > B C
# > A
# > 4 2 x
# > 3 1 y
for idx, group in df2.groupby(level=0, sort=False):
print(idx)
# > 4
# > 3
Problem description
DataFrame.groupby() has a parameter that selects whether the result should be sorted by groups.
However, if the DataFrame has a multi-index and the grouping is only done by one index, the result is sorted regardless of the value of sort
.
Grouping by more than one index works.
Passing the single index as a list [0]
does not fix the problem.
Expected Output
for idx, group in df.groupby(level=0, sort=False):
print(idx)
should yield
4
3
Output of pd.show_versions()
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-33-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_GB.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.21.0.dev+450.g6eadb87fe
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None