Skip to content

Groupby on a column with tuples creates a multiindex, yielding unexpected behaviour. #21340

Closed
@masasin

Description

@masasin

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0, 11, (10, 3)), columns=['num1', 'num2', 'num3'])
df['category_tuple'] = [(0, 1), (0, 1), (0, 1), (2, 3), (2, 3), (2, 3), (2, 3), (0, 4), (5,), (6,)]
df['category_string'] = list('aaabbbbcde')
df = df[['category_tuple', 'category_string', 'num1', 'num2', 'num3']]

# df.groupby('category_string').apply(lambda x: x.sort_values(by='num1'))
df.groupby('category_tuple').apply(lambda x: x.sort_values(by='num1'))

Problem description

I get this error: ValueError: Names should be list-like for a MultiIndex

This behaviour changed somewhere between 0.19.0 and 0.19.1. This can be sidestepped by casting the tuples to a string before doing the groupby, but the behaviour is definitely unexpected.

category_string works, but category_tuple does not. Other operations also break in the same way (e.g., describe()). I'd expect them both to work in the same way, and not create a multiindex unless I request it.

Expected Output

                  category_tuple category_string  num1  num2  num3
category_tuple                                                   
(0, 1)          0         (0, 1)               a     0     0    10
                2         (0, 1)               a     0     7     1
                1         (0, 1)               a     6     7     2
(2, 3)          4         (2, 3)               b     1     3     0
                6         (2, 3)               b     3     6     8
                3         (2, 3)               b     6    10     3
                5         (2, 3)               b     6     0     1
(0, 4)          7         (0, 4)               c     7     7     5
(5,)            8           (5,)               d     4     5     7
(6,)            9           (6,)               e     9     1     5

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None

pandas: 0.22.0
pytest: None
pip: 10.0.0
setuptools: 28.8.0
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

BugGroupbyMultiIndexNeeds TestsUnit test(s) needed to prevent regressionsNested DataData where the values are collections (lists, sets, dicts, objects, etc.).good first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions