Skip to content

groupby w/ only NULL-groups crashes since 0.23 #21624

Closed
@crepererum

Description

@crepererum

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import pandas.testing as pdt


# works:
df1 = pd.DataFrame({
    'g': [None, 'a'],
    'x': 1,
})
actual1 = df1.groupby('g')['x'].transform('sum')
expected1 = pd.Series([np.nan, 1.], name='x')

pdt.assert_series_equal(
    actual1,
    expected1,
)


# crashes:
df2 = pd.DataFrame({
    'g': [None],
    'x': 1,
})
actual2 = df2.groupby('g')['x'].transform('sum')
expected2 = pd.Series([np.nan], name='x')  # crashes with "ValueError: Length of passed values is 1, index implies 0"

pdt.assert_series_equal(
    actual2,
    expected2,
)

Problem description

groupby ignores groups that contain NULL-elements in any of the group columns. In that case, results of transform are NULL (NaN for floats, NaT for time, None for objects). The question is what happens if there are only "NULL groups":

  • before pandas 0.23: an object column with None objects is created
  • pandas 0.23: crash (ValueError: Length of passed values is 1, index implies 0)

Expected Output

A NULL-column according to the input data type.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.49-moby
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.1
pytest: 3.4.0
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.9.0
xarray: None
IPython: 5.7.0
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.0
xlrd: None
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.8.0
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: 0.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions