Closed
Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
import pandas.testing as pdt
# works:
df1 = pd.DataFrame({
'g': [None, 'a'],
'x': 1,
})
actual1 = df1.groupby('g')['x'].transform('sum')
expected1 = pd.Series([np.nan, 1.], name='x')
pdt.assert_series_equal(
actual1,
expected1,
)
# crashes:
df2 = pd.DataFrame({
'g': [None],
'x': 1,
})
actual2 = df2.groupby('g')['x'].transform('sum')
expected2 = pd.Series([np.nan], name='x') # crashes with "ValueError: Length of passed values is 1, index implies 0"
pdt.assert_series_equal(
actual2,
expected2,
)
Problem description
groupby
ignores groups that contain NULL-elements in any of the group columns. In that case, results of transform
are NULL (NaN for floats, NaT for time, None for objects). The question is what happens if there are only "NULL groups":
- before pandas 0.23: an object column with
None
objects is created - pandas 0.23: crash (
ValueError: Length of passed values is 1, index implies 0
)
Expected Output
A NULL-column according to the input data type.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.49-moby
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.1
pytest: 3.4.0
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.9.0
xarray: None
IPython: 5.7.0
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.0
xlrd: None
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.8.0
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: 0.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None