Skip to content

DataFrame.groupby().std() fails on filtered DataFrame #16174

Closed
@edhalter

Description

@edhalter

Code Sample, a copy-pastable example if possible

dicts = [{'filter_col':False, 'groupby_col':True, 'bool_col':True, 'float_col':10.5}, {'filter_col':True, 'groupby_col':True, 'bool_col':True, 'float_col':20.5}, {'filter_col':True, 'groupby_col':True, 'bool_col':True, 'float_col':30.5}]
df = DataFrame(dicts)
df_filter = df[df['filter_col'] == True]
dfgb = df_filter.groupby('groupby_col')
dfgb.std()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/site-packages/pandas/core/groupby.py", line 1055, in std
    return np.sqrt(self.var(ddof=ddof))
AttributeError: 'bool' object has no attribute 'sqrt'

Problem description

Required elements for the error to appear are:

  • groupby() is applied to a filtered DataFrame, not an original DataFrame
  • std(), not another aggregate function (e.g. mean()), is called on the DataFrameGroupBy object
  • the DataFrame contains a column of type bool
  • there are at least 2 rows w/ the same value of the .groupby() column (here, 'groupby_col')

In my more-complicated real-world data where I ran into the error, I would also see an Exception complaining about type float:

AttributeError: 'float' object has no attribute 'sqrt'

However, even in that case, deleting the bool column would resolve the issue.

Presumably I'll be able to work around the issue by calling .std() on individual columns of the DataFrameGroupBy object, but it seems like pandas should be able to handle this case w/o choking.

Expected Output

             bool_col  filter_col  float_col
groupby_col                                 
True              0.0     0.0       7.07107

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None

python: 3.5.3.final.0

python-bits: 64

OS: Linux

OS-release: 4.9.16-gentoo
machine: x86_64
processor: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 7.1.2
setuptools: 30.4.0
Cython: 0.25.1
numpy: 1.10.4
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.5
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions