Closed
Description
A small, complete example of the issue
import pandas as pd
import numpy as np
dd = dict(a=np.arange(9), b=np.repeat(np.arange(3), 3))
df = pd.DataFrame(dd)
# Problematic line below
df.groupby('b', as_index=False).apply(np.std)
# The line below has the same problem
df.groupby('b', as_index=False).std()
# When as_index=False is not passed in, it works as expected.
# The following two lines work exactly as expected
# (An example of the group-by/apply working for some operations
# df.groupby('b', as_index=False).apply(np.mean)
# df.groupby('b', as_index=False).mean()
Expected Output
a b
0 0.816497 0.0
1 0.816497 1.0
2 0.816497 2.0
Actual Output
a b
0 0.816497 0.0
1 0.816497 0.0
2 0.816497 0.0
When as_index=False
is passed into groupby, finding the standard deviation doesn't work as expected. When as_index=True
, everything works as expected.
Finding the mean works as expected in both cases.
I have been able to reproduce the problem also on Linux with the same version of pandas.
Output of pd.show_versions()
> INSTALLED VERSIONS
> ------------------
> commit: None
> python: 3.5.2.final.0
> python-bits: 64
> OS: Darwin
> OS-release: 15.6.0
> machine: x86_64
> processor: i386
> byteorder: little
> LC_ALL: en_US
> LANG: en_US.UTF8
>
> pandas: 0.18.0
> nose: None
> pip: 8.1.2
> setuptools: 28.5.0
> Cython: None
> numpy: 1.11.0
> scipy: 0.17.0
> statsmodels: None
> xarray: None
> IPython: 5.1.0
> sphinx: 1.3.5
> patsy: None
> dateutil: 2.5.3
> pytz: 2016.3
> blosc: None
> bottleneck: None
> tables: None
> numexpr: None
> matplotlib: 1.5.1
> openpyxl: None
> xlrd: 0.9.4
> xlwt: None
> xlsxwriter: None
> lxml: None
> bs4: 4.4.1
> html5lib: None
> httplib2: None
> apiclient: None
> sqlalchemy: None
> pymysql: None
> psycopg2: None
> jinja2: 2.8
> boto: None