Skip to content

DataFrame Groupby Apply Returning Unexpected Result? #12652

Closed
@jaradc

Description

@jaradc

Why doesn't applying the addColumn function to a DataFrameGroupby object return the expected output below?

Code Sample, a copy-pasteable example if possible

import pandas as pd

df = pd.DataFrame({
  ('C', 'julian'): [0.258185, 0.52591899999999991, 0.17491099999999998, 0.94083099999999997, 0.70193700000000003, 0.189361, 0.90364500000000003, 0.56848199999999993, 0.44919799999999993, 0.39054899999999998],
  ('B', 'geoffrey'): [0.27970200000000001, 0.54119799999999996, 0.36436499999999999, 0.73802900000000005, 0.85527000000000009, 0.37441099999999999, 0.87378500000000003, 0.062140000000000001, 0.008404, 0.171458], 
  ('A', 'julian'): [0.20408199999999999, 0.263235, 0.196243, 0.52878500000000006, 0.85351699999999997, 0.23979699999999998, 0.98073399999999999, 0.59194199999999997, 0.81816699999999998, 0.21742399999999998], 
  ('B', 'julian'): [0.79572500000000002, 0.507324, 0.65340799999999999, 0.65416000000000007, 0.803087, 0.94354400000000005, 0.85009699999999988, 0.56629799999999997, 0.28205000000000002, 0.47193299999999999], 
  ('A', 'geoffrey'): [0.073676000000000005, 0.096733, 0.028613, 0.831569, 0.26324999999999998, 0.069519000000000011, 0.29041400000000001, 0.088387000000000007, 0.061483000000000003, 0.42760200000000004], 
  ('C', 'geoffrey'): [0.25811200000000001, 0.75765199999999999, 0.92473300000000003, 0.29447299999999998, 0.26469799999999999, 0.84664699999999993, 0.11871300000000001, 0.87206399999999995, 0.65837000000000001, 0.23442600000000002]},
  columns=pd.MultiIndex.from_tuples([('A','julian'),('A','geoffrey'), ('B','julian'),('B','geoffrey'), ('C','julian'),('C','geoffrey')]))

def addColumn(grouped):
  name = grouped.columns[0][1]
  grouped['sum', name] = grouped.sum(axis=1)
  #print(grouped)
  return grouped

result = df.groupby(level=1, axis=1).apply(addColumn)

Expected Output

      A         B         C       sum         A         B         C       sum  
   geoffrey  geoffrey  geoffrey  geoffrey    julian    julian    julian    julian  
0  0.073676  0.279702  0.258112  0.611491  0.204082  0.795725  0.258185  1.257992  
1  0.096733  0.541198  0.757652  1.395584  0.263235  0.507324  0.525919  1.296478  
2  0.028613  0.364365  0.924733  1.317710  0.196243  0.653408  0.174911  1.024561  
3  0.831569  0.738029  0.294473  1.864071  0.528785  0.654160  0.940831  2.123776  
4  0.263250  0.855270  0.264698  1.383219  0.853517  0.803087  0.701937  2.358542  
5  0.069519  0.374411  0.846647  1.290578  0.239797  0.943544  0.189361  1.372703  
6  0.290414  0.873785  0.118713  1.282912  0.980734  0.850097  0.903645  2.734476  
7  0.088387  0.062140  0.872064  1.022590  0.591942  0.566298  0.568482  1.726721  
8  0.061483  0.008404  0.658370  0.728257  0.818167  0.282050  0.449198  1.549415  
9  0.427602  0.171458  0.234426  0.833486  0.217424  0.471933  0.390549  1.079906  

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: None
setuptools: 15.0
Cython: 0.22
numpy: 1.9.3
scipy: 0.16.0c1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.4.2
bs4: 4.1.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions