Closed
Description
Why doesn't applying the addColumn
function to a DataFrameGroupby
object return the expected output below?
Code Sample, a copy-pasteable example if possible
import pandas as pd
df = pd.DataFrame({
('C', 'julian'): [0.258185, 0.52591899999999991, 0.17491099999999998, 0.94083099999999997, 0.70193700000000003, 0.189361, 0.90364500000000003, 0.56848199999999993, 0.44919799999999993, 0.39054899999999998],
('B', 'geoffrey'): [0.27970200000000001, 0.54119799999999996, 0.36436499999999999, 0.73802900000000005, 0.85527000000000009, 0.37441099999999999, 0.87378500000000003, 0.062140000000000001, 0.008404, 0.171458],
('A', 'julian'): [0.20408199999999999, 0.263235, 0.196243, 0.52878500000000006, 0.85351699999999997, 0.23979699999999998, 0.98073399999999999, 0.59194199999999997, 0.81816699999999998, 0.21742399999999998],
('B', 'julian'): [0.79572500000000002, 0.507324, 0.65340799999999999, 0.65416000000000007, 0.803087, 0.94354400000000005, 0.85009699999999988, 0.56629799999999997, 0.28205000000000002, 0.47193299999999999],
('A', 'geoffrey'): [0.073676000000000005, 0.096733, 0.028613, 0.831569, 0.26324999999999998, 0.069519000000000011, 0.29041400000000001, 0.088387000000000007, 0.061483000000000003, 0.42760200000000004],
('C', 'geoffrey'): [0.25811200000000001, 0.75765199999999999, 0.92473300000000003, 0.29447299999999998, 0.26469799999999999, 0.84664699999999993, 0.11871300000000001, 0.87206399999999995, 0.65837000000000001, 0.23442600000000002]},
columns=pd.MultiIndex.from_tuples([('A','julian'),('A','geoffrey'), ('B','julian'),('B','geoffrey'), ('C','julian'),('C','geoffrey')]))
def addColumn(grouped):
name = grouped.columns[0][1]
grouped['sum', name] = grouped.sum(axis=1)
#print(grouped)
return grouped
result = df.groupby(level=1, axis=1).apply(addColumn)
Expected Output
A B C sum A B C sum
geoffrey geoffrey geoffrey geoffrey julian julian julian julian
0 0.073676 0.279702 0.258112 0.611491 0.204082 0.795725 0.258185 1.257992
1 0.096733 0.541198 0.757652 1.395584 0.263235 0.507324 0.525919 1.296478
2 0.028613 0.364365 0.924733 1.317710 0.196243 0.653408 0.174911 1.024561
3 0.831569 0.738029 0.294473 1.864071 0.528785 0.654160 0.940831 2.123776
4 0.263250 0.855270 0.264698 1.383219 0.853517 0.803087 0.701937 2.358542
5 0.069519 0.374411 0.846647 1.290578 0.239797 0.943544 0.189361 1.372703
6 0.290414 0.873785 0.118713 1.282912 0.980734 0.850097 0.903645 2.734476
7 0.088387 0.062140 0.872064 1.022590 0.591942 0.566298 0.568482 1.726721
8 0.061483 0.008404 0.658370 0.728257 0.818167 0.282050 0.449198 1.549415
9 0.427602 0.171458 0.234426 0.833486 0.217424 0.471933 0.390549 1.079906
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: Nonepandas: 0.17.1
nose: None
pip: None
setuptools: 15.0
Cython: 0.22
numpy: 1.9.3
scipy: 0.16.0c1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.4.2
bs4: 4.1.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8