Closed
Description
Suppose I have this simple DataFrame:
In [91]: df = pd.DataFrame({'ref':[1, 2, 3], 'symbol':['GOOGL', 'IBM', 'MSFT'], 'price':[585.93, 180.72, 42.03]}, columns=['ref', 'symbol', 'price'])
In [92]: df
Out[92]:
ref symbol price
0 1 GOOGL 585.93
1 2 IBM 180.72
2 3 MSFT 42.03
In [93]: df.dtypes
Out[93]:
ref int64
symbol object
price float64
dtype: object
If I aggregate over an object and then reset the index, everything comes back as expected:
In [95]: df.groupby('symbol').first().reset_index()
Out[95]:
symbol ref price
0 GOOGL 1 585.93
1 IBM 2 180.72
2 MSFT 3 42.03
Similarly with aggregating over an integer:
In [96]: df.groupby('ref').first().reset_index()
Out[96]:
ref symbol price
0 1 GOOGL 585.93
1 2 IBM 180.72
2 3 MSFT 42.03
Now let's say I have an empty DataFrame. Aggregating over an object produces what I expect:
In [97]: df.query('price > 1000').groupby('symbol').first().reset_index()
Out[97]:
Empty DataFrame
Columns: [symbol, ref, price]
Index: []
But aggregating over an integer gives me index
as the column name instead of the expected ref
!
In [98]: df.query('price > 1000').groupby('ref').first().reset_index()
Out[98]:
Empty DataFrame
Columns: [index, symbol, price]
Index: []
^^^^^
Interestingly, setting the index in a non-aggregating function does the right thing:
In [100]: df.query('price > 1000').set_index('ref').reset_index()
Out[100]:
Empty DataFrame
Columns: [ref, symbol, price]
Index: []
So aggregating over an integer in an empty DataFrame removes the index name. This was discovered in pandas 0.14.0.