Skip to content

BUG: Aggregating over an integer on an empty DataFrame removes the index name #7574

Closed
@chrisaycock

Description

@chrisaycock

Suppose I have this simple DataFrame:

In [91]: df = pd.DataFrame({'ref':[1, 2, 3], 'symbol':['GOOGL', 'IBM', 'MSFT'], 'price':[585.93, 180.72, 42.03]}, columns=['ref', 'symbol', 'price'])

In [92]: df
Out[92]:
   ref symbol   price
0    1  GOOGL  585.93
1    2    IBM  180.72
2    3   MSFT   42.03

In [93]: df.dtypes
Out[93]:
ref         int64
symbol     object
price     float64
dtype: object

If I aggregate over an object and then reset the index, everything comes back as expected:

In [95]: df.groupby('symbol').first().reset_index()
Out[95]:
  symbol  ref   price
0  GOOGL    1  585.93
1    IBM    2  180.72
2   MSFT    3   42.03

Similarly with aggregating over an integer:

In [96]: df.groupby('ref').first().reset_index()
Out[96]:
   ref symbol   price
0    1  GOOGL  585.93
1    2    IBM  180.72
2    3   MSFT   42.03

Now let's say I have an empty DataFrame. Aggregating over an object produces what I expect:

In [97]: df.query('price > 1000').groupby('symbol').first().reset_index()
Out[97]:
Empty DataFrame
Columns: [symbol, ref, price]
Index: []

But aggregating over an integer gives me index as the column name instead of the expected ref!

In [98]: df.query('price > 1000').groupby('ref').first().reset_index()
Out[98]:
Empty DataFrame
Columns: [index, symbol, price]
Index: []
          ^^^^^

Interestingly, setting the index in a non-aggregating function does the right thing:

In [100]: df.query('price > 1000').set_index('ref').reset_index()
Out[100]:
Empty DataFrame
Columns: [ref, symbol, price]
Index: []

So aggregating over an integer in an empty DataFrame removes the index name. This was discovered in pandas 0.14.0.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions