Closed
Description
In a DataFrame
without a categorical, the following comparisons work as expected:
df = pd.DataFrame({'id': [None] * 3, 'spam': [None] * 3})
df['spam'] == 'spam'
df.groupby('id').first()['spam'] == 'spam'
However, when a column is Categorical
, a groupby
on the all-null column behaves unexpected:
df['spam'] = df['spam'].astype('category')
df['spam'] == 'spam' # works as expected
df.groupby('id').first()['spam'] == 'spam' # raises TypeError: invalid type comparison
Looks like the groupby converts all types in the group to float64
:
>>> df.groupby('id').first().info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 1 columns):
spam 0 non-null float64
dtypes: float64(1)
memory usage: 0.0 bytes