Closed
Description
Should pandas preserve the input dtype on when doing
boolean indexing, if possible?
Its a pretty limited case, in that you have to have all values in a column non-null
and they all have to be integers (just casted as floats)
this is actually a little tricky to implement as this has to be done column by column (and its possibly that blocks have to be split)
the question here is will this be not what the user expects? (currently all dtypes
on boolean output operations are cast to float64/object)
In [24]: df = pd.DataFrame(dict(
a = pd.Series([1]*3,dtype='int32'),
b = pd.Series([1]*3,dtype='float32')),
index=range(3))
In [25]: df.ix[2,1] = 0
In [26]: df
Out[26]:
a b
0 1 1
1 1 1
2 1 0
In [27]: df.dtypes
Out[27]:
a int32
b float32
Dtype: object
In [28]: df[df>0]
Out[28]:
a b
0 1 1
1 1 1
2 1 NaN
##### if it is possible to preserve column a as the original int input dtype
which in this case it is #####
In [29]: df[df>0].dtypes
Out[29]:
a float64
b float32
Dtype: object
### possible output ###
Out[29]:
a int32
b float32
Dtype: object