Skip to content

ENH: should boolean indexing preserve input dtypes where possible? #2794

Closed
@jreback

Description

@jreback

Should pandas preserve the input dtype on when doing
boolean indexing, if possible?

Its a pretty limited case, in that you have to have all values in a column non-null
and they all have to be integers (just casted as floats)

this is actually a little tricky to implement as this has to be done column by column (and its possibly that blocks have to be split)

the question here is will this be not what the user expects? (currently all dtypes
on boolean output operations are cast to float64/object)

In [24]: df = pd.DataFrame(dict(
  a = pd.Series([1]*3,dtype='int32'), 
  b = pd.Series([1]*3,dtype='float32')),
index=range(3))

In [25]: df.ix[2,1] = 0

In [26]: df
Out[26]: 
   a  b
0  1  1
1  1  1
2  1  0

In [27]: df.dtypes
Out[27]: 
a      int32
b    float32
Dtype: object

In [28]: df[df>0]
Out[28]: 
   a   b
0  1   1
1  1   1
2  1 NaN

##### if it is possible to preserve column a as the original int input dtype
which in this case it is #####
In [29]: df[df>0].dtypes
Out[29]: 
a    float64
b    float32
Dtype: object

### possible output ###
Out[29]: 
a    int32
b    float32
Dtype: object

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions