Skip to content

Refactor DataFrame.replace to dispatch on types #5541

Closed
@nspies

Description

@nspies

I'm having trouble figuring out how DataFrame.replace() is supposed to work. I'm not sure if this is a bug or a documentation issue.

In [1]: import pandas

In [2]: df = pandas.DataFrame({"col1":range(5), "col2":[0.5]*3+[1.0]*2})

In [3]: df
Out[3]: 
   col1  col2
0     0   0.5
1     1   0.5
2     2   0.5
3     3   1.0
4     4   1.0

In [4]: df.replace(1.0, "a")
Out[4]: 
  col1 col2
0    0  0.5
1    a  0.5
2    2  0.5
3    3    a
4    4    a

In [5]: df.replace(1.0, "a").replace(0.5, "b")
Out[5]: 
  col1 col2
0    0    b
1    a    b
2    2    b
3    3    a
4    4    a

So far, so good, everything makes sense. But I would have expected this to accomplish the same as above:


In [6]: df.replace({1.0:"a", 0.5:"b"})
Out[6]: 
  col1 col2
0    b    b
1    a    a
2    2    b
3    3    a
4    4    b

As you can see, I'm getting alternating "b" and "a". From a quick browse of the source code, it seems that the dictionary-replacement option should result in the same outcome as the following (which gives what I would have expected):

In [15]: df.replace([1.0, 0.5], ["a", "b"])
Out[15]: 
  col1 col2
a    0    b
b    a    b
c    2    b
d    3    a
e    4    a

I'm not sure what the to_replace=dict option is supposed to be doing but (at least for pandas v 0.12.0) it isn't doing what I would have expected.

Whether this is a bug or not, the df.replace() method needs better documentation. It's not enough to include a disclaimer that "This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works."

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions