Description
I'm having trouble figuring out how DataFrame.replace() is supposed to work. I'm not sure if this is a bug or a documentation issue.
In [1]: import pandas
In [2]: df = pandas.DataFrame({"col1":range(5), "col2":[0.5]*3+[1.0]*2})
In [3]: df
Out[3]:
col1 col2
0 0 0.5
1 1 0.5
2 2 0.5
3 3 1.0
4 4 1.0
In [4]: df.replace(1.0, "a")
Out[4]:
col1 col2
0 0 0.5
1 a 0.5
2 2 0.5
3 3 a
4 4 a
In [5]: df.replace(1.0, "a").replace(0.5, "b")
Out[5]:
col1 col2
0 0 b
1 a b
2 2 b
3 3 a
4 4 a
So far, so good, everything makes sense. But I would have expected this to accomplish the same as above:
In [6]: df.replace({1.0:"a", 0.5:"b"})
Out[6]:
col1 col2
0 b b
1 a a
2 2 b
3 3 a
4 4 b
As you can see, I'm getting alternating "b" and "a". From a quick browse of the source code, it seems that the dictionary-replacement option should result in the same outcome as the following (which gives what I would have expected):
In [15]: df.replace([1.0, 0.5], ["a", "b"])
Out[15]:
col1 col2
a 0 b
b a b
c 2 b
d 3 a
e 4 a
I'm not sure what the to_replace=dict option is supposed to be doing but (at least for pandas v 0.12.0) it isn't doing what I would have expected.
Whether this is a bug or not, the df.replace() method needs better documentation. It's not enough to include a disclaimer that "This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works."