Description
Dear developers,
I think that it is confusing how DataFrame.replace interprets the to_replace values as a pure string or a regexp.
When you pass a dictionary of from and to values, these are interpreted as pure string literals and work as expected. If you pass a nested dictionary with the same values, it is interpreted as a regex even if regex = False is specifically added. This causes a problem if you want to replace values that have special characters in them.
Let's see an example:
@buddha[J:T26]|19> df = pd.DataFrame({'a' : ['()', 'something else']})
@buddha[J:T26]|20> df
<20>
a
0 ()
1 something else
[2 rows x 1 columns]
@buddha[J:T26]|21> df.replace({'()' : 'parantheses'})
<21>
a
0 parantheses
1 something else
[2 rows x 1 columns]
@buddha[J:T26]|22> df.replace({'a' : {'()' : 'parantheses'}})
<22>
a
0 parantheses(parantheses)parantheses
1 paranthesessparanthesesoparanthesesmparanthese...
[2 rows x 1 columns]
As you can see, in the first case the () got replaced as expected. In the second case, even though the only thing I changed was to specify the column in which the replace should occur, the parentheses got treated as a regex. The same happens if I add regex = False to the options.
The current workaround for me is to make sure that the from/to values are regexes.
Cheers,
Adam
Ps.: Pandas is awesome, thanks a lot for all the effort!
@Buddha[J:T26]|24> pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 23 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.13.1
Cython: None
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.2.0
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
sqlalchemy: 0.8.4
lxml: 3.3.1
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None