Ambiguity in DataFrame.replace regex handling

Dear developers,

I think that it is confusing how DataFrame.replace interprets the to_replace values as a pure string or a regexp. 

When you pass a dictionary of from and to values, these are interpreted as pure string literals and work as expected. If you pass a nested dictionary with the same values, it is interpreted as a regex even if regex = False is specifically added. This causes a problem if you want to replace values that have special characters in them.

Let's see an example:

``` python

@buddha[J:T26]|19> df = pd.DataFrame({'a' : ['()', 'something else']})
@buddha[J:T26]|20> df
              <20>
                a
0              ()
1  something else

[2 rows x 1 columns]
@buddha[J:T26]|21> df.replace({'()' : 'parantheses'})
              <21>
                a
0     parantheses
1  something else

[2 rows x 1 columns]
@buddha[J:T26]|22> df.replace({'a' : {'()' : 'parantheses'}})
              <22>
                                                   a
0                parantheses(parantheses)parantheses
1  paranthesessparanthesesoparanthesesmparanthese...

[2 rows x 1 columns]
```

As you can see, in the first case the () got replaced as expected. In the second case, even though the only thing I changed was to specify the column in which the replace should occur, the parentheses got treated as a regex. The same happens if I add regex = False to the options.

The current workaround for me is to make sure that the from/to values are regexes.

Cheers,
Adam

Ps.: Pandas is awesome, thanks a lot for all the effort!

@buddha[J:T26]|24> pd.show_versions()
## INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 23 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: None
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.2.0
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
sqlalchemy: 0.8.4
lxml: 3.3.1
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ambiguity in DataFrame.replace regex handling #6777

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Ambiguity in DataFrame.replace regex handling #6777

Description

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions