Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.replace.html
Documentation problem
From https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.replace.html I quote:
regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this isTrue
then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must beNone
.
I want to bring your attention to the first condition:
If this is
True
then to_replace must be a string.
However, in all the Pandas versions I've tried, this doesn't appear to be the case. The to_replace
argument need not be a string when supplying regex=True
, and providing a nested dict in to_replace
appears to work as expected.
MWE:
import pandas as pd
# Create a sample DataFrame
data = {
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': ['apple', 'banana', 'cherry', 'date', 'elderberry']
}
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
# Test regex replacements using a nested dictionary
nested_dict = {
'C': {
r'^[a-d]': 'fruit_', # Replace words starting with 'a' to 'd'
r'[^aeiou]+$': 'berry', # Replace words not ending in a vowel with 'berry'
}
}
# Apply regex replacements using the nested dictionary
df.replace(to_replace=nested_dict, regex=True, inplace=True)
# Display the modified DataFrame with regex replacements
print("\nModified DataFrame:")
print(df)
Suggested fix for documentation
Update documentation or API behaviour.