Skip to content

DOC: DataFrame.replace() regex contract inconsistent with usage #55570

Closed
@schivmeister

Description

@schivmeister

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.replace.html

Documentation problem

From https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.replace.html I quote:

regex : bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

I want to bring your attention to the first condition:

If this is True then to_replace must be a string.

However, in all the Pandas versions I've tried, this doesn't appear to be the case. The to_replace argument need not be a string when supplying regex=True, and providing a nested dict in to_replace appears to work as expected.

MWE:

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['apple', 'banana', 'cherry', 'date', 'elderberry']
}
df = pd.DataFrame(data)

# Display the original DataFrame
print("Original DataFrame:")
print(df)

# Test regex replacements using a nested dictionary
nested_dict = {
    'C': {
        r'^[a-d]': 'fruit_',  # Replace words starting with 'a' to 'd'
        r'[^aeiou]+$': 'berry',  # Replace words not ending in a vowel with 'berry'
    }
}

# Apply regex replacements using the nested dictionary
df.replace(to_replace=nested_dict, regex=True, inplace=True)

# Display the modified DataFrame with regex replacements
print("\nModified DataFrame:")
print(df)

Suggested fix for documentation

Update documentation or API behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions