Skip to content

Should str.replace accept a compiled expression? #15446

Closed
@gwerbin

Description

@gwerbin

The current solution is to call str.replace(<compiled_re>.pattern, flags=<compiled_re>.flags) which is relatively ugly and verbose in my opnion.

Here's a contrived example of removing stopwords and normalizing whitespace afterwards:

import pandas as pd
import re

some_names = pd.Series(["three weddings and a funeral", "the big lebowski", "florence and the machine"])

stopwords = ["the", "a", "and"]
stopwords_re = re.compile(r"(\s+)?\b({})\b(\s+)?".format("|".join(stopwords), re.IGNORECASE)
whitespace_re = re.compile(r"\s+")

# desired code:
# some_names.str.replace(stopwords_re, " ").str.strip().str.replace(whitespace_re, " ")

# actual code:
some_names.\
    str.replace(stopwords_re.pattern, " ", flags=stopwords_re.flags).\
    str.strip().str.replace(whitespace_re.pattern, " ", flags=whitespace_re.flags)

Why do I think this is better?

  1. It's nice to have commonly used regular expressions compiled and to carry their flags around with them (and also allows the use of "verbose" regular expressions)
  2. It's not that compiled regular expressions should quack like strings... it's that in this case we're making strings quack like compiled regular expressions, but at the same time not letting those compiled regular expressions quack their own quack.

Is there a good reason not to implement this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions