Skip to content

ENH: note interpretation of pattern for str.contains as regex #44811

Closed
@nealmcb

Description

@nealmcb

Is your feature request related to a problem?

I didn't understand this error message:

>>> contests[contests.contest_name.str.contains("Proposition 120 (STATUTORY)")]
UserWarning: This pattern has match groups. To actually get the groups, use str.extract.

Only after fiddling around and searching did it finally dawn on me that the groups it was talking about weren't pandas group_by groups, but regular expression groups. I was using pandas group_by elsewhere in the project, but not had no intention of using regular expressions at all. Since the python __contains__ functionality doesn't use regular expressions, and people who aren't thinking in those terms may have no idea what "match groups" are, I suggest a more helpful error message.

The fix in my case was easy: either quoting the parentheses so they weren't interpreted as grouping metacharacters, or using regex=False as explained at How do I select by partial string from a pandas DataFrame.

But I'd like others to figure out the issue more quickly than I did.

Describe the solution you'd like

I suggest e.g. changing the message to:

UserWarning: This pattern is interpreted as a regular expression and has match groups. To actually get the groups, use str.extract.

API breaking implications

This suggestion has no impact on the Pandas API.

Additional context

Other error messages related to regular expressions may also benefit from more context and clarity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions