Skip to content

ENH: Allow regex matching in fullmatch mode #32806

Closed
@frreiss

Description

@frreiss

Problem description

Series.str contains methods for all the regular expression matching modes in the re package except for re.fullmatch(). fullmatch only returns matches that cover the entire input string, unlike match, which also returns matches that start at the beginning of the string but do not cover the complete string.

One can work around the lack of fullmatch by round-tripping to/from numpy arrays and using np.vectorize, i.e.

>>> s = pd.Series(["foo", "bar", "foobar"])
>>> my_regex = "foo"
>>> import re
>>> import numpy as np
>>> compiled_regex = re.compile(my_regex)
>>> regex_f = np.vectorize(lambda s: compiled_regex.fullmatch(s) is not None)
>>> matches_array = regex_f(s.values)
>>> matches_series = pd.Series(matches_array)
>>> matches_series
0     True
1    False
2    False
dtype: bool

but it would be more convenient for users if fullmatch was built in.

The fullmatch method was added to the re package in Python 3.4. I think that the reason this method wasn't in previous versions of Pandas was that older versions of Python don't have re.fullmatch. As of Pandas 1.0, all the supported versions of Python now have fullmatch.

I have a pull request ready that adds this functionality. After my changes, the Series.str namespace gets a new method fullmatch that evaluates re.fullmatch over the series. For example:

>>> s = pd.Series(["foo", "bar", "foobar"])
>>> s.str.fullmatch("foo")
0     True
1    False
2    False
dtype: bool

[Edit: Simplified the workaround]

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions