Description
Problem description
Series.str
contains methods for all the regular expression matching modes in the re
package except for re.fullmatch()
. fullmatch
only returns matches that cover the entire input string, unlike match
, which also returns matches that start at the beginning of the string but do not cover the complete string.
One can work around the lack of fullmatch
by round-tripping to/from numpy arrays and using np.vectorize
, i.e.
>>> s = pd.Series(["foo", "bar", "foobar"])
>>> my_regex = "foo"
>>> import re
>>> import numpy as np
>>> compiled_regex = re.compile(my_regex)
>>> regex_f = np.vectorize(lambda s: compiled_regex.fullmatch(s) is not None)
>>> matches_array = regex_f(s.values)
>>> matches_series = pd.Series(matches_array)
>>> matches_series
0 True
1 False
2 False
dtype: bool
but it would be more convenient for users if fullmatch
was built in.
The fullmatch
method was added to the re
package in Python 3.4. I think that the reason this method wasn't in previous versions of Pandas was that older versions of Python don't have re.fullmatch
. As of Pandas 1.0, all the supported versions of Python now have fullmatch
.
I have a pull request ready that adds this functionality. After my changes, the Series.str
namespace gets a new method fullmatch
that evaluates re.fullmatch
over the series. For example:
>>> s = pd.Series(["foo", "bar", "foobar"])
>>> s.str.fullmatch("foo")
0 True
1 False
2 False
dtype: bool
[Edit: Simplified the workaround]