-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Improved the docstring of Series.str.findall #19982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
fe09b66
a227404
4909526
9689ce6
ce333e4
19854e1
5e364a3
c688b50
62c6a5a
31b7919
cd7223b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -898,23 +898,92 @@ def str_join(arr, sep): | |
|
||
def str_findall(arr, pat, flags=0): | ||
""" | ||
Find all occurrences of pattern or regular expression in the | ||
Series/Index. Equivalent to :func:`re.findall`. | ||
Find all occurrences of pattern or regular expression in the Series/Index. | ||
|
||
Equivalent to applying :func:`re.findall` to all the elements in the | ||
Series/Index. | ||
|
||
Parameters | ||
---------- | ||
pat : string | ||
Pattern or regular expression | ||
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
Pattern or regular expression. | ||
flags : int | ||
re module flags, e.g. re.IGNORECASE (default is 0, which means | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the docstring guide shows only single backtick quotes at the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then we should update that. In principle (according to numpydoc format spec): single backtick quotes for refering to keyword arguments (eg if you would refer to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
no flags). | ||
|
||
Returns | ||
------- | ||
matches : Series/Index of lists | ||
Series/Index of lists of strings | ||
All non-overlapping matches of pattern or regular expression in each | ||
string of this Series/Index. | ||
|
||
See Also | ||
-------- | ||
extractall : returns DataFrame with one column per capture group | ||
extractall : For each subject string in the Series, extract groups | ||
from all matches of regular expression pattern. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From this explanation, it is for me not clear what the difference is with this method |
||
count : Count occurrences of pattern in each string of the Series/Index. | ||
re.findall: Return all non-overlapping matches of pattern in string, | ||
as a list of strings. | ||
|
||
Examples | ||
-------- | ||
|
||
>>> s = pd.Series(['Lion', 'Monkey', 'Rabbit']) | ||
|
||
The search for the pattern `Monkey` returns one match: | ||
|
||
>>> s.str.findall('Monkey') | ||
0 [] | ||
1 [Monkey] | ||
2 [] | ||
dtype: object | ||
|
||
On the other hand, the search for the pattern 'MONKEY' doesn't return any | ||
match: | ||
|
||
>>> s.str.findall('MONKEY') | ||
0 [] | ||
1 [] | ||
2 [] | ||
dtype: object | ||
|
||
Flags can be added to the regular expression. For instance, to find the | ||
pattern `MONKEY` ignoring the case: | ||
|
||
>>> import re | ||
>>> s.str.findall('MONKEY', flags=re.IGNORECASE) | ||
0 [] | ||
1 [Monkey] | ||
2 [] | ||
dtype: object | ||
|
||
When the pattern matches more than one string in the Series, all matches | ||
are returned: | ||
|
||
>>> s.str.findall('on') | ||
0 [on] | ||
1 [on] | ||
2 [] | ||
dtype: object | ||
|
||
Regular expressions are supported too. For instance, the search for all the | ||
strings ending with the word `on` is shown next: | ||
|
||
>>> s.str.findall('on$') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you put a small sentence before the example explaining what it is doing or why the result is as it is? (for the others as well) |
||
0 [on] | ||
1 [] | ||
2 [] | ||
dtype: object | ||
|
||
If the pattern is found more than once in the same string, then a list of | ||
strings is returned: | ||
|
||
>>> s.str.findall('b') | ||
0 [] | ||
1 [] | ||
2 [b, b] | ||
dtype: object | ||
|
||
""" | ||
regex = re.compile(pat, flags=flags) | ||
return _na_map(regex.findall, arr) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this
int, default 0
(we changed our minds about this, the docstring guide has been updated, sorry about that)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the docstring guide indicates:
int (default 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but as I said we decided to change that (and I think now it should be reflected in the online guide, although it might take another rebuild to reflect it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, change pushed.