-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the pandas.Index.drop_duplicates and pandas.Series.drop_duplicates docstring #20114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
f7e098a
17e3846
dd9d3a4
13406a1
bcfceaf
2876b26
863b961
b41e1d1
6c8de22
9f8e438
0d604ac
d92c124
0453aea
c300ea6
8763f33
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4017,8 +4017,52 @@ def unique(self, level=None): | |
result = super(Index, self).unique() | ||
return self._shallow_copy(result) | ||
|
||
@Appender(base._shared_docs['drop_duplicates'] % _index_doc_kwargs) | ||
def drop_duplicates(self, keep='first'): | ||
""" | ||
Return Index with duplicate values removed. | ||
|
||
The drop_duplicates method can remove occurences or whole sets | ||
of duplicated entries in a pandas.Index object. | ||
|
||
Parameters | ||
---------- | ||
keep : {'first', 'last', ``False``}, default 'first' | ||
- 'first' : Drop duplicates except for the first occurrence. | ||
- 'last' : Drop duplicates except for the last occurrence. | ||
- ``False`` : Drop all duplicates. | ||
|
||
Returns | ||
------- | ||
deduplicated : Index | ||
|
||
See Also | ||
-------- | ||
pandas.Series.drop_duplicates : equivalent method on pandas.Series | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated in d92c124 |
||
|
||
Examples | ||
-------- | ||
Generate an pandas.Index with duplicate values. | ||
|
||
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo']) | ||
|
||
With the 'keep' parameter, the selection behaviour of duplicated values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change first sentence to
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sentence updated in 0d604ac |
||
can be changed. The value 'first' keeps the first occurrence for each | ||
set of duplicated entries. The default value of keep is 'first'. | ||
|
||
>>> idx.drop_duplicates(keep='first') | ||
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object') | ||
|
||
The value 'last' keeps the last occurrence for each set of duplicated | ||
entries. | ||
|
||
>>> idx.drop_duplicates(keep='last') | ||
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object') | ||
|
||
The value ``False`` discards all sets of duplicated entries. | ||
|
||
>>> idx.drop_duplicates(keep=False) | ||
Index(['cow', 'beetle', 'hippo'], dtype='object') | ||
""" | ||
return super(Index, self).drop_duplicates(keep=keep) | ||
|
||
@Appender(base._shared_docs['duplicated'] % _index_doc_kwargs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1316,8 +1316,77 @@ def unique(self): | |
|
||
return result | ||
|
||
@Appender(base._shared_docs['drop_duplicates'] % _shared_doc_kwargs) | ||
def drop_duplicates(self, keep='first', inplace=False): | ||
""" | ||
Return Series with duplicate values removed. | ||
|
||
The drop_duplicates method can remove occurences or whole sets | ||
of duplicated entries in a pandas.Series object. | ||
|
||
Parameters | ||
---------- | ||
keep : {'first', 'last', ``False``}, default 'first' | ||
- 'first' : Drop duplicates except for the first occurrence. | ||
- 'last' : Drop duplicates except for the last occurrence. | ||
- ``False`` : Drop all duplicates. | ||
inplace : boolean, default ``False`` | ||
If ``True``, performs operation inplace and returns None. | ||
|
||
Returns | ||
------- | ||
deduplicated : Series | ||
|
||
See Also | ||
-------- | ||
pandas.Index.drop_duplicates : equivalent method on pandas.Index | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can also DataFrame.drop_duplicates. and Series.duplicated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you edit DataFrame.drop_duplicates if needed to ensure the back-link to here (unless other PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's keep that for another PR. But good idea to also add DataFrame.drop_duplicates and Series.duplicated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. adjusted in c300ea6 |
||
Examples | ||
-------- | ||
Generate an Series with duplicated entries. | ||
|
||
>>> s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'], | ||
... name='animal') | ||
>>> s | ||
0 lama | ||
1 cow | ||
2 lama | ||
3 beetle | ||
4 lama | ||
5 hippo | ||
Name: animal, dtype: object | ||
|
||
With the 'keep' parameter, the selection behaviour of duplicated values | ||
can be changed. The value 'first' keeps the first occurrence for each | ||
set of duplicated entries. The default value of keep is 'first'. | ||
|
||
>>> s.drop_duplicates() | ||
0 lama | ||
1 cow | ||
3 beetle | ||
5 hippo | ||
Name: animal, dtype: object | ||
|
||
The value 'last' for parameter 'keep' keeps the last occurrence for | ||
each set of duplicated entries. | ||
|
||
>>> s.drop_duplicates(keep='last') | ||
1 cow | ||
3 beetle | ||
4 lama | ||
5 hippo | ||
Name: animal, dtype: object | ||
|
||
The value ``False`` for parameter 'keep' discards all sets of | ||
duplicated entries. Setting the value of 'inplace' to ``True`` performs | ||
the operation inplace and returns ``None``. | ||
|
||
>>> s.drop_duplicates(keep=False, inplace=True) | ||
>>> s | ||
1 cow | ||
3 beetle | ||
5 hippo | ||
Name: animal, dtype: object | ||
""" | ||
return super(Series, self).drop_duplicates(keep=keep, inplace=inplace) | ||
|
||
@Appender(base._shared_docs['duplicated'] % _shared_doc_kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "whole sets" means here. Personally I think this extended summary can be removed. The first line is descriptive enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks Tom, I'll adjust accordingly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 0453aea