Skip to content

ENH: Add "select all" option for duplicates #10592

Closed
@nickeubank

Description

@nickeubank

How would people feel about me adding a "select_all" option to duplicates / duplicated that returns all duplicated rows, not just all but first or all but last?

I often check to see if I have duplicates by some primary key, and if I do, I then want to look at ALL rows with the same primary key. Right now, duplicates / duplicated won't do so and I need to use the following hack:

df[df.duplicated('key',take_last=True) | df.duplicated('key')]

It seems I"m not the only person with this issue:
http://stackoverflow.com/questions/26244309/how-to-analyze-all-duplicate-entries-in-this-pandas-dataframe
http://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
http://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas

Happy to write PR if people support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions