Skip to content

ENH: nlargest for DataFrame #3960

Closed
@hayd

Description

@hayd

I don't think there is a way to get the nlargest elements in a DataFrame without sorting.

In ordinary python you'd use heapq's nlargest (and we can hack a bit to use it for a DataFrame):

In [10]: df
Out[10]:
                IP                                              Agent  Count
0    74.86.158.106  Mozilla/5.0+(compatible; UptimeRobot/2.0; http...    369
1   203.81.107.103  Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...    388
2  173.199.120.155  Mozilla/5.0 (compatible; AhrefsBot/4.0; +http:...    417
3    124.43.84.242  Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.3...    448
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...    454
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...    461
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...    467

In [11]: df.sort('Count', ascending=False).head(3)
Out[11]:
                IP                                              Agent  Count
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...    467
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...    461
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...    454
In [21]: from heapq import nlargest

In [22]: top_3 = nlargest(3, df.iterrows(), key=lambda x: x[1]['Count'])

In [23]: pd.DataFrame.from_items(top_3).T
Out[23]:
                IP                                              Agent Count
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...   467
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...   461
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...   454

This is much slower than sorting, presumbly from the overhead, I thought I'd throw this as a feature idea anyway.

see http://stackoverflow.com/a/17194717/1240268

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions