ENH: nlargest for DataFrame

I don't _think_ there is a way to get the nlargest elements in a DataFrame without sorting.

In ordinary python you'd use heapq's nlargest (and we can hack a bit to use it for a DataFrame):

```
In [10]: df
Out[10]:
                IP                                              Agent  Count
0    74.86.158.106  Mozilla/5.0+(compatible; UptimeRobot/2.0; http...    369
1   203.81.107.103  Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...    388
2  173.199.120.155  Mozilla/5.0 (compatible; AhrefsBot/4.0; +http:...    417
3    124.43.84.242  Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.3...    448
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...    454
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...    461
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...    467

In [11]: df.sort('Count', ascending=False).head(3)
Out[11]:
                IP                                              Agent  Count
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...    467
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...    461
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...    454
```

```
In [21]: from heapq import nlargest

In [22]: top_3 = nlargest(3, df.iterrows(), key=lambda x: x[1]['Count'])

In [23]: pd.DataFrame.from_items(top_3).T
Out[23]:
                IP                                              Agent Count
6   124.43.104.198  Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20...   467
5   124.43.155.138  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G...   461
4  112.135.196.223  Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...   454
```

This is much slower than sorting, presumbly from the overhead, I thought I'd throw this as a feature idea anyway.

see http://stackoverflow.com/a/17194717/1240268


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: nlargest for DataFrame #3960

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: nlargest for DataFrame #3960

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions