Closed
Description
I don't think there is a way to get the nlargest elements in a DataFrame without sorting.
In ordinary python you'd use heapq's nlargest (and we can hack a bit to use it for a DataFrame):
In [10]: df
Out[10]:
IP Agent Count
0 74.86.158.106 Mozilla/5.0+(compatible; UptimeRobot/2.0; http... 369
1 203.81.107.103 Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20... 388
2 173.199.120.155 Mozilla/5.0 (compatible; AhrefsBot/4.0; +http:... 417
3 124.43.84.242 Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.3... 448
4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454
5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461
6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467
In [11]: df.sort('Count', ascending=False).head(3)
Out[11]:
IP Agent Count
6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467
5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461
4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454
In [21]: from heapq import nlargest
In [22]: top_3 = nlargest(3, df.iterrows(), key=lambda x: x[1]['Count'])
In [23]: pd.DataFrame.from_items(top_3).T
Out[23]:
IP Agent Count
6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467
5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461
4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454
This is much slower than sorting, presumbly from the overhead, I thought I'd throw this as a feature idea anyway.