loc very slow on sorted, non-unique index with list of labels ar argument

```
In [1]: import pandas, numpy

In [2]: df = pandas.DataFrame(numpy.random.random((100000, 4)))

In [3]: %timeit df.loc[55555]
10000 loops, best of 3: 118 µs per loop

In [4]: %timeit df.loc[[55555]]
1000 loops, best of 3: 324 µs per loop
```

... makes sense to me.

```
In [5]: df.index = list(range(99999)) + [55555]

In [6]: %timeit df.loc[55555]
100 loops, best of 3: 4.04 ms per loop

In [7]: %timeit df.loc[[55555]]
100 loops, best of 3: 16.8 ms per loop
```

Non-unique index, slower (the second call probably has to scan all the index): still makes sense to me. Sorting should improve things...

```
In [8]: df.sort(inplace=True)

In [9]: %timeit df.loc[55555]
1000 loops, best of 3: 239 µs per loop

In [10]: %timeit df.loc[[55555]]
100 loops, best of 3: 17.2 ms per loop
```

... here I'm lost: why this huge difference? The difference is even larger (3 orders of magnitude) in a real database I am working on. Clearly,

```
In [12]: df.loc[[55555]] == df.loc[55555]
Out[12]: 
          0     1     2     3
55555  True  True  True  True
55555  True  True  True  True
```

(As a sidenote: the reason why I'm doing calls such as df.loc[[a_label]] is that df.loc[a_label] will return sometimes a Series, sometimes a DataFrame. I currently solve this by using df.loc[df.index == a_label], which is however ~3x slower than df.loc[a_label] - but much faster than the above df.loc[[a_label]].)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

loc very slow on sorted, non-unique index with list of labels ar argument #9466

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

loc very slow on sorted, non-unique index with list of labels ar argument #9466

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions