Discussion: Why is loc label-based slicing right-inclusive? 

I've searched the documentation and old issues and I can't find anything on the reason for this unusual choice. The original historic PR was #2922. 

One user raised the question before  in #14900 and aggressively shut down by jeff without an answer.

It has caused bugs in my code in the past, and I've seen it do the same for other people, 
It is the cause of subtle bugs seen in the [wild](https://gist.github.com/betatim/c59039682d92fab89859358e8c585313), https://github.com/pandas-dev/pandas/issues/26959#issuecomment-504456480.  It's inconsistency with python slice conventions is confusing to newbies who ask on [SO](https://stackoverflow.com/questions/55187559/why-is-loc-slicing-in-pandas-inclusive-of-stop-contrary-to-typical-python-slic) but no explanation is given. 

Every [pandas tutorial](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c) has to mention this special case:
```
.loc includes the last value with slice notation. In other data containers such as Python lists, the last value is excluded.
```

users often need to slice something with `closed='left'` behavior, and try [to add it](https://github.com/pandas-dev/pandas/issues/12398) in similar situations where it isn't available.

This behavior is fully documented. That's not the problem. For example,
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html says

```
Note that contrary to usual python slices, both the start and the stop are 
included, when present in the index! See "Slicing with labels".).
```

The "Slicing with labels" section documents the behavior, but gives no reason why the choice to break with python conventions was taken.

Another user who asked this question was given a workaround which is much more cumbersome compared to the convenience of `.loc` in https://github.com/pandas-dev/pandas/issues/16571#issuecomment-305599270.

He points to [DataFrame.between_time](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.between_time.html), which has kwds for requesting this behavior, but infuriatingly, accepts only times and not datetimes.

A  few related cookbook entries were added 
https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#dataframes
which explains
```
There are 2 explicit slicing methods, with a third general case

    Positional-oriented (Python slicing style : exclusive of end)
    Label-oriented (Non-Python slicing style : inclusive of end)
```
this again documents how `loc` works, but again offers no reason why pandas originally broke with python conventions. 

In every case I've seen someone ask "why is label-based slicing right-inclusive?" the answer has always been "because it's label based, not position based", which doesn't really explain anything.

The same issue exists with string based partial time slicing such like `df.loc[:"2018"]`,
which will include rows with `2018-01-01` prefix.

So, over the years several people have found this undesirable and/or have tried to find out why, but with an hour's worth of gooling and reading, I can't find an explanation ever have been given.

I'm not saying there's no good reason, I understand that indexing can get complicated, and mixed cases are tricky, etc'. But I want to understand why this was necessary or desirable, and why making `.loc` pythonic was unfeasible.

I'll be opening a very small POC PR for discussion in a few minutes, which adds a new indexer called `locs`. <del>It is far from complete, but it seems to do exactly what I want for the single-index case, in what I've tried so far.</del> it passes all the equivalent tests loc does.

To summarize:
- it's been asked before, but not answered.
- the behavior is documented, but is surprising and never explained.
- It's not immediately obvious why it's impossible to implement the pythonic version.

So I ask, why is `.loc` right-inclusive?











Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Discussion: Why is loc label-based slicing right-inclusive? #27059

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discussion: Why is loc label-based slicing right-inclusive? #27059

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions