Skip to content

BUG: DataFrame/Series.loc improperly allows lookups of boolean labels/slices #20432

Closed
@jschendel

Description

@jschendel

Code Sample, a copy-pastable example if possible

Basic example of the issue, specific to TimedeltaIndex, xref #20408 (comment)

In [2]: s = pd.Series(list('abcde'), pd.timedelta_range(0, 4, freq='ns'))

In [3]: s.loc[True]
Out[3]: 'b'

In [4]: s.loc[False:True]
Out[4]:
00:00:00           a
00:00:00.000000    b
Freq: N, dtype: object

Indexing with both boolean labels and slices was successful, which doesn't seem right.

I investigated this same behavior across various index types for both Series and DataFrame, and produced the summary below.

Summary

  • 'raises' column indicates if the indexing operation raised an exception
  • 'exception' column indicates the type of exception raised
                                  raises  exception
CategoricalIndex DataFrame label    True   KeyError
                           slice    True   KeyError
                 Series    label   False        NaN
                           slice    True   KeyError
DatetimeIndex    DataFrame label   False        NaN
                           slice   False        NaN
                 Series    label   False        NaN
                           slice   False        NaN
Float64Index     DataFrame label   False        NaN
                           slice   False        NaN
                 Series    label   False        NaN
                           slice   False        NaN
Index            DataFrame label   False        NaN
                           slice   False        NaN
                 Series    label   False        NaN
                           slice   False        NaN
Int64Index       DataFrame label    True   KeyError
                           slice   False        NaN
                 Series    label    True   KeyError
                           slice   False        NaN
IntervalIndex    DataFrame label    True  TypeError
                           slice    True  TypeError
                 Series    label    True  TypeError
                           slice    True  TypeError
MultiIndex       DataFrame label    True   KeyError
                           slice    True   KeyError
                 Series    label    True   KeyError
                           slice    True   KeyError
PeriodIndex      DataFrame label    True   KeyError
                           slice   False        NaN
                 Series    label    True   KeyError
                           slice   False        NaN
RangeIndex       DataFrame label    True   KeyError
                           slice   False        NaN
                 Series    label    True   KeyError
                           slice   False        NaN
TimedeltaIndex   DataFrame label   False        NaN
                           slice   False        NaN
                 Series    label   False        NaN
                           slice   False        NaN
UInt64Index      DataFrame label    True   KeyError
                           slice   False        NaN
                 Series    label    True   KeyError
                           slice   False        NaN

Code to produce summary

indexes = [
    pd.RangeIndex(4),
    pd.Int64Index(range(4)),
    pd.UInt64Index(range(4)),
    pd.Float64Index(range(4)),
    pd.CategoricalIndex(range(4)),
    pd.date_range(0, periods=4, freq='ns'),
    pd.timedelta_range(0, periods=4, freq='ns'),
    pd.interval_range(0, periods=4),
    pd.Index([0, 1, 2, 3], dtype=object),
    pd.MultiIndex.from_product([[0, 1], [0, 1]]),
    pd.period_range('2018Q1', freq='Q', periods=4),  # need better example here
]

result = {}
for index in indexes:
    index_name = type(index).__name__
    s = pd.Series(list('abcd'), index=index)
    for obj in (s, s.to_frame()):
        obj_name = type(obj).__name__

        # check single label
        key = (index_name, obj_name, 'label')
        try:
            obj.loc[True]
            result[key] = {'raises': False}
        except Exception as e:
            result[key] = {'raises': True, 'exception': type(e).__name__}

        # check slice
        key = (index_name, obj_name, 'slice')
        try:
            obj.loc[False:True]
            result[key] = {'raises': False}
        except Exception as e:
            result[key] = {'raises': True, 'exception': type(e).__name__}

result = pd.DataFrame.from_dict(result, orient='index')

Expected Output

I'd generally expect all of these operations to raise a KeyError, which a couple potential exceptions:

  • I'd be open to an argument for numeric indexes casting to integer equivalent. Seems like this should at least be consistent for labels vs slices, which it is not right now.
  • Maybe we should allow conversion for the object dtype Index?

Metadata

Metadata

Labels

IndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions