Skip to content

BUG: partial string indexing with scalar #27712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 4, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Interval
Indexing
^^^^^^^^

-
- Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`)
-
-

Expand Down
5 changes: 5 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1704,6 +1704,11 @@ def _is_scalar_access(self, key: Tuple):
if isinstance(ax, MultiIndex):
return False

if isinstance(k, str) and ax.is_all_dates:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_all_dates the appropriate way to check whether partial string indexing is supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it's potentially expensive. Why not just isinstance(ax, DatetimeIndex)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could alias is_all_dates to _partial_str_indexing_supported (or something less verbose) to make this more obvious?

For non-object Index is_all_dates is pretty cheap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right place for this, rather you just call:

self._get_partial_string_timestamp_match_key(key, labels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you do that? The problem with the current structure is that _LocIndexer.__getitem__ if self._is_scalar_access(key) returns True, we go straight to return self._getitem_scalar(key), which is incorrect for partial string indexing.

It doesn't look like _get_partial_string_timestamp_match_key will work, at least not directly. That doesn't convert the partial string key to a slice or array indexer.

(Pdb) pp key
('2000', 'A')
(Pdb) pp self._get_partial_string_timestamp_match_key(key[0], self.obj.axes[0])
'2000'

Stepping back, I think that _is_scalar_access should return False for partial string indexing, since it's not actually a scalar access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could alias is_all_dates to _partial_str_indexing_supported (or something less verbose) to make this more obvious?

For non-object Index is_all_dates is pretty cheap

It's the object case that I'm worried about, as I think this always hit.

Rather than inferring, why don't we define whether an index type supports partial string indexing as a class attribute? That skips the (potentially) expensive inference step, which IIUC would only be useful if (object-type) Index objects supported partial string indexing, which I don't believe they do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0fb9bbe has that, as a POC. All the indexing tests pass locally.

# partial string indexing, df.loc['2000', 'A']
# should not be considered scalar
return False

if not ax.is_unique:
return False

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/indexes/datetimes/test_partial_slicing.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,3 +468,14 @@ def test_getitem_with_datestring_with_UTC_offset(self, start, end):
with pytest.raises(ValueError, match="The index must be timezone"):
df = df.tz_localize(None)
df[start:end]

def test_slice_reduce_to_series(self):
# GH 27516
df = pd.DataFrame(
{"A": range(24)}, index=pd.date_range("2000", periods=24, freq="M")
)
expected = pd.Series(
range(12), index=pd.date_range("2000", periods=12, freq="M"), name="A"
)
result = df.loc["2000", "A"]
tm.assert_series_equal(result, expected)