API: require scalar result from the scalar .at indexer

The `.at` indexer is documented as the fast, restricted version of `loc` to "access a scalar value"  (https://pandas.pydata.org/docs/user_guide/indexing.html#fast-scalar-value-getting-and-setting).

However, currently, there is not always a guarantee that the result will actually be a scalar. For example, with a Series with duplicate index values:

```
In [3]: s = pd.Series([1, 2, 3], index=['a', 'b', 'a'])  

In [4]: s.at['a'] 
Out[4]: 
a    1
a    3
dtype: int64
```

There are some other possible cases that could give a non-scalar value as well:

- MultiIndex if not all levels are indexed (#26989)
- DataFrame with duplicate index labels in index and/or columns (#33041)

However, those cases currently all fail (error or produce wrong result, see below for examples). 
So a question that can be posed: **should we consider those cases as bugs, or should we rather actually require a scalar result** and thus see the series behaviour above as too liberal?

Since those linked issues have open PRs to "fix" them, I think we should first decide on what guarantees on the `.at` API we want to provide. And personally, I think taking the documented intention of accessing a scalar would be a plus (it gives a certainty about the type of the result, i.e. always a scalar, and not sometimes a series or dataframe depening on the value of the label passed to `at`).

If we prefer the more strict scalar result requirement, we can deprecate the Series case (but only if accessing a duplicated label) and provide better error messages for the error cases.

---

Examples of the failing behaviours (on pandas 1.0.1):

```python
# duplicated row label -> you get a numpy array (which I would say is always wrong)
>>> df = pd.DataFrame(np.random.randn(3, 2), index=['a', 'b', 'a'], columns=['A', 'B'])  
>>> df.at['a', 'A'] 
array([-0.02914828,  0.2856617 ])

# duplicated column label -> error
>>> df = pd.DataFrame(np.random.randn(3, 2), index=['a', 'b', 'c'], columns=['A', 'A'])  
>>> df.at['a', 'A']  
...
AttributeError: 'BlockManager' object has no attribute 'T'

# Selecting one level of a MultiIndex -> error
>>> s = pd.Series(np.random.randn(4), index=pd.MultiIndex.from_product([['A', 'B'], [1, 2]]))  
>>> s.at['A']  
...
KeyError: 'A'
During handling of the above exception, another exception occurred:
...
AttributeError: 'numpy.ndarray' object has no attribute '_values'

# selecting all MultiIndex levels -> this should actually work (#26989)
>>> s.at['A', 1] 
...
TypeError: _get_value() got multiple values for argument 'takeable'

# DataFrame with all MultiIndex levels specified works OK
>>>s.to_frame('col').at[('A', 1), 'col']
1.341421652269149
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: require scalar result from the scalar .at indexer #33153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: require scalar result from the scalar .at indexer #33153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions