Skip to content

API: require scalar result from the scalar .at indexer #33153

Open
@jorisvandenbossche

Description

@jorisvandenbossche

The .at indexer is documented as the fast, restricted version of loc to "access a scalar value" (https://pandas.pydata.org/docs/user_guide/indexing.html#fast-scalar-value-getting-and-setting).

However, currently, there is not always a guarantee that the result will actually be a scalar. For example, with a Series with duplicate index values:

In [3]: s = pd.Series([1, 2, 3], index=['a', 'b', 'a'])  

In [4]: s.at['a'] 
Out[4]: 
a    1
a    3
dtype: int64

There are some other possible cases that could give a non-scalar value as well:

However, those cases currently all fail (error or produce wrong result, see below for examples).
So a question that can be posed: should we consider those cases as bugs, or should we rather actually require a scalar result and thus see the series behaviour above as too liberal?

Since those linked issues have open PRs to "fix" them, I think we should first decide on what guarantees on the .at API we want to provide. And personally, I think taking the documented intention of accessing a scalar would be a plus (it gives a certainty about the type of the result, i.e. always a scalar, and not sometimes a series or dataframe depening on the value of the label passed to at).

If we prefer the more strict scalar result requirement, we can deprecate the Series case (but only if accessing a duplicated label) and provide better error messages for the error cases.


Examples of the failing behaviours (on pandas 1.0.1):

# duplicated row label -> you get a numpy array (which I would say is always wrong)
>>> df = pd.DataFrame(np.random.randn(3, 2), index=['a', 'b', 'a'], columns=['A', 'B'])  
>>> df.at['a', 'A'] 
array([-0.02914828,  0.2856617 ])

# duplicated column label -> error
>>> df = pd.DataFrame(np.random.randn(3, 2), index=['a', 'b', 'c'], columns=['A', 'A'])  
>>> df.at['a', 'A']  
...
AttributeError: 'BlockManager' object has no attribute 'T'

# Selecting one level of a MultiIndex -> error
>>> s = pd.Series(np.random.randn(4), index=pd.MultiIndex.from_product([['A', 'B'], [1, 2]]))  
>>> s.at['A']  
...
KeyError: 'A'
During handling of the above exception, another exception occurred:
...
AttributeError: 'numpy.ndarray' object has no attribute '_values'

# selecting all MultiIndex levels -> this should actually work (#26989)
>>> s.at['A', 1] 
...
TypeError: _get_value() got multiple values for argument 'takeable'

# DataFrame with all MultiIndex levels specified works OK
>>>s.to_frame('col').at[('A', 1), 'col']
1.341421652269149

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions