Skip to content

API: should setitem-with-expansion _ever_ raise? #37774

Closed
@jbrockmendel

Description

@jbrockmendel

ATM we are pretty inconsistent as to when setitem-with-expansion raises. This boils down to idiosyncratic casting rules for Index.insert mentioned in today's dev call.

Examples:

ri = pd.Index(range(6))
ci = pd.CategoricalIndex(["a", "a", "b", "b", "c", "a"])
dti = pd.date_range("2016-01-01", periods=6)
mi = pd.MultiIndex.from_arrays([ri, dti])

ser1 = pd.Series(range(6), index=ci)
ser2 = pd.Series(range(6), index=dti)
ser3 = pd.Series(range(6), index=mi)

>>> ser1.loc["d"] = 10
ValueError: 'fill_value=d' is not present in this Categorical's categories

>>> ser2.loc[4] = 10
TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.

>>> ser2.loc["foo"]  = 10  # <-- casts to object

>>> ser3.loc[dti[0], dti[0]] = 10
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'

>>> ser3.loc[3, 4] = 10
TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.

>>> ser3.loc[3, "a"] = 10  # <-- casts level to object

I see two options:

  1. use consistent casting rules for new item being inserted, so allow iff dtype can be retained.
  2. always cast, never raise

Option 2 seems the more user-friendly, and is necessary for e.g. some of our crosstab tests which insert "All". AFAICT that is why DTI/TDI have a special case casting for strings and raising for everything else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions