Closed
Description
ATM we are pretty inconsistent as to when setitem-with-expansion raises. This boils down to idiosyncratic casting rules for Index.insert
mentioned in today's dev call.
Examples:
ri = pd.Index(range(6))
ci = pd.CategoricalIndex(["a", "a", "b", "b", "c", "a"])
dti = pd.date_range("2016-01-01", periods=6)
mi = pd.MultiIndex.from_arrays([ri, dti])
ser1 = pd.Series(range(6), index=ci)
ser2 = pd.Series(range(6), index=dti)
ser3 = pd.Series(range(6), index=mi)
>>> ser1.loc["d"] = 10
ValueError: 'fill_value=d' is not present in this Categorical's categories
>>> ser2.loc[4] = 10
TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.
>>> ser2.loc["foo"] = 10 # <-- casts to object
>>> ser3.loc[dti[0], dti[0]] = 10
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'
>>> ser3.loc[3, 4] = 10
TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.
>>> ser3.loc[3, "a"] = 10 # <-- casts level to object
I see two options:
- use consistent casting rules for new item being inserted, so allow iff dtype can be retained.
- always cast, never raise
Option 2 seems the more user-friendly, and is necessary for e.g. some of our crosstab
tests which insert "All". AFAICT that is why DTI/TDI have a special case casting for strings and raising for everything else.