Description
Consider the following code:
x = pd.DataFrame({"a": [1]}, index=pd.DatetimeIndex(["2020-10-22 01:21:00+00:00"]))
x["b"] = 2
class mystring(str):
pass
x[mystring("b")] # works
x[mystring("c")] = 3 # error!
We are specifically interested in this because we've been trying to move towards using enum.Enum
classes to remove column-name string constants from our code, which works well in many cases, but fails badly when there's a datetime index. Just as an example, we might do something like
import enum
@enum.unique
class ColNames(str, enum.Enum):
NAME = "name"
POWER_W = "power (watts)"
As far as I can tell, what's going on is:
- When indexing, pandas (for reasons I'm not sure I understand) first tries to interpret the key as a row slice
- That function will happily accept str subclasses (and works as expected if the column already exists)
- But with a DateTime index, it eventually gets here where apparently str subclasses are no longer accepted and a
TypeError
is raised.
I see two easy solutions (i.e. that I could easily submit a PR for):
- Convert the key to a normal string either here or in the subclass implementations thereof
- Watch for
TypeError
s here in addition to the other known types there, allowing it to gracefully continue on and try the key as a column name (which will then succeed)
I can also envision fixing this by
3. extending the low-level parsing function to work better with this case, or
4. finishing the deprecation of row lookups using regular frame[indx]
notation
However solution 3 is outside my comfort zone, and 4 seems like it might be more involved anyway.
It feels to me like solution 3 is the "correct" resolution to this problem, and that 1 is almost as good (if applied in all the right places), but since it's not my project, I wanted to get feedback before trying to jump in to any of these.