Skip to content

Column lookups using str subclasses fail on DataFrames with DateTime indexes #37366

Closed
@bkurtz

Description

@bkurtz

Consider the following code:

x = pd.DataFrame({"a": [1]}, index=pd.DatetimeIndex(["2020-10-22 01:21:00+00:00"]))
x["b"] = 2
class mystring(str):
    pass

x[mystring("b")] # works
x[mystring("c")] = 3 # error!

We are specifically interested in this because we've been trying to move towards using enum.Enum classes to remove column-name string constants from our code, which works well in many cases, but fails badly when there's a datetime index. Just as an example, we might do something like

import enum
@enum.unique
class ColNames(str, enum.Enum):
    NAME = "name"
    POWER_W = "power (watts)"

As far as I can tell, what's going on is:

  1. When indexing, pandas (for reasons I'm not sure I understand) first tries to interpret the key as a row slice
  2. That function will happily accept str subclasses (and works as expected if the column already exists)
  3. But with a DateTime index, it eventually gets here where apparently str subclasses are no longer accepted and a TypeError is raised.

I see two easy solutions (i.e. that I could easily submit a PR for):

  1. Convert the key to a normal string either here or in the subclass implementations thereof
  2. Watch for TypeErrors here in addition to the other known types there, allowing it to gracefully continue on and try the key as a column name (which will then succeed)

I can also envision fixing this by
3. extending the low-level parsing function to work better with this case, or
4. finishing the deprecation of row lookups using regular frame[indx] notation
However solution 3 is outside my comfort zone, and 4 seems like it might be more involved anyway.

It feels to me like solution 3 is the "correct" resolution to this problem, and that 1 is almost as good (if applied in all the right places), but since it's not my project, I wanted to get feedback before trying to jump in to any of these.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions