Skip to content

Move year to .dt.year, and other namespace-specific function #341

Closed
@MarcoGorelli

Description

@MarcoGorelli

We'd originally decided not to bother with temporal (and other) namespaces, and just have .year be a column method, rather than .dt.year

I'm suggesting to go back on this decision, because of the risk of conflicts - I'll explain

pandas is adding nested datatypes (e.g. pyarrow struct pandas-dev/pandas#54977 , and pyarrow list pandas-dev/pandas#55777)

Some methods on those datatypes will clash with the column/dataframe ones. For example:

  • Column.get_value(2) means "get the value from the second row"
  • Column.list.get_value(2) means "for each row, get the second element in that row's list"

Likewise for Column.mean vs Column.list.mean - the former being a reduction, the latter a transformation (preserves input shape)

Presumably, we'll eventually have nested datatypes in the standard too?

So, I'd suggest we mirror existing dataframe libraries and have namespaces for functionality which is limited to certain datatypes:

  • .dt for temporal functions
  • .str for string manipulation functions
  • later, if/when we have nested datatypes, .list / .struct, and .cat for categorical

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions