Skip to content

Should Pandas adopt a naming convention for its protocol methods #26915

Closed
@ghost

Description

numpy has defined various protocol extensions over the years, and has consistently
named them with an array prefix, and "dunder" notation:

pandas has't been as diligent. The one example I've been dealing with, is _reduce. Originally an internal Series method (not a protocol), it now dispatches to _reduce for subclasses of ExtensionArray.

pandas/pandas/core/series.py

Lines 3743 to 3745 in baa77c3

elif isinstance(delegate, ExtensionArray):
# dispatch to ExtensionArray interface
return delegate._reduce(name, skipna=skipna, **kwds)

When reading the code for a new EA project, its hard to pick out that _reduce is actually an override of the parent class, instead of just an internal function written by the EA author. Something like __pandas_reduce__ would have made this clearer.

Due to inexperience with EA, I lost a bit of time figuring out why s.sum() wasn't invoking the EA implementation I thought it would.

Granted, it's not exactly a protocol. For the forseeable future, this will always be an ExtensionArray subclass, rather than a duck typed object. But, for readability,
and while EA is experimental, this might be the right time to clean it up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions