Skip to content

Feature request: Protocol for converting something to a pandas DataFrame #30218

Open
@amueller

Description

@amueller

This is inspired by the discussion in scikit-learn/enhancement_proposals#25.

NumPy defines an __array__ protocol that allows developers to implement classes that can be converted to an array by calling np.asarray() . That makes it easy to have a common interface between libraries and it's heavily used by pandas and sklearn.

It would be great to have a similar protocol for converting something to a pandas DataFrame. The goal would be to allow users to pass other data structures to libraries that expect a dataframe, say seaborn, as long as the data structures allow conversion to pd.DataFrame.

A workaround is for the developer of the new datastructure to provide an .asframe method, but that creates friction and requires the users to know what data type a particular library or function expects. If instead the developer of the datastructure can declare that conversion to a dataframe is possible, the library author (say seaborn) can request conversion to dataframe in a unified manner.

The implementation of this is probably pretty simple as it requires "only" a special case in pd.DataFrame.__init__. The main work is probably in adding it to developer documentation and publicizing it correctly.

cc @jorisvandenbossche

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions