Description
This is inspired by the discussion in scikit-learn/enhancement_proposals#25.
NumPy defines an __array__
protocol that allows developers to implement classes that can be converted to an array by calling np.asarray()
. That makes it easy to have a common interface between libraries and it's heavily used by pandas and sklearn.
It would be great to have a similar protocol for converting something to a pandas DataFrame
. The goal would be to allow users to pass other data structures to libraries that expect a dataframe, say seaborn, as long as the data structures allow conversion to pd.DataFrame
.
A workaround is for the developer of the new datastructure to provide an .asframe
method, but that creates friction and requires the users to know what data type a particular library or function expects. If instead the developer of the datastructure can declare that conversion to a dataframe is possible, the library author (say seaborn) can request conversion to dataframe in a unified manner.
The implementation of this is probably pretty simple as it requires "only" a special case in pd.DataFrame.__init__
. The main work is probably in adding it to developer documentation and publicizing it correctly.