Implement DataFrame.__array_ufunc__

Applying a ufunc on a `DataFrame` with sparse columns does not retain its sparse dtype:

```
In [105]: df = pd.SparseDataFrame(np.array([[0, 1, 0], [1, 0, 1]]),
                                  columns=['a', 'b', 'c'], default_fill_value=0)
In [106]: df2 = pd.DataFrame(df)

In [107]: np.exp(df)['a']
Out[107]: 
0    1.000000
1    2.718282
Name: a, dtype: Sparse[float64, 0]
BlockIndex
Block locations: array([0], dtype=int32)
Block lengths: array([2], dtype=int32)

In [108]: np.exp(df2)['a']
Out[108]: 
0    1.000000
1    2.718282
Name: a, dtype: float64
```

Although `SparseDataFrame` returns the correct thing here, I am not sure it actually *works* as desired, as I am not sure it prevents materializing the full data (which in principle should be possible to not do)

--- 

*edit from Tom*

Implementing `DataFrame.__array_ufunc__` is probably the best way to do this.

The semantics will be similar to
[Series.__array_ufunc__](https://dev.pandas.io/development/extending.html#numpy-universal-functions),
but applied blockwise.

1. Series and DataFrame objs in `inputs` will first be aligned.
2. All arrays will be unboxed from blocks
3. The ufunc will be applied to each array. If the array defines array_ufunc, it'll be called.
4. The results will be re-boxed in a DataFrame with the original labels.

There are some additional complicates with dimensionality, shapes, broadcasting... But the basic idea of using `__array_ufunc__` blockwise so that the underlying array's `__array_ufunc__` is called makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement DataFrame.__array_ufunc__ #23743

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement DataFrame.__array_ufunc__ #23743

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions