Skip to content

API: SparseArray.astype behaviour to always preserve sparseness #34457

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Currently, the SparseArray.astype function will always convert the specified target dtype to a sparse dtype, if it is not one. For example, this gives:

In [64]: arr = pd.arrays.SparseArray([1, 0, 0, 2])  

In [65]: arr   
Out[65]: 
[1, 0, 0, 2]
Fill: 0
IntIndex
Indices: array([0, 3], dtype=int32)

In [66]: arr.astype(float)  
Out[66]: 
[1.0, 0.0, 0.0, 2.0]
Fill: 0.0
IntIndex
Indices: array([0, 3], dtype=int32)

This ensures that a simple astype doesn't densify the sparse array (and you don't need to do astype(pd.SparseDtype(float, fill_value))).
And note this also gives this behaviour to Series.astype(..)

But, this also gives the inconsistency that arr.astype(target_dtype).dtype != target_dtype, so you can rely on the fact that you get back an array of the actual dtype that you specified.
See eg the workaround I need to add for this in #34338

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions