Skip to content

Efficiency of SparseArray.__getitem__(SparseArray[bool]) #23122

Closed
@TomAugspurger

Description

@TomAugspurger

This currently densifies:

# TODO: I think we can avoid densifying when masking a
# boolean SparseArray with another. Need to look at the
# key's fill_value for True / False, and then do an intersection
# on the indicies of the sp_values.
if isinstance(key, SparseArray):
if is_bool_dtype(key):
key = key.to_dense()
else:
key = np.asarray(key)

I haven't investigated it, but we should be able to do a boolean mask as an
intersection sp_values on self and key. If key is SparseDtype[bool, False]
(i.e. False is the fill_value) this should be a lot faster.

Metadata

Metadata

Assignees

Labels

PerformanceMemory or execution speed performanceSparseSparse Data Type

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions