Skip to content

groupby().sum() very slow when applied to boolean columns #2692

Closed
@lselector

Description

@lselector

While upgrading pandas from 0.7.2 to 0.9.1 we have bumped into slowness of certain groupby().sum() operations. Here is a simple example:

N=10000
aa=DataFrame({'ii':range(N),'bb':[True for x in range(N)]})
timeit aa.sum() # fast
timeit aa.groupby('bb').sum() #fast
timeit aa.groupby('ii').sum() # very slow (~ 1000 times slower)

Metadata

Metadata

Assignees

Labels

BugPerformanceMemory or execution speed performance

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions