Skip to content

Don't create (or add option to not create) empty columns when unstacking #1181

Closed
@wesm

Description

@wesm

from @rkern on the pydata mailing list:

Hello! I am using the pivot table functionality to good effect. One
issue I am running to, however, is that the columns end up being
roughly a full Cartesian product of the column pivots. That is, there
are blocks of all-NA columns. The row pivots appear to be pruned
already; all-NA rows don't show up. For pivot values that are roughly
tree-structured (i.e. the children of one node mostly don't show up as
children of a neighboring node), this can create pivot tables with
very many columns. Currently, I post-process the pivot tables using
.dropna(axis=1, how='all'), but I have just run into a case there the
intermediate table is too large for my 32-bit machine. Would there be
a good way to change the pivot table computation to get appropriately
sparse trees for both the rows and the columns? I am happy to look
into it myself, but I did want to check to see if it was on anyone's
radar or if anyone had suggestions.

to my response

That's actually a good point and it's basically an oversight in the pivot_table implementation. If you look at the code, it's basically a convenience function that uses groupby and calls unstack on the aggregated result. I suspect the problem with unstack is that it's creating lots of empty columns; 

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions