Skip to content

crosstab is not respecting filtered scope when dtype=category #12298

Closed
@rumbin

Description

@rumbin

First of all, I have to admit that I have some trouble finding a catchy title for this issue. Better suggestions are welcome...

There is some issue with crosstab, when applied on a DataFrame which has been derived from a DataFrame consisting of categorized columns.

Let's consider the following example df:

import pandas as pd
df = pd.DataFrame({'col0':list('abbabcabcab'), 'col1':[1,1,1,1,2,3,2,2,3,3,1],
                  'crit':[1,1,1,1,1,0,0,0,0,0,0]})

We apply some filter according to some criterion which happens to eliminate all occurrences of 'c' in 'col0' (and '3' in 'col1'). Then we build a crosstab:

df_filtered = df[df.crit == 1]
pd.crosstab(df_filtered.col0, df_filtered.col1)

The result is as expected:

col1    1   2
col0        
a   2   0
b   2   1

Now we try the same on categorized columns:

for col in df.columns:
    df[col] = df[col].astype('category')
df_filtered = df[df.crit == 1]
pd.crosstab(df_filtered.col0, df_filtered.col1

In this case, value 'c' and '3' are again listed, although they shouldn't be:

col1    1   2   3
col0            
a   2   0   0
b   2   1   0
c   0   0   0

My guess is that the reason for this behavior can be found in the lookup-table of the category, which does not account for values that are not represented any longer:
df_filtered.col0.cat.categories
Output:
Index(['a', 'b', 'c'], dtype='object'

So, either this lookup-table has to be fixed upon filtering, or pd.crosstab has to respect this condition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions