crosstab is not respecting filtered scope when dtype=category

First of all, I have to admit that I have some trouble finding a catchy title for this issue. Better suggestions are welcome...

There is some issue with crosstab, when applied on a DataFrame which has been derived from a DataFrame consisting of categorized columns.

Let's consider the following example df:

```
import pandas as pd
df = pd.DataFrame({'col0':list('abbabcabcab'), 'col1':[1,1,1,1,2,3,2,2,3,3,1],
                  'crit':[1,1,1,1,1,0,0,0,0,0,0]})
```

We apply some filter according to some criterion which happens to eliminate  all occurrences of 'c' in 'col0' (and '3' in 'col1'). Then we build a crosstab:

```
df_filtered = df[df.crit == 1]
pd.crosstab(df_filtered.col0, df_filtered.col1)
```

The result is as expected:

```
col1    1   2
col0        
a   2   0
b   2   1
```

Now we try the same on categorized columns:

```
for col in df.columns:
    df[col] = df[col].astype('category')
df_filtered = df[df.crit == 1]
pd.crosstab(df_filtered.col0, df_filtered.col1
```

In this case, value 'c' and '3' are again listed, although they shouldn't be:

```
col1    1   2   3
col0            
a   2   0   0
b   2   1   0
c   0   0   0
```

My guess is that the reason for this behavior can be found in the lookup-table of the category, which does not account for values that are not represented any longer:
`df_filtered.col0.cat.categories`
Output:
`Index(['a', 'b', 'c'], dtype='object'`

So, either this lookup-table has to be fixed upon filtering, or pd.crosstab has to respect this condition.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

crosstab is not respecting filtered scope when dtype=category #12298

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

crosstab is not respecting filtered scope when dtype=category #12298

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions