Skip to content

API: comparisons of categorical data and (scalar or list-like) #8995

Closed
@jankatins

Description

@jankatins

From #8946:

If cat > scalar is allowed and cat == list also because it basically is doing a comparison of each line as if it was the scalar case, then by that logic, cat > list should also be allowed: each row in that comparison would treat the element from the list as a scalar.

On the other hand a scalar comparison with the categorical makes only sense if the scalar can be treated as a category (for any other value, it's basically a "not of the same type" comparison, which would raise on python3), so the scalar must be in categories and this should not work:

In[4]: df = pd.DataFrame({"a":[1,3,3,3,np.nan]})
In[6]: df["b"] = df.a.astype("category")
In[7]: df.b
Out[7]: 
0     1
1     3
2     3
3     3
4   NaN
Name: b, dtype: category
Categories (2, float64): [1 < 3]
In[8]: df.b > 2
Out[8]: 
0    False
1     True
2     True
3     True
4    False
Name: b, dtype: bool

Oh, one more thing: according to that thought, df.b == 2 (-> The "equality" case) should also NOT work, because 2 is not in categories and therefore a "different type".

Current code results in this:

In [5]: df.b==2
Out[5]: 
0    False
1    False
2    False
3    False
4    False
Name: b, dtype: bool

this is actually consistent (e.g. it returns False). On a comparison it shouldn't raise so this is a reasonable result. I think this is de-facto like the following and is useful.

In [7]: Series(['a','b','c'])==2
Out[7]: 
0    False
1    False
2    False
dtype: bool

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions