Skip to content

DOC: CategoricalDtype equality semantics aren't completely described #57259

Closed
@VladimirFokow

Description

@VladimirFokow

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/user_guide/categorical.html#equality-semantics

https://github.com/pandas-dev/pandas/blob/main/pandas/core/dtypes/dtypes.py#L407

Documentation problem

Problematic statement 1:

Two instances of CategoricalDtype compare equal whenever they have the same categories and order.

Problematic statement 2:

  1. A CDT with ordered={False, None} is only equal to another CDT with
    ordered={False, None} and identical categories.

Counter-example:

>>> a = pd.Categorical(np.full(2, np.nan, dtype=object))
>>> b = pd.Categorical(np.full(2, np.nan))

>>> a, b
([NaN, NaN]
 Categories (0, object): [],
 [NaN, NaN]
 Categories (0, float64): [])

>>> a.dtype, b.dtype
(CategoricalDtype(categories=[], ordered=False, categories_dtype=object),
 CategoricalDtype(categories=[], ordered=False, categories_dtype=float64))

>>> a.dtype == b.dtype
False

As we can see, they both have ordered=False, and their categories are same.
Following the documentation, they should be equal.

Suggested fix for documentation

to have accurate and exhaustive descriptions

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions