Closed
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/user_guide/categorical.html#equality-semantics
https://github.com/pandas-dev/pandas/blob/main/pandas/core/dtypes/dtypes.py#L407
Documentation problem
Problematic statement 1:
Two instances of
CategoricalDtype
compare equal whenever they have the same categories and order.
Problematic statement 2:
- A CDT with ordered={False, None} is only equal to another CDT with
ordered={False, None} and identical categories.
Counter-example:
>>> a = pd.Categorical(np.full(2, np.nan, dtype=object))
>>> b = pd.Categorical(np.full(2, np.nan))
>>> a, b
([NaN, NaN]
Categories (0, object): [],
[NaN, NaN]
Categories (0, float64): [])
>>> a.dtype, b.dtype
(CategoricalDtype(categories=[], ordered=False, categories_dtype=object),
CategoricalDtype(categories=[], ordered=False, categories_dtype=float64))
>>> a.dtype == b.dtype
False
As we can see, they both have ordered=False
, and their categories
are same.
Following the documentation, they should be equal.
Suggested fix for documentation
to have accurate and exhaustive descriptions