Skip to content

Commit dd511f4

Browse files
committed
fixup! ENH: Parametrized CategoricalDtype
1 parent 842e7c4 commit dd511f4

File tree

9 files changed

+82
-69332
lines changed

9 files changed

+82
-69332
lines changed

doc/source/api.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -637,7 +637,10 @@ strings and apply several methods to it. These can be accessed like
637637
Categorical
638638
~~~~~~~~~~~
639639

640-
If the Series is of dtype ``category``, ``Series.cat`` can be used to change the the categorical
640+
.. autoclass:: api.types.CategoricalDtype
641+
:members: categories, ordered
642+
643+
If the Series is of dtype ``CategoricalDtype``, ``Series.cat`` can be used to change the categorical
641644
data. This accessor is similar to the ``Series.dt`` or ``Series.str`` and has the
642645
following usable methods and properties:
643646

doc/source/categorical.rst

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,11 @@ of :class:`~pd.api.types.CategoricalDtype`.
9999

100100
.. ipython:: python
101101
102+
from pandas.api.types import CategoricalDtype
103+
102104
s = pd.Series(["a", "b", "c", "a"])
103-
cat_type = pd.api.types.CategoricalDtype(categories=["b", "c", "d"],
104-
ordered=False)
105+
cat_type = CategoricalDtype(categories=["b", "c", "d"],
106+
ordered=False)
105107
s_cat = s.astype(cat_type)
106108
s_cat
107109
@@ -141,33 +143,40 @@ constructor to save the factorize step during normal constructor mode:
141143
splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
142144
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
143145
146+
.. _categorical.categoricaldtype:
147+
144148
CategoricalDtype
145149
----------------
146150

147151
.. versionchanged:: 0.21.0
148152

149-
A categorical's type is fully described by 1.) its categories (an iterable with
150-
unique values and no missing values), and 2.) its orderedness (a boolean).
153+
A categorical's type is fully described by
154+
155+
1. its categories: a sequence of unique values and no missing values
156+
2. its orderedness: a boolean
157+
151158
This information can be stored in a :class:`~pandas.api.types.CategoricalDtype`.
152159
The ``categories`` argument is optional, which implies that the actual categories
153160
should be inferred from whatever is present in the data when the
154161
:class:`pandas.Categorical` is created.
155162

156163
.. ipython:: python
157164
158-
pd.api.types.CategoricalDtype(['a', 'b', 'c'])
159-
pd.api.types.CategoricalDtype(['a', 'b', 'c'], ordered=True)
160-
pd.api.types.CategoricalDtype()
165+
from pandas.api.types import CategoricalDtype
166+
167+
CategoricalDtype(['a', 'b', 'c'])
168+
CategoricalDtype(['a', 'b', 'c'], ordered=True)
169+
CategoricalDtype()
161170
162171
A :class:`~pandas.api.types.CategoricalDtype` can be used in any place pandas
163172
expects a `dtype`. For example :func:`pandas.read_csv`,
164-
:func:`pandas.DataFrame.astype`, or the Series constructor.
173+
:func:`pandas.DataFrame.astype`, or in the Series constructor.
165174

166-
As a convenience, you can use the string `'category'` in place of a
175+
As a convenience, you can use the string ``'category'`` in place of a
167176
:class:`~pandas.api.types.CategoricalDtype` when you want the default behavior of
168177
the categories being unordered, and equal to the set values present in the
169-
array. On other words, ``dtype='category'`` is equivalent to
170-
``dtype=pd.api.types.CategoricalDtype()``.
178+
array. In other words, ``dtype='category'`` is equivalent to
179+
``dtype=CategoricalDtype()``.
171180

172181
Equality Semantics
173182
~~~~~~~~~~~~~~~~~~
@@ -178,19 +187,20 @@ order of the ``categories`` is not considered
178187

179188
.. ipython:: python
180189
181-
c1 = pd.api.types.CategoricalDtype(['a', 'b', 'c'], ordered=False)
190+
c1 = CategoricalDtype(['a', 'b', 'c'], ordered=False)
191+
182192
# Equal, since order is not considered when ordered=False
183-
c1 == pd.api.types.CategoricalDtype(['b', 'c', 'a'], ordered=False)
193+
c1 == CategoricalDtype(['b', 'c', 'a'], ordered=False)
194+
184195
# Unequal, since the second CategoricalDtype is ordered
185-
c1 == pd.api.types.CategoricalDtype(['a', 'b', 'c'], ordered=True)
196+
c1 == CategoricalDtype(['a', 'b', 'c'], ordered=True)
186197
187198
All instances of ``CategoricalDtype`` compare equal to the string ``'category'``
188199

189200
.. ipython:: python
190201
191202
c1 == 'category'
192203
193-
194204
.. warning::
195205

196206
Since ``dtype='category'`` is essentially ``CategoricalDtype(None, False)``,
@@ -246,9 +256,7 @@ It's also possible to pass in the categories in a specific order:
246256

247257
.. ipython:: python
248258
249-
s = pd.Series(list('babc')).astype(
250-
pd.api.types.CategoricalDtype(list('abcd'))
251-
)
259+
s = pd.Series(list('babc')).astype(CategoricalDtype(list('abcd')))
252260
s
253261
254262
# categories
@@ -362,7 +370,7 @@ meaning and certain operations are possible. If the categorical is unordered, ``
362370
s = pd.Series(pd.Categorical(["a","b","c","a"], ordered=False))
363371
s.sort_values(inplace=True)
364372
s = pd.Series(["a","b","c","a"]).astype(
365-
pd.api.types.CategoricalDtype(ordered=True)
373+
CategoricalDtype(ordered=True)
366374
)
367375
s.sort_values(inplace=True)
368376
s
@@ -464,13 +472,13 @@ categories or a categorical with any list-like object, will raise a TypeError.
464472
.. ipython:: python
465473
466474
cat = pd.Series([1,2,3]).astype(
467-
pd.api.types.CategoricalDtype([3, 2, 1], ordered=True)
475+
CategoricalDtype([3, 2, 1], ordered=True)
468476
)
469477
cat_base = pd.Series([2,2,2]).astype(
470-
pd.api.types.CategoricalDtype([3, 2, 1], ordered=True)
478+
CategoricalDtype([3, 2, 1], ordered=True)
471479
)
472480
cat_base2 = pd.Series([2,2,2]).astype(
473-
pd.api.types.CategoricalDtype(ordered=True)
481+
CategoricalDtype(ordered=True)
474482
)
475483
476484
cat

doc/source/whatsnew/v0.21.0.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ users upgrade to this version.
1010
Highlights include:
1111

1212
- Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>`.
13+
- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
14+
categoricals independent of the data (:issue:`14711`, :issue:`15078`)
1315

1416
Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations <whatsnew_0210.deprecations>` before updating.
1517

@@ -22,8 +24,6 @@ Check the :ref:`API Changes <whatsnew_0210.api_breaking>` and :ref:`deprecations
2224
New features
2325
~~~~~~~~~~~~
2426

25-
- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
26-
categoricals independent of the data (:issue:`14711`, :issue:`15078`)
2727
- Support for `PEP 519 -- Adding a file system path protocol
2828
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers and writers (:issue:`13823`)
2929
- Added ``__fspath__`` method to :class:`~pandas.HDFStore`, :class:`~pandas.ExcelFile`,

0 commit comments

Comments
 (0)