Skip to content

Commit 49172dd

Browse files
committed
API: Added squeeze keyword to MultiIndex ctors
Adds a new ``squeeze`` keyword to control whether the various MultiIndex constructors should squeeze down to an ``Index`` when all the values are length-1 tuples. In [3]: MultiIndex.from_tuples([('a',), ('b',), ('c',)]) Out[3]: Index(['a', 'b', 'c'], dtype='object') In [4]: MultiIndex.from_tuples([('a',), ('b',), ('c',)], squeeze=False) Out[4]: MultiIndex(levels=[['a', 'b', 'c']], labels=[[0, 1, 2]]) This is helpful for routines that rely on the MultiIndex constructors always returning a MultiIndex, regardless of the data values (e.g. hash_tuples).
1 parent b32494d commit 49172dd

26 files changed

+686
-215
lines changed

doc/source/categorical.rst

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,14 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
9696
df["B"] = raw_cat
9797
df
9898
99-
You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:
99+
You can also specify differently ordered categories or make the resulting data
100+
ordered by passing a :class:`CategoricalDtype`:
100101

101102
.. ipython:: python
102103
103104
s = pd.Series(["a","b","c","a"])
104-
s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
105+
cat_type = pd.CategoricalDtype(categories=["b", "c", "d"], ordered=False)
106+
s_cat = s.astype(cat_type)
105107
s_cat
106108
107109
Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:
@@ -140,6 +142,54 @@ constructor to save the factorize step during normal constructor mode:
140142
splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
141143
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
142144
145+
CategoricalDtype
146+
----------------
147+
148+
.. versionadded:: 0.21.0
149+
150+
A categorical's type is fully described by 1.) its categories (an iterable with
151+
unique values and no missing values), and 2.) its orderedness (a boolean).
152+
This information can be stored in a :class:`~pandas.CategoricalDtype`.
153+
The ``categories`` argument is optional, which implies that the actual categories
154+
should be inferred from whatever is present in the data when the
155+
:class:`pandas.Categorical` is created.
156+
157+
.. ipython:: python
158+
159+
pd.CategoricalDtype(['a', 'b', 'c'])
160+
pd.CategoricalDtype(['a', 'b', 'c'], ordered=True)
161+
pd.CategoricalDtype()
162+
163+
A :class:`~pandas.CategoricalDtype` can be used in any place pandas expects a
164+
`dtype`. For example :func:`pandas.read_csv`, :func:`pandas.DataFrame.astype`,
165+
or the Series constructor.
166+
167+
As a convenience, you can use the string `'category'` in place of a
168+
:class:`pandas.CategoricalDtype` when you want the default behavior of
169+
the categories being unordered, and equal to the set values present in the array.
170+
On other words, ``dtype='category'`` is equivalent to ``dtype=pd.CategoricalDtype()``.
171+
172+
Equality Semantics
173+
~~~~~~~~~~~~~~~~~~
174+
175+
Two instances of :class:`pandas.CategoricalDtype` compare equal whenever the have
176+
the same categories and orderedness. When comparing two unordered categoricals, the
177+
order of the ``categories`` is not considered
178+
179+
.. ipython:: python
180+
181+
c1 = pd.CategoricalDtype(['a', 'b', 'c'], ordered=False)
182+
# Equal, since order is not considered when ordered=False
183+
c1 == pd.CategoricalDtype(['b', 'c', 'a'], ordered=False)
184+
# Unequal, since the second CategoricalDtype is ordered
185+
c1 == pd.CategoricalDtype(['a', 'b', 'c'], ordered=True)
186+
187+
Finally, all instances of ``CategoricalDtype`` compare equal to the string ``'category'``
188+
189+
.. ipython:: python
190+
191+
c1 == 'category'
192+
143193
Description
144194
-----------
145195

doc/source/whatsnew/v0.21.0.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,8 @@ Other Enhancements
121121
- :func:`read_feather` has gained the ``nthreads`` parameter for multi-threaded operations (:issue:`16359`)
122122
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`)
123123
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`)
124+
- The various :class:`MultiIndex` constructors all take a ``squeeze`` keyword to control whether to squeeze down to a regular ``Index`` when
125+
the values are all tuples of length one (the default is ``True``, as before) (:issue:`17178`)
124126
- :func:`DataFrame.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`)
125127
- :func:`date_range` now accepts 'YS' in addition to 'AS' as an alias for start of year (:issue:`9313`)
126128
- :func:`date_range` now accepts 'Y' in addition to 'A' as an alias for end of year (:issue:`9313`)

pandas/core/api.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
from pandas.core.algorithms import factorize, unique, value_counts
88
from pandas.core.dtypes.missing import isna, isnull, notna, notnull
9+
from pandas.core.dtypes.dtypes import CategoricalDtype
910
from pandas.core.categorical import Categorical
1011
from pandas.core.groupby import Grouper
1112
from pandas.io.formats.format import set_eng_float_format

0 commit comments

Comments
 (0)