Skip to content

Commit 78907e4

Browse files
authored
DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays (#51111)
* DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays * fix doc build issues * fix spelling * fix comments and bugs * fix doc build * fix doc build II
1 parent f623a6e commit 78907e4

File tree

7 files changed

+117
-114
lines changed

7 files changed

+117
-114
lines changed

doc/source/development/internals.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@ containers for the axis labels:
1919
assuming nothing about its contents. The labels must be hashable (and
2020
likely immutable) and unique. Populates a dict of label to location in
2121
Cython to do ``O(1)`` lookups.
22-
* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
23-
data, such as time stamps
24-
* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
2522
* :class:`MultiIndex`: the standard hierarchical index object
2623
* :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
2724
* :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)

doc/source/user_guide/advanced.rst

Lines changed: 17 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -848,125 +848,35 @@ values **not** in the categories, similarly to how you can reindex **any** panda
848848
849849
.. _advanced.rangeindex:
850850

851-
Int64Index and RangeIndex
852-
~~~~~~~~~~~~~~~~~~~~~~~~~
851+
RangeIndex
852+
~~~~~~~~~~
853853

854-
.. deprecated:: 1.4.0
855-
In pandas 2.0, :class:`Index` will become the default index type for numeric types
856-
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
857-
are therefore deprecated and will be removed in a futire version.
858-
``RangeIndex`` will not be removed, as it represents an optimized version of an integer index.
859-
860-
:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
861-
implementing an ordered, sliceable set.
862-
863-
:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects.
864-
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
865-
866-
.. _advanced.float64index:
867-
868-
Float64Index
869-
~~~~~~~~~~~~
870-
871-
.. deprecated:: 1.4.0
872-
:class:`Index` will become the default index type for numeric types in the future
873-
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
874-
are therefore deprecated and will be removed in a future version of Pandas.
875-
``RangeIndex`` will not be removed as it represents an optimized version of an integer index.
876-
877-
By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
878-
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
879-
same.
880-
881-
.. ipython:: python
882-
883-
indexf = pd.Index([1.5, 2, 3, 4.5, 5])
884-
indexf
885-
sf = pd.Series(range(5), index=indexf)
886-
sf
887-
888-
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).
854+
:class:`RangeIndex` is a sub-class of :class:`Index` that provides the default index for all :class:`DataFrame` and :class:`Series` objects.
855+
``RangeIndex`` is an optimized version of ``Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
856+
A ``RangeIndex`` will always have an ``int64`` dtype.
889857

890858
.. ipython:: python
891859
892-
sf[3]
893-
sf[3.0]
894-
sf.loc[3]
895-
sf.loc[3.0]
860+
idx = pd.RangeIndex(5)
861+
idx
896862
897-
The only positional indexing is via ``iloc``.
863+
``RangeIndex`` is the default index for all :class:`DataFrame` and :class:`Series` objects:
898864

899865
.. ipython:: python
900866
901-
sf.iloc[3]
867+
ser = pd.Series([1, 2, 3])
868+
ser.index
869+
df = pd.DataFrame([[1, 2], [3, 4]])
870+
df.index
871+
df.columns
902872
903-
A scalar index that is not found will raise a ``KeyError``.
904-
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
905-
**always** positional when using ``iloc``. The exception is when the slice is
906-
boolean, in which case it will always be positional.
907-
908-
.. ipython:: python
909-
910-
sf[2:4]
911-
sf.loc[2:4]
912-
sf.iloc[2:4]
913-
914-
In float indexes, slicing using floats is allowed.
915-
916-
.. ipython:: python
917-
918-
sf[2.1:4.6]
919-
sf.loc[2.1:4.6]
920-
921-
In non-float indexes, slicing using floats will raise a ``TypeError``.
922-
923-
.. code-block:: ipython
924-
925-
In [1]: pd.Series(range(5))[3.5]
926-
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
927-
928-
In [1]: pd.Series(range(5))[3.5:4.5]
929-
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
930-
931-
Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
932-
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could, for
933-
example, be millisecond offsets.
934-
935-
.. ipython:: python
936-
937-
dfir = pd.concat(
938-
[
939-
pd.DataFrame(
940-
np.random.randn(5, 2), index=np.arange(5) * 250.0, columns=list("AB")
941-
),
942-
pd.DataFrame(
943-
np.random.randn(6, 2),
944-
index=np.arange(4, 10) * 250.1,
945-
columns=list("AB"),
946-
),
947-
]
948-
)
949-
dfir
950-
951-
Selection operations then will always work on a value basis, for all selection operators.
952-
953-
.. ipython:: python
954-
955-
dfir[0:1000.4]
956-
dfir.loc[0:1001, "A"]
957-
dfir.loc[1000.4]
958-
959-
You could retrieve the first 1 second (1000 ms) of data as such:
960-
961-
.. ipython:: python
962-
963-
dfir[0:1000]
964-
965-
If you need integer based selection, you should use ``iloc``:
873+
A ``RangeIndex`` will behave similarly to a :class:`Index` with an ``int64`` dtype and operations on a ``RangeIndex``,
874+
whose result cannot be represented by a ``RangeIndex``, but should have an integer dtype, will be converted to an ``Index`` with ``int64``.
875+
For example:
966876

967877
.. ipython:: python
968878
969-
dfir.iloc[0:5]
879+
idx[[0, 2]]
970880
971881
972882
.. _advanced.intervalindex:

doc/source/user_guide/indexing.rst

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1582,8 +1582,27 @@ lookups, data alignment, and reindexing. The easiest way to create an
15821582
index
15831583
'd' in index
15841584
1585-
You can also pass a ``name`` to be stored in the index:
1585+
or using numbers:
1586+
1587+
.. ipython:: python
15861588
1589+
index = pd.Index([1, 5, 12])
1590+
index
1591+
5 in index
1592+
1593+
If no dtype is given, ``Index`` tries to infer the dtype from the data.
1594+
It is also possible to give an explicit dtype when instantiating an :class:`Index`:
1595+
1596+
.. ipython:: python
1597+
1598+
index = pd.Index(['e', 'd', 'a', 'b'], dtype="string")
1599+
index
1600+
index = pd.Index([1, 5, 12], dtype="int8")
1601+
index
1602+
index = pd.Index([1, 5, 12], dtype="float32")
1603+
index
1604+
1605+
You can also pass a ``name`` to be stored in the index:
15871606

15881607
.. ipython:: python
15891608

doc/source/user_guide/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4756,7 +4756,7 @@ Selecting coordinates
47564756
^^^^^^^^^^^^^^^^^^^^^
47574757

47584758
Sometimes you want to get the coordinates (a.k.a the index locations) of your query. This returns an
4759-
``Int64Index`` of the resulting locations. These coordinates can also be passed to subsequent
4759+
``Index`` of the resulting locations. These coordinates can also be passed to subsequent
47604760
``where`` operations.
47614761

47624762
.. ipython:: python

doc/source/user_guide/timedeltas.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,7 @@ Scalars type ops work as well. These can potentially return a *different* type o
477477
# division can result in a Timedelta if the divisor is an integer
478478
tdi / 2
479479
480-
# or a Float64Index if the divisor is a Timedelta
480+
# or a float64 Index if the divisor is a Timedelta
481481
tdi / tdi[0]
482482
483483
.. _timedeltas.resampling:

doc/source/whatsnew/v0.13.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ Float64Index API change
310310

311311
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
312312
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
313-
same. See :ref:`the docs<advanced.float64index>`, (:issue:`263`)
313+
same. (:issue:`263`)
314314

315315
Construction is by default for floating type values.
316316

doc/source/whatsnew/v2.0.0.rst

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,82 @@ The available extras, found in the :ref:`installation guide<install.dependencies
2828
``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
2929
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).
3030

31+
.. _whatsnew_200.enhancements.index_can_hold_numpy_numeric_dtypes:
32+
33+
:class:`Index` can now hold numpy numeric dtypes
34+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
35+
36+
It is now possible to use any numpy numeric dtype in a :class:`Index` (:issue:`42717`).
37+
38+
Previously it was only possible to use ``int64``, ``uint64`` & ``float64`` dtypes:
39+
40+
.. code-block:: ipython
41+
42+
In [1]: pd.Index([1, 2, 3], dtype=np.int8)
43+
Out[1]: Int64Index([1, 2, 3], dtype="int64")
44+
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
45+
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
46+
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
47+
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")
48+
49+
:class:`Int64Index`, :class:`UInt64Index` & :class:`Float64Index` were depreciated in pandas
50+
version 1.4 and have now been removed. Instead :class:`Index` should be used directly, and
51+
can it now take all numpy numeric dtypes, i.e.
52+
``int8``/ ``int16``/``int32``/``int64``/``uint8``/``uint16``/``uint32``/``uint64``/``float32``/``float64`` dtypes:
53+
54+
.. ipython:: python
55+
56+
pd.Index([1, 2, 3], dtype=np.int8)
57+
pd.Index([1, 2, 3], dtype=np.uint16)
58+
pd.Index([1, 2, 3], dtype=np.float32)
59+
60+
The ability for ``Index`` to hold the numpy numeric dtypes has meant some changes in Pandas
61+
functionality. In particular, operations that previously were forced to create 64-bit indexes,
62+
can now create indexes with lower bit sizes, e.g. 32-bit indexes.
63+
64+
Below is a possibly non-exhaustive list of changes:
65+
66+
1. Instantiating using a numpy numeric array now follows the dtype of the numpy array.
67+
Previously, all indexes created from numpy numeric arrays were forced to 64-bit. Now,
68+
the index dtype follows the dtype of the numpy array. For example, it would for all
69+
signed integer arrays previously return an index with ``int64`` dtype, but will now
70+
reuse the dtype of the supplied numpy array. So ``Index(np.array([1, 2, 3]))`` will be ``int32`` on 32-bit systems.
71+
Instantiating :class:`Index` using a list of numbers will still return 64bit dtypes,
72+
e.g. ``Index([1, 2, 3])`` will have a ``int64`` dtype, which is the same as previously.
73+
2. The various numeric datetime attributes of :class:`DateTimeIndex` (:attr:`~Date_TimeIndex.day`,
74+
:attr:`~DateTimeIndex.month`, :attr:`~DateTimeIndex.year` etc.) were previously in of
75+
dtype ``int64``, while they were ``int32`` for :class:`DatetimeArray`. They are now
76+
``int32`` on ``DateTimeIndex`` also:
77+
78+
.. ipython:: python
79+
80+
idx = pd.date_range(start='1/1/2018', periods=3, freq='M')
81+
idx.array.year
82+
idx.year
83+
84+
3. Level dtypes on Indexes from :meth:`Series.sparse.from_coo` are now of dtype ``int32``,
85+
the same as they are on the ``rows``/``cols`` on a scipy sparse matrix. Previously they
86+
were of dtype ``int64``.
87+
88+
.. ipython:: python
89+
90+
from scipy import sparse
91+
A = sparse.coo_matrix(
92+
([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
93+
)
94+
ser = pd.Series.sparse.from_coo(A)
95+
ser.index.dtype
96+
97+
4. :class:`Index` cannot be instantiated using a float16 dtype. Previously instantiating
98+
an :class:`Index` using dtype ``float16`` resulted in a :class:`Float64Index` with a
99+
``float64`` dtype. It row raises a ``NotImplementedError``:
100+
101+
.. ipython:: python
102+
:okexcept:
103+
104+
pd.Index([1, 2, 3], dtype=np.float16)
105+
106+
31107
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_dtype_backend:
32108

33109
Configuration option, ``mode.dtype_backend``, to return pyarrow-backed dtypes
@@ -684,6 +760,7 @@ Deprecations
684760

685761
Removal of prior version deprecations/changes
686762
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
763+
- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here <whatsnew_200.enhancements.index_can_hold_numpy_numeric_dtypes>` for more information (:issue:`42717`)
687764
- Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`)
688765
- Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`)
689766
- Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`)

0 commit comments

Comments
 (0)