DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays (#51111)

topper-123 · web-flow · commit 78907e49a8e9 · 2023-02-02T08:26:14.000-08:00
* DOC: Update docs to reflect that Index can hold int64, int32 etc. arrays

* fix doc build issues

* fix spelling

* fix comments and bugs

* fix doc build

* fix doc build II
diff --git a/doc/source/development/internals.rst b/doc/source/development/internals.rst
@@ -19,9 +19,6 @@ containers for the axis labels:
   assuming nothing about its contents. The labels must be hashable (and
   likely immutable) and unique. Populates a dict of label to location in
   Cython to do ``O(1)`` lookups.
-* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
-  data, such as time stamps
-* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
 * :class:`MultiIndex`: the standard hierarchical index object
 * :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
 * :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)
diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst
@@ -848,125 +848,35 @@ values **not** in the categories, similarly to how you can reindex **any** panda
 
 .. _advanced.rangeindex:
 
-Int64Index and RangeIndex
-~~~~~~~~~~~~~~~~~~~~~~~~~
+RangeIndex
+~~~~~~~~~~
 
-.. deprecated:: 1.4.0
-    In pandas 2.0, :class:`Index` will become the default index type for numeric types
-    instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
-    are therefore deprecated and will be removed in a futire version.
-    ``RangeIndex`` will not be removed, as it represents an optimized version of an integer index.
-
-:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
-implementing an ordered, sliceable set.
-
-:class:`RangeIndex` is a sub-class of ``Int64Index``  that provides the default index for all ``NDFrame`` objects.
-``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
-
-.. _advanced.float64index:
-
-Float64Index
-~~~~~~~~~~~~
-
-.. deprecated:: 1.4.0
-    :class:`Index` will become the default index type for numeric types in the future
-    instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
-    are therefore deprecated and will be removed in a future version of Pandas.
-    ``RangeIndex`` will not be removed as it represents an optimized version of an integer index.
-
-By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
-This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
-same.
-
-.. ipython:: python
-
-   indexf = pd.Index([1.5, 2, 3, 4.5, 5])
-   indexf
-   sf = pd.Series(range(5), index=indexf)
-   sf
-
-Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).
+:class:`RangeIndex` is a sub-class of :class:`Index`  that provides the default index for all :class:`DataFrame` and :class:`Series` objects.
+``RangeIndex`` is an optimized version of ``Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
+A ``RangeIndex`` will always have an ``int64`` dtype.
 
 .. ipython:: python
 
-   sf[3]
-   sf[3.0]
-   sf.loc[3]
-   sf.loc[3.0]
+   idx = pd.RangeIndex(5)
+   idx
 
-The only positional indexing is via ``iloc``.
+``RangeIndex`` is the default index for all :class:`DataFrame` and :class:`Series` objects:
 
 .. ipython:: python
 
-   sf.iloc[3]
+   ser = pd.Series([1, 2, 3])
+   ser.index
+   df = pd.DataFrame([[1, 2], [3, 4]])
+   df.index
+   df.columns
 
-A scalar index that is not found will raise a ``KeyError``.
-Slicing is primarily on the values of the index when using ``[],ix,loc``, and
-**always** positional when using ``iloc``. The exception is when the slice is
-boolean, in which case it will always be positional.
-
-.. ipython:: python
-
-   sf[2:4]
-   sf.loc[2:4]
-   sf.iloc[2:4]
-
-In float indexes, slicing using floats is allowed.
-
-.. ipython:: python
-
-   sf[2.1:4.6]
-   sf.loc[2.1:4.6]
-
-In non-float indexes, slicing using floats will raise a ``TypeError``.
-
-.. code-block:: ipython
-
-   In [1]: pd.Series(range(5))[3.5]
-   TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
-
-   In [1]: pd.Series(range(5))[3.5:4.5]
-   TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
-
-Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
-irregular timedelta-like indexing scheme, but the data is recorded as floats. This could, for
-example, be millisecond offsets.
-
-.. ipython:: python
-
-   dfir = pd.concat(
-       [
-           pd.DataFrame(
-               np.random.randn(5, 2), index=np.arange(5) * 250.0, columns=list("AB")
-           ),
-           pd.DataFrame(
-               np.random.randn(6, 2),
-               index=np.arange(4, 10) * 250.1,
-               columns=list("AB"),
-           ),
-       ]
-   )
-   dfir
-
-Selection operations then will always work on a value basis, for all selection operators.
-
-.. ipython:: python
-
-   dfir[0:1000.4]
-   dfir.loc[0:1001, "A"]
-   dfir.loc[1000.4]
-
-You could retrieve the first 1 second (1000 ms) of data as such:
-
-.. ipython:: python
-
-   dfir[0:1000]
-
-If you need integer based selection, you should use ``iloc``:
+A ``RangeIndex`` will behave similarly to a :class:`Index` with an ``int64`` dtype and operations on a ``RangeIndex``,
+whose result cannot be represented by a ``RangeIndex``, but should have an integer dtype, will be converted to an ``Index`` with ``int64``.
+For example:
 
 .. ipython:: python
 
-   dfir.iloc[0:5]
+   idx[[0, 2]]
 
 
 .. _advanced.intervalindex:
diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst
@@ -1582,8 +1582,27 @@ lookups, data alignment, and reindexing. The easiest way to create an
    index
    'd' in index
 
-You can also pass a ``name`` to be stored in the index:
+or using numbers:
+
+.. ipython:: python
 
+   index = pd.Index([1, 5, 12])
+   index
+   5 in index
+
+If no dtype is given, ``Index`` tries to infer the dtype from the data.
+It is also possible to give an explicit dtype when instantiating an :class:`Index`:
+
+.. ipython:: python
+
+   index = pd.Index(['e', 'd', 'a', 'b'], dtype="string")
+   index
+   index = pd.Index([1, 5, 12], dtype="int8")
+   index
+   index = pd.Index([1, 5, 12], dtype="float32")
+   index
+
+You can also pass a ``name`` to be stored in the index:
 
 .. ipython:: python
 
diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
@@ -4756,7 +4756,7 @@ Selecting coordinates
 ^^^^^^^^^^^^^^^^^^^^^
 
 Sometimes you want to get the coordinates (a.k.a the index locations) of your query. This returns an
-``Int64Index`` of the resulting locations. These coordinates can also be passed to subsequent
+``Index`` of the resulting locations. These coordinates can also be passed to subsequent
 ``where`` operations.
 
 .. ipython:: python
diff --git a/doc/source/user_guide/timedeltas.rst b/doc/source/user_guide/timedeltas.rst
@@ -477,7 +477,7 @@ Scalars type ops work as well. These can potentially return a *different* type o
    # division can result in a Timedelta if the divisor is an integer
    tdi / 2
 
-   # or a Float64Index if the divisor is a Timedelta
+   # or a float64 Index if the divisor is a Timedelta
    tdi / tdi[0]
 
 .. _timedeltas.resampling:
diff --git a/doc/source/whatsnew/v0.13.0.rst b/doc/source/whatsnew/v0.13.0.rst
@@ -310,7 +310,7 @@ Float64Index API change
 
 - Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
   This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
-  same. See :ref:`the docs<advanced.float64index>`, (:issue:`263`)
+  same. (:issue:`263`)
 
   Construction is by default for floating type values.
 
diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst
@@ -28,6 +28,82 @@ The available extras, found in the :ref:`installation guide<install.dependencies
 ``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
 sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).
 
+.. _whatsnew_200.enhancements.index_can_hold_numpy_numeric_dtypes:
+
+:class:`Index` can now hold numpy numeric dtypes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It is now possible to use any numpy numeric dtype in a :class:`Index` (:issue:`42717`).
+
+Previously it was only possible to use ``int64``, ``uint64`` & ``float64`` dtypes:
+
+.. code-block:: ipython
+
+    In [1]: pd.Index([1, 2, 3], dtype=np.int8)
+    Out[1]: Int64Index([1, 2, 3], dtype="int64")
+    In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
+    Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
+    In [3]: pd.Index([1, 2, 3], dtype=np.float32)
+    Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")
+
+:class:`Int64Index`, :class:`UInt64Index` & :class:`Float64Index` were depreciated in pandas
+version 1.4 and have now been removed. Instead :class:`Index` should be used directly, and
+can it now take all numpy numeric dtypes, i.e.
+``int8``/ ``int16``/``int32``/``int64``/``uint8``/``uint16``/``uint32``/``uint64``/``float32``/``float64`` dtypes:
+
+.. ipython:: python
+
+    pd.Index([1, 2, 3], dtype=np.int8)
+    pd.Index([1, 2, 3], dtype=np.uint16)
+    pd.Index([1, 2, 3], dtype=np.float32)
+
+The ability for ``Index`` to hold the numpy numeric dtypes has meant some changes in Pandas
+functionality. In particular, operations that previously were forced to create 64-bit indexes,
+can now create indexes with lower bit sizes, e.g. 32-bit indexes.
+
+Below is a possibly non-exhaustive list of changes:
+
+1. Instantiating using a numpy numeric array now follows the dtype of the numpy array.
+   Previously, all indexes created from numpy numeric arrays were forced to 64-bit. Now,
+   the index dtype follows the dtype of the numpy array. For example, it would for all
+   signed integer arrays previously return an index with ``int64`` dtype, but will now
+   reuse the dtype of the supplied numpy array. So ``Index(np.array([1, 2, 3]))`` will be ``int32`` on 32-bit systems.
+   Instantiating :class:`Index` using a list of numbers will still return 64bit dtypes,
+   e.g. ``Index([1, 2, 3])`` will have a ``int64`` dtype, which is the same as previously.
+2. The various numeric datetime attributes of :class:`DateTimeIndex` (:attr:`~Date_TimeIndex.day`,
+   :attr:`~DateTimeIndex.month`, :attr:`~DateTimeIndex.year` etc.) were previously in of
+   dtype ``int64``, while they were ``int32`` for :class:`DatetimeArray`. They are now
+   ``int32`` on ``DateTimeIndex`` also:
+
+   .. ipython:: python
+
+       idx = pd.date_range(start='1/1/2018', periods=3, freq='M')
+       idx.array.year
+       idx.year
+
+3. Level dtypes on Indexes from :meth:`Series.sparse.from_coo` are now of dtype ``int32``,
+   the same as they are on the ``rows``/``cols`` on a scipy sparse matrix. Previously they
+   were of dtype ``int64``.
+
+   .. ipython:: python
+
+       from scipy import sparse
+       A = sparse.coo_matrix(
+           ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4)
+       )
+       ser = pd.Series.sparse.from_coo(A)
+       ser.index.dtype
+
+4. :class:`Index` cannot be instantiated using a float16 dtype. Previously instantiating
+   an :class:`Index` using dtype ``float16`` resulted in a :class:`Float64Index` with a
+   ``float64`` dtype. It row raises a ``NotImplementedError``:
+
+   .. ipython:: python
+       :okexcept:
+
+       pd.Index([1, 2, 3], dtype=np.float16)
+
+
 .. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_dtype_backend:
 
 Configuration option, ``mode.dtype_backend``, to return pyarrow-backed dtypes
@@ -684,6 +760,7 @@ Deprecations
 
 Removal of prior version deprecations/changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- Removed :class:`Int64Index`, :class:`UInt64Index` and :class:`Float64Index`. See also :ref:`here <whatsnew_200.enhancements.index_can_hold_numpy_numeric_dtypes>` for more information (:issue:`42717`)
 - Removed deprecated :attr:`Timestamp.freq`, :attr:`Timestamp.freqstr` and argument ``freq`` from the :class:`Timestamp` constructor and :meth:`Timestamp.fromordinal` (:issue:`14146`)
 - Removed deprecated :class:`CategoricalBlock`, :meth:`Block.is_categorical`, require datetime64 and timedelta64 values to be wrapped in :class:`DatetimeArray` or :class:`TimedeltaArray` before passing to :meth:`Block.make_block_same_class`, require ``DatetimeTZBlock.values`` to have the correct ndim when passing to the :class:`BlockManager` constructor, and removed the "fastpath" keyword from the :class:`SingleBlockManager` constructor (:issue:`40226`, :issue:`40571`)
 - Removed deprecated global option ``use_inf_as_null`` in favor of ``use_inf_as_na`` (:issue:`17126`)