Skip to content

DOC: clean up internals.rst #51107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions doc/source/development/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,24 @@ Indexing
In pandas there are a few objects implemented which can serve as valid
containers for the axis labels:

* ``Index``: the generic "ordered set" object, an ndarray of object dtype
* :class:`Index`: the generic "ordered set" object, an ndarray of object dtype
assuming nothing about its contents. The labels must be hashable (and
likely immutable) and unique. Populates a dict of label to location in
Cython to do ``O(1)`` lookups.
* ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
data, such as time stamps
* ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data
* ``MultiIndex``: the standard hierarchical index object
* ``DatetimeIndex``: An Index object with ``Timestamp`` boxed elements (impl are the int64 values)
* ``TimedeltaIndex``: An Index object with ``Timedelta`` boxed elements (impl are the in64 values)
* ``PeriodIndex``: An Index object with Period elements
* :class:`MultiIndex`: the standard hierarchical index object
* :class:`DatetimeIndex`: An Index object with :class:`Timestamp` boxed elements (impl are the int64 values)
* :class:`TimedeltaIndex`: An Index object with :class:`Timedelta` boxed elements (impl are the in64 values)
* :class:`PeriodIndex`: An Index object with Period elements

There are functions that make the creation of a regular index easy:

* ``date_range``: fixed frequency date range generated from a time rule or
* :func:`date_range`: fixed frequency date range generated from a time rule or
DateOffset. An ndarray of Python datetime objects
* ``period_range``: fixed frequency date range generated from a time rule or
DateOffset. An ndarray of ``Period`` objects, representing timespans
* :func:`period_range`: fixed frequency date range generated from a time rule or
DateOffset. An ndarray of :class:`Period` objects, representing timespans

The motivation for having an ``Index`` class in the first place was to enable
different implementations of indexing. This means that it's possible for you,
Expand All @@ -43,28 +43,28 @@ From an internal implementation point of view, the relevant methods that an
``Index`` must define are one or more of the following (depending on how
incompatible the new object internals are with the ``Index`` functions):

* ``get_loc``: returns an "indexer" (an integer, or in some cases a
* :meth:`~Index.get_loc`: returns an "indexer" (an integer, or in some cases a
slice object) for a label
* ``slice_locs``: returns the "range" to slice between two labels
* ``get_indexer``: Computes the indexing vector for reindexing / data
* :meth:`~Index.slice_locs`: returns the "range" to slice between two labels
* :meth:`~Index.get_indexer`: Computes the indexing vector for reindexing / data
alignment purposes. See the source / docstrings for more on this
* ``get_indexer_non_unique``: Computes the indexing vector for reindexing / data
* :meth:`~Index.get_indexer_non_unique`: Computes the indexing vector for reindexing / data
alignment purposes when the index is non-unique. See the source / docstrings
for more on this
* ``reindex``: Does any pre-conversion of the input index then calls
* :meth:`~Index.reindex`: Does any pre-conversion of the input index then calls
``get_indexer``
* ``union``, ``intersection``: computes the union or intersection of two
* :meth:`~Index.union`, :meth:`~Index.intersection`: computes the union or intersection of two
Index objects
* ``insert``: Inserts a new label into an Index, yielding a new object
* ``delete``: Delete a label, yielding a new object
* ``drop``: Deletes a set of labels
* ``take``: Analogous to ndarray.take
* :meth:`~Index.insert`: Inserts a new label into an Index, yielding a new object
* :meth:`~Index.delete`: Delete a label, yielding a new object
* :meth:`~Index.drop`: Deletes a set of labels
* :meth:`~Index.take`: Analogous to ndarray.take

MultiIndex
~~~~~~~~~~

Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
integer **codes** (until version 0.24 named *labels*), and the level **names**:
Internally, the :class:`MultiIndex` consists of a few things: the **levels**, the
integer **codes**, and the level **names**:

.. ipython:: python

Expand All @@ -80,13 +80,13 @@ You can probably guess that the codes determine which unique element is
identified with that location at each layer of the index. It's important to
note that sortedness is determined **solely** from the integer codes and does
not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
if you compute the levels and codes yourself, please be careful.
constructors :meth:`~MultiIndex.from_tuples` and :meth:`~MultiIndex.from_arrays` ensure
that this is true, but if you compute the levels and codes yourself, please be careful.

Values
~~~~~~

pandas extends NumPy's type system with custom types, like ``Categorical`` or
pandas extends NumPy's type system with custom types, like :class:`Categorical` or
datetimes with a timezone, so we have multiple notions of "values". For 1-D
containers (``Index`` classes and ``Series``) we have the following convention:

Expand Down