Add MultiIndex._data and MultiIndex.array

I propose adding a ``MultiIndex._data`` that is of type ``List[Categorical]``, where all the underlying data of a MultiIndex would be stored. A ``multiIndex.array`` property would also be added, that accesses the ``_data``.

This has the advantage of collecting the data that is underlying MultiIndex into one data structure, that is human readable, and also makes access to zero-copy data very easy, e.g. would ``mi.array[1]`` return the data of the second level as a ``Categorical``, in a easy-to-read form. 

A ``MultiIndex`` could with the above changes be explained as just "a container over a list of Categoricals", which is easier to explain than the current mode. The ``MultiIndex`` could also be related to ``CategoricalIndex``, which is "a container over a single Categorical".

This change means that ``MultiIndex.levels`` will become a property that returns a ``FrozenList(cat.categories for cat in self._data)``, and  ``MultiIndex.codes`` will be a property that returns ``FrozenList(cat.codes for cat in self._data)``.

``MultiIndex.array`` will be added and will simply be a property that returns a FrozenList of ``self._data``.

Performance will not be affected, as most operations would still go through ``MultiIndex.codes`` and ``MultiIndex.levels``.

## Moving names from MultiIndex.levels to MultiIndex._names

Currently the levels' names are stored at each level's ``name`` attribute. This is not very compatible with extracting the categories from ``_data``. (the ``.categories`` is actually part of the dtype, which ideally should be immutable, so we shouldn't set or change its name attribute).

To make my suggestion practically possible, the level names should be stored in ``MultiIndex._names`` instead, and ``MultiIndex.names`` will become a property that reads from/writes to ``MultiIndex._names``. I think this change simplifies the  MultiIndex a bit, as data and names are dealt with separately. This is a small backward breaking change though.

So, I suggest making two PRs:

1. Separating  the names from the levels (to be included in 0.25)
2. Add ``_data``, ``array`` and change ``levels`` and ``codes`` into properties.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add MultiIndex._data and MultiIndex.array #27138

Moving names from MultiIndex.levels to MultiIndex._names

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add MultiIndex._data and MultiIndex.array #27138

Description

Moving names from MultiIndex.levels to MultiIndex._names

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions