Description
In #23752, the MultiIndex signature was changed. Compared to 0.23.4, the only change is that labels
has been changed to codes
.
Now that the signature is being changed anyway, I've started to think about if this is even the right signature:
I think codes
should actually be an implementation detail, and an improved signature would be using data
for the first parameter, similarly to how data
is the first parameter in the signature for CategoricalIndex
. So a signauture like this:
>>> inspect.signature(pd.MultiIndex) # proposed signature
<Signature (data=None, levels=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True, _set_identity=True)>
I think would be better.
data
could then accept codes, but could also accept other types of data, that could be used to construct a MultiIndex
. For example:
>>> pd.MultiIndex(data=[[1,0, 1, 0], [0,1,0,1]], levels=[['a', 'b'], ['x', 'y']])
MultiIndex([('b', 'x'), # repr after #22511
('a', 'y'),
('b', 'x'),
('a', 'y')],
)
>>> pd.MultiIndex({'a': [1,2,3], 'v': ['a', 'd', 'q']})
MultiIndex([(1, 'a'),
(2, 'd'),
(3, 'q')],
names=['a', 'b']
)
In the first example, I use the current initalisation method, and in the second I show a initalisation with a dict, similar to how a DataFrame is initalized with a dict.
I think this could make the initialisation of MultiIndex more similar to the ones for the other pandas objects, and make MultiIndexes more friendly to use for users.