-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add allow_duplicates to MultiIndex.to_frame #45318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3194d97
a3359b9
670ecf5
025c9d6
61bff46
0444265
8052b13
31110eb
4c0b994
066c34f
dbca195
e69fc47
7f8fd32
e6f5894
29ac6b6
ed26844
3cfec79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1710,7 +1710,12 @@ def unique(self, level=None): | |||
level = self._get_level_number(level) | ||||
return self._get_level_values(level=level, unique=True) | ||||
|
||||
def to_frame(self, index: bool = True, name=lib.no_default) -> DataFrame: | ||||
def to_frame( | ||||
self, | ||||
index: bool = True, | ||||
name=lib.no_default, | ||||
allow_duplicates: bool = False, | ||||
) -> DataFrame: | ||||
""" | ||||
Create a DataFrame with the levels of the MultiIndex as columns. | ||||
|
||||
|
@@ -1725,6 +1730,11 @@ def to_frame(self, index: bool = True, name=lib.no_default) -> DataFrame: | |||
name : list / sequence of str, optional | ||||
The passed names should substitute index level names. | ||||
|
||||
allow_duplicates : bool, optional default False | ||||
Allow duplicate column labels to be created. | ||||
|
||||
.. versionadded:: 1.5.0 | ||||
|
||||
Returns | ||||
------- | ||||
DataFrame : a DataFrame containing the original MultiIndex data. | ||||
|
@@ -1774,14 +1784,21 @@ def to_frame(self, index: bool = True, name=lib.no_default) -> DataFrame: | |||
else: | ||||
idx_names = self.names | ||||
|
||||
idx_names = [ | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. umm why are you repeating L1785? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is doing a transform: filling in None names with the level number. Whether that is the right thing to do is another issue. I am just preserving the existing behavior, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then this needs another argument similar to how this is done in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not changing anything here. The old code is https://github.com/johnzangwill/pandas/blob/6cc5584bba59ef8f06d4dc901dc39ddd08d1519f/pandas/core/indexes/multi.py#L1780:
and I have just moved that logic earlier, since I need unique dictionary indexes. In any case, insert and reset_index do this differently, replacing None level labels with level_n. As I say, that is a separate issue and I have raised it elsewhere (#45245), but is is not the subject of this PR, I don't think that this is conditional in reset_index or that there is an argument for it. Which argument are you referring to? This is the code in reset_index:
that puts in "level_n" for multi-index and "index" or "level_0" for simple index. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok can you just make a method on Index then to do this, repeating this code is not great There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have searched Pandas and I cannot find any other instance of this. The nearest is pandas/pandas/core/indexes/base.py Line 1631 in 4e034ec
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah i think a common method on index is worth it here (to share here & reset_index) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, but I have already explained that
These |
||||
level if name is None else name for level, name in enumerate(idx_names) | ||||
] | ||||
|
||||
if not allow_duplicates and len(set(idx_names)) != len(idx_names): | ||||
raise ValueError( | ||||
"Cannot create duplicate column labels if allow_duplicates is False" | ||||
) | ||||
|
||||
# Guarantee resulting column order - PY36+ dict maintains insertion order | ||||
result = DataFrame( | ||||
{ | ||||
(level if lvlname is None else lvlname): self._get_level_values(level) | ||||
for lvlname, level in zip(idx_names, range(len(self.levels))) | ||||
}, | ||||
{level: self._get_level_values(level) for level in range(len(self.levels))}, | ||||
copy=False, | ||||
) | ||||
result.columns = idx_names | ||||
|
||||
if index: | ||||
result.index = self | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done