-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Panel.to_frame() with MultiIndex major axis #5417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What do you mean by flatten? You can get tuples by using the values
|
OK, I think this is just about there if you want to take a look. Essentially I check if the major and minor axes are a MultiIndex. If they are, I repeat labels the necessary number of times and rearrange them appropriately. I need to write a few more test cases to make sure I'm not getting any false positives. I didn't see a vbench covering to_frame. Think we need one? |
I believe we need to set up a way to have vbench conditional on version - |
@@ -1320,6 +1320,51 @@ def test_to_frame_mixed(self): | |||
# Previously, this was mutating the underlying index and changing its name | |||
assert_frame_equal(wp['bool'], panel['bool'], check_names=False) | |||
|
|||
def test_to_frame_multi_major(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard notes on testing MI: make sure you have 1 example that is non-lex-sorted and 1 that has nan in some of the levels.
maybe put those conversions in a function (in index?) |
names=[maj_name, min_name], verify_integrity=False) | ||
if isinstance(self.major_axis, MultiIndex): | ||
major_levels = self.major_axis.levels | ||
major_labels = map(lambda x: np.repeat(x, K), self.major_axis.labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this a list comprehension instead?
@jtratner, that was re: test_perf.py, I had some scripts relying on it to auto. sample perf back in history I didn't see anything here that calls for changes in vbench, can you explain the issue? |
okay, that makes sense. I just wasn't sure if to_frame() has always been a method on Panel. thanks for clarifying. |
@jreback Which conversions? Just this line? major_labels = [np.repeat(x, K) for x in self.major_axis.labels] |
@jtratner, ok well In that case I think the existing |
wow, I should have noticed that. |
major_names = [major_names] | ||
if isinstance(self.minor_axis, MultiIndex): | ||
minor_levels = self.minor_axis.levels | ||
minor_labels = map(lambda x: np.repeat(x, N), self.minor_axis.labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I meant these if/else....they are basically the same for the minor/major axes so could do it with a function; even better to have this as a index method, something like (which does essentially what your if else does), except it works separately on Multi-index (the if part) and Index (the else part)
def conform_to_shape(self, N, K):
return levels, labels, names
makes a lot of sense as an index method because has method in index and multi index |
@@ -2373,6 +2373,42 @@ def format(self, space=2, sparsify=None, adjoin=True, names=False, | |||
else: | |||
return result_levels | |||
|
|||
def to_hierarchical(self, N, K=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rename this to _to_hierarchical
?
Ok those tests should be clearer now. Earlier @jtratner mentioned
which makes a lot of sense. I just need to figure out the best way filter the panel... It's not immediately obvious to me. |
Example: idx = pd.MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), (2, 'two')])
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'c', 'd']}, index=idx)
wp = pd.Panel({'i1': df, 'i2': df})
wp.loc['i1', (1, 'one')]['A'] = np.nan
In [24]: mask = com.notnull(wp.values).all(axis=0)
In [25]: mask
Out[25]:
array([[False, True],
[ True, True],
[ True, True],
[ True, True]], dtype=bool)
In [28]: selector = mask.ravel()
In [29]: selector
Out[29]: array([False, True, True, True, True, True, True, True], dtype=bool) so that one false is coming from the |
EDIT: I think this is a bug. made an issue at #5773 Getting back to this, and I'm cleaning up some setting with copy errors in my tests. Should the following work? idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'),
(2, 'two')])
df = DataFrame([[1, 'a', 1], [2, 'b', 1], [3, 'c', 1], [4, 'd', 1]],
columns=['A', 'B', 'C'], index=idx)
wp = Panel({'i1': df, 'i2': df})
wp.loc['i1', (1, 'one'), 'A'] = np.nan Raises an
|
@jreback ready to go at your leisure. let me know if you want a note in |
else: | ||
minor_levels = [self.minor_axis] | ||
# Anyone think of a better way to do this? np.repeat does not | ||
# do what I want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway you can do a simple function like construct_multi_parts
for the non-mi case? so don't have repeated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure why I didn't before...
@TomAugspurger this is a bug fix, so unless you want to, nothing needed for whatsnew (fyi now targeting 0.13.1, though it still may go to 0.14, depends if we have critical bugs ..) |
@jtratner look ok to u? |
@@ -73,6 +73,8 @@ Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in Series replace with timestamp dict (:issue:`5797`) | |||
- read_csv/read_table now respects the `prefix` kwarg (:issue:`5732`). | |||
- ``Panel.to_frame()`` no longer fails when ``major_axis`` is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls move to 0.13.1
@TomAugspurger this looks fine...pls move release notes to 0.13.1 and rebase... |
Fixes bug that caused failure on Panel.to_frame() if major_axis was a MultiIndex (pandas-dev#5402) refactor reindexing to index.py API/REF: refactor bits to index. rename
@jreback rebased and moved to 0.13.1 |
BUG: Panel.to_frame() with MultiIndex major axis
thanks @TomAugspurger ! |
Closes #5402 WIP for now.
So this is a hack to get things "working".
I mainly just wanted to ask if there was a better way to get the levels and labels to the MultiIndex constructor.
Don't spend long thinking about it (this is my PR after all), but if you know of a quick way off the top of your head, I'd be interested.
I essentially need to
Step 2 could be complicated by
minor_axis
being a MultiIndex also, I should be able to refactor this pretty easily to handle that. I think my was is making several copies of each index, and I'm not sure that they're all necessary.