Skip to content

BUG: Panel.to_frame() with MultiIndex major axis #5417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 15, 2014

Conversation

TomAugspurger
Copy link
Contributor

Closes #5402 WIP for now.

So this is a hack to get things "working".

I mainly just wanted to ask if there was a better way to get the levels and labels to the MultiIndex constructor.
Don't spend long thinking about it (this is my PR after all), but if you know of a quick way off the top of your head, I'd be interested.
I essentially need to

  1. flatten the MultiIndex from major axis (levels and labels)
  2. combine that flattened index with the minor axis (levels and labels)

Step 2 could be complicated by minor_axis being a MultiIndex also, I should be able to refactor this pretty easily to handle that. I think my was is making several copies of each index, and I'm not sure that they're all necessary.

@jtratner
Copy link
Contributor

jtratner commented Nov 2, 2013

What do you mean by flatten? You can get tuples by using the values
property. Check out _sparsify too.
On Nov 2, 2013 3:37 PM, "Tom Augspurger" notifications@github.com wrote:

Closes #5402 #5402 WIP for now.

So this is a hack to get things "working".

I mainly just wanted to ask if there was a better way to get the levels
and labels to the MultiIndex constructor.
Don't spend long thinking about it (this is my PR after all), but if you
know of a quick way off the top of your head, I'd be interested.
I essentially need to

  1. flatten the MultiIndex from major axis (levels and labels)
  2. combine that flattened index with the minor axis (levels and labels)

Step 2 could be complicated by minor_axis being a MultiIndex also, I
should be able to refactor this pretty easily to handle that. I think my
was is making several copies of each index, and I'm not sure that they're

all necessary.

You can merge this Pull Request by running

git pull https://github.com/TomAugspurger/pandas to-frame-multi

Or view, comment on, or merge it at:

#5417
Commit Summary

  • BUG: Panel.to_frame() with MultiIndex major axis

File Changes

Patch Links:

@TomAugspurger
Copy link
Contributor Author

OK, I think this is just about there if you want to take a look. Essentially I check if the major and minor axes are a MultiIndex. If they are, I repeat labels the necessary number of times and rearrange them appropriately. I need to write a few more test cases to make sure I'm not getting any false positives.

I didn't see a vbench covering to_frame. Think we need one?

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

I believe we need to set up a way to have vbench conditional on version -
@y-p I recall you saying that vbenches were supposed to run across many
versions of pandas, but maybe we could add a version check (decorator?
kwarg?) to the setup code?

@@ -1320,6 +1320,51 @@ def test_to_frame_mixed(self):
# Previously, this was mutating the underlying index and changing its name
assert_frame_equal(wp['bool'], panel['bool'], check_names=False)

def test_to_frame_multi_major(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standard notes on testing MI: make sure you have 1 example that is non-lex-sorted and 1 that has nan in some of the levels.

@jreback
Copy link
Contributor

jreback commented Nov 4, 2013

maybe put those conversions in a function (in index?)

names=[maj_name, min_name], verify_integrity=False)
if isinstance(self.major_axis, MultiIndex):
major_levels = self.major_axis.levels
major_labels = map(lambda x: np.repeat(x, K), self.major_axis.labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a list comprehension instead?

@ghost
Copy link

ghost commented Nov 4, 2013

@jtratner, that was re: test_perf.py, I had some scripts relying on it to auto. sample perf back in history
and find regression points using bisection. When some contrib. changes to test_perf broke
it for older pandas, I added some comments to it to make it clear that back-compat is a requirement,
since test_perf itself relies on pandas for data processing.

I didn't see anything here that calls for changes in vbench, can you explain the issue?

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

okay, that makes sense. I just wasn't sure if to_frame() has always been a method on Panel. thanks for clarifying.

@TomAugspurger
Copy link
Contributor Author

@jreback Which conversions? Just this line?

major_labels = [np.repeat(x, K) for x in self.major_axis.labels]

@ghost
Copy link

ghost commented Nov 4, 2013

@jtratner, ok well In that case I think the existing start_date kwarg attached to each vbench should be all you need.

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

wow, I should have noticed that.

major_names = [major_names]
if isinstance(self.minor_axis, MultiIndex):
minor_levels = self.minor_axis.levels
minor_labels = map(lambda x: np.repeat(x, N), self.minor_axis.labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I meant these if/else....they are basically the same for the minor/major axes so could do it with a function; even better to have this as a index method, something like (which does essentially what your if else does), except it works separately on Multi-index (the if part) and Index (the else part)

def conform_to_shape(self, N, K):

   return levels, labels, names

@jreback
Copy link
Contributor

jreback commented Nov 4, 2013

makes a lot of sense as an index method because has method in index and multi index

@@ -2373,6 +2373,42 @@ def format(self, space=2, sparsify=None, adjoin=True, names=False,
else:
return result_levels

def to_hierarchical(self, N, K=1):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I rename this to _to_hierarchical?

@TomAugspurger
Copy link
Contributor Author

Ok those tests should be clearer now. Earlier @jtratner mentioned

I'm not totally sure, but I believe you could actually pay the cost of filtering the panel on selector first, and then you wouldn't have to filter on selector everywhere else. Slower, but cleaner maybe?

which makes a lot of sense. I just need to figure out the best way filter the panel... It's not immediately obvious to me.

@TomAugspurger
Copy link
Contributor Author

Can you post some examples of 'selectors' - - might be easier to reason
about the selector separate from the context of this method.

selector is a 1-d array of bools built by raveling a 2-d mask. The mask is the (major, minor) axis pairs that contain NaNs (or will contain NaNs).

Example:

idx = pd.MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), (2, 'two')])
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'c', 'd']}, index=idx)
wp = pd.Panel({'i1': df, 'i2': df})

wp.loc['i1', (1, 'one')]['A'] = np.nan
In [24]: mask = com.notnull(wp.values).all(axis=0)

In [25]: mask
Out[25]: 
array([[False,  True],
       [ True,  True],
       [ True,  True],
       [ True,  True]], dtype=bool)
In [28]: selector = mask.ravel()

In [29]: selector
Out[29]: array([False,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

so that one false is coming from the nan. It means we want to filter the first (major, minor) pair from the output.

@TomAugspurger
Copy link
Contributor Author

EDIT: I think this is a bug. made an issue at #5773

Getting back to this, and I'm cleaning up some setting with copy errors in my tests.

Should the following work?

        idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'),
                                      (2, 'two')])
        df = DataFrame([[1, 'a', 1], [2, 'b', 1], [3, 'c', 1], [4, 'd', 1]],
                       columns=['A', 'B', 'C'], index=idx)
        wp = Panel({'i1': df, 'i2': df})

        wp.loc['i1', (1, 'one'), 'A'] = np.nan

Raises an

IndexError                                Traceback (most recent call last)
<ipython-input-2-3346cc0dd38b> in <module>()
      5 wp = Panel({'i1': df, 'i2': df})
      6 
----> 7 wp.loc['i1', (1, 'one'), 'A'] = np.nan

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_87_g9fefe8c-py2.7-macosx-10.8-x86_64.egg/pandas/core/indexing.pyc in __setitem__(self, key, value)
     94             indexer = self._convert_to_indexer(key, is_setter=True)
     95 
---> 96         self._setitem_with_indexer(indexer, value)
     97 
     98     def _has_valid_type(self, k, axis):

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_87_g9fefe8c-py2.7-macosx-10.8-x86_64.egg/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    297                 obj = self.obj[item]
    298                 index = obj.index
--> 299                 idx = indexer[:info_axis][0]
    300                 try:
    301                     if idx in index:

IndexError: tuple index out of range

@TomAugspurger
Copy link
Contributor Author

@jreback ready to go at your leisure. let me know if you want a note in v0.14.0.txt. I put one in release.rst.

else:
minor_levels = [self.minor_axis]
# Anyone think of a better way to do this? np.repeat does not
# do what I want
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway you can do a simple function like construct_multi_parts for the non-mi case? so don't have repeated code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure why I didn't before...

@jreback
Copy link
Contributor

jreback commented Jan 3, 2014

@TomAugspurger this is a bug fix, so unless you want to, nothing needed for whatsnew (fyi now targeting 0.13.1, though it still may go to 0.14, depends if we have critical bugs ..)

@jreback
Copy link
Contributor

jreback commented Jan 3, 2014

@jtratner look ok to u?

@@ -73,6 +73,8 @@ Bug Fixes
~~~~~~~~~
- Bug in Series replace with timestamp dict (:issue:`5797`)
- read_csv/read_table now respects the `prefix` kwarg (:issue:`5732`).
- ``Panel.to_frame()`` no longer fails when ``major_axis`` is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move to 0.13.1

@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

@TomAugspurger this looks fine...pls move release notes to 0.13.1 and rebase...

Fixes bug that caused failure on Panel.to_frame()
if major_axis was a MultiIndex (pandas-dev#5402)

refactor reindexing to index.py

API/REF: refactor bits to index. rename
@TomAugspurger
Copy link
Contributor Author

@jreback rebased and moved to 0.13.1

jreback added a commit that referenced this pull request Jan 15, 2014
BUG: Panel.to_frame() with MultiIndex major axis
@jreback jreback merged commit b15ba30 into pandas-dev:master Jan 15, 2014
@jreback
Copy link
Contributor

jreback commented Jan 15, 2014

thanks @TomAugspurger !

@TomAugspurger TomAugspurger deleted the to-frame-multi branch November 3, 2016 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/Not Implemented Panel.to_frame() with MultiIndex
3 participants