Skip to content

ENH: Add axis and level keywords to where, so that the other argument can now be an alignable pandas object. #4781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 10, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Sep 9, 2013

So traditionally a fillna that does the means of the columns is an apply operation

In [1]: df = DataFrame(np.random.randn(10,3))

In [2]: df.iloc[3:5,0] = np.nan

In [3]: df.iloc[4:6,1] = np.nan

In [4]: df.iloc[5:8,2] = np.nan

In [5]: df
Out[5]: 
          0         1         2
0  0.096030  0.197451  1.645981
1 -0.443437  0.359204 -0.382563
2  0.613981  1.418754 -0.589935
3  0.000000  0.449953 -0.308414
4  0.000000  0.000000 -0.471054
5 -2.350309  0.000000  0.000000
6 -0.218522  0.498207  0.000000
7  0.478238  0.399154  0.000000
8  0.895854  0.230992  0.025799
9  0.085675  2.189373 -0.946990

The following currently fails in 0.12 as where is finicky about how it broadcasts

In [4]: df.where(df>0,df[0],axis='index')
ValueError: other must be the same shape as self when an ndarray

Adding axis and level arguments to where (which uses align under the hood), now enables the other object to be a Series/DataFrame (as well as a scalar) without a whole bunch of alignment/broadcasting. This also should be quite a bit faster.

IIn [6]: df.where(df>0,df[0],axis='index')
Out[6]: 
          0         1         2
0  0.096030  0.197451  1.645981
1 -0.443437  0.359204 -0.443437
2  0.613981  1.418754  0.613981
3  0.000000  0.449953  0.000000
4  0.000000  0.000000  0.000000
5 -2.350309 -2.350309 -2.350309
6 -0.218522  0.498207 -0.218522
7  0.478238  0.399154  0.478238
8  0.895854  0.230992  0.025799
9  0.085675  2.189373  0.085675

This works in 0.12.

In [7]: df.apply(lambda x, y: x.where(x>0,y), y=df[0])
Out[7]: 
          0         1         2
0  0.096030  0.197451  1.645981
1 -0.443437  0.359204 -0.443437
2  0.613981  1.418754  0.613981
3  0.000000  0.449953  0.000000
4  0.000000  0.000000  0.000000
5 -2.350309 -2.350309 -2.350309
6 -0.218522  0.498207 -0.218522
7  0.478238  0.399154  0.478238
8  0.895854  0.230992  0.025799
9  0.085675  2.189373  0.085675

@bmu
Copy link

bmu commented Sep 9, 2013

fillna works with a Series, see my answer on SO.

@jreback
Copy link
Contributor Author

jreback commented Sep 9, 2013

@bmu right....I had enabled that for 0.12 (duh!)
(and I fixed the doc string)

should have had a better example one that is somewhat non-trivial

I have updated the top display

jreback added a commit that referenced this pull request Sep 10, 2013
ENH: Add axis and level keywords to where, so that the other argument can now be an alignable pandas object.
@jreback jreback merged commit 13210b7 into pandas-dev:master Sep 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants