Skip to content

FIX: Bug whereby array_equivalent was not correctly comparing Float64Ind... #6597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2014

Conversation

unutbu
Copy link
Contributor

@unutbu unutbu commented Mar 11, 2014

Currently,

>>> import pandas.core.common as com
>>> com.array_equivalent(Float64Index([0, np.nan]), Float64Index([0, np.nan]))
False

Although the current pandas code base does not use array_equivalent to compare Float64Indexes, leaving array_equivalent in its current state may be a bug waiting to happen.

This PR attempts to fix the problem by using pd.isnull for all arrays of dtype object. In a previous PR I tried this and got terrible perf results. Since then I've discovered that my machine does not have enough memory to run the full perf test suit without page faults. If I rerun test_perf.sh for just a few Benchmarks, I can avoid the page faults and get consistent results.

Running /usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent yielded two tests with ratio > 1.1.

reindex_fillna_pad                           |   0.5784 |   0.5034 |   1.1490 |
packers_write_pack                           |  15.2360 |   7.1851 |   2.1205 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

which I believe were due to page faults. When I reran perf on just these tests using
/usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent -r "reindex_fillna_pad|packers_write_pack"

I got

Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
reindex_fillna_pad_float32                   |   0.4633 |   0.4590 |   1.0093 |
packers_write_pack                           |   7.9544 |   7.8390 |   1.0147 |
reindex_fillna_pad                           |   0.7290 |   0.7180 |   1.0154 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

jreback added a commit that referenced this pull request Mar 11, 2014
FIX: Bug whereby array_equivalent was not correctly comparing Float64Ind...
@jreback jreback merged commit 45009f0 into pandas-dev:master Mar 11, 2014
@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

thank you sir!

@jreback jreback added this to the 0.14.0 milestone Mar 11, 2014
@unutbu unutbu deleted the fix-equivalent branch March 11, 2014 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants