FIX: Bug whereby array_equivalent was not correctly comparing Float64Ind... #6597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently,
Although the current pandas code base does not use
array_equivalent
to compare Float64Indexes, leavingarray_equivalent
in its current state may be a bug waiting to happen.This PR attempts to fix the problem by using
pd.isnull
for all arrays of dtypeobject
. In a previous PR I tried this and got terrible perf results. Since then I've discovered that my machine does not have enough memory to run the full perf test suit without page faults. If I reruntest_perf.sh
for just a few Benchmarks, I can avoid the page faults and get consistent results.Running
/usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent
yielded two tests with ratio > 1.1.which I believe were due to page faults. When I reran perf on just these tests using
/usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent -r "reindex_fillna_pad|packers_write_pack"
I got