Skip to content

ENH: Faster merge_asof() through a single-pass algo #13902

Closed
@chrisaycock

Description

@chrisaycock

Out of curiosity, I took a crack at a single-pass merge_asof(). My sample passes the existing regression tests but is "wrong" in that it works only for a single object-type "by" parameter. I use PyObjectHashTable while scanning through the right DataFrame to cache the most recently found row for each "by" object.

I could add a little type differentiation if there is interest. I see that Tempita is getting some use in pandas. The main question is whether I can use multiple columns in the "by" parameter, which would be useful for matching things like ['ticker', 'exchange']. Still investigating.

$ asv continuous master -b "join_merge.merge_asof_*"
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing into conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For pandas commit hash c4302949:
[  0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[  0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by                                               41.07ms
[ 50.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby                                           12.90ms
[ 50.00%] · For pandas commit hash 97de42ab:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by                                              608.08ms
[100.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby                                           81.03ms
   before     after       ratio
  [97de42ab] [c4302949]
-   81.03ms    12.90ms      0.16  join_merge.merge_asof_noby.time_merge_asof_noby
-  608.08ms    41.07ms      0.07  join_merge.merge_asof_by.time_merge_asof_by

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions