Skip to content

PERF: Improve duplicated perf #13751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 22, 2016

Index/Series.duplicated now uses dtype-based logic. Also skip algorithm if Index.is_unique is True (expected to be cached in practical situation).

asv:

   before     after       ratio
  [bb6b5e54] [16c6da4f]
-   34.41ms     4.69ms      0.14  algorithms.algorithm.time_int_duplicated
-   53.77ms     5.59ms      0.10  algorithms.algorithm.time_float_duplicated
-   60.22ms    40.20μs      0.00  algorithms.algorithm.time_int_unique_duplicated
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

NOTE: can create template for htable after #13716.

@sinhrks sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jul 22, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 22, 2016
self.int.duplicated()

def time_float_duplicated(self):
self.int.duplicated()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks. re-submit a bench:)

@codecov-io
Copy link

codecov-io commented Jul 22, 2016

Current coverage is 84.58% (diff: 100%)

Merging #13751 into master will increase coverage by <.01%

@@             master     #13751   diff @@
==========================================
  Files           141        141          
  Lines         51233      51258    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43331      43356    +25   
  Misses         7902       7902          
  Partials          0          0          

Powered by Codecov. Last update 2c047d4...12fb5ac

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2016

Updated the bench.algorithms.algorithm.time_int_unique_duplicated now benchmark under is_unique is once cached.

cdef:
Py_ssize_t i, n
dict seen = dict()
object row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use templates for these (make another issue)

@jreback jreback closed this in 2166ac1 Jul 25, 2016
@sinhrks sinhrks deleted the perf_duplicated branch July 25, 2016 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: core/base/IndexOpsMixin duplicated should be changed to use same impl as frame.duplicated
4 participants