PERF: Improve duplicated perf #13751

sinhrks · 2016-07-22T07:29:27Z

closes PERF: core/base/IndexOpsMixin duplicated should be changed to use same impl as frame.duplicated #10235
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Index/Series.duplicated now uses dtype-based logic. Also skip algorithm if Index.is_unique is True (expected to be cached in practical situation).

asv:

   before     after       ratio
  [bb6b5e54] [16c6da4f]
-   34.41ms     4.69ms      0.14  algorithms.algorithm.time_int_duplicated
-   53.77ms     5.59ms      0.10  algorithms.algorithm.time_float_duplicated
-   60.22ms    40.20μs      0.00  algorithms.algorithm.time_int_unique_duplicated
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

NOTE: can create template for htable after #13716.

jorisvandenbossche · 2016-07-22T08:51:04Z

asv_bench/benchmarks/algorithms.py

+        self.int.duplicated()
+
+    def time_float_duplicated(self):
+        self.int.duplicated()


int -> float

Ah thanks. re-submit a bench:)

codecov-io · 2016-07-22T08:55:44Z

Current coverage is 84.58% (diff: 100%)

Merging #13751 into master will increase coverage by <.01%

@@             master     #13751   diff @@
==========================================
  Files           141        141          
  Lines         51233      51258    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43331      43356    +25   
  Misses         7902       7902          
  Partials          0          0

Powered by Codecov. Last update 2c047d4...12fb5ac

sinhrks · 2016-07-22T09:50:18Z

Updated the bench.algorithms.algorithm.time_int_unique_duplicated now benchmark under is_unique is once cached.

jreback · 2016-07-25T12:00:09Z

pandas/hashtable.pyx

+    cdef:
+        Py_ssize_t i, n
+        dict seen = dict()
+        object row


maybe we can use templates for these (make another issue)

sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jul 22, 2016

sinhrks added this to the 0.19.0 milestone Jul 22, 2016

sinhrks force-pushed the perf_duplicated branch from 5712fbf to b70c39a Compare July 22, 2016 08:18

jorisvandenbossche reviewed Jul 22, 2016
View reviewed changes

sinhrks force-pushed the perf_duplicated branch from b70c39a to 16c6da4 Compare July 22, 2016 08:53

sinhrks mentioned this pull request Jul 24, 2016

Add duplicated/drop_duplicates top-level array functions #2715

Closed

sinhrks force-pushed the perf_duplicated branch 2 times, most recently from 7646463 to 2f53f67 Compare July 24, 2016 15:03

PERF: Improve duplicated perf

12fb5ac

sinhrks force-pushed the perf_duplicated branch from 2f53f67 to 12fb5ac Compare July 24, 2016 16:07

jreback reviewed Jul 25, 2016
View reviewed changes

jreback closed this in 2166ac1 Jul 25, 2016

sinhrks deleted the perf_duplicated branch July 25, 2016 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Improve duplicated perf #13751

PERF: Improve duplicated perf #13751

Uh oh!

sinhrks commented Jul 22, 2016 •

edited

Loading

Uh oh!

jorisvandenbossche Jul 22, 2016

Uh oh!

sinhrks Jul 22, 2016

Uh oh!

codecov-io commented Jul 22, 2016 •

edited

Loading

Uh oh!

sinhrks commented Jul 22, 2016

Uh oh!

jreback Jul 25, 2016

Uh oh!

Uh oh!

Uh oh!

PERF: Improve duplicated perf #13751

PERF: Improve duplicated perf #13751

Uh oh!

Conversation

sinhrks commented Jul 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche Jul 22, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Jul 22, 2016

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Jul 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 84.58% (diff: 100%)

Uh oh!

sinhrks commented Jul 22, 2016

Uh oh!

jreback Jul 25, 2016

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sinhrks commented Jul 22, 2016 •

edited

Loading

codecov-io commented Jul 22, 2016 •

edited

Loading