Add blog article about the new collections performance #832

julienrf · 2018-02-06T09:50:35Z

No description provided.

olafurpg

Nice post, easy to follow!

olafurpg · 2018-02-06T10:02:21Z

blog/_posts/2018-02-07-collections-performance.md

+based operation implementations are about 25% slower than builder based implementations
+and I’ve explained how we restored builder based implementations on strict collections.
+
+I expect the new collections to be as fast or slightly faster than the previous collections.


s/fast or slightly faster/equally fast or slightly faster/s

?

olafurpg · 2018-02-06T10:02:45Z

blog/_posts/2018-02-07-collections-performance.md

+[merged](https://github.com/scala/collection-strawman/pull/342)
+a completely new implementation of immutable `Set` and `Map` based on [compressed
+hash-array mapped prefix-trees](https://michael.steindorfer.name/publications/oopsla15.pdf).
+This data structure has a smaller memory footprint than the old `HashSet` and `HashMap`,


Can you include concrete numbers here? Sounds amazing!

olafurpg · 2018-02-06T10:04:47Z

blog/_posts/2018-02-07-collections-performance.md

+them to be slower since the execution path, when calling an operation, can be made
+exactly the same as in the old collections.
+
+## Conclusion


I would include "correct first" somewhere in the conclusion, it's an interesting point worth repeating.

MasseGuillaume · 2018-02-06T10:06:26Z

blog/_posts/2018-02-07-collections-performance.md

+
+In a [previous blog post](/blog/2017/11/28/view-based-collections.html), I explained
+how [Scala 2.13’s new collections](http://www.scala-lang.org/blog/2017/02/28/collections-rework.html)
+have been designed so that the default implementations of transformation operations work


is designed such

MasseGuillaume · 2018-02-06T10:06:41Z

blog/_posts/2018-02-07-collections-performance.md

+how [Scala 2.13’s new collections](http://www.scala-lang.org/blog/2017/02/28/collections-rework.html)
+have been designed so that the default implementations of transformation operations work
+with both strict and non-strict types of collections. In essence, we abstract over
+the evaluation mode (strict or non strict) of concrete collection types.


MasseGuillaume · 2018-02-06T10:07:28Z

blog/_posts/2018-02-07-collections-performance.md

+builder based versions. How much slower exactly varies with the type of collection
+(e.g. `List`, `Vector`, `Set`), the operation (e.g. `map`, `flatMap`, `filter`)
+and the number of elements in the collection. In my benchmark on `Vector`, on
+the `map`, `filter` and `flatMap` operations, with 1 element to 7 million of


with 1 to 7 million of elements

MasseGuillaume · 2018-02-06T10:07:49Z

blog/_posts/2018-02-07-collections-performance.md

+For reference, the source code of the new collections is available in
+[this GitHub repository](https://github.com/scala/collection-strawman).
+
+## Overhead Of View Based Implementations


MasseGuillaume · 2018-02-06T10:07:59Z

blog/_posts/2018-02-07-collections-performance.md

+Here we use `StrictOptimizedSeqOps`, which is a specialization of `StrictOptimizedIterableOps`
+for `Seq` collections.
+
+## Is The View Based Design Worth It?


MasseGuillaume · 2018-02-06T10:08:16Z

blog/_posts/2018-02-07-collections-performance.md

+
+## Overhead Of View Based Implementations
+
+Let’s be clear: the view based implementations are in general slower than their


MasseGuillaume · 2018-02-06T10:08:32Z

blog/_posts/2018-02-07-collections-performance.md

+## How To Fix That Performance Regression?
+
+Our solution is simply to go back to builder based implementations for strict collections: we
+override the default view based implementations with more efficient builder based


MasseGuillaume · 2018-02-06T10:08:40Z

blog/_posts/2018-02-07-collections-performance.md

+These charts show the execution time (vertically) of the `filter`, `map` and `flatMap`
+operations, according to the number of elements (horizontally). Note that scales are
+logarithmic in both axis. The blue line shows the performance of the old `Vector`,
+the green line shows the performance of the new `Vector` if it used only view based


MasseGuillaume · 2018-02-06T10:10:13Z

blog/_posts/2018-02-07-collections-performance.md

+Again, the answer depends on the type of collection, the operations and the number of elements.
+My `Vector` benchmarks show a 20% speedup on average:
+
+![](/resources/img/new-collections-performance-filter.png)


unit for the y-axis? s or ms?

the title should be: Vector.filter (log-scaled)

MasseGuillaume · 2018-02-06T10:10:40Z

blog/_posts/2018-02-07-collections-performance.md

+## Is The View Based Design Worth It?
+
+In my previous article I explained that a drawback of the old builder based design was that,
+on non strict collections (e.g. `Stream` or `View`), we had to carefully override all the


MasseGuillaume · 2018-02-06T10:10:45Z

blog/_posts/2018-02-07-collections-performance.md

+
+In my previous article I explained that a drawback of the old builder based design was that,
+on non strict collections (e.g. `Stream` or `View`), we had to carefully override all the
+default implementations of transformation operations to make them non strict.


MasseGuillaume · 2018-02-06T10:10:50Z

blog/_posts/2018-02-07-collections-performance.md

+default implementations of transformation operations to make them non strict.
+
+Now it seems that the situation is just reversed: the default implementations work well
+with non strict collections, but we have to override them in strict collections.


MasseGuillaume · 2018-02-06T10:11:12Z

blog/_posts/2018-02-07-collections-performance.md

+So, is the new design worth it? To answer this question I will quote a comment posted
+by Stefan Zeiger [here](https://www.reddit.com/r/scala/comments/7g52cy/let_them_be_lazy/dqixt8d/):
+
+> The lazy-by-default approach is mostly beneficial when you're implementing lazy


most beneficial

MasseGuillaume · 2018-02-06T10:11:29Z

blog/_posts/2018-02-07-collections-performance.md

+
+These charts show the execution time (vertically) of the `filter`, `map` and `flatMap`
+operations, according to the number of elements (horizontally). Note that scales are
+logarithmic in both axis. The blue line shows the performance of the old `Vector`,


MasseGuillaume · 2018-02-06T10:11:55Z

blog/_posts/2018-02-07-collections-performance.md

+
+Since operation implementations end up being the same, why do we get better performance
+at all? Well, these numbers are specific to `Vector`, and are due to the fact that
+we more agressively inlined a few critical methods. I don’t expect the new collections


aggressively

MasseGuillaume · 2018-02-06T10:12:34Z

blog/_posts/2018-02-07-collections-performance.md

+we more agressively inlined a few critical methods. I don’t expect the new collections
+to be *always* 20% faster than the old collections. However, there is no reason for
+them to be slower since the execution path, when calling an operation, can be made
+exactly the same as in the old collections.


exactly the same => the same

MasseGuillaume · 2018-02-06T10:14:42Z

blog/_posts/2018-02-07-collections-performance.md

+title: On Performance of the New Collections
+---
+
+In a [previous blog post](/blog/2017/11/28/view-based-collections.html), I explained


I would group all those paragraphs. until Overhead Of View Based Implementations

MasseGuillaume · 2018-02-06T10:15:28Z

blog/_posts/2018-02-07-collections-performance.md

+
+## Overhead Of View Based Implementations
+
+Let’s be clear: the view based implementations are in general slower than their


Let’s be clear,

MasseGuillaume · 2018-02-06T10:16:02Z

blog/_posts/2018-02-07-collections-performance.md

+Let’s be clear: the view based implementations are in general slower than their
+builder based versions. How much slower exactly varies with the type of collection
+(e.g. `List`, `Vector`, `Set`), the operation (e.g. `map`, `flatMap`, `filter`)
+and the number of elements in the collection. In my benchmark on `Vector`, on


MasseGuillaume · 2018-02-06T10:16:49Z

blog/_posts/2018-02-07-collections-performance.md

+
+## How To Fix That Performance Regression?
+
+Our solution is simply to go back to builder based implementations for strict collections: we


builder-based

MasseGuillaume · 2018-02-06T10:16:58Z

blog/_posts/2018-02-07-collections-performance.md

+## How To Fix That Performance Regression?
+
+Our solution is simply to go back to builder based implementations for strict collections: we
+override the default view based implementations with more efficient builder based


builder-based

MasseGuillaume · 2018-02-06T10:17:23Z

blog/_posts/2018-02-07-collections-performance.md

+
+Our solution is simply to go back to builder based implementations for strict collections: we
+override the default view based implementations with more efficient builder based
+ones. We actually and up with the same implementations as in the old collections.


We actually and up with ?

MasseGuillaume · 2018-02-06T10:19:05Z

blog/_posts/2018-02-07-collections-performance.md

+
+## Is The View Based Design Worth It?
+
+In my previous article I explained that a drawback of the old builder based design was that,


In my previous article,

MasseGuillaume · 2018-02-06T10:21:17Z

blog/_posts/2018-02-07-collections-performance.md

+
+## Is The View Based Design Worth It?
+
+In my previous article I explained that a drawback of the old builder based design was that,


This sentence could be split into two.

In my previous article, I explained that a drawback of the old builder based design. On non-strict collections (e.g. Stream or View), we had to carefully override all the default implementations of transformation operations to make them non-strict.

MasseGuillaume · 2018-02-06T10:22:26Z

blog/_posts/2018-02-07-collections-performance.md

+> implementation for a strict collection type you only suffer a small performance
+> impact but it's still correct.
+
+In short: implementations are **correct first** in the new design but you might want to


julienrf · 2018-02-07T09:16:12Z

Just a heads up that I defined the release date for this article to today. Let me know if you need more time to review or if I should merge it.

lrytz · 2018-02-07T11:18:14Z

blog/_posts/2018-02-07-collections-performance.md

+the green line shows the performance of the new `Vector` if it used only view based
+implementations, and the red line shows the actual performance of the new `Vector`
+(with strict optimized implementations). Benchmark source code and numbers can be found
+[here](https://gist.github.com/julienrf/f1cb2b062cd9783a35e2f35778959c76).


In the logarithmic scale it's not obvious to see the speedup factor. could you say something about it?

I mention that the scale is logarithmic and I’ve given the speedup factors in the text (view-based implementations are 25% slower than builder-based ones, and the new collections are 35% faster than the old collections).

What should I add?

Ah I missed the mention of 35 above the charts. I guess my concern is that the graphs the thing that really jumps at people's eyes. The current for with the logarithmic scale doesn't really visualize the nice improvements. Maybe you could do some bars on a linear scale, normalize the old collection to 1, and add time values to the bars? Something like https://docs.google.com/spreadsheets/d/1Jcw3iG5sx_Xo-3svjb_qmB4Fprz5sEX7nCFpCd4EcWQ/edit?usp=sharing

Yeah I was considering of doing something exactly like that! OK, let’s do it then.

SethTisue

publishable as-is, but I've made a few suggestions.

SethTisue · 2018-02-07T18:28:54Z

blog/_posts/2018-02-07-collections-performance.md

+> impact but it's still correct.
+
+In short, implementations are **correct first** in the new design but you might want to
+override them for performance reasons on strict collections.


This sentence seems like the crux of the whole post. I suggest saying something this right at the top. Many people will read partway in and then bail, so you should hit the takeaways early, then proceed with more detailed explanations.

SethTisue · 2018-02-07T18:31:21Z

blog/_posts/2018-02-07-collections-performance.md

+
+## Performance Comparison With 2.12’s Collections
+
+Talking about performance, how performant are the new collections compared to the old ones?


Again, I suggest including some brief answer to this very near the top of the whole blog post.

julienrf · 2018-02-08T09:48:46Z

Alright, thanks all for your reviews, I’ve improved the charts and mentioned the takeaways in the first sections. I can squash the commits but I think the “squash and merge” button should just work.

julienrf · 2018-02-09T10:42:03Z

I updated the publication date to today.

lrytz · 2018-02-09T10:49:50Z

Looks good!

Add blog article about the new collections performance

2d86bdc

julienrf requested review from szeiger, lrytz, SethTisue and heathermiller February 6, 2018 09:50

olafurpg approved these changes Feb 6, 2018

View reviewed changes

MasseGuillaume reviewed Feb 6, 2018

View reviewed changes

Address Guillaume and Olaf reviews. Also fix a wrong number.

b2a34a0

lrytz reviewed Feb 7, 2018

View reviewed changes

SethTisue approved these changes Feb 7, 2018

View reviewed changes

julienrf added 3 commits February 8, 2018 10:19

Improve charts

0c7aba2

Mention the takeways in the first section

c83aa6f

Change publication date

c3df352

Polish charts

6036b26

julienrf force-pushed the new-collections-performance branch from 964faed to 6036b26 Compare February 8, 2018 12:10

Changed publication date

d324021

julienrf merged commit 2d59211 into scala:master Feb 9, 2018

julienrf deleted the new-collections-performance branch February 9, 2018 12:54


		## Overhead Of View Based Implementations

		Let’s be clear: the view based implementations are in general slower than their


		## How To Fix That Performance Regression?

		Our solution is simply to go back to builder based implementations for strict collections: we


		## Is The View Based Design Worth It?

		In my previous article I explained that a drawback of the old builder based design was that,


		## Performance Comparison With 2.12’s Collections

		Talking about performance, how performant are the new collections compared to the old ones?

Add blog article about the new collections performance #832

Add blog article about the new collections performance #832

Uh oh!

Conversation

julienrf commented Feb 6, 2018

Uh oh!

olafurpg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MasseGuillaume Feb 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julienrf commented Feb 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrytz Feb 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SethTisue left a comment

MasseGuillaume Feb 6, 2018 •

edited

Loading

lrytz Feb 7, 2018 •

edited

Loading