diff --git a/blog/_posts/2018-02-09-collections-performance.md b/blog/_posts/2018-02-09-collections-performance.md new file mode 100644 index 000000000..99cc26031 --- /dev/null +++ b/blog/_posts/2018-02-09-collections-performance.md @@ -0,0 +1,145 @@ +--- +layout: blog-detail +post-type: blog +by: Julien Richard-Foy +title: On Performance of the New Collections +--- + +In a [previous blog post](/blog/2017/11/28/view-based-collections.html), I explained +how [Scala 2.13’s new collections](http://www.scala-lang.org/blog/2017/02/28/collections-rework.html) +have been designed so that the default implementations of transformation operations work +with both strict and non-strict types of collections. In essence, we abstract over +the evaluation mode (strict or non strict) of concrete collection types. + +After we published that blog post, the community +[raised concerns](https://www.reddit.com/r/scala/comments/7g52cy/let_them_be_lazy/dqgol36/) +about possible performance implications of having more levels of abstraction than before. + +This blog article: + +- gives more information about the overhead of the collections’ + view-based design and our solution to remove that overhead, +- argues that for correctness reasons it is still better to have + view-based default implementations, +- shows that we should expect the new collections to be equally fast + or faster than the old collections, and reports an average speedup + of 35% in the case of `Vector`’s `filter`, `map` and `flatMap`. + +For reference, the source code of the new collections is available in +[this GitHub repository](https://github.com/scala/collection-strawman). + +## Overhead Of View Based Implementations + +Let’s be clear, the view based implementations are in general slower than their +builder based versions. How much slower exactly varies with the type of collection +(e.g. `List`, `Vector`, `Set`), the operation (e.g. `map`, `flatMap`, `filter`) +and the number of elements in the collection. In my benchmark on `Vector`, on +the `map`, `filter` and `flatMap` operations, with 1 to 7 million of +elements, I measured an average slowdown of 25%. + +## How To Fix That Performance Regression? + +Our solution is simply to go back to builder based implementations for strict collections: we +override the default view based implementations with more efficient builder based +ones. We actually end up with the same implementations as in the old collections. + +In practice these implementations are factored out in traits that can be mixed +into concrete collection types. Such trait names are always prefixed with +`StrictOptimized`. For instance, here is an excerpt of the `StrictOptimizedIterableOps` +trait: + +~~~ scala +trait StrictOptimizedIterableOps[+A, +CC[_], +C] extends IterableOps[A, CC, C] { + + override def map[B](f: A => B): CC[B] = { + val b = iterableFactory.newBuilder[B]() + val it = iterator() + while (it.hasNext) { + b += f(it.next()) + } + b.result() + } + +} +~~~ + +Then, to implement the `Vector` collection, we just mix such a “strict optimized” trait: + +~~~ scala +trait Vector[+A] extends IndexedSeq[A] + with IndexedSeqOps[A, Vector, Vector[A]] + with StrictOptimizedSeqOps[A, Vector, Vector[A]] +~~~ + +Here we use `StrictOptimizedSeqOps`, which is a specialization of `StrictOptimizedIterableOps` +for `Seq` collections. + +## Is The View Based Design Worth It? + +In my previous article, I explained a drawback of the old builder based design. +On non strict collections (e.g. `Stream` or `View`), we had to carefully override all the +default implementations of transformation operations to make them non strict. + +Now it seems that the situation is just reversed: the default implementations work well +with non strict collections, but we have to override them in strict collections. + +So, is the new design worth it? To answer this question I will quote a comment posted +by Stefan Zeiger [here](https://www.reddit.com/r/scala/comments/7g52cy/let_them_be_lazy/dqixt8d/): + +> The lazy-by-default approach is mostly beneficial when you're implementing lazy +> collections because you don't have to override pretty much everything or get +> incorrect semantics. The reverse risk is smaller: If you don't override a lazy +> implementation for a strict collection type you only suffer a small performance +> impact but it's still correct. + +In short, implementations are **correct first** in the new design but you might want to +override them for performance reasons on strict collections. + +## Performance Comparison With 2.12’s Collections + +Talking about performance, how performant are the new collections compared to the old ones? + +Again, the answer depends on the type of collection, the operations and the number of elements. +My `Vector` benchmarks show a 35% speedup on average: + +![](/resources/img/blog/new-collections-performance-filter.png) + +![](/resources/img/blog/new-collections-performance-map.png) + +![](/resources/img/blog/new-collections-performance-flatMap.png) + +These charts show the speedup factor (vertically) of the `filter`, `map` and `flatMap` +operations execution compared to the old `Vector`, for various number of elements (horizontally). +The blue line shows the old `Vector`, +the red line shows the new `Vector` if it used only view based +implementations, and the yellow line shows the actual new `Vector` +(with strict optimized implementations). Benchmark source code and numbers can be found +[here](https://gist.github.com/julienrf/f1cb2b062cd9783a35e2f35778959c76). + +Since operation implementations end up being the same, why do we get better performance +at all? Well, these numbers are specific to `Vector` and the tested operations, they +are due to the fact that +we more aggressively inlined a few critical methods. I don’t expect the new collections +to be *always* faster than the old collections. However, there is no reason for +them to be slower since the execution path, when calling an operation, can be made +exactly the same as in the old collections. + +## Conclusion + +This article studied the performance of the new collections. I’ve reported that view +based operation implementations are about 25% slower than builder based implementations, +and I’ve explained how we restored builder based implementations on strict collections. +Last but not least, I’ve shown that defaulting to view based implementations does +make sense for the sake of correctness. + +I expect the new collections to be equally fast or slightly faster than the previous collections. +Indeed, we took advantage of the rewrite to apply some more optimizations here and +again. + +More significant performance improvements can be achieved by using different +data structures. For instance, we recently +[merged](https://github.com/scala/collection-strawman/pull/342) +a completely new implementation of immutable `Set` and `Map` based on [compressed +hash-array mapped prefix-trees](https://michael.steindorfer.name/publications/oopsla15.pdf). +This data structure has a smaller memory footprint than the old `HashSet` and `HashMap`, +and some operations can be an order of magnitude faster (e.g. `==` is up to 7x faster). diff --git a/resources/img/blog/new-collections-performance-filter.png b/resources/img/blog/new-collections-performance-filter.png new file mode 100644 index 000000000..6581f25ab Binary files /dev/null and b/resources/img/blog/new-collections-performance-filter.png differ diff --git a/resources/img/blog/new-collections-performance-flatMap.png b/resources/img/blog/new-collections-performance-flatMap.png new file mode 100644 index 000000000..2c046f8fb Binary files /dev/null and b/resources/img/blog/new-collections-performance-flatMap.png differ diff --git a/resources/img/blog/new-collections-performance-map.png b/resources/img/blog/new-collections-performance-map.png new file mode 100644 index 000000000..6e4df0fb7 Binary files /dev/null and b/resources/img/blog/new-collections-performance-map.png differ