@@ -24,18 +24,18 @@ For reference, the source code of the new collections is available in
24
24
25
25
## Overhead Of View Based Implementations
26
26
27
- Let’s be clear: the view based implementations are in general slower than their
27
+ Let’s be clear, the view based implementations are in general slower than their
28
28
builder based versions. How much slower exactly varies with the type of collection
29
29
(e.g. ` List ` , ` Vector ` , ` Set ` ), the operation (e.g. ` map ` , ` flatMap ` , ` filter ` )
30
30
and the number of elements in the collection. In my benchmark on ` Vector ` , on
31
- the ` map ` , ` filter ` and ` flatMap ` operations, with 1 element to 7 million of
31
+ the ` map ` , ` filter ` and ` flatMap ` operations, with 1 to 7 million of
32
32
elements, I measured an average slowdown of 25%.
33
33
34
34
## How To Fix That Performance Regression?
35
35
36
36
Our solution is simply to go back to builder based implementations for strict collections: we
37
37
override the default view based implementations with more efficient builder based
38
- ones. We actually and up with the same implementations as in the old collections.
38
+ ones. We actually end up with the same implementations as in the old collections.
39
39
40
40
In practice these implementations are factored out in traits that can be mixed
41
41
into concrete collection types. Such trait names are always prefixed with
@@ -70,8 +70,8 @@ for `Seq` collections.
70
70
71
71
## Is The View Based Design Worth It?
72
72
73
- In my previous article I explained that a drawback of the old builder based design was that,
74
- on non strict collections (e.g. ` Stream ` or ` View ` ), we had to carefully override all the
73
+ In my previous article, I explained a drawback of the old builder based design.
74
+ On non strict collections (e.g. ` Stream ` or ` View ` ), we had to carefully override all the
75
75
default implementations of transformation operations to make them non strict.
76
76
77
77
Now it seems that the situation is just reversed: the default implementations work well
@@ -86,15 +86,15 @@ by Stefan Zeiger [here](https://www.reddit.com/r/scala/comments/7g52cy/let_them_
86
86
> implementation for a strict collection type you only suffer a small performance
87
87
> impact but it's still correct.
88
88
89
- In short: implementations are ** correct first** in the new design but you might want to
89
+ In short, implementations are ** correct first** in the new design but you might want to
90
90
override them for performance reasons on strict collections.
91
91
92
92
## Performance Comparison With 2.12’s Collections
93
93
94
94
Talking about performance, how performant are the new collections compared to the old ones?
95
95
96
96
Again, the answer depends on the type of collection, the operations and the number of elements.
97
- My ` Vector ` benchmarks show a 20 % speedup on average:
97
+ My ` Vector ` benchmarks show a 35 % speedup on average:
98
98
99
99
![ ] ( /resources/img/new-collections-performance-filter.png )
100
100
@@ -104,26 +104,29 @@ My `Vector` benchmarks show a 20% speedup on average:
104
104
105
105
These charts show the execution time (vertically) of the ` filter ` , ` map ` and ` flatMap `
106
106
operations, according to the number of elements (horizontally). Note that scales are
107
- logarithmic in both axis . The blue line shows the performance of the old ` Vector ` ,
107
+ logarithmic in both axes . The blue line shows the performance of the old ` Vector ` ,
108
108
the green line shows the performance of the new ` Vector ` if it used only view based
109
109
implementations, and the red line shows the actual performance of the new ` Vector `
110
110
(with strict optimized implementations). Benchmark source code and numbers can be found
111
111
[ here] ( https://gist.github.com/julienrf/f1cb2b062cd9783a35e2f35778959c76 ) .
112
112
113
113
Since operation implementations end up being the same, why do we get better performance
114
- at all? Well, these numbers are specific to ` Vector ` , and are due to the fact that
115
- we more agressively inlined a few critical methods. I don’t expect the new collections
116
- to be * always* 20% faster than the old collections. However, there is no reason for
114
+ at all? Well, these numbers are specific to ` Vector ` and the tested operations, they
115
+ are due to the fact that
116
+ we more aggressively inlined a few critical methods. I don’t expect the new collections
117
+ to be * always* faster than the old collections. However, there is no reason for
117
118
them to be slower since the execution path, when calling an operation, can be made
118
119
exactly the same as in the old collections.
119
120
120
121
## Conclusion
121
122
122
123
This article studied the performance of the new collections. I’ve reported that view
123
- based operation implementations are about 25% slower than builder based implementations
124
+ based operation implementations are about 25% slower than builder based implementations,
124
125
and I’ve explained how we restored builder based implementations on strict collections.
126
+ Last but not least, I’ve shown that defaulting to the view based implementations does
127
+ make sense for the sake of correctness.
125
128
126
- I expect the new collections to be as fast or slightly faster than the previous collections.
129
+ I expect the new collections to be equally fast or slightly faster than the previous collections.
127
130
Indeed, we took advantage of the rewrite to apply some more optimizations here and
128
131
again.
129
132
@@ -133,4 +136,4 @@ data structures. For instance, we recently
133
136
a completely new implementation of immutable ` Set ` and ` Map ` based on [ compressed
134
137
hash-array mapped prefix-trees] ( https://michael.steindorfer.name/publications/oopsla15.pdf ) .
135
138
This data structure has a smaller memory footprint than the old ` HashSet ` and ` HashMap ` ,
136
- and some operations are an order of magnitude faster.
139
+ and some operations can be an order of magnitude faster (e.g. ` == ` is up to 7x faster) .
0 commit comments