Skip to content

Commit 7f3f7c9

Browse files
committed
Add blog article about the new collections
1 parent 210dd53 commit 7f3f7c9

File tree

7 files changed

+211
-0
lines changed

7 files changed

+211
-0
lines changed
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
layout: blog-detail
3+
post-type: blog
4+
by: Julien Richard-Foy
5+
title: Scala 2.13’s Collections
6+
---
7+
8+
One more article about the standard collections, really? Indeed, during the last
9+
18 months a lot of work has been done on the collections and we’ve published
10+
several blog articles and given several talks to the explain various changes or
11+
challenges we were facing. This article attempts to summarize **what is going
12+
to change from an end-user perspective**.
13+
14+
In case you’ve thoroughly followed our previous blog posts and talks, you might
15+
not learn much from this article. Otherwise, this is the perfect opportunity
16+
to catch up on the topic in a few minutes!
17+
18+
The next section presents the internal changes in the collections implementation
19+
that might have some visible impact on the surface. Then, I will show why I think
20+
that the removal of `CanBuildFrom` made the API more beginner friendly. Next, I
21+
will introduce some new operations available in the collections. Finally, I
22+
will mention the main deprecations, the motivation behind them, and their
23+
recommended replacement.
24+
25+
## Under The Hood: A Cleaner Ground
26+
27+
![iceberg](/resources/img/blog/iceberg.jpeg)
28+
29+
The most important change in the new collections framework is that transformation
30+
operations (such as `map` or `filter`) are now implemented in a way that works with both
31+
strict collections (such as `List`) and non-strict collections (such as `Stream`).
32+
This is a change because this was not the case before. Indeed, the previous
33+
implementations were strict (they eagerly evaluated the collection elements) and had
34+
to be overridden by non-strict collection types. You can find more details about that in
35+
[this blog post](/blog/2017/11/28/view-based-collections.html).
36+
37+
The good news is that the new design is more **correct** in the sense that you can
38+
now implement custom non-strict collection types without having to worry about
39+
re-implementing a ton of operations. (Some operations, though, still eagerly evaluate
40+
the collection elements (e.g. `groupBy`) and will be clearly documented.) Another benefit
41+
is that transformation operations defined outside of the collections (like in the
42+
[cvogt/scala-extensions](https://github.com/cvogt/scala-extensions) project)
43+
now work with non-strict collections (such as `View` or `Stream`).
44+
45+
Speaking of non-strict collections, the `View` type has been redesigned and
46+
views should behave in a more predictable way. Also, `Stream` has been
47+
deprecated in favor of `LazyList` (see the last section).
48+
49+
## Life Without `CanBuildFrom`
50+
51+
I think the most visible change for end-users is that transformation operations
52+
don’t use `CanBuildFrom` anymore. I believe this will be quite visible despite our previous
53+
efforts to *hide* `CanBuildFrom` from the API documentation in the current collections.
54+
Indeed, if you take a look at the
55+
[current `List` API](/api/2.12.6/scala/collection/immutable/List.html), the signature
56+
shown for the `map` operation does not mention `CanBuildFrom`:
57+
58+
![there is no CanBuildFrom](/resources/img/blog/scaladoc-list-map.png)
59+
60+
However, if you use this operation in your code, then your IDE reveals its actual signature:
61+
62+
![what is That](/resources/img/blog/ij-list-map.png)
63+
64+
As you can see, the type signature shown in the API documentation has been “simplified”
65+
to make it more approachable, but I believe that this is probably introducing more
66+
confusion to the users. Especially when you look at the
67+
[`TreeMap[A, B]` API](/api/2.12.6/scala/collection/immutable/TreeMap.html):
68+
69+
![wtf](/resources/img/blog/scaladoc-treemap-map.png)
70+
71+
This type signature makes no sense: the result type can not be `TreeMap[B]` since
72+
`TreeMap` takes *two* type parameters (the type of keys and the type
73+
of values). Also, the function `f` actually takes a *key-value pair* as parameter,
74+
not just a key (as incorrectly indicated by the type `A`).
75+
76+
`CanBuildFrom` was used for good reasons, in particular the type `That` shown
77+
in the above screenshot was *computed* according to the type of the source
78+
collection and the type of elements of the new collection. The case of `TreeMap`
79+
is compelling: in case you transform your key-value pairs into other key-value
80+
pairs for which the type of keys has an implicit `Ordering` instance, then `map`
81+
returns a `TreeMap`, but if there is no such `Ordering` instance then the best
82+
collection type that can be returned is `Map`. And if you transform the key-value
83+
pairs into something that is not even a pair, then the best collection type
84+
that can be returned is `Iterable`. These three cases were supported by
85+
a single operation implementation, and `CanBuildFrom` was used to abstract over
86+
the various possible return types.
87+
88+
In the new collections we wanted to have simpler type signatures so that we
89+
can shoulder their actual form in the API documentation and auto-completion
90+
provided by IDEs is not scary. We achieve that by using overloading, as
91+
explained in more details in
92+
[this blog article](/blog/2017/05/30/tribulations-canbuildfrom.html).
93+
94+
In practice, this means that the new `TreeMap` has three overloads of the
95+
`map` operation:
96+
97+
![](/resources/img/blog/scaladoc-new-treemap-map.png)
98+
99+
These type signatures are the actual ones and they essentially translate
100+
“in types” what I’ve written above about the possible result types of `map`
101+
according to the type of elements returned by the transformation function `f`.
102+
I believe that the new API is simpler to understand.
103+
104+
## New And Noteworthy
105+
106+
We have introduced a few new operations. The following sections
107+
present some of them.
108+
109+
### `groupMap`
110+
111+
A common pattern with the current collection is to use `groupBy`
112+
followed by `mapValues` to transform the groups. For instance,
113+
this is how we can index the names of a collection of users by
114+
their age:
115+
116+
~~~ scala
117+
case class User(name: String, age: Int)
118+
119+
def namesByAge(users: Seq[User]): Map[Int, Seq[String]] =
120+
users.groupBy(_.age).mapValues(users => users.map(_.name))
121+
~~~
122+
123+
There is a subtlety in this code. The static return type is `Map`
124+
but the `Map` implementation actually returned is lazy and evaluates
125+
its elements each time it is traversed (ie the `users => users.map(_.name)`
126+
function is evaluated each time the `Map` is traversed).
127+
128+
In the new collections the return type of `mapValues` is a `MapView` instead
129+
of a `Map`, to clearly indicate that its contents is evaluated each time it
130+
is traversed.
131+
132+
Furthermore, we have introduced an operation named `groupMap`
133+
that both groups elements and transforms the groups. The above code
134+
can be rewritten as follows to take advantage of `groupMap`:
135+
136+
~~~
137+
def namesByAge(users: Seq[User]): Map[Int, Seq[String]] =
138+
users.groupMap(_.age)(_.name)
139+
~~~
140+
141+
The returned `Map` is strict: it eagerly evaluates its elements
142+
once. Also, the fact that it is implemented as a single operation
143+
makes it possible to apply some optimizations that make it
144+
~1.3x faster than the version that uses `mapValues`.
145+
146+
### `InPlace` Transformation Operations
147+
148+
Mutable collections have a couple of new operations for transforming
149+
their elements in place: instead of returning a new collection (like
150+
`map` and `filter` do) they mutate the source collection. These
151+
operations are suffixed with `InPlace`. For instance, to remove
152+
users whose name start with the letter `J` from a buffer and then
153+
increment their age, one can now write:
154+
155+
~~~ scala
156+
val users = ArrayBuffer(…)
157+
users
158+
.filterInPlace(user => !user.name.startsWith("J"))
159+
.mapInPlace(user => user.copy(age = user.age + 1))
160+
~~~
161+
162+
## Deprecations For Less Confusion
163+
164+
A consequence of cleaning and simplifying the collections framework
165+
is that several types or operations have been deprecated.
166+
167+
### `Iterable` Is The Top Collection Type
168+
169+
We felt that having a distinction between `Traversable` and `Iterable` was not
170+
worth it, so we removed `Traversable` (it is now an alias to `Iterable[A]`).
171+
172+
`IterableOnce[A]` is now the collection type at the top of the hierarchy.
173+
Its only abstract member is `def iterator: Iterator[A]`.
174+
175+
### `LazyList` Is Preferred Over `Stream`
176+
177+
`Stream` is deprecated in favor of `LazyList`. As its name suggests,
178+
a `LazyList` is a linked list whose elements are lazily evaluated. An
179+
important semantic difference with `Stream` is that in `LazyList` both
180+
the head and the tail are lazy, whereas in `Stream` only the tail is lazy.
181+
182+
### Insertion And Removal Operations Are Not Available On Generic Collections
183+
184+
In the current framework, the `scala.collection.Map` type has a `+` and a `-` operations
185+
to add and remove entries. The semantics of these operations is to return a new collection
186+
with the added or removed entries, without changing the source collection.
187+
188+
These operations are then inherited by the mutable branch of the collections. But the mutable
189+
collection types also introduce their own insertion and removal operations, namely `+=` and `-=`,
190+
which modify the source collection in place. This means that the `scala.collection.mutable.Map` type
191+
has `+` and `+=`, as well as `-` and `-=`.
192+
193+
Having all these operations can be handy in some cases but can also introduce confusion. If you want
194+
to use `+` or `-`, then you probably wanted to use an immutable collection type in the first place…
195+
Another example is the `updated` operation, which is available on mutable `Map` but returns a new
196+
collection.
197+
198+
We think that by deprecating these insertion and removal operations from generic collection
199+
types and by having distinct operations between the `mutable` and `immutable` branches we make
200+
the situation clearer.
201+
202+
## Summary
203+
204+
In summary, the changes for end-users are the following:
205+
206+
- non-strict collections (such as views) are safer to use and simpler implement,
207+
- type signatures of transformation operations (such as `map`) are simpler
208+
(no implicit `CanBuildFrom` parameter),
209+
- new cool operations have been added,
210+
- the type hierarchy is simpler (no `Traversable`),
211+
- mutable collection types do not inherit immutable insertion and removal operations.

resources/img/blog/iceberg.jpeg

49.7 KB
Loading

resources/img/blog/ij-list-map.png

16.6 KB
Loading
12.2 KB
Loading
10.5 KB
Loading
25.4 KB
Loading
11.6 KB
Loading

0 commit comments

Comments
 (0)