|
| 1 | +--- |
| 2 | +layout: blog-detail |
| 3 | +post-type: blog |
| 4 | +by: Julien Richard-Foy |
| 5 | +title: Scala 2.13’s Collections |
| 6 | +--- |
| 7 | + |
| 8 | +One more article about the standard collections, really? Indeed, during the last |
| 9 | +18 months a lot of work has been done on the collections and we’ve published |
| 10 | +several blog articles and given several talks to the explain various changes or |
| 11 | +challenges we were facing. This article attempts to summarize **what is going |
| 12 | +to change from an end-user perspective**. |
| 13 | + |
| 14 | +In case you’ve thoroughly followed our previous blog posts and talks, you might |
| 15 | +not learn much from this article. Otherwise, this is the perfect opportunity |
| 16 | +to catch up on the topic in a few minutes! |
| 17 | + |
| 18 | +The next section presents the internal changes in the collections implementation |
| 19 | +that might have some visible impact on the surface. Then, I will show why I think |
| 20 | +that the removal of `CanBuildFrom` made the API more beginner friendly. Next, I |
| 21 | +will introduce some new operations available in the collections. Finally, I |
| 22 | +will mention the main deprecations, the motivation behind them, and their |
| 23 | +recommended replacement. |
| 24 | + |
| 25 | +## Under The Hood: A Cleaner Ground |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +The most important change in the new collections framework is that transformation |
| 30 | +operations (such as `map` or `filter`) are now implemented in a way that works with both |
| 31 | +strict collections (such as `List`) and non-strict collections (such as `Stream`). |
| 32 | +This is a change because this was not the case before. Indeed, the previous |
| 33 | +implementations were strict and had to be overridden by non-strict collection types. |
| 34 | +You can find more details about that in |
| 35 | +[this blog post](/blog/2017/11/28/view-based-collections.html). |
| 36 | + |
| 37 | +The good news is that the new design is more **correct** in the sense that you can |
| 38 | +now implement custom non-strict collection types without having to worry about |
| 39 | +re-implementing a ton of operations. Another benefit is that transformation |
| 40 | +operations defined outside of the collections (like in the |
| 41 | +[cvogt/scala-extensions](https://github.com/cvogt/scala-extensions) project) |
| 42 | +now work with non-strict collections (such as `View` or `Stream`). |
| 43 | + |
| 44 | +Speaking of non-strict collections, the `View` type has been redesigned and |
| 45 | +views should behave in a more predictable way. Also, `Stream` has been |
| 46 | +deprecated in favor of `LazyList` (see the last section). |
| 47 | + |
| 48 | +## Life Without `CanBuildFrom` |
| 49 | + |
| 50 | +I think the most visible change for end-users is that transformation operations |
| 51 | +don’t use `CanBuildFrom` anymore. I believe this will be quite visible despite our previous |
| 52 | +efforts to *hide* `CanBuildFrom` from the API documentation in the current collections. |
| 53 | +Indeed, if you take a look at the |
| 54 | +[current `List` API](/api/2.12.6/scala/collection/immutable/List.html), the signature |
| 55 | +shown for the `map` operation does not mention `CanBuildFrom`: |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +However, if you use this operation in your code, then your IDE reveals its actual signature: |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | +As you can see, the type signature shown in the API documentation has been “simplified” |
| 64 | +to make it more approachable, but I believe that this is probably introducing more |
| 65 | +confusion to the users. Especially when you look at the |
| 66 | +[`TreeMap[A, B]` API](/api/2.12.6/scala/collection/immutable/TreeMap.html): |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +This type signature makes no sense: the result type can not be `TreeMap[B]` since |
| 71 | +`TreeMap` takes *two* type parameters (the type of keys and the type |
| 72 | +of values). Also, the function `f` actually takes a *key-value pair* as parameter, |
| 73 | +not just a key (as incorrectly indicated by the type `A`). |
| 74 | + |
| 75 | +`CanBuildFrom` was used for good reasons, in particular the type `That` shown |
| 76 | +in the above screenshot was *computed* according to the type of the source |
| 77 | +collection and the type of elements of the new collection. The case of `TreeMap` |
| 78 | +is compelling: in case you transform your key-value pairs into other key-value |
| 79 | +pairs for which the type of keys has an implicit `Ordering` instance, then `map` |
| 80 | +returns a `TreeMap`, but if there is no such `Ordering` instance then the best |
| 81 | +collection type that can be returned is `Map`. And if you transform the key-value |
| 82 | +pairs into something that is not even a pair, then the best collection type |
| 83 | +that can be returned is `Iterable`. These three cases were supported by |
| 84 | +a single operation implementation, and `CanBuildFrom` was used to abstract over |
| 85 | +the various possible return types. |
| 86 | + |
| 87 | +In the new collections we wanted to have simpler type signatures so that we |
| 88 | +can shoulder their actual form in the API documentation and auto-completion |
| 89 | +provided by IDEs is not scary. We achieve that by using overloading, as |
| 90 | +explained in more details in |
| 91 | +[this blog article](/blog/2017/05/30/tribulations-canbuildfrom.html). |
| 92 | + |
| 93 | +In practice, this means that the new `TreeMap` has three overloads of the |
| 94 | +`map` operation: |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +These type signatures are the actual ones and they essentially translate |
| 99 | +“in types” what I’ve written above about the possible result types of `map` |
| 100 | +according to the type of elements returned by the transformation function `f`. |
| 101 | +I believe that the new API is simpler to understand. |
| 102 | + |
| 103 | +## New And Noteworthy |
| 104 | + |
| 105 | +We have introduced a few new operations. The following sections |
| 106 | +present some of them. |
| 107 | + |
| 108 | +### `groupMap` |
| 109 | + |
| 110 | +A common pattern with the current collection is to use `groupBy` |
| 111 | +followed by `mapValues` to transform the groups. For instance, |
| 112 | +this is how we can index the names of a collection of users by |
| 113 | +their age: |
| 114 | + |
| 115 | +~~~ scala |
| 116 | +case class User(name: String, age: Int) |
| 117 | + |
| 118 | +def namesByAge(users: Seq[User]): Map[Int, Seq[String]] = |
| 119 | + users.groupBy(_.age).mapValues(users => users.map(_.name)) |
| 120 | +~~~ |
| 121 | + |
| 122 | +There is a subtlety in this code. The static return type is `Map` |
| 123 | +but the `Map` implementation actually returned is lazy and evaluates |
| 124 | +its elements each time it is traversed (ie the `users => users.map(_.name)` |
| 125 | +function is evaluated each time the `Map` is traversed). |
| 126 | + |
| 127 | +In the new collections the return type of `mapValues` is a `MapView` instead |
| 128 | +of a `Map`, to clearly indicate that its contents is evaluated each time it |
| 129 | +is traversed. |
| 130 | + |
| 131 | +Furthermore, we have introduced an operation named `groupMap` |
| 132 | +that both groups elements and transforms the groups. The above code |
| 133 | +can be rewritten as follows to take advantage of `groupMap`: |
| 134 | + |
| 135 | +~~~ |
| 136 | +def namesByAge(users: Seq[User]): Map[Int, Seq[String]] = |
| 137 | + users.groupMap(_.age)(_.name) |
| 138 | +~~~ |
| 139 | + |
| 140 | +The returned `Map` is strict: it eagerly evaluates its elements |
| 141 | +once. Also, the fact that it is implemented as a single operation |
| 142 | +makes it possible to apply some optimizations that make it |
| 143 | +~1.3x faster than the version that uses `mapValues`. |
| 144 | + |
| 145 | +### `InPlace` Transformation Operations |
| 146 | + |
| 147 | +Mutable collections have a couple of new operations for transforming |
| 148 | +their elements in place: instead of returning a new collection (like |
| 149 | +`map` and `filter` do) they mutate the source collection. These |
| 150 | +operations are suffixed with `InPlace`. For instance, to remove |
| 151 | +users whose name start with the letter `J` from a buffer and then |
| 152 | +increment their age, one can now write: |
| 153 | + |
| 154 | +~~~ scala |
| 155 | +val users = ArrayBuffer(…) |
| 156 | +users |
| 157 | + .filterInPlace(user => !user.name.startsWith("J")) |
| 158 | + .mapInPlace(user => user.copy(age = user.age + 1)) |
| 159 | +~~~ |
| 160 | + |
| 161 | +## Deprecations For Less Confusion |
| 162 | + |
| 163 | +A consequence of cleaning and simplifying the collections framework |
| 164 | +is that several types or operations have been deprecated. |
| 165 | + |
| 166 | +### `Iterable` Is The Top Collection Type |
| 167 | + |
| 168 | +TBD |
| 169 | + |
| 170 | +### `LazyList` Is Preferred Over `Stream` |
| 171 | + |
| 172 | +`Stream` is deprecated in favor of `LazyList`. As its name suggests, |
| 173 | +a `LazyList` is a linked list whose elements are lazily evaluated. An |
| 174 | +important semantic difference with `Stream` is that in `LazyList` both |
| 175 | +the head and the tail are lazy, whereas in `Stream` only the tail is lazy. |
| 176 | + |
| 177 | +### Insertion And Removal Operations Are Not Available On Generic Collections |
| 178 | + |
| 179 | +In the current framework, the `scala.collection.Map` type has a `+` and a `-` operations |
| 180 | +to add and remove entries. The semantics of these operations is to return a new collection |
| 181 | +with the added or removed entries, without changing the source collection. |
| 182 | + |
| 183 | +These operations are then inherited by the mutable branch of the collections. But the mutable |
| 184 | +collection types also introduce their own insertion and removal operations, namely `+=` and `-=`, |
| 185 | +which modify the source collection in place. This means that the `scala.collection.mutable.Map` type |
| 186 | +has `+` and `+=`, as well as `-` and `-=`. |
| 187 | + |
| 188 | +Having all these operations can be handy in some cases but can also introduce confusion. If you want |
| 189 | +to use `+` or `-`, then you probably wanted to use an immutable collection type in the first place… |
| 190 | +Another example is the `updated` operation, which is available on mutable `Map` but returns a new |
| 191 | +collection. |
| 192 | + |
| 193 | +We think that by deprecating these insertion and removal operations from generic collection |
| 194 | +types and by having distinct operations between the `mutable` and `immutable` branches we make |
| 195 | +the situation clearer. |
| 196 | + |
| 197 | +## Summary |
| 198 | + |
| 199 | +TBD |
0 commit comments