|
| 1 | +--- |
| 2 | +layout: blog-detail |
| 3 | +post-type: blog |
| 4 | +by: Vincenzo Bazzucchi, Scala Center |
| 5 | +title: Tuples bring generic programming to Scala 3 |
| 6 | +--- |
| 7 | + |
| 8 | +Tuples allow developers to create new types by associating existing types. In |
| 9 | +doing so, they are very similar to case classes but unlike them they retain |
| 10 | +only the structure of the types (e.g., which type is in which order) rather |
| 11 | +than giving each element a name. A tuple can also be seen as a *sequence* and |
| 12 | +therefore a collection of objects, however, whereas *homogeneous* collections |
| 13 | +such as `List[A]` or `Set[A]` accumulate elements retaining only one type |
| 14 | +(`A`), tuples are capable of storing data of different types while preserving |
| 15 | +the type of each entry. |
| 16 | + |
| 17 | +In Scala 3, tuples gain power thanks to new operations, additional type safety |
| 18 | +and fewer restrictions, pointing in the direction of a construct called |
| 19 | +**Heterogeneous Lists** (HLists), one of the core data structures in generic |
| 20 | +programming. |
| 21 | + |
| 22 | +In this post I will take you on a tour of the new Tuple API before looking at |
| 23 | +how a new language feature, dependent match types, allows to implement such |
| 24 | +API. I hope that through the two proposed examples, you will develop an |
| 25 | +intuition about the usage and power of a few new exciting features of Scala 3. |
| 26 | + |
| 27 | +# Why generic programming ? |
| 28 | + |
| 29 | +HLists and case classes can both be used to define products of types. However |
| 30 | +HLists do not require the developer to declare class or field names. This |
| 31 | +makes them more convenient in some scenarios, for example in return types. If |
| 32 | +we consider `List`, you can see that `def splitAt(n: Int)` produces a |
| 33 | +`(List[A], List[A])` and not a `case class SplitResult(left: List[A], right: |
| 34 | +List[A])` because of the cognitive cost of introducing new names |
| 35 | +(`SplitResult`, `left` and `right`). |
| 36 | + |
| 37 | +Moreover, there are infinitely many case classes which share a common |
| 38 | +structure, which means that they have the same number and type of fields. We |
| 39 | +might want to apply the same transformations to them, so that such |
| 40 | +transformations can be defined only once. [The Type Astronaut's Guide to |
| 41 | +Shapeless](https://underscore.io/books/shapeless-guide/) proposes the following |
| 42 | +simple example: |
| 43 | + |
| 44 | +```scala |
| 45 | +case class Employee(name: String, number: Int, manager: Boolean) |
| 46 | +case class IceCream(name: String, numCherries: Int, inCone: Boolean) |
| 47 | +``` |
| 48 | + |
| 49 | +If you are implementing an operation such as serializing instances of these |
| 50 | +types to CSV or JSON, you will realize that the logic is exactly the same and |
| 51 | +you will want to implement it only once. This is equivalent to defining the |
| 52 | +serialization algorithm for the `(String, Int, Boolean)` HList, assuming that |
| 53 | +you can map both case classes to it. |
| 54 | + |
| 55 | +# A simple CSV encoder |
| 56 | + |
| 57 | +Let's consider a simple CSV encoder for our `Employee` and `IceCream` case classes. |
| 58 | +Each record, or line, of a CSV file is a sequence of values separated by a |
| 59 | +delimiter, usually a comma or a semicolon. In Scala we can represent each value |
| 60 | +as text, using the `String` type, and thus each record can be a list of values, |
| 61 | +with type `List[String]`. Therefore, in order to encode case classes to CSV, we |
| 62 | +need to extract each field of the case class and to turn it into a `String`, |
| 63 | +and then collect all the fields in a list. In this setting, `Employee` and |
| 64 | +`IceCream` could be treated in the same way, because they can be simply be seen |
| 65 | +as a `(String, Int, Boolean)` which needs to be transformed into a |
| 66 | +`List[String]`. We will first see how to handle this simple scenario before |
| 67 | +briefly looking at how to obtain a tuple from a case class. |
| 68 | + |
| 69 | +Assuming that we know how to transform each element of a tuple into a |
| 70 | +`List[String]`, can we transform any tuple into a `List[String]` ? |
| 71 | + |
| 72 | +The answer is yes, and this is possible because Scala 3 introduces types `*:`, |
| 73 | +`EmptyTuple` and `NonEmptyTuple` but also methods `head` and `tail` which allow |
| 74 | +us to define recursive operations on tuples. |
| 75 | + |
| 76 | +## Set up |
| 77 | + |
| 78 | +Let's define the `RowEncoder[A]` type-class, which describes the capability of |
| 79 | +values of type `A` to be converted into `Row`. To encode a type to `Row`, we |
| 80 | +first need to convert each field of the type into a `String`: this capability |
| 81 | +is defined by the `FieldEncoder` type-class. |
| 82 | + |
| 83 | +```scala |
| 84 | +trait FieldEncoder[A]: |
| 85 | + def encodeField(a: A): String |
| 86 | + |
| 87 | +type Row = List[String] |
| 88 | + |
| 89 | +trait RowEncoder[A]: |
| 90 | + def encodeRow(a: A): Row |
| 91 | +``` |
| 92 | + |
| 93 | +We can then add some instances for our base types: |
| 94 | + |
| 95 | +```scala |
| 96 | +object BaseEncoders: |
| 97 | + given FieldEncoder[Int] with |
| 98 | + def encodeField(x: Int) = x.toString |
| 99 | + |
| 100 | + given FieldEncoder[Boolean] with |
| 101 | + def encodeField(x: Boolean) = if x then "true" else "false" |
| 102 | + |
| 103 | + given FieldEncoder[String] with |
| 104 | + def encodeField(x: String) = x // Ideally, we should also escape commas and double quotes |
| 105 | +end BaseEncoders |
| 106 | +``` |
| 107 | + |
| 108 | +## Recursion! |
| 109 | + |
| 110 | +Now that all these tools are in place, let's focus on the hard part: |
| 111 | +implementing the transformation of a tuple with an arbitrary number of elements |
| 112 | +into a `Row`. Similarly to how you may be used to recurse on lists, on |
| 113 | +tuples we need to manage two scenarios: the base case (`EmptyTuple`) and the |
| 114 | +inductive case (`NonEmptyTuple`). |
| 115 | + |
| 116 | +In the following snippet, I prefer to use the [context bound |
| 117 | +syntax](https://dotty.epfl.ch/docs/reference/contextual/context-bounds.html) |
| 118 | +even if I need a handle for the instances because it concentrates all the |
| 119 | +constraints in the type parameter list (and I do not need to come up with any |
| 120 | +name). After this personal preference disclaimer, let's see the two cases: |
| 121 | + |
| 122 | +```scala |
| 123 | +object TupleEncoders: |
| 124 | + // Base case |
| 125 | + given RowEncoder[EmptyTuple] with |
| 126 | + def encodeRow(empty: EmptyTuple) = |
| 127 | + List.empty |
| 128 | + |
| 129 | + // Inductive case |
| 130 | + given [H: FieldEncoder, T <: Tuple: RowEncoder]: RowEncoder[H *: T] with |
| 131 | + def encodeRow(tuple: H *: T) = |
| 132 | + summon[FieldEncoder[H]].encodeField(tuple.head) :: summon[RowEncoder[T]].encodeRow(tuple.tail) |
| 133 | +end TupleEncoders |
| 134 | +``` |
| 135 | + |
| 136 | +If the tuple is empty, we produce an empty list. To encode a non-empty tuple we |
| 137 | +invoke the encoder for the first element and we prepend the result to the `Row` |
| 138 | +created by the encoder of the tail of the tuple. |
| 139 | + |
| 140 | +We can create an entrypoint function and test this implementation: |
| 141 | +```scala |
| 142 | +def tupleToCsv[X <: Tuple : RowEncoder](tuple: X): List[String] = |
| 143 | + summon[RowEncoder[X]].encodeRow(tuple) |
| 144 | + |
| 145 | +tupleToCsv(("Bob", 42, false)) // List("Bob", 42, false) |
| 146 | +``` |
| 147 | + |
| 148 | +## How to obtain a tuple from a case class ? |
| 149 | + |
| 150 | +Scala 3 introduces the |
| 151 | +[`Mirror`](https://dotty.epfl.ch/docs/reference/contextual/derivation.html) |
| 152 | +type-class which provides type-level information about the components and |
| 153 | +labels of types. [A paragraph from that |
| 154 | +documentation](https://dotty.epfl.ch/docs/reference/contextual/derivation.html#types-supporting-derives-clauses) |
| 155 | +is particularly interesting for our use case: |
| 156 | + |
| 157 | +> The compiler automatically generates instances of `Mirror` for `enum`s and |
| 158 | +> their cases, **case classes** and case objects, sealed classes or traits |
| 159 | +> having only case classes and case objects as children. |
| 160 | +
|
| 161 | +That's why we can obtain a tuple from a case class using: |
| 162 | +```scala |
| 163 | +val bob: Employee = Employee("Bob", 42, false) |
| 164 | +val bobTuple: (String, Int, Boolean) = Tuple.fromProductTyped(bob) |
| 165 | +``` |
| 166 | +But that is also why we can revert the operation: |
| 167 | +```scala |
| 168 | +val bobAgain: Employee = summon[Mirror.Of[Employee]].fromProduct(bobTuple) |
| 169 | +``` |
| 170 | + |
| 171 | +# New tuples operations |
| 172 | +In the previous example, we saw that we can use `.head` and `.tail` on tuples, |
| 173 | +but Scala 3 introduces many other operations, here is a quick overview: |
| 174 | + |
| 175 | +| Operation | Example | Result | |
| 176 | +|------------|-------------------------------------------------------------|------------------------------------------------------| |
| 177 | +| `size` | `(1, 2, 3).size` | `3` | |
| 178 | +| `head` | `(3 *: 4 *: 5 *: EmptyTuple).head` | `3` | |
| 179 | +| `tail` | `(3 *: 4 *: 5 *: EmptyTuple).tail` | `(4, 5)` | |
| 180 | +| `*:` | `3 *: 4 *: 5 *: 6 *: EmptyTuple` | `(3, 4, 5, 6)` | |
| 181 | +| `++` | `(1, 2, 3) ++ (4, 5, 6)` | `(1, 2, 3, 4, 5, 6)` | |
| 182 | +| `drop` | `(1, 2, 3).drop(2)` | `(3)` | |
| 183 | +| `take` | `(1, 2, 3).take(2)` | `(1, 2)` | |
| 184 | +| `apply` | `(1, 2, 3)(2)` | `3` | |
| 185 | +| `splitAt` | `(1, 2, 3, 4, 5).splitAt(2)` | `((1, 2), (3, 4, 5))` | |
| 186 | +| `zip` | `(1, 2, 3).zip(('a', 'b'))` | `((1 'a'), (2, 'b'))` | |
| 187 | +| `toList` | `(1, 'a', 2).toList` | `List(1, 'a', 2) : List[Int | Char]` | |
| 188 | +| `toArray` | `(1, 'a', 2).toArray` | `Array(1, '1', 2) : Array[AnyRef]` | |
| 189 | +| `toIArray` | `(1, 'a', 2).toIArray` | `IArray(1, '1', 2) : IArray[AnyRef]` | |
| 190 | +| `map` | `(1, 'a').map[[X] =>> Option[X]]([T] => (t: T) => Some(t))` | `(Some(1), Some('a')) : (Option[Int], Option[Char])` | |
| 191 | + |
| 192 | + |
| 193 | +# Under the hood: Scala 3 introduces match types |
| 194 | + |
| 195 | +All the operations in the above table use very precise types. For example, the |
| 196 | +compiler ensures that `3 *: (4, 5, 6)` is a `(Int, Int, Int, Int)` or that the |
| 197 | +index provided to `apply` is strictly inferior to the size of the tuple. |
| 198 | + |
| 199 | +How is this possible? |
| 200 | + |
| 201 | +The core new feature that allows such a flexible implementation of tuples are |
| 202 | +**match types**. I invite you to read more about them |
| 203 | +[here](http://dotty.epfl.ch/docs/reference/new-types/match-types.html). |
| 204 | + |
| 205 | +Let's see how we can implement the `++` operator using this powerful construct. |
| 206 | +We will call our naive version `concat`. |
| 207 | + |
| 208 | +## Defining tuples |
| 209 | + |
| 210 | +First let's define our own tuple: |
| 211 | + |
| 212 | +```scala |
| 213 | +enum Tup: |
| 214 | + case EmpT |
| 215 | + case TCons[H, T <: Tup](head: H, tail: T) |
| 216 | +``` |
| 217 | + |
| 218 | +That is a tuple is either empty, or an element `head` which precedes another |
| 219 | +tuple. Using this recursive definition we can create a tuple in the following |
| 220 | +way: |
| 221 | + |
| 222 | +```scala |
| 223 | +import Tup._ |
| 224 | + |
| 225 | +val myTup = TCons(1, TCons(2, EmpT)) |
| 226 | +``` |
| 227 | +It is not very pretty, but it can be easily adapted to provide the same ease of |
| 228 | +use as the previous examples. To do so we can use another Scala 3 feature: |
| 229 | +[extension |
| 230 | +methods](http://dotty.epfl.ch/docs/reference/contextual/extension-methods.html) |
| 231 | + |
| 232 | +```scala |
| 233 | +import Tup._ |
| 234 | + |
| 235 | +extension [A, T <: Tup] (a: A) def *: (t: T): TCons[A, T] = |
| 236 | + TCons(a, t) |
| 237 | +``` |
| 238 | +So that we can write: |
| 239 | + |
| 240 | +```scala |
| 241 | +1 *: "2" *: EmpT |
| 242 | +``` |
| 243 | + |
| 244 | +## Concatenating tuples |
| 245 | + |
| 246 | +Now let's focus on `concat`, which could look like this: |
| 247 | +```scala |
| 248 | +import Tup._ |
| 249 | + |
| 250 | +def concat[L <: Tup, R <: Tup](left: L, right: R): Tup = |
| 251 | + left match |
| 252 | + case EmpT => right |
| 253 | + case TCons(head, tail) => TCons(head, concat(tail, right)) |
| 254 | +``` |
| 255 | + |
| 256 | +Let's analyze the algorithm line by line: `L` and `R` are the type of the left |
| 257 | +and right tuple. We require them to be a subtype of `Tup` because we want to |
| 258 | +concatenate tuples. Then we proceed recursively by case: if the left tuple is |
| 259 | +empty, the result of the concatenation is just the right tuple. Otherwise the |
| 260 | +result is the current head followed by the result of concatenating the tail |
| 261 | +with the other tuple. |
| 262 | + |
| 263 | +If we test the function, it seems to work: |
| 264 | +```scala |
| 265 | +val left = 1 *: 2 *: EmpT |
| 266 | +val right = 3 *: 4 *: EmpT |
| 267 | + |
| 268 | +concat(left, right) // TCons(1,TCons(2,TCons(3, TCons(4,EmpT)))) |
| 269 | +``` |
| 270 | + |
| 271 | +So everything seems good. However we can ask the compiler to verify that the |
| 272 | +function behaves as expected. For instance the following code type-checks: |
| 273 | + |
| 274 | +```scala |
| 275 | +def concat[L <: Tup, R <: Tup](left: L, right: R): Tup = left |
| 276 | +``` |
| 277 | + |
| 278 | +More problematic is the fact that this signature prevents us from using a more |
| 279 | +specific type for our variables or methods: |
| 280 | +```scala |
| 281 | +// This does not compile |
| 282 | +val res: TCons[Int, TCons[Int, TCons[Int, TCons[Int, EmpT.type]]]] = concat(left, right) |
| 283 | +``` |
| 284 | + |
| 285 | +Because the returned type is just a tuple, we do not check anything else. This |
| 286 | +means that the function can return an arbitrary tuple, the compiler cannot |
| 287 | +check that returned value consists of the concatenation of the two tuples. In |
| 288 | +other words, we need a type to indicate that the return of this function is all |
| 289 | +the types of `left` followed by all the types of the elements of `right`. |
| 290 | + |
| 291 | +Can we make it so that the compiler verifies that we are indeed returning a |
| 292 | +tuple consisting of the correct elements ? |
| 293 | + |
| 294 | +In Scala 3 it is now possible, without requiring external libraries! |
| 295 | + |
| 296 | +## A new type for the result of `concat` |
| 297 | + |
| 298 | +We know that we need to focus on the return type. We can define it exactly as |
| 299 | +we have just described it. Let's call this type `Concat` to mirror the name of |
| 300 | +the function. |
| 301 | + |
| 302 | +```scala |
| 303 | +type Concat[L <: Tup, R <: Tup] <: Tup = L match |
| 304 | + case EmpT.type => R |
| 305 | + case TCons[headType, tailType] => TCons[headType, Concat[tailType, R]] |
| 306 | +``` |
| 307 | + |
| 308 | +You can see that the implementation closely follows the one above for the |
| 309 | +method. The syntax can be read in the following way: the `Concat` type is a |
| 310 | +subtype of `Tup` and is obtained by combining types `L` and `R` which are both |
| 311 | +subtypes of `Tup`. To use it we need to massage a bit the method |
| 312 | +implementation and to change its return type: |
| 313 | + |
| 314 | +```scala |
| 315 | +def concat[L <: Tup, R <: Tup](left: L, right: R): Concat[L, R] = |
| 316 | + left match |
| 317 | + case _: EmpT.type => right |
| 318 | + case cons: TCons[_, _] => TCons(cons.head, concat(cons.tail, right)) |
| 319 | +``` |
| 320 | + |
| 321 | +We use here a combination of match types and a form of dependent types called |
| 322 | +*dependent match types* (docs |
| 323 | +[here](http://dotty.epfl.ch/docs/reference/new-types/match-types.html) and |
| 324 | +[here](http://dotty.epfl.ch/docs/reference/new-types/dependent-function-types.html)). |
| 325 | +There are some quirks to it as you might have noticed: using lower case types |
| 326 | +means using type variables and we cannot use pattern matching on the object. I |
| 327 | +think however that this implementation is extremely concise and readable. |
| 328 | + |
| 329 | +Now the compiler will prevent us from making the above mistake: |
| 330 | + |
| 331 | +```scala |
| 332 | +def wrong[L <: Tup, R <: Tup](left: L, right: R): Concat[L, R] = left |
| 333 | +// This does not compile! |
| 334 | +``` |
| 335 | + |
| 336 | +We can use an extension method to allow users to write `(1, 2) ++ (3, 4)` |
| 337 | +instead of `concat((1, 2), (3, 4))`, similarly to how we implemented `*:`. |
| 338 | + |
| 339 | +We can use the same approach for other functions on tuples, I invite you to |
| 340 | +have a look at the [source code of the standard |
| 341 | +library](https://github.com/lampepfl/dotty/blob/87102a0b182849c71f61a6febe631f767bcc72c3/library/src-bootstrapped/scala/Tuple.scala) |
| 342 | +to see how the other operators are implemented. |
0 commit comments