Skip to content

Commit fb8b585

Browse files
authored
Blog about tuples in scala 3 (#1204)
* Write blog post about new tuples in Scala 3 * Rewrite post to show power of tuples with an application * Address spelling and formatting remarks * Separate row and field encoders and handle emptytuple * Correct two sentences * Clarify intro * Adapt the intro to show the iterative and teaching approach of the blogpost
1 parent b11ecea commit fb8b585

File tree

1 file changed

+342
-0
lines changed

1 file changed

+342
-0
lines changed
Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
---
2+
layout: blog-detail
3+
post-type: blog
4+
by: Vincenzo Bazzucchi, Scala Center
5+
title: Tuples bring generic programming to Scala 3
6+
---
7+
8+
Tuples allow developers to create new types by associating existing types. In
9+
doing so, they are very similar to case classes but unlike them they retain
10+
only the structure of the types (e.g., which type is in which order) rather
11+
than giving each element a name. A tuple can also be seen as a *sequence* and
12+
therefore a collection of objects, however, whereas *homogeneous* collections
13+
such as `List[A]` or `Set[A]` accumulate elements retaining only one type
14+
(`A`), tuples are capable of storing data of different types while preserving
15+
the type of each entry.
16+
17+
In Scala 3, tuples gain power thanks to new operations, additional type safety
18+
and fewer restrictions, pointing in the direction of a construct called
19+
**Heterogeneous Lists** (HLists), one of the core data structures in generic
20+
programming.
21+
22+
In this post I will take you on a tour of the new Tuple API before looking at
23+
how a new language feature, dependent match types, allows to implement such
24+
API. I hope that through the two proposed examples, you will develop an
25+
intuition about the usage and power of a few new exciting features of Scala 3.
26+
27+
# Why generic programming ?
28+
29+
HLists and case classes can both be used to define products of types. However
30+
HLists do not require the developer to declare class or field names. This
31+
makes them more convenient in some scenarios, for example in return types. If
32+
we consider `List`, you can see that `def splitAt(n: Int)` produces a
33+
`(List[A], List[A])` and not a `case class SplitResult(left: List[A], right:
34+
List[A])` because of the cognitive cost of introducing new names
35+
(`SplitResult`, `left` and `right`).
36+
37+
Moreover, there are infinitely many case classes which share a common
38+
structure, which means that they have the same number and type of fields. We
39+
might want to apply the same transformations to them, so that such
40+
transformations can be defined only once. [The Type Astronaut's Guide to
41+
Shapeless](https://underscore.io/books/shapeless-guide/) proposes the following
42+
simple example:
43+
44+
```scala
45+
case class Employee(name: String, number: Int, manager: Boolean)
46+
case class IceCream(name: String, numCherries: Int, inCone: Boolean)
47+
```
48+
49+
If you are implementing an operation such as serializing instances of these
50+
types to CSV or JSON, you will realize that the logic is exactly the same and
51+
you will want to implement it only once. This is equivalent to defining the
52+
serialization algorithm for the `(String, Int, Boolean)` HList, assuming that
53+
you can map both case classes to it.
54+
55+
# A simple CSV encoder
56+
57+
Let's consider a simple CSV encoder for our `Employee` and `IceCream` case classes.
58+
Each record, or line, of a CSV file is a sequence of values separated by a
59+
delimiter, usually a comma or a semicolon. In Scala we can represent each value
60+
as text, using the `String` type, and thus each record can be a list of values,
61+
with type `List[String]`. Therefore, in order to encode case classes to CSV, we
62+
need to extract each field of the case class and to turn it into a `String`,
63+
and then collect all the fields in a list. In this setting, `Employee` and
64+
`IceCream` could be treated in the same way, because they can be simply be seen
65+
as a `(String, Int, Boolean)` which needs to be transformed into a
66+
`List[String]`. We will first see how to handle this simple scenario before
67+
briefly looking at how to obtain a tuple from a case class.
68+
69+
Assuming that we know how to transform each element of a tuple into a
70+
`List[String]`, can we transform any tuple into a `List[String]` ?
71+
72+
The answer is yes, and this is possible because Scala 3 introduces types `*:`,
73+
`EmptyTuple` and `NonEmptyTuple` but also methods `head` and `tail` which allow
74+
us to define recursive operations on tuples.
75+
76+
## Set up
77+
78+
Let's define the `RowEncoder[A]` type-class, which describes the capability of
79+
values of type `A` to be converted into `Row`. To encode a type to `Row`, we
80+
first need to convert each field of the type into a `String`: this capability
81+
is defined by the `FieldEncoder` type-class.
82+
83+
```scala
84+
trait FieldEncoder[A]:
85+
def encodeField(a: A): String
86+
87+
type Row = List[String]
88+
89+
trait RowEncoder[A]:
90+
def encodeRow(a: A): Row
91+
```
92+
93+
We can then add some instances for our base types:
94+
95+
```scala
96+
object BaseEncoders:
97+
given FieldEncoder[Int] with
98+
def encodeField(x: Int) = x.toString
99+
100+
given FieldEncoder[Boolean] with
101+
def encodeField(x: Boolean) = if x then "true" else "false"
102+
103+
given FieldEncoder[String] with
104+
def encodeField(x: String) = x // Ideally, we should also escape commas and double quotes
105+
end BaseEncoders
106+
```
107+
108+
## Recursion!
109+
110+
Now that all these tools are in place, let's focus on the hard part:
111+
implementing the transformation of a tuple with an arbitrary number of elements
112+
into a `Row`. Similarly to how you may be used to recurse on lists, on
113+
tuples we need to manage two scenarios: the base case (`EmptyTuple`) and the
114+
inductive case (`NonEmptyTuple`).
115+
116+
In the following snippet, I prefer to use the [context bound
117+
syntax](https://dotty.epfl.ch/docs/reference/contextual/context-bounds.html)
118+
even if I need a handle for the instances because it concentrates all the
119+
constraints in the type parameter list (and I do not need to come up with any
120+
name). After this personal preference disclaimer, let's see the two cases:
121+
122+
```scala
123+
object TupleEncoders:
124+
// Base case
125+
given RowEncoder[EmptyTuple] with
126+
def encodeRow(empty: EmptyTuple) =
127+
List.empty
128+
129+
// Inductive case
130+
given [H: FieldEncoder, T <: Tuple: RowEncoder]: RowEncoder[H *: T] with
131+
def encodeRow(tuple: H *: T) =
132+
summon[FieldEncoder[H]].encodeField(tuple.head) :: summon[RowEncoder[T]].encodeRow(tuple.tail)
133+
end TupleEncoders
134+
```
135+
136+
If the tuple is empty, we produce an empty list. To encode a non-empty tuple we
137+
invoke the encoder for the first element and we prepend the result to the `Row`
138+
created by the encoder of the tail of the tuple.
139+
140+
We can create an entrypoint function and test this implementation:
141+
```scala
142+
def tupleToCsv[X <: Tuple : RowEncoder](tuple: X): List[String] =
143+
summon[RowEncoder[X]].encodeRow(tuple)
144+
145+
tupleToCsv(("Bob", 42, false)) // List("Bob", 42, false)
146+
```
147+
148+
## How to obtain a tuple from a case class ?
149+
150+
Scala 3 introduces the
151+
[`Mirror`](https://dotty.epfl.ch/docs/reference/contextual/derivation.html)
152+
type-class which provides type-level information about the components and
153+
labels of types. [A paragraph from that
154+
documentation](https://dotty.epfl.ch/docs/reference/contextual/derivation.html#types-supporting-derives-clauses)
155+
is particularly interesting for our use case:
156+
157+
> The compiler automatically generates instances of `Mirror` for `enum`s and
158+
> their cases, **case classes** and case objects, sealed classes or traits
159+
> having only case classes and case objects as children.
160+
161+
That's why we can obtain a tuple from a case class using:
162+
```scala
163+
val bob: Employee = Employee("Bob", 42, false)
164+
val bobTuple: (String, Int, Boolean) = Tuple.fromProductTyped(bob)
165+
```
166+
But that is also why we can revert the operation:
167+
```scala
168+
val bobAgain: Employee = summon[Mirror.Of[Employee]].fromProduct(bobTuple)
169+
```
170+
171+
# New tuples operations
172+
In the previous example, we saw that we can use `.head` and `.tail` on tuples,
173+
but Scala 3 introduces many other operations, here is a quick overview:
174+
175+
| Operation | Example | Result |
176+
|------------|-------------------------------------------------------------|------------------------------------------------------|
177+
| `size` | `(1, 2, 3).size` | `3` |
178+
| `head` | `(3 *: 4 *: 5 *: EmptyTuple).head` | `3` |
179+
| `tail` | `(3 *: 4 *: 5 *: EmptyTuple).tail` | `(4, 5)` |
180+
| `*:` | `3 *: 4 *: 5 *: 6 *: EmptyTuple` | `(3, 4, 5, 6)` |
181+
| `++` | `(1, 2, 3) ++ (4, 5, 6)` | `(1, 2, 3, 4, 5, 6)` |
182+
| `drop` | `(1, 2, 3).drop(2)` | `(3)` |
183+
| `take` | `(1, 2, 3).take(2)` | `(1, 2)` |
184+
| `apply` | `(1, 2, 3)(2)` | `3` |
185+
| `splitAt` | `(1, 2, 3, 4, 5).splitAt(2)` | `((1, 2), (3, 4, 5))` |
186+
| `zip` | `(1, 2, 3).zip(('a', 'b'))` | `((1 'a'), (2, 'b'))` |
187+
| `toList` | `(1, 'a', 2).toList` | `List(1, 'a', 2) : List[Int | Char]` |
188+
| `toArray` | `(1, 'a', 2).toArray` | `Array(1, '1', 2) : Array[AnyRef]` |
189+
| `toIArray` | `(1, 'a', 2).toIArray` | `IArray(1, '1', 2) : IArray[AnyRef]` |
190+
| `map` | `(1, 'a').map[[X] =>> Option[X]]([T] => (t: T) => Some(t))` | `(Some(1), Some('a')) : (Option[Int], Option[Char])` |
191+
192+
193+
# Under the hood: Scala 3 introduces match types
194+
195+
All the operations in the above table use very precise types. For example, the
196+
compiler ensures that `3 *: (4, 5, 6)` is a `(Int, Int, Int, Int)` or that the
197+
index provided to `apply` is strictly inferior to the size of the tuple.
198+
199+
How is this possible?
200+
201+
The core new feature that allows such a flexible implementation of tuples are
202+
**match types**. I invite you to read more about them
203+
[here](http://dotty.epfl.ch/docs/reference/new-types/match-types.html).
204+
205+
Let's see how we can implement the `++` operator using this powerful construct.
206+
We will call our naive version `concat`.
207+
208+
## Defining tuples
209+
210+
First let's define our own tuple:
211+
212+
```scala
213+
enum Tup:
214+
case EmpT
215+
case TCons[H, T <: Tup](head: H, tail: T)
216+
```
217+
218+
That is a tuple is either empty, or an element `head` which precedes another
219+
tuple. Using this recursive definition we can create a tuple in the following
220+
way:
221+
222+
```scala
223+
import Tup._
224+
225+
val myTup = TCons(1, TCons(2, EmpT))
226+
```
227+
It is not very pretty, but it can be easily adapted to provide the same ease of
228+
use as the previous examples. To do so we can use another Scala 3 feature:
229+
[extension
230+
methods](http://dotty.epfl.ch/docs/reference/contextual/extension-methods.html)
231+
232+
```scala
233+
import Tup._
234+
235+
extension [A, T <: Tup] (a: A) def *: (t: T): TCons[A, T] =
236+
TCons(a, t)
237+
```
238+
So that we can write:
239+
240+
```scala
241+
1 *: "2" *: EmpT
242+
```
243+
244+
## Concatenating tuples
245+
246+
Now let's focus on `concat`, which could look like this:
247+
```scala
248+
import Tup._
249+
250+
def concat[L <: Tup, R <: Tup](left: L, right: R): Tup =
251+
left match
252+
case EmpT => right
253+
case TCons(head, tail) => TCons(head, concat(tail, right))
254+
```
255+
256+
Let's analyze the algorithm line by line: `L` and `R` are the type of the left
257+
and right tuple. We require them to be a subtype of `Tup` because we want to
258+
concatenate tuples. Then we proceed recursively by case: if the left tuple is
259+
empty, the result of the concatenation is just the right tuple. Otherwise the
260+
result is the current head followed by the result of concatenating the tail
261+
with the other tuple.
262+
263+
If we test the function, it seems to work:
264+
```scala
265+
val left = 1 *: 2 *: EmpT
266+
val right = 3 *: 4 *: EmpT
267+
268+
concat(left, right) // TCons(1,TCons(2,TCons(3, TCons(4,EmpT))))
269+
```
270+
271+
So everything seems good. However we can ask the compiler to verify that the
272+
function behaves as expected. For instance the following code type-checks:
273+
274+
```scala
275+
def concat[L <: Tup, R <: Tup](left: L, right: R): Tup = left
276+
```
277+
278+
More problematic is the fact that this signature prevents us from using a more
279+
specific type for our variables or methods:
280+
```scala
281+
// This does not compile
282+
val res: TCons[Int, TCons[Int, TCons[Int, TCons[Int, EmpT.type]]]] = concat(left, right)
283+
```
284+
285+
Because the returned type is just a tuple, we do not check anything else. This
286+
means that the function can return an arbitrary tuple, the compiler cannot
287+
check that returned value consists of the concatenation of the two tuples. In
288+
other words, we need a type to indicate that the return of this function is all
289+
the types of `left` followed by all the types of the elements of `right`.
290+
291+
Can we make it so that the compiler verifies that we are indeed returning a
292+
tuple consisting of the correct elements ?
293+
294+
In Scala 3 it is now possible, without requiring external libraries!
295+
296+
## A new type for the result of `concat`
297+
298+
We know that we need to focus on the return type. We can define it exactly as
299+
we have just described it. Let's call this type `Concat` to mirror the name of
300+
the function.
301+
302+
```scala
303+
type Concat[L <: Tup, R <: Tup] <: Tup = L match
304+
case EmpT.type => R
305+
case TCons[headType, tailType] => TCons[headType, Concat[tailType, R]]
306+
```
307+
308+
You can see that the implementation closely follows the one above for the
309+
method. The syntax can be read in the following way: the `Concat` type is a
310+
subtype of `Tup` and is obtained by combining types `L` and `R` which are both
311+
subtypes of `Tup`. To use it we need to massage a bit the method
312+
implementation and to change its return type:
313+
314+
```scala
315+
def concat[L <: Tup, R <: Tup](left: L, right: R): Concat[L, R] =
316+
left match
317+
case _: EmpT.type => right
318+
case cons: TCons[_, _] => TCons(cons.head, concat(cons.tail, right))
319+
```
320+
321+
We use here a combination of match types and a form of dependent types called
322+
*dependent match types* (docs
323+
[here](http://dotty.epfl.ch/docs/reference/new-types/match-types.html) and
324+
[here](http://dotty.epfl.ch/docs/reference/new-types/dependent-function-types.html)).
325+
There are some quirks to it as you might have noticed: using lower case types
326+
means using type variables and we cannot use pattern matching on the object. I
327+
think however that this implementation is extremely concise and readable.
328+
329+
Now the compiler will prevent us from making the above mistake:
330+
331+
```scala
332+
def wrong[L <: Tup, R <: Tup](left: L, right: R): Concat[L, R] = left
333+
// This does not compile!
334+
```
335+
336+
We can use an extension method to allow users to write `(1, 2) ++ (3, 4)`
337+
instead of `concat((1, 2), (3, 4))`, similarly to how we implemented `*:`.
338+
339+
We can use the same approach for other functions on tuples, I invite you to
340+
have a look at the [source code of the standard
341+
library](https://github.com/lampepfl/dotty/blob/87102a0b182849c71f61a6febe631f767bcc72c3/library/src-bootstrapped/scala/Tuple.scala)
342+
to see how the other operators are implemented.

0 commit comments

Comments
 (0)