diff --git a/_config.yml b/_config.yml index d8037e11d1..32288ccccc 100644 --- a/_config.yml +++ b/_config.yml @@ -16,6 +16,7 @@ keywords: - Guide scala-version: 2.12.8 +scala-213-version: 2.13.0-M5 collections: style: diff --git a/_data/overviews.yml b/_data/overviews.yml index b39a8702fc..8318194013 100644 --- a/_data/overviews.yml +++ b/_data/overviews.yml @@ -49,6 +49,39 @@ url: "core/architecture-of-scala-collections.html" by: Martin Odersky and Lex Spoon description: "These pages describe the architecture of the Scala collections framework in detail. Compared to the Collections API you will find out more about the internal workings of the framework. You will also learn how this architecture helps you define your own collections in a few lines of code, while reusing the overwhelming part of collection functionality from the framework." + - title: Scala 2.13’s Collections + by: Martin Odersky and Julien Richard-Foy + icon: sitemap + url: "collections-2.13/introduction.html" + description: "Scala's Collection Library." + subdocs: + - title: Introduction + url: "collections-2.13/introduction.html" + - title: Mutable and Immutable Collections + url: "collections-2.13/overview.html" + - title: Trait Iterable + url: "collections-2.13/trait-iterable.html" + - title: The sequence traits Seq, IndexedSeq, and LinearSeq + - title: Concrete Immutable Collection Classes + url: "collections-2.13/concrete-immutable-collection-classes.html" + - title: Concrete Mutable Collection Classes + url: "collections-2.13/concrete-mutable-collection-classes.html" + - title: Arrays + url: "collections-2.13/arrays.html" + - title: Strings + url: "collections-2.13/strings.html" + - title: Performance Characteristics + url: "collections-2.13/performance-characteristics.html" + - title: Equality + url: "collections-2.13/equality.html" + - title: Views + url: "collections-2.13/views.html" + - title: Iterators + url: "collections-2.13/iterators.html" + - title: Creating Collections From Scratch + url: "collections-2.13/creating-collections-from-scratch.html" + - title: Conversions Between Java and Scala Collections + url: "collections-2.13/conversions-between-java-and-scala-collections.html" - title: The Architecture of Scala 2.13’s Collections icon: sitemap url: "core/architecture-of-scala-213-collections.html" diff --git a/_overviews/collections-2.13/arrays.md b/_overviews/collections-2.13/arrays.md new file mode 100644 index 0000000000..461d1edc1d --- /dev/null +++ b/_overviews/collections-2.13/arrays.md @@ -0,0 +1,120 @@ +--- +layout: multipage-overview +title: Arrays + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 10 + +permalink: /overviews/collections-2.13/:title.html +--- + +[Array](http://www.scala-lang.org/api/{{ site.scala-version }}/scala/Array.html) is a special kind of collection in Scala. On the one hand, Scala arrays correspond one-to-one to Java arrays. That is, a Scala array `Array[Int]` is represented as a Java `int[]`, an `Array[Double]` is represented as a Java `double[]` and a `Array[String]` is represented as a Java `String[]`. But at the same time, Scala arrays offer much more than their Java analogues. First, Scala arrays can be _generic_. That is, you can have an `Array[T]`, where `T` is a type parameter or abstract type. Second, Scala arrays are compatible with Scala sequences - you can pass an `Array[T]` where a `Seq[T]` is required. Finally, Scala arrays also support all sequence operations. Here's an example of this in action: + + scala> val a1 = Array(1, 2, 3) + a1: Array[Int] = Array(1, 2, 3) + scala> val a2 = a1 map (_ * 3) + a2: Array[Int] = Array(3, 6, 9) + scala> val a3 = a2 filter (_ % 2 != 0) + a3: Array[Int] = Array(3, 9) + scala> a3.reverse + res0: Array[Int] = Array(9, 3) + +Given that Scala arrays are represented just like Java arrays, how can these additional features be supported in Scala? The Scala array implementation makes systematic use of implicit conversions. In Scala, an array does not pretend to _be_ a sequence. It can't really be that because the data type representation of a native array is not a subtype of `Seq`. Instead there is an implicit "wrapping" conversion between arrays and instances of class `scala.collection.mutable.ArraySeq`, which is a subclass of `Seq`. Here you see it in action: + + scala> val seq: collection.Seq[Int] = a1 + seq: scala.collection.Seq[Int] = ArraySeq(1, 2, 3) + scala> val a4: Array[Int] = seq.toArray + a4: Array[Int] = Array(1, 2, 3) + scala> a1 eq a4 + res1: Boolean = false + +The interaction above demonstrates that arrays are compatible with sequences, because there's an implicit conversion from arrays to `ArraySeq`s. To go the other way, from an `ArraySeq` to an `Array`, you can use the `toArray` method defined in `Iterable`. The last REPL line above shows that wrapping and then unwrapping with `toArray` produces a copy of the original array. + +There is yet another implicit conversion that gets applied to arrays. This conversion simply "adds" all sequence methods to arrays but does not turn the array itself into a sequence. "Adding" means that the array is wrapped in another object of type `ArrayOps` which supports all sequence methods. Typically, this `ArrayOps` object is short-lived; it will usually be inaccessible after the call to the sequence method and its storage can be recycled. Modern VMs often avoid creating this object entirely. + +The difference between the two implicit conversions on arrays is shown in the next REPL dialogue: + + scala> val seq: collection.Seq[Int] = a1 + seq: scala.collection.Seq[Int] = ArraySeq(1, 2, 3) + scala> seq.reverse + res2: scala.collection.Seq[Int] = ArraySeq(3, 2, 1) + scala> val ops: collection.ArrayOps[Int] = a1 + ops: scala.collection.ArrayOps[Int] = scala.collection.ArrayOps@2d7df55 + scala> ops.reverse + res3: Array[Int] = Array(3, 2, 1) + +You see that calling reverse on `seq`, which is an `ArraySeq`, will give again a `ArraySeq`. That's logical, because arrayseqs are `Seqs`, and calling reverse on any `Seq` will give again a `Seq`. On the other hand, calling reverse on the ops value of class `ArrayOps` will give an `Array`, not a `Seq`. + +The `ArrayOps` example above was quite artificial, intended only to show the difference to `ArraySeq`. Normally, you'd never define a value of class `ArrayOps`. You'd just call a `Seq` method on an array: + + scala> a1.reverse + res4: Array[Int] = Array(3, 2, 1) + +The `ArrayOps` object gets inserted automatically by the implicit conversion. So the line above is equivalent to + + scala> intArrayOps(a1).reverse + res5: Array[Int] = Array(3, 2, 1) + +where `intArrayOps` is the implicit conversion that was inserted previously. This raises the question how the compiler picked `intArrayOps` over the other implicit conversion to `ArraySeq` in the line above. After all, both conversions map an array to a type that supports a reverse method, which is what the input specified. The answer to that question is that the two implicit conversions are prioritized. The `ArrayOps` conversion has a higher priority than the `ArraySeq` conversion. The first is defined in the `Predef` object whereas the second is defined in a class `scala.LowPriorityImplicits`, which is inherited by `Predef`. Implicits in subclasses and subobjects take precedence over implicits in base classes. So if both conversions are applicable, the one in `Predef` is chosen. A very similar scheme works for strings. + +So now you know how arrays can be compatible with sequences and how they can support all sequence operations. What about genericity? In Java you cannot write a `T[]` where `T` is a type parameter. How then is Scala's `Array[T]` represented? In fact a generic array like `Array[T]` could be at run-time any of Java's eight primitive array types `byte[]`, `short[]`, `char[]`, `int[]`, `long[]`, `float[]`, `double[]`, `boolean[]`, or it could be an array of objects. The only common run-time type encompassing all of these types is `AnyRef` (or, equivalently `java.lang.Object`), so that's the type to which the Scala compiler maps `Array[T]`. At run-time, when an element of an array of type `Array[T]` is accessed or updated there is a sequence of type tests that determine the actual array type, followed by the correct array operation on the Java array. These type tests slow down array operations somewhat. You can expect accesses to generic arrays to be three to four times slower than accesses to primitive or object arrays. This means that if you need maximal performance, you should prefer concrete over generic arrays. Representing the generic array type is not enough, however, there must also be a way to create generic arrays. This is an even harder problem, which requires a little bit of help from you. To illustrate the problem, consider the following attempt to write a generic method that creates an array. + + // this is wrong! + def evenElems[T](xs: Vector[T]): Array[T] = { + val arr = new Array[T]((xs.length + 1) / 2) + for (i <- 0 until xs.length by 2) + arr(i / 2) = xs(i) + arr + } + +The `evenElems` method returns a new array that consist of all elements of the argument vector `xs` which are at even positions in the vector. The first line of the body of `evenElems` creates the result array, which has the same element type as the argument. So depending on the actual type parameter for `T`, this could be an `Array[Int]`, or an `Array[Boolean]`, or an array of some of the other primitive types in Java, or an array of some reference type. But these types have all different runtime representations, so how is the Scala runtime going to pick the correct one? In fact, it can't do that based on the information it is given, because the actual type that corresponds to the type parameter `T` is erased at runtime. That's why you will get the following error message if you compile the code above: + + error: cannot find class manifest for element type T + val arr = new Array[T]((arr.length + 1) / 2) + ^ + +What's required here is that you help the compiler out by providing some runtime hint what the actual type parameter of `evenElems` is. This runtime hint takes the form of a class manifest of type `scala.reflect.ClassTag`. A class manifest is a type descriptor object which describes what the top-level class of a type is. Alternatively to class manifests there are also full manifests of type `scala.reflect.Manifest`, which describe all aspects of a type. But for array creation, only class manifests are needed. + +The Scala compiler will construct class manifests automatically if you instruct it to do so. "Instructing" means that you demand a class manifest as an implicit parameter, like this: + + def evenElems[T](xs: Vector[T])(implicit m: ClassTag[T]): Array[T] = ... + +Using an alternative and shorter syntax, you can also demand that the type comes with a class manifest by using a context bound. This means following the type with a colon and the class name `ClassTag`, like this: + + import scala.reflect.ClassTag + // this works + def evenElems[T: ClassTag](xs: Vector[T]): Array[T] = { + val arr = new Array[T]((xs.length + 1) / 2) + for (i <- 0 until xs.length by 2) + arr(i / 2) = xs(i) + arr + } + +The two revised versions of `evenElems` mean exactly the same. What happens in either case is that when the `Array[T]` is constructed, the compiler will look for a class manifest for the type parameter T, that is, it will look for an implicit value of type `ClassTag[T]`. If such a value is found, the manifest is used to construct the right kind of array. Otherwise, you'll see an error message like the one above. + +Here is some REPL interaction that uses the `evenElems` method. + + scala> evenElems(Vector(1, 2, 3, 4, 5)) + res6: Array[Int] = Array(1, 3, 5) + scala> evenElems(Vector("this", "is", "a", "test", "run")) + res7: Array[java.lang.String] = Array(this, a, run) + +In both cases, the Scala compiler automatically constructed a class manifest for the element type (first, `Int`, then `String`) and passed it to the implicit parameter of the `evenElems` method. The compiler can do that for all concrete types, but not if the argument is itself another type parameter without its class manifest. For instance, the following fails: + + scala> def wrap[U](xs: Vector[U]) = evenElems(xs) + :6: error: No ClassTag available for U. + def wrap[U](xs: Vector[U]) = evenElems(xs) + ^ + +What happened here is that the `evenElems` demands a class manifest for the type parameter `U`, but none was found. The solution in this case is, of course, to demand another implicit class manifest for `U`. So the following works: + + scala> def wrap[U: ClassTag](xs: Vector[U]) = evenElems(xs) + wrap: [U](xs: Vector[U])(implicit evidence$1: scala.reflect.ClassTag[U])Array[U] + +This example also shows that the context bound in the definition of `U` is just a shorthand for an implicit parameter named here `evidence$1` of type `ClassTag[U]`. + +In summary, generic array creation demands class manifests. So whenever creating an array of a type parameter `T`, you also need to provide an implicit class manifest for `T`. The easiest way to do this is to declare the type parameter with a `ClassTag` context bound, as in `[T: ClassTag]`. diff --git a/_overviews/collections-2.13/concrete-immutable-collection-classes.md b/_overviews/collections-2.13/concrete-immutable-collection-classes.md new file mode 100644 index 0000000000..e3c8f68556 --- /dev/null +++ b/_overviews/collections-2.13/concrete-immutable-collection-classes.md @@ -0,0 +1,226 @@ +--- +layout: multipage-overview +title: Concrete Immutable Collection Classes + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 8 + +permalink: /overviews/collections-2.13/:title.html +--- + +Scala provides many concrete immutable collection classes for you to choose from. They differ in the traits they implement (maps, sets, sequences), whether they can be infinite, and the speed of various operations. Here are some of the most common immutable collection types used in Scala. + +## Lists + +A [List](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/List.html) is a finite immutable sequence. They provide constant-time access to their first element as well as the rest of the list, and they have a constant-time cons operation for adding a new element to the front of the list. Many other operations take linear time. + +## LazyLists + +A [LazyList](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/LazyList.html) is like a list except that its elements are computed lazily. Because of this, a lazy list can be infinitely long. Only those elements requested are computed. Otherwise, lazy lists have the same performance characteristics as lists. + +Whereas lists are constructed with the `::` operator, lazy lists are constructed with the similar-looking `#::`. Here is a simple example of a lazy list containing the integers 1, 2, and 3: + + scala> val lazyList = 1 #:: 2 #:: 3 #:: LazyList.empty + lazyList: scala.collection.immutable.LazyList[Int] = LazyList(?) + +The head of this lazy list is 1, and the tail of it has 2 and 3. None of the elements are printed here, though, because the list +hasn’t been computed yet! Lazy lists are specified to compute lazily, and the `toString` method of a lazy list is careful not to force any extra evaluation. + +Below is a more complex example. It computes a lazy list that contains a Fibonacci sequence starting with the given two numbers. A Fibonacci sequence is one where each element is the sum of the previous two elements in the series. + + + scala> def fibFrom(a: Int, b: Int): LazyList[Int] = a #:: fibFrom(b, a + b) + fibFrom: (a: Int,b: Int)LazyList[Int] + +This function is deceptively simple. The first element of the sequence is clearly `a`, and the rest of the sequence is the Fibonacci sequence starting with `b` followed by `a + b`. The tricky part is computing this sequence without causing an infinite recursion. If the function used `::` instead of `#::`, then every call to the function would result in another call, thus causing an infinite recursion. Since it uses `#::`, though, the right-hand side is not evaluated until it is requested. +Here are the first few elements of the Fibonacci sequence starting with two ones: + + scala> val fibs = fibFrom(1, 1).take(7) + fibs: scala.collection.immutable.LazyList[Int] = LazyList(?) + scala> fibs.toList + res9: List[Int] = List(1, 1, 2, 3, 5, 8, 13) + +## Immutable ArraySeqs + +Lists are very efficient when the algorithm processing them is careful to only process their heads. Accessing, adding, and removing the head of a list takes only constant time, whereas accessing or modifying elements later in the list takes time linear in the depth into the list. + +[ArraySeq](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/ArraySeq.html) is a +collection type (introduced in Scala 2.13) that addresses the inefficiency for random access on lists. ArraySeqs +allow accessing any element of the collection in constant time. As a result, algorithms using ArraySeqs do not +have to be careful about accessing just the head of the collection. They can access elements at arbitrary locations, +and thus they can be much more convenient to write. + +ArraySeqs are built and updated just like any other sequence. + +~~~ +scala> val arr = scala.collection.immutable.ArraySeq(1, 2, 3) +arr: scala.collection.immutable.ArraySeq[Int] = ArraySeq(1, 2, 3) +scala> val arr2 = arr :+ 4 +arr2: scala.collection.immutable.ArraySeq[Int] = ArraySeq(1, 2, 3, 4) +scala> arr2(0) +res22: Int = 1 +~~~ + +ArraySeqs are immutable, so you cannot change an element in place. However, the `updated`, `appended` and `prepended` +operations create new ArraySeqs that differ from a given ArraySeq only in a single element: + +~~~ +scala> arr.updated(2, 4) +res26: scala.collection.immutable.ArraySeq[Int] = ArraySeq(1, 2, 4) +scala> arr +res27: scala.collection.immutable.ArraySeq[Int] = ArraySeq(1, 2, 3) +~~~ + +As the last line above shows, a call to `updated` has no effect on the original ArraySeq `arr`. + +ArraySeqs store their elements in a private [Array](arrays.html). This is a compact representation that supports fast +indexed access, but updating or adding one element is linear since it requires creating another array and copying all +the original array’s elements. + +## Vectors + +We have seen in the previous sections that `List` and `ArraySeq` are efficient data structures in some specific +use cases but they are also inefficient in other use cases: for instance, prepending an element is constant for `List`, +but linear for `ArraySeq`, and, conversely, indexed access is constant for `ArraySeq` but linear for `List`. + +[Vector](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Vector.html) is a collection type that provides good performance for all its operations. Vectors allow accessing any element of the sequence in "effectively" constant time. It's a larger constant than for access to the head of a List or for reading an element of an ArraySeq, but it's a constant nonetheless. As a result, algorithms using vectors do not have to be careful about accessing just the head of the sequence. They can access and modify elements at arbitrary locations, and thus they can be much more convenient to write. + +Vectors are built and modified just like any other sequence. + + scala> val vec = scala.collection.immutable.Vector.empty + vec: scala.collection.immutable.Vector[Nothing] = Vector() + scala> val vec2 = vec :+ 1 :+ 2 + vec2: scala.collection.immutable.Vector[Int] = Vector(1, 2) + scala> val vec3 = 100 +: vec2 + vec3: scala.collection.immutable.Vector[Int] = Vector(100, 1, 2) + scala> vec3(0) + res1: Int = 100 + +Vectors are represented as trees with a high branching factor (The branching factor of a tree or a graph is the number of children at each node). Every tree node contains up to 32 elements of the vector or contains up to 32 other tree nodes. Vectors with up to 32 elements can be represented in a single node. Vectors with up to `32 * 32 = 1024` elements can be represented with a single indirection. Two hops from the root of the tree to the final element node are sufficient for vectors with up to 215 elements, three hops for vectors with 220, four hops for vectors with 225 elements and five hops for vectors with up to 230 elements. So for all vectors of reasonable size, an element selection involves up to 5 primitive array selections. This is what we meant when we wrote that element access is "effectively constant time". + +Like selection, functional vector updates are also "effectively constant time". Updating an element in the middle of a vector can be done by copying the node that contains the element, and every node that points to it, starting from the root of the tree. This means that a functional update creates between one and five nodes that each contain up to 32 elements or subtrees. This is certainly more expensive than an in-place update in a mutable array, but still a lot cheaper than copying the whole vector. + +Because vectors strike a good balance between fast random selections and fast random functional updates, they are currently the default implementation of immutable indexed sequences: + + + scala> collection.immutable.IndexedSeq(1, 2, 3) + res2: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3) + +## Immutable Queues + +A [Queue](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Queue.html) is a first-in-first-out sequence. You enqueue an element onto a queue with `enqueue`, and dequeue an element with `dequeue`. These operations are constant time. + +Here's how you can create an empty immutable queue: + + scala> val empty = scala.collection.immutable.Queue[Int]() + empty: scala.collection.immutable.Queue[Int] = Queue() + +You can append an element to an immutable queue with `enqueue`: + + scala> val has1 = empty.enqueue(1) + has1: scala.collection.immutable.Queue[Int] = Queue(1) + +To append multiple elements to a queue, call `enqueueAll` with a collection as its argument: + + scala> val has123 = has1.enqueueAll(List(2, 3)) + has123: scala.collection.immutable.Queue[Int] + = Queue(1, 2, 3) + +To remove an element from the head of the queue, you use `dequeue`: + + scala> val (element, has23) = has123.dequeue + element: Int = 1 + has23: scala.collection.immutable.Queue[Int] = Queue(2, 3) + +Note that `dequeue` returns a pair consisting of the element removed and the rest of the queue. + +## Ranges + +A [Range](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Range.html) is an ordered sequence of integers that are equally spaced apart. For example, "1, 2, 3," is a range, as is "5, 8, 11, 14." To create a range in Scala, use the predefined methods `to` and `by`. + + scala> 1 to 3 + res2: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3) + scala> 5 to 14 by 3 + res3: scala.collection.immutable.Range = Range(5, 8, 11, 14) + +If you want to create a range that is exclusive of its upper limit, then use the convenience method `until` instead of `to`: + + scala> 1 until 3 + res2: scala.collection.immutable.Range = Range(1, 2) + +Ranges are represented in constant space, because they can be defined by just three numbers: their start, their end, and the stepping value. Because of this representation, most operations on ranges are extremely fast. + +## Compressed Hash-Array Mapped Prefix-trees + +Hash tries are a standard way to implement immutable sets and maps efficiently. [Compressed Hash-Array Mapped Prefix-trees](https://github.com/msteindorfer/oopsla15-artifact/) are a design for hash tries on the JVM which improves locality and makes sure the trees remain in a canonical and compact representation. They are supported by class [immutable.HashMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/HashMap.html). Their representation is similar to vectors in that they are also trees where every node has 32 elements or 32 subtrees. But the selection of these keys is now done based on hash code. For instance, to find a given key in a map, one first takes the hash code of the key. Then, the lowest 5 bits of the hash code are used to select the first subtree, followed by the next 5 bits and so on. The selection stops once all elements stored in a node have hash codes that differ from each other in the bits that are selected up to this level. + +Hash tries strike a nice balance between reasonably fast lookups and reasonably efficient functional insertions (`+`) and deletions (`-`). That's why they underly Scala's default implementations of immutable maps and sets. In fact, Scala has a further optimization for immutable sets and maps that contain less than five elements. Sets and maps with one to four elements are stored as single objects that just contain the elements (or key/value pairs in the case of a map) as fields. The empty immutable set and the empty immutable map is in each case a single object - there's no need to duplicate storage for those because an empty immutable set or map will always stay empty. + +## Red-Black Trees + +Red-black trees are a form of balanced binary tree where some nodes are designated "red" and others designated "black." Like any balanced binary tree, operations on them reliably complete in time logarithmic to the size of the tree. + +Scala provides implementations of immutable sets and maps that use a red-black tree internally. Access them under the names [TreeSet](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/TreeSet.html) and [TreeMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/TreeMap.html). + + + scala> scala.collection.immutable.TreeSet.empty[Int] + res11: scala.collection.immutable.TreeSet[Int] = TreeSet() + scala> res11 + 1 + 3 + 3 + res12: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 3) + +Red-black trees are the standard implementation of `SortedSet` in Scala, because they provide an efficient iterator that returns all elements in sorted order. + +## Immutable BitSets + +A [BitSet](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/BitSet.html) represents a collection of small integers as the bits of a larger integer. For example, the bit set containing 3, 2, and 0 would be represented as the integer 1101 in binary, which is 13 in decimal. + +Internally, bit sets use an array of 64-bit `Long`s. The first `Long` in the array is for integers 0 through 63, the second is for 64 through 127, and so on. Thus, bit sets are very compact so long as the largest integer in the set is less than a few hundred or so. + +Operations on bit sets are very fast. Testing for inclusion takes constant time. Adding an item to the set takes time proportional to the number of `Long`s in the bit set's array, which is typically a small number. Here are some simple examples of the use of a bit set: + + scala> val bits = scala.collection.immutable.BitSet.empty + bits: scala.collection.immutable.BitSet = BitSet() + scala> val moreBits = bits + 3 + 4 + 4 + moreBits: scala.collection.immutable.BitSet = BitSet(3, 4) + scala> moreBits(3) + res26: Boolean = true + scala> moreBits(0) + res27: Boolean = false + +## VectorMaps + +A [VectorMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/VectorMap.html) represents +a map using both a `Vector` of keys and a `HashMap`. It provides an iterator that returns all the entries in their +insertion order. + +~~~ +scala> val vm = scala.collection.immutable.VectorMap.empty[Int, String] +vm: scala.collection.immutable.VectorMap[Int,String] = + VectorMap() +scala> val vm1 = vm + (1 -> "one") +vm1: scala.collection.immutable.VectorMap[Int,String] = + VectorMap(1 -> one) +scala> val vm2 = vm1 + (2 -> "two") +vm2: scala.collection.immutable.VectorMap[Int,String] = + VectorMap(1 -> one, 2 -> two) +scala> vm2 == Map(2 -> "two", 1 -> "one") +res29: Boolean = true +~~~ + +The first lines show that the content of the `VectorMap` keeps the insertion order, and the last line +shows that `VectorMap`s are comparable with other `Map`s and that this comparison does not take the +order of elements into account. + +## ListMaps + +A [ListMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/ListMap.html) represents a map as a linked list of key-value pairs. In general, operations on a list map might have to iterate through the entire list. Thus, operations on a list map take time linear in the size of the map. In fact there is little usage for list maps in Scala because standard immutable maps are almost always faster. The only possible exception to this is if the map is for some reason constructed in such a way that the first elements in the list are selected much more often than the other elements. + + scala> val map = scala.collection.immutable.ListMap(1->"one", 2->"two") + map: scala.collection.immutable.ListMap[Int,java.lang.String] = + Map(1 -> one, 2 -> two) + scala> map(2) + res30: String = "two" diff --git a/_overviews/collections-2.13/concrete-mutable-collection-classes.md b/_overviews/collections-2.13/concrete-mutable-collection-classes.md new file mode 100644 index 0000000000..bcf855ddc6 --- /dev/null +++ b/_overviews/collections-2.13/concrete-mutable-collection-classes.md @@ -0,0 +1,160 @@ +--- +layout: multipage-overview +title: Concrete Mutable Collection Classes + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 9 + +permalink: /overviews/collections-2.13/:title.html +--- + +You've now seen the most commonly used immutable collection classes that Scala provides in its standard library. Take a look now at the mutable collection classes. + +## Array Buffers + +An [ArrayBuffer](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/ArrayBuffer.html) buffer holds an array and a size. Most operations on an array buffer have the same speed as for an array, because the operations simply access and modify the underlying array. Additionally, array buffers can have data efficiently added to the end. Appending an item to an array buffer takes amortized constant time. Thus, array buffers are useful for efficiently building up a large collection whenever the new items are always added to the end. + + scala> val buf = scala.collection.mutable.ArrayBuffer.empty[Int] + buf: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer() + scala> buf += 1 + res32: buf.type = ArrayBuffer(1) + scala> buf += 10 + res33: buf.type = ArrayBuffer(1, 10) + scala> buf.toArray + res34: Array[Int] = Array(1, 10) + +## List Buffers + +A [ListBuffer](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/ListBuffer.html) is like an array buffer except that it uses a linked list internally instead of an array. If you plan to convert the buffer to a list once it is built up, use a list buffer instead of an array buffer. + + scala> val buf = scala.collection.mutable.ListBuffer.empty[Int] + buf: scala.collection.mutable.ListBuffer[Int] = ListBuffer() + scala> buf += 1 + res35: buf.type = ListBuffer(1) + scala> buf += 10 + res36: buf.type = ListBuffer(1, 10) + scala> buf.toList + res37: List[Int] = List(1, 10) + +## StringBuilders + +Just like an array buffer is useful for building arrays, and a list buffer is useful for building lists, a [StringBuilder](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/StringBuilder.html) is useful for building strings. String builders are so commonly used that they are already imported into the default namespace. Create them with a simple `new StringBuilder`, like this: + + scala> val buf = new StringBuilder + buf: StringBuilder = + scala> buf += 'a' + res38: buf.type = a + scala> buf ++= "bcdef" + res39: buf.type = abcdef + scala> buf.toString + res41: String = abcdef + +## ArrayDeque + +An [ArrayDeque](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/ArrayDeque.html) +is a sequence that supports efficient addition of elements in the front and in the end. +It internally uses a resizable array. + +If you need to append and prepend elements to a buffer, use an `ArrayDeque` instead of +an `ArrayBuffer`. + +## Queues + +Scala provides mutable queues in addition to immutable ones. You use a `mQueue` similarly to how you use an immutable one, but instead of `enqueue`, you use the `+=` and `++=` operators to append. Also, on a mutable queue, the `dequeue` method will just remove the head element from the queue and return it. Here's an example: + + scala> val queue = new scala.collection.mutable.Queue[String] + queue: scala.collection.mutable.Queue[String] = Queue() + scala> queue += "a" + res10: queue.type = Queue(a) + scala> queue ++= List("b", "c") + res11: queue.type = Queue(a, b, c) + scala> queue + res12: scala.collection.mutable.Queue[String] = Queue(a, b, c) + scala> queue.dequeue + res13: String = a + scala> queue + res14: scala.collection.mutable.Queue[String] = Queue(b, c) + +## Stacks + +You saw immutable stacks earlier. There is also a mutable version, supported by class [mutable.Stack](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/Stack.html). It works exactly the same as the immutable version except that modifications happen in place. + + scala> val stack = new scala.collection.mutable.Stack[Int] + stack: scala.collection.mutable.Stack[Int] = Stack() + scala> stack.push(1) + res0: stack.type = Stack(1) + scala> stack + res1: scala.collection.mutable.Stack[Int] = Stack(1) + scala> stack.push(2) + res0: stack.type = Stack(1, 2) + scala> stack + res3: scala.collection.mutable.Stack[Int] = Stack(1, 2) + scala> stack.top + res8: Int = 2 + scala> stack + res9: scala.collection.mutable.Stack[Int] = Stack(1, 2) + scala> stack.pop + res10: Int = 2 + scala> stack + res11: scala.collection.mutable.Stack[Int] = Stack(1) + +## Mutable ArraySeqs + +Array sequences are mutable sequences of fixed size which store their elements internally in an `Array[Object]`. They are implemented in Scala by class [ArraySeq](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/ArraySeq.html). + +You would typically use an `ArraySeq` if you want an array for its performance characteristics, but you also want to create generic instances of the sequence where you do not know the type of the elements and you do not have a `ClassTag` to provide it at run-time. These issues are explained in the section on [arrays]({{ site.baseurl }}/overviews/collections/arrays.html). + +## Hash Tables + +A hash table stores its elements in an underlying array, placing each item at a position in the array determined by the hash code of that item. Adding an element to a hash table takes only constant time, so long as there isn't already another element in the array that has the same hash code. Hash tables are thus very fast so long as the objects placed in them have a good distribution of hash codes. As a result, the default mutable map and set types in Scala are based on hash tables. You can access them also directly under the names [mutable.HashSet](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/HashSet.html) and [mutable.HashMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/HashMap.html). + +Hash sets and maps are used just like any other set or map. Here are some simple examples: + + scala> val map = scala.collection.mutable.HashMap.empty[Int,String] + map: scala.collection.mutable.HashMap[Int,String] = Map() + scala> map += (1 -> "make a web site") + res42: map.type = Map(1 -> make a web site) + scala> map += (3 -> "profit!") + res43: map.type = Map(1 -> make a web site, 3 -> profit!) + scala> map(1) + res44: String = make a web site + scala> map contains 2 + res46: Boolean = false + +Iteration over a hash table is not guaranteed to occur in any particular order. Iteration simply proceeds through the underlying array in whichever order it happens to be in. To get a guaranteed iteration order, use a _linked_ hash map or set instead of a regular one. A linked hash map or set is just like a regular hash map or set except that it also includes a linked list of the elements in the order they were added. Iteration over such a collection is always in the same order that the elements were initially added. + +## Weak Hash Maps + +A weak hash map is a special kind of hash map where the garbage collector does not follow links from the map to the keys stored in it. This means that a key and its associated value will disappear from the map if there is no other reference to that key. Weak hash maps are useful for tasks such as caching, where you want to re-use an expensive function's result if the function is called again on the same key. If keys and function results are stored in a regular hash map, the map could grow without bounds, and no key would ever become garbage. Using a weak hash map avoids this problem. As soon as a key object becomes unreachable, it's entry is removed from the weak hashmap. Weak hash maps in Scala are implemented by class [WeakHashMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/WeakHashMap.html) as a wrapper of an underlying Java implementation `java.util.WeakHashMap`. + +## Concurrent Maps + +A concurrent map can be accessed by several threads at once. In addition to the usual [Map](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Map.html) operations, it provides the following atomic operations: + +### Operations in Class concurrent.Map + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| `m.putIfAbsent(k, v)` |Adds key/value binding `k -> v` unless `k` is already defined in `m` | +| `m.remove(k, v)` |Removes entry for `k` if it is currently mapped to `v`. | +| `m.replace(k, old, new)` |Replaces value associated with key `k` to `new`, if it was previously bound to `old`. | +| `m.replace (k, v)` |Replaces value associated with key `k` to `v`, if it was previously bound to some value.| + +`concurrent.Map` is a trait in the Scala collections library. Currently, it has two implementations. The first one is Java's `java.util.concurrent.ConcurrentMap`, which can be converted automatically into a Scala map using the [standard Java/Scala collection conversions]({{ site.baseurl }}/overviews/collections/conversions-between-java-and-scala-collections.html). The second implementation is [TrieMap](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/concurrent/TrieMap.html), which is a lock-free implementation of a hash array mapped trie. + +## Mutable Bitsets + +A mutable bit of type [mutable.BitSet](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/BitSet.html) set is just like an immutable one, except that it is modified in place. Mutable bit sets are slightly more efficient at updating than immutable ones, because they don't have to copy around `Long`s that haven't changed. + + scala> val bits = scala.collection.mutable.BitSet.empty + bits: scala.collection.mutable.BitSet = BitSet() + scala> bits += 1 + res49: bits.type = BitSet(1) + scala> bits += 3 + res50: bits.type = BitSet(1, 3) + scala> bits + res51: scala.collection.mutable.BitSet = BitSet(1, 3) diff --git a/_overviews/collections-2.13/conversions-between-java-and-scala-collections.md b/_overviews/collections-2.13/conversions-between-java-and-scala-collections.md new file mode 100644 index 0000000000..a1bf75ff12 --- /dev/null +++ b/_overviews/collections-2.13/conversions-between-java-and-scala-collections.md @@ -0,0 +1,64 @@ +--- +layout: multipage-overview +title: Conversions Between Java and Scala Collections + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 17 + +permalink: /overviews/collections-2.13/:title.html +--- + +Like Scala, Java also has a rich collections library. There are many similarities between the two. For instance, both libraries know iterators, iterables, sets, maps, and sequences. But there are also important differences. In particular, the Scala libraries put much more emphasis on immutable collections, and provide many more operations that transform a collection into a new one. + +Sometimes you might need to pass from one collection framework to the other. For instance, you might want to access an existing Java collection as if it were a Scala collection. Or you might want to pass one of Scala's collections to a Java method that expects its Java counterpart. It is quite easy to do this, because Scala offers implicit conversions between all the major collection types in the [JavaConverters](http://www.scala-lang.org/api/{{ site.scala-version }}/scala/collection/JavaConverters$.html) object. In particular, you will find bidirectional conversions between the following types. + + + Iterator <=> java.util.Iterator + Iterator <=> java.util.Enumeration + Iterable <=> java.lang.Iterable + Iterable <=> java.util.Collection + mutable.Buffer <=> java.util.List + mutable.Set <=> java.util.Set + mutable.Map <=> java.util.Map + mutable.ConcurrentMap <=> java.util.concurrent.ConcurrentMap + +To enable these conversions, simply import them from the [JavaConverters](http://www.scala-lang.org/api/{{ site.scala-version }}/scala/collection/JavaConverters$.html) object: + + scala> import collection.JavaConverters._ + import collection.JavaConverters._ + +This enables conversions between Scala collections and their corresponding Java collections by way of extension methods called `asScala` and `asJava`: + + scala> import collection.mutable._ + import collection.mutable._ + + scala> val jul: java.util.List[Int] = ArrayBuffer(1, 2, 3).asJava + jul: java.util.List[Int] = [1, 2, 3] + + scala> val buf: Seq[Int] = jul.asScala + buf: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3) + + scala> val m: java.util.Map[String, Int] = HashMap("abc" -> 1, "hello" -> 2).asJava + m: java.util.Map[String,Int] = {abc=1, hello=2} + +Internally, these conversion work by setting up a "wrapper" object that forwards all operations to the underlying collection object. So collections are never copied when converting between Java and Scala. An interesting property is that if you do a round-trip conversion from, say a Java type to its corresponding Scala type, and back to the same Java type, you end up with the identical collection object you have started with. + +Certain other Scala collections can also be converted to Java, but do not have a conversion back to the original Scala type: + + Seq => java.util.List + mutable.Seq => java.util.List + Set => java.util.Set + Map => java.util.Map + +Because Java does not distinguish between mutable and immutable collections in their type, a conversion from, say, `scala.immutable.List` will yield a `java.util.List`, where all mutation operations throw an "UnsupportedOperationException". Here's an example: + + scala> val jul = List(1, 2, 3).asJava + jul: java.util.List[Int] = [1, 2, 3] + + scala> jul.add(7) + java.lang.UnsupportedOperationException + at java.util.AbstractList.add(AbstractList.java:148) diff --git a/_overviews/collections-2.13/creating-collections-from-scratch.md b/_overviews/collections-2.13/creating-collections-from-scratch.md new file mode 100644 index 0000000000..c86dd32db2 --- /dev/null +++ b/_overviews/collections-2.13/creating-collections-from-scratch.md @@ -0,0 +1,62 @@ +--- +layout: multipage-overview +title: Creating Collections From Scratch + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 16 + +permalink: /overviews/collections-2.13/:title.html +--- + +You have syntax `List(1, 2, 3)` to create a list of three integers and `Map('A' -> 1, 'C' -> 2)` to create a map with two bindings. This is actually a universal feature of Scala collections. You can take any collection name and follow it by a list of elements in parentheses. The result will be a new collection with the given elements. Here are some more examples: + + Iterable() // An empty collection + List() // The empty list + List(1.0, 2.0) // A list with elements 1.0, 2.0 + Vector(1.0, 2.0) // A vector with elements 1.0, 2.0 + Iterator(1, 2, 3) // An iterator returning three integers. + Set(dog, cat, bird) // A set of three animals + HashSet(dog, cat, bird) // A hash set of the same animals + Map('a' -> 7, 'b' -> 0) // A map from characters to integers + +"Under the covers" each of the above lines is a call to the `apply` method of some object. For instance, the third line above expands to + + List.apply(1.0, 2.0) + +So this is a call to the `apply` method of the companion object of the `List` class. That method takes an arbitrary number of arguments and constructs a list from them. Every collection class in the Scala library has a companion object with such an `apply` method. It does not matter whether the collection class represents a concrete implementation, like `List`, or `LazyList` or `Vector`, do, or whether it is an abstract base class such as `Seq`, `Set` or `Iterable`. In the latter case, calling apply will produce some default implementation of the abstract base class. Examples: + + scala> List(1, 2, 3) + res17: List[Int] = List(1, 2, 3) + scala> Iterable(1, 2, 3) + res18: Iterable[Int] = List(1, 2, 3) + scala> mutable.Iterable(1, 2, 3) + res19: scala.collection.mutable.Iterable[Int] = ArrayBuffer(1, 2, 3) + +Besides `apply`, every collection companion object also defines a member `empty`, which returns an empty collection. So instead of `List()` you could write `List.empty`, instead of `Map()`, `Map.empty`, and so on. + +The operations provided by collection companion objects are summarized in the following table. In short, there's + +* `concat`, which concatenates an arbitrary number of collections together, +* `fill` and `tabulate`, which generate single or multi-dimensional collections of given dimensions initialized by some expression or tabulating function, +* `range`, which generates integer collections with some constant step length, and +* `iterate` and `unfold`, which generates the collection resulting from repeated application of a function to a start element or state. + +### Factory Methods for Sequences + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| `C.empty` | The empty collection. | +| `C(x, y, z)` | A collection consisting of elements `x, y, z`. | +| `C.concat(xs, ys, zs)` | The collection obtained by concatenating the elements of `xs, ys, zs`. | +| `C.fill(n){e}` | A collection of length `n` where each element is computed by expression `e`. | +| `C.fill(m, n){e}` | A collection of collections of dimension `m×n` where each element is computed by expression `e`. (exists also in higher dimensions). | +| `C.tabulate(n){f}` | A collection of length `n` where the element at each index i is computed by `f(i)`. | +| `C.tabulate(m, n){f}` | A collection of collections of dimension `m×n` where the element at each index `(i, j)` is computed by `f(i, j)`. (exists also in higher dimensions). | +| `C.range(start, end)` | The collection of integers `start` ... `end-1`. | +| `C.range(start, end, step)`| The collection of integers starting with `start` and progressing by `step` increments up to, and excluding, the `end` value. | +| `C.iterate(x, n)(f)` | The collection of length `n` with elements `x`, `f(x)`, `f(f(x))`, ... | +| `C.unfold(init)(f)` | A collection that uses a function `f` to compute its next element and state, starting from the `init` state.| \ No newline at end of file diff --git a/_overviews/collections-2.13/equality.md b/_overviews/collections-2.13/equality.md new file mode 100644 index 0000000000..42137a43ca --- /dev/null +++ b/_overviews/collections-2.13/equality.md @@ -0,0 +1,34 @@ +--- +layout: multipage-overview +title: Equality + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 13 + +permalink: /overviews/collections-2.13/:title.html +--- + +The collection libraries have a uniform approach to equality and hashing. The idea is, first, to divide collections into sets, maps, and sequences. Collections in different categories are always unequal. For instance, `Set(1, 2, 3)` is unequal to `List(1, 2, 3)` even though they contain the same elements. On the other hand, within the same category, collections are equal if and only if they have the same elements (for sequences: the same elements in the same order). For example, `List(1, 2, 3) == Vector(1, 2, 3)`, and `HashSet(1, 2) == TreeSet(2, 1)`. + +It does not matter for the equality check whether a collection is mutable or immutable. For a mutable collection one simply considers its current elements at the time the equality test is performed. This means that a mutable collection might be equal to different collections at different times, depending what elements are added or removed. This is a potential trap when using a mutable collection as a key in a hashmap. Example: + + scala> import collection.mutable.{HashMap, ArrayBuffer} + import collection.mutable.{HashMap, ArrayBuffer} + scala> val buf = ArrayBuffer(1, 2, 3) + buf: scala.collection.mutable.ArrayBuffer[Int] = + ArrayBuffer(1, 2, 3) + scala> val map = HashMap(buf -> 3) + map: scala.collection.mutable.HashMap[scala.collection. + mutable.ArrayBuffer[Int],Int] = Map((ArrayBuffer(1, 2, 3),3)) + scala> map(buf) + res13: Int = 3 + scala> buf(0) += 1 + scala> map(buf) + java.util.NoSuchElementException: key not found: + ArrayBuffer(2, 2, 3) + +In this example, the selection in the last line will most likely fail because the hash-code of the array `xs` has changed in the second-to-last line. Therefore, the hash-code-based lookup will look at a different place than the one where `xs` was stored. diff --git a/_overviews/collections-2.13/introduction.md b/_overviews/collections-2.13/introduction.md new file mode 100644 index 0000000000..2f077f3383 --- /dev/null +++ b/_overviews/collections-2.13/introduction.md @@ -0,0 +1,99 @@ +--- +layout: multipage-overview +title: Introduction + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 1 + +permalink: /overviews/collections-2.13/:title.html +--- + +**Martin Odersky, and Lex Spoon** + +In the eyes of many, the new collections framework is the most significant +change in the Scala 2.8 release. Scala had collections before (and in fact the new +framework is largely compatible with them). But it's only 2.8 that +provides a common, uniform, and all-encompassing framework for +collection types. + +Even though the additions to collections are subtle at first glance, +the changes they can provoke in your programming style can be +profound. In fact, quite often it's as if you work on a higher-level +with the basic building blocks of a program being whole collections +instead of their elements. This new style of programming requires some +adaptation. Fortunately, the adaptation is helped by several nice +properties of the new Scala collections. They are easy to use, +concise, safe, fast, universal. + +**Easy to use:** A small vocabulary of 20-50 methods is +enough to solve most collection problems in a couple of operations. No +need to wrap your head around complicated looping structures or +recursions. Persistent collections and side-effect-free operations mean +that you need not worry about accidentally corrupting existing +collections with new data. Interference between iterators and +collection updates is eliminated. + +**Concise:** You can achieve with a single word what used to +take one or several loops. You can express functional operations with +lightweight syntax and combine operations effortlessly, so that the result +feels like a custom algebra. + +**Safe:** This one has to be experienced to sink in. The +statically typed and functional nature of Scala's collections means +that the overwhelming majority of errors you might make are caught at +compile-time. The reason is that (1) the collection operations +themselves are heavily used and therefore well +tested. (2) the usages of the collection operation make inputs and +output explicit as function parameters and results. (3) These explicit +inputs and outputs are subject to static type checking. The bottom line +is that the large majority of misuses will manifest themselves as type +errors. It's not at all uncommon to have programs of several hundred +lines run at first try. + +**Fast:** Collection operations are tuned and optimized in the +libraries. As a result, using collections is typically quite +efficient. You might be able to do a little bit better with carefully +hand-tuned data structures and operations, but you might also do a lot +worse by making some suboptimal implementation decisions along the +way. What's more, collections have been recently adapted to parallel +execution on multi-cores. Parallel collections support the same +operations as sequential ones, so no new operations need to be learned +and no code needs to be rewritten. You can turn a sequential collection into a +parallel one simply by invoking the `par` method. + +**Universal:** Collections provide the same operations on +any type where it makes sense to do so. So you can achieve a lot with +a fairly small vocabulary of operations. For instance, a string is +conceptually a sequence of characters. Consequently, in Scala +collections, strings support all sequence operations. The same holds +for arrays. + +**Example:** Here's one line of code that demonstrates many of the +advantages of Scala's collections. + + val (minors, adults) = people partition (_.age < 18) + +It's immediately clear what this operation does: It partitions a +collection of `people` into `minors` and `adults` depending on +their age. Because the `partition` method is defined in the root +collection type `TraversableLike`, this code works for any kind of +collection, including arrays. The resulting `minors` and `adults` +collections will be of the same type as the `people` collection. + +This code is much more concise than the one to three loops required for +traditional collection processing (three loops for an array, because +the intermediate results need to be buffered somewhere else). Once +you have learned the basic collection vocabulary you will also find +writing this code is much easier and safer than writing explicit +loops. Furthermore, the `partition` operation is quite fast, and can +be even faster on parallel collections on multi-cores. (Parallel +collections are available as a +[separate library](https://index.scala-lang.org/scala/scala-parallel-collections/scala-parallel-collections)) + +This document provides an in depth discussion of the APIs of the +Scala collections classes from a user perspective. It takes you on +a tour of all the fundamental classes and the methods they define. diff --git a/_overviews/collections-2.13/iterators.md b/_overviews/collections-2.13/iterators.md new file mode 100644 index 0000000000..55de58cb7e --- /dev/null +++ b/_overviews/collections-2.13/iterators.md @@ -0,0 +1,238 @@ +--- +layout: multipage-overview +title: Iterators + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 15 + +permalink: /overviews/collections-2.13/:title.html +--- + +An iterator is not a collection, but rather a way to access the elements of a collection one by one. The two basic operations on an iterator `it` are `next` and `hasNext`. A call to `it.next()` will return the next element of the iterator and advance the state of the iterator. Calling `next` again on the same iterator will then yield the element one beyond the one returned previously. If there are no more elements to return, a call to `next` will throw a `NoSuchElementException`. You can find out whether there are more elements to return using [Iterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterator.html)'s `hasNext` method. + +The most straightforward way to "step through" all the elements returned by an iterator `it` uses a while-loop: + + while (it.hasNext) + println(it.next()) + +Iterators in Scala also provide analogues of most of the methods that you find in the `Traversable`, `Iterable` and `Seq` classes. For instance, they provide a `foreach` method which executes a given procedure on each element returned by an iterator. Using `foreach`, the loop above could be abbreviated to: + + it foreach println + +As always, for-expressions can be used as an alternate syntax for expressions involving `foreach`, `map`, `withFilter`, and `flatMap`, so yet another way to print all elements returned by an iterator would be: + + for (elem <- it) println(elem) + +There's an important difference between the foreach method on iterators and the same method on traversable collections: When called on an iterator, `foreach` will leave the iterator at its end when it is done. So calling `next` again on the same iterator will fail with a `NoSuchElementException`. By contrast, when called on a collection, `foreach` leaves the number of elements in the collection unchanged (unless the passed function adds to removes elements, but this is discouraged, because it may lead to surprising results). + +The other operations that `Iterator` has in common with `Iterable` have the same property. For instance, iterators provide a `map` method, which returns a new iterator: + + scala> val it = Iterator("a", "number", "of", "words") + it: Iterator[java.lang.String] = + scala> it.map(_.length) + res1: Iterator[Int] = + scala> it.hasNext + res2: Boolean = true + scala> res1 foreach println + 1 + 6 + 2 + 5 + scala> it.hasNext + res4: Boolean = false + +As you can see, after the call to `it.map`, the `it` iterator hasn’t advanced to its end, but traversing the iterator +resulting from the call to `it.map` also traverses `it` and advances it to its end. + +Another example is the `dropWhile` method, which can be used to find the first elements of an iterator that has a certain property. For instance, to find the first word in the iterator above that has at least two characters you could write: + + scala> val it = Iterator("a", "number", "of", "words") + it: Iterator[java.lang.String] = + scala> it dropWhile (_.length < 2) + res4: Iterator[java.lang.String] = + scala> res4.next() + res5: java.lang.String = number + +Note again that `it` was changed by the call to `dropWhile`: it now points to the second word "number" in the list. +In fact, `it` and the result `res4` returned by `dropWhile` will return exactly the same sequence of elements. + +One way to circumvent this behavior is to `duplicate` the underlying iterator instead of calling methods on it directly. +The _two_ iterators that result will each return exactly the same elements as the underlying iterator `it`: + + scala> val (words, ns) = Iterator("a", "number", "of", "words").duplicate + words: Iterator[String] = + ns: Iterator[String] = + + scala> val shorts = words.filter(_.length < 3).toList + shorts: List[String] = List(a, of) + + scala> val count = ns.map(_.length).sum + count: Int = 14 + +The two iterators work independently: advancing one does not affect the other, so that each can be +destructively modified by invoking arbitrary methods. This creates the illusion of iterating over +the elements twice, but the effect is achieved through internal buffering. +As usual, the underlying iterator `it` cannot be used directly and must be discarded. + +In summary, iterators behave like collections _if one never accesses an iterator again after invoking a method on it_. The Scala collection libraries make this explicit with an abstraction [IterableOnce](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/IterableOnce.html), which is a common superclass of [Iterable](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterable.html) and [Iterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterator.html). `IterableOnce[A]` only has two methods: `iterator: Iterator[A]` and `knownSize: Int`. If an `IterableOnce` object is in fact an `Iterator`, its `iterator` operation always returns itself, in its current state, but if it is an `Iterable`, its `iterator` operation always return a new `Iterator`. A common use case of `IterableOnce` is as an argument type for methods that can take either an iterator or a collection as argument. An example is the appending method `concat` in class `Iterable`. It takes an `IterableOnce` parameter, so you can append elements coming from either an iterator or a collection. + +All operations on iterators are summarized below. + +### Operations in class Iterator + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Abstract Methods:** | | +| `it.next()` | Returns next element on iterator and advances past it. | +| `it.hasNext` | Returns `true` if `it` can return another element. | +| **Variations:** | | +| `it.buffered` | A buffered iterator returning all elements of `it`. | +| `it grouped size` | An iterator that yields the elements returned by `it` in fixed-sized sequence "chunks". | +| `xs sliding size` | An iterator that yields the elements returned by `it` in sequences representing a sliding fixed-sized window. | +| **Duplication:** | | +| `it.duplicate` | A pair of iterators that each independently return all elements of `it`. | +| **Additions:** | | +| `it concat jt`
or `it ++ jt` | An iterator returning all elements returned by iterator `it`, followed by all elements returned by iterator `jt`. | +| `it.padTo(len, x)` | The iterator that first returns all elements of `it` and then follows that by copies of `x` until length `len` elements are returned overall. | +| **Maps:** | | +| `it map f` | The iterator obtained from applying the function `f` to every element returned from `it`. | +| `it flatMap f` | The iterator obtained from applying the iterator-valued function f to every element in `it` and appending the results. | +| `it collect f` | The iterator obtained from applying the partial function `f` to every element in `it` for which it is defined and collecting the results. | +| **Conversions:** | | +| `it.toArray` | Collects the elements returned by `it` in an array. | +| `it.toList` | Collects the elements returned by `it` in a list. | +| `it.toIterable` | Collects the elements returned by `it` in an iterable. | +| `it.toSeq` | Collects the elements returned by `it` in a sequence. | +| `it.toIndexedSeq` | Collects the elements returned by `it` in an indexed sequence. | +| `it.toLazyList` | Collects the elements returned by `it` in a lazy list. | +| `it.toSet` | Collects the elements returned by `it` in a set. | +| `it.toMap` | Collects the key/value pairs returned by `it` in a map. | +| **Copying:** | | +| `it.copyToArray(arr, s, n)`| Copies at most `n` elements returned by `it` to array `arr` starting at index `s`. The last two arguments are optional. | +| **Size Info:** | | +| `it.isEmpty` | Test whether the iterator is empty (opposite of `hasNext`). | +| `it.nonEmpty` | Test whether the collection contains elements (alias of `hasNext`). | +| `it.size` | The number of elements returned by `it`. Note: `it` will be at its end after this operation! | +| `it.length` | Same as `it.size`. | +| `xs.knownSize` |The number of elements, if this one is known without modifying the iterator’s state, otherwise `-1`. | +| **Element Retrieval Index Search:**| | +| `it find p` | An option containing the first element returned by `it` that satisfies `p`, or `None` is no element qualifies. Note: The iterator advances to after the element, or, if none is found, to the end. | +| `it indexOf x` | The index of the first element returned by `it` that equals `x`. Note: The iterator advances past the position of this element. | +| `it indexWhere p` | The index of the first element returned by `it` that satisfies `p`. Note: The iterator advances past the position of this element. | +| **Subiterators:** | | +| `it take n` | An iterator returning of the first `n` elements of `it`. Note: it will advance to the position after the `n`'th element, or to its end, if it contains less than `n` elements. | +| `it drop n` | The iterator that starts with the `(n+1)`'th element of `it`. Note: `it` will advance to the same position. | +| `it.slice(m,n)` | The iterator that returns a slice of the elements returned from it, starting with the `m`'th element and ending before the `n`'th element. | +| `it takeWhile p` | An iterator returning elements from `it` as long as condition `p` is true. | +| `it dropWhile p` | An iterator skipping elements from `it` as long as condition `p` is `true`, and returning the remainder. | +| `it filter p` | An iterator returning all elements from `it` that satisfy the condition `p`. | +| `it withFilter p` | Same as `it` filter `p`. Needed so that iterators can be used in for-expressions. | +| `it filterNot p` | An iterator returning all elements from `it` that do not satisfy the condition `p`. | +| `it.distinct` | An iterator returning the elements from `it` without duplicates. | +| **Subdivisions:** | | +| `it partition p` | Splits `it` into a pair of two iterators: one returning all elements from `it` that satisfy the predicate `p`, the other returning all elements from `it` that do not. | +| `it span p` | Splits `it` into a pair of two iterators: one returning all elements of the prefix of `it` that satisfy the predicate `p`, the other returning all remaining elements of `it`. | +| **Element Conditions:** | | +| `it forall p` | A boolean indicating whether the predicate p holds for all elements returned by `it`. | +| `it exists p` | A boolean indicating whether the predicate p holds for some element in `it`. | +| `it count p` | The number of elements in `it` that satisfy the predicate `p`. | +| **Folds:** | | +| `it.foldLeft(z)(op)` | Apply binary operation `op` between successive elements returned by `it`, going left to right and starting with `z`. | +| `it.foldRight(z)(op)` | Apply binary operation `op` between successive elements returned by `it`, going right to left and starting with `z`. | +| `it reduceLeft op` | Apply binary operation `op` between successive elements returned by non-empty iterator `it`, going left to right. | +| `it reduceRight op` | Apply binary operation `op` between successive elements returned by non-empty iterator `it`, going right to left. | +| **Specific Folds:** | | +| `it.sum` | The sum of the numeric element values returned by iterator `it`. | +| `it.product` | The product of the numeric element values returned by iterator `it`. | +| `it.min` | The minimum of the ordered element values returned by iterator `it`. | +| `it.max` | The maximum of the ordered element values returned by iterator `it`. | +| **Zippers:** | | +| `it zip jt` | An iterator of pairs of corresponding elements returned from iterators `it` and `jt`. | +| `it.zipAll(jt, x, y)` | An iterator of pairs of corresponding elements returned from iterators `it` and `jt`, where the shorter iterator is extended to match the longer one by appending elements `x` or `y`. | +| `it.zipWithIndex` | An iterator of pairs of elements returned from `it` with their indices. | +| **Update:** | | +| `it.patch(i, jt, r)` | The iterator resulting from `it` by replacing `r` elements starting with `i` by the patch iterator `jt`. | +| **Comparison:** | | +| `it sameElements jt` | A test whether iterators it and `jt` return the same elements in the same order. Note: Using the iterators after this operation is undefined and subject to change. | +| **Strings:** | | +| `it.addString(b, start, sep, end)`| Adds a string to `StringBuilder` `b` which shows all elements returned by `it` between separators `sep` enclosed in strings `start` and `end`. `start`, `sep`, `end` are all optional. | +| `it.mkString(start, sep, end)` | Converts the collection to a string which shows all elements returned by `it` between separators `sep` enclosed in strings `start` and `end`. `start`, `sep`, `end` are all optional. | + +### Laziness + +Unlike operations directly on a concrete collection like `List`, operations on `Iterator` are lazy. + +A lazy operation does not immediately compute all of its results. Instead, it computes the results as they are individually requested. + +So the expression `(1 to 10).iterator.map(println)` would not print anything to the screen. The `map` method in this case doesn't apply its argument function to the values in the range, it returns a new `Iterator` that will do this as each one is requested. Adding `.toList` to the end of that expression will actually print the elements. + +A consequence of this is that a method like `map` or `filter` won't necessarily apply its argument function to all of the input elements. The expression `(1 to 10).iterator.map(println).take(5).toList` would only print the values `1` to `5`, for instance, since those are only ones that will be requested from the `Iterator` returned by `map`. + +This is one of the reasons why it's important to only use pure functions as arguments to `map`, `filter`, `fold` and similar methods. Remember, a pure function has no side-effects, so one would not normally use `println` in a `map`. `println` is used to demonstrate laziness as it's not normally visible with pure functions. + +Laziness is still valuable, despite often not being visible, as it can prevent unneeded computations from happening, and can allow for working with infinite sequences, like so: + + def zipWithIndex[A](i: Iterator[A]): Iterator[(Int, A)] = + Iterator.from(0).zip(i) + +### Buffered iterators + +Sometimes you want an iterator that can "look ahead", so that you can inspect the next element to be returned without advancing past that element. Consider for instance, the task to skip leading empty strings from an iterator that returns a sequence of strings. You might be tempted to write the following + + + def skipEmptyWordsNOT(it: Iterator[String]) = + while (it.next().isEmpty) {} + +But looking at this code more closely, it's clear that this is wrong: The code will indeed skip leading empty strings, but it will also advance `it` past the first non-empty string! + +The solution to this problem is to use a buffered iterator. Class [BufferedIterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/BufferedIterator.html) is a subclass of [Iterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterator.html), which provides one extra method, `head`. Calling `head` on a buffered iterator will return its first element but will not advance the iterator. Using a buffered iterator, skipping empty words can be written as follows. + + def skipEmptyWords(it: BufferedIterator[String]) = + while (it.head.isEmpty) { it.next() } + +Every iterator can be converted to a buffered iterator by calling its `buffered` method. Here's an example: + + scala> val it = Iterator(1, 2, 3, 4) + it: Iterator[Int] = + scala> val bit = it.buffered + bit: scala.collection.BufferedIterator[Int] = + scala> bit.head + res10: Int = 1 + scala> bit.next() + res11: Int = 1 + scala> bit.next() + res12: Int = 2 + scala> bit.headOption + res13: Option[Int] = Some(3) + +Note that calling `head` on the buffered iterator `bit` does not advance it. Therefore, the subsequent call `bit.next()` returns the same value as `bit.head`. + +As usual, the underlying iterator must not be used directly and must be discarded. + +The buffered iterator only buffers the next element when `head` is invoked. Other derived iterators, +such as those produced by `duplicate` and `partition`, may buffer arbitrary subsequences of the +underlying iterator. But iterators can be efficiently joined by adding them together with `++`: + + scala> def collapse(it: Iterator[Int]) = if (!it.hasNext) Iterator.empty else { + | var head = it.next + | val rest = if (head == 0) it.dropWhile(_ == 0) else it + | Iterator.single(head) ++ rest + | } + collapse: (it: Iterator[Int])Iterator[Int] + + scala> def collapse(it: Iterator[Int]) = { + | val (zeros, rest) = it.span(_ == 0) + | zeros.take(1) ++ rest + | } + collapse: (it: Iterator[Int])Iterator[Int] + + scala> collapse(Iterator(0, 0, 0, 1, 2, 3, 4)).toList + res14: List[Int] = List(0, 1, 2, 3, 4) + +In the second version of `collapse`, the unconsumed zeros are buffered internally. +In the first version, any leading zeros are dropped and the desired result constructed +as a concatenated iterator, which simply calls its two constituent iterators in turn. diff --git a/_overviews/collections-2.13/maps.md b/_overviews/collections-2.13/maps.md new file mode 100644 index 0000000000..9d89489c6f --- /dev/null +++ b/_overviews/collections-2.13/maps.md @@ -0,0 +1,113 @@ +--- +layout: multipage-overview +title: Maps + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 7 + +permalink: /overviews/collections2.13/:title.html +--- + +A [Map](http://www.scala-lang.org/api/current/scala/collection/Map.html) is an [Iterable](http://www.scala-lang.org/api/current/scala/collection/Iterable.html) consisting of pairs of keys and values (also named _mappings_ or _associations_). Scala's [Predef](http://www.scala-lang.org/api/current/scala/Predef$.html) object offers an implicit conversion that lets you write `key -> value` as an alternate syntax for the pair `(key, value)`. For instance `Map("x" -> 24, "y" -> 25, "z" -> 26)` means exactly the same as `Map(("x", 24), ("y", 25), ("z", 26))`, but reads better. + +The fundamental operations on maps are similar to those on sets. They are summarized in the following table and fall into the following categories: + +* **Lookup** operations `apply`, `get`, `getOrElse`, `contains`, and `isDefinedAt`. These turn maps into partial functions from keys to values. The fundamental lookup method for a map is: `def get(key): Option[Value]`. The operation "`m get key`" tests whether the map contains an association for the given `key`. If so, it returns the associated value in a `Some`. If no key is defined in the map, `get` returns `None`. Maps also define an `apply` method that returns the value associated with a given key directly, without wrapping it in an `Option`. If the key is not defined in the map, an exception is raised. +* **Additions and updates** `+`, `++`, `updated`, which let you add new bindings to a map or change existing bindings. +* **Removals** `-`, `--`, which remove bindings from a map. +* **Subcollection producers** `keys`, `keySet`, `keysIterator`, `values`, `valuesIterator`, which return a map's keys and values separately in various forms. +* **Transformations** `filterKeys` and `mapValues`, which produce a new map by filtering and transforming bindings of an existing map. + +### Operations in Class Map ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Lookups:** | | +| `ms get k` |The value associated with key `k` in map `ms` as an option, `None` if not found.| +| `ms(k)` |(or, written out, `ms apply k`) The value associated with key `k` in map `ms`, or exception if not found.| +| `ms getOrElse (k, d)` |The value associated with key `k` in map `ms`, or the default value `d` if not found.| +| `ms contains k` |Tests whether `ms` contains a mapping for key `k`.| +| `ms isDefinedAt k` |Same as `contains`. | +| **Subcollections:** | | +| `ms.keys` |An iterable containing each key in `ms`. | +| `ms.keySet` |A set containing each key in `ms`. | +| `ms.keysIterator` |An iterator yielding each key in `ms`. | +| `ms.values` |An iterable containing each value associated with a key in `ms`.| +| `ms.valuesIterator` |An iterator yielding each value associated with a key in `ms`.| +| **Transformation:** | | +| `ms.view filterKeys p` |A map view containing only those mappings in `ms` where the key satisfies predicate `p`.| +| `ms.view mapValues f` |A map view resulting from applying function `f` to each value associated with a key in `ms`.| + +Immutable maps support in addition operations to add and remove mappings by returning new `Map`s, as summarized in the following table. + +### Operations in Class immutable.Map ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Additions and Updates:**| | +| `ms.updated(k, v)`
or `ms + (k -> v)` |The map containing all mappings of `ms` as well as the mapping `k -> v` from key `k` to value `v`.| +| **Removals:** | | +| `ms remove k`
or `ms - k` |The map containing all mappings of `ms` except for any mapping of key `k`.| +| `ms removeAll ks`
or `ms -- ks` |The map containing all mappings of `ms` except for any mapping with a key in `ks`.| + +Mutable maps support in addition the operations summarized in the following table. + + +### Operations in Class mutable.Map ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Additions and Updates:**| | +| `ms(k) = v` |(Or, written out, `ms.update(x, v)`). Adds mapping from key `k` to value `v` to map ms as a side effect, overwriting any previous mapping of `k`.| +| `ms.addOne(k -> v)`
or `ms += (k -> v)` |Adds mapping from key `k` to value `v` to map `ms` as a side effect and returns `ms` itself.| +| `ms addAll xvs`
or `ms ++= kvs` |Adds all mappings in `kvs` to `ms` as a side effect and returns `ms` itself.| +| `ms.put(k, v)` |Adds mapping from key `k` to value `v` to `ms` and returns any value previously associated with `k` as an option.| +| `ms getOrElseUpdate (k, d)`|If key `k` is defined in map `ms`, return its associated value. Otherwise, update `ms` with the mapping `k -> d` and return `d`.| +| **Removals:**| | +| `ms subtractOne k`
or `ms -= k` |Removes mapping with key `k` from ms as a side effect and returns `ms` itself.| +| `ms subtractAll ks`
or `ms --= ks` |Removes all keys in `ks` from `ms` as a side effect and returns `ms` itself.| +| `ms remove k` |Removes any mapping with key `k` from `ms` and returns any value previously associated with `k` as an option.| +| `ms filterInPlace p` |Keeps only those mappings in `ms` that have a key satisfying predicate `p`.| +| `ms.clear()` |Removes all mappings from `ms`. | +| **Transformation:** | | +| `ms mapValuesInPlace f` |Transforms all associated values in map `ms` with function `f`.| +| **Cloning:** | | +| `ms.clone` |Returns a new mutable map with the same mappings as `ms`.| + +The addition and removal operations for maps mirror those for sets. A mutable map `m` is usually updated "in place", using the two variants `m(key) = value` or `m += (key -> value)`. There is also the variant `m.put(key, value)`, which returns an `Option` value that contains the value previously associated with `key`, or `None` if the `key` did not exist in the map before. + +The `getOrElseUpdate` is useful for accessing maps that act as caches. Say you have an expensive computation triggered by invoking a function `f`: + + scala> def f(x: String) = { + println("taking my time."); sleep(100) + x.reverse } + f: (x: String)String + +Assume further that `f` has no side-effects, so invoking it again with the same argument will always yield the same result. In that case you could save time by storing previously computed bindings of argument and results of `f` in a map and only computing the result of `f` if a result of an argument was not found there. One could say the map is a _cache_ for the computations of the function `f`. + + scala> val cache = collection.mutable.Map[String, String]() + cache: scala.collection.mutable.Map[String,String] = Map() + +You can now create a more efficient caching version of the `f` function: + + scala> def cachedF(s: String) = cache.getOrElseUpdate(s, f(s)) + cachedF: (s: String)String + scala> cachedF("abc") + taking my time. + res3: String = cba + scala> cachedF("abc") + res4: String = cba + +Note that the second argument to `getOrElseUpdate` is "by-name", so the computation of `f("abc")` above is only performed if `getOrElseUpdate` requires the value of its second argument, which is precisely if its first argument is not found in the `cache` map. You could also have implemented `cachedF` directly, using just basic map operations, but it would take more code to do so: + + def cachedF(arg: String) = cache get arg match { + case Some(result) => result + case None => + val result = f(x) + cache(arg) = result + result + } diff --git a/_overviews/collections-2.13/overview.md b/_overviews/collections-2.13/overview.md new file mode 100644 index 0000000000..79cb856cc3 --- /dev/null +++ b/_overviews/collections-2.13/overview.md @@ -0,0 +1,151 @@ +--- +layout: multipage-overview +title: Mutable and Immutable Collections + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 2 + +permalink: /overviews/collections-2.13/:title.html +--- + +Scala collections systematically distinguish between mutable and +immutable collections. A _mutable_ collection can be updated or +extended in place. This means you can change, add, or remove elements +of a collection as a side effect. _Immutable_ collections, by +contrast, never change. You have still operations that simulate +additions, removals, or updates, but those operations will in each +case return a new collection and leave the old collection unchanged. + +All collection classes are found in the package `scala.collection` or +one of its sub-packages `mutable` and `immutable`. Most +collection classes needed by client code exist in three variants, +which are located in packages `scala.collection`, +`scala.collection.immutable`, and `scala.collection.mutable`, +respectively. Each variant has different characteristics with respect +to mutability. + +A collection in package `scala.collection.immutable` is guaranteed to +be immutable for everyone. Such a collection will never change after +it is created. Therefore, you can rely on the fact that accessing the +same collection value repeatedly at different points in time will +always yield a collection with the same elements. + +A collection in package `scala.collection.mutable` is known to have +some operations that change the collection in place. So dealing with +mutable collection means you need to understand which code changes +which collection when. + +A collection in package `scala.collection` can be either mutable or +immutable. For instance, [collection.IndexedSeq\[T\]](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/IndexedSeq.html) +is a superclass of both [collection.immutable.IndexedSeq\[T\]](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/IndexedSeq.html) +and +[collection.mutable.IndexedSeq\[T\]](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/IndexedSeq.html) +Generally, the root collections in +package `scala.collection` support transformation operations +affecting the whole collection, the immutable +collections in package `scala.collection.immutable` typically add +operations for adding or removing single +values, and the mutable collections in package +`scala.collection.mutable` typically add some side-effecting +modification operations to the root interface. + +Another difference between root collections and immutable collections is +that clients of an immutable collection have a guarantee that nobody +can mutate the collection, whereas clients of a root collection only +promise not to change the collection themselves. Even though the +static type of such a collection provides no operations for modifying +the collection, it might still be possible that the run-time type is a +mutable collection which can be changed by other clients. + +By default, Scala always picks immutable collections. For instance, if +you just write `Set` without any prefix or without having imported +`Set` from somewhere, you get an immutable set, and if you write +`Iterable` you get an immutable iterable collection, because these +are the default bindings imported from the `scala` package. To get +the mutable default versions, you need to write explicitly +`collection.mutable.Set`, or `collection.mutable.Iterable`. + +A useful convention if you want to use both mutable and immutable +versions of collections is to import just the package +`collection.mutable`. + + import scala.collection.mutable + +Then a word like `Set` without a prefix still refers to an immutable collection, +whereas `mutable.Set` refers to the mutable counterpart. + +The last package in the collection hierarchy is `scala.collection.generic`. This +package contains building blocks for abstracting over concrete collections. + +For convenience and backwards compatibility some important types have +aliases in the `scala` package, so you can use them by their simple +names without needing an import. An example is the `List` type, which +can be accessed alternatively as + + scala.collection.immutable.List // that's where it is defined + scala.List // via the alias in the scala package + List // because scala._ + // is always automatically imported + +Other types aliased are +[Iterable](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterable.html), [Seq](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Seq.html), [IndexedSeq](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/IndexedSeq.html), [Iterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterator.html), [LazyList](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/LazyList.html), [Vector](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Vector.html), [StringBuilder](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/mutable/StringBuilder.html), and [Range](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/immutable/Range.html). + +The following figure shows all collections in package +`scala.collection`. These are all high-level abstract classes or traits, which +generally have mutable as well as immutable implementations. + +[![General collection hierarchy][1]][1] + +The following figure shows all collections in package `scala.collection.immutable`. + +[![Immutable collection hierarchy][2]][2] + +And the following figure shows all collections in package `scala.collection.mutable`. + +[![Mutable collection hierarchy][3]][3] + +Legend: + +[![Graph legend][4]][4] + +## An Overview of the Collections API ## + +The most important collection classes are shown in the figures above. There is quite a bit of commonality shared by all these classes. For instance, every kind of collection can be created by the same uniform syntax, writing the collection class name followed by its elements: + + Iterable("x", "y", "z") + Map("x" -> 24, "y" -> 25, "z" -> 26) + Set(Color.red, Color.green, Color.blue) + SortedSet("hello", "world") + Buffer(x, y, z) + IndexedSeq(1.0, 2.0) + LinearSeq(a, b, c) + +The same principle also applies for specific collection implementations, such as: + + List(1, 2, 3) + HashMap("x" -> 24, "y" -> 25, "z" -> 26) + +All these collections get displayed with `toString` in the same way they are written above. + +All collections support the API provided by `Iterable`, but specialize types wherever this makes sense. For instance the `map` method in class `Iterable` returns another `Iterable` as its result. But this result type is overridden in subclasses. For instance, calling `map` on a `List` yields again a `List`, calling it on a `Set` yields again a `Set` and so on. + + scala> List(1, 2, 3) map (_ + 1) + res0: List[Int] = List(2, 3, 4) + scala> Set(1, 2, 3) map (_ * 2) + res0: Set[Int] = Set(2, 4, 6) + +This behavior which is implemented everywhere in the collections libraries is called the _uniform return type principle_. + +Most of the classes in the collections hierarchy exist in three variants: root, mutable, and immutable. The only exception is the `Buffer` trait which only exists as a mutable collection. + +In the following, we will review these classes one by one. + + + [1]: /resources/images/tour/collections-diagram-213.svg + [2]: /resources/images/tour/collections-immutable-diagram-213.svg + [3]: /resources/images/tour/collections-mutable-diagram-213.svg + [4]: /resources/images/tour/collections-legend-diagram.svg diff --git a/_overviews/collections-2.13/performance-characteristics.md b/_overviews/collections-2.13/performance-characteristics.md new file mode 100644 index 0000000000..9c46cfc8d2 --- /dev/null +++ b/_overviews/collections-2.13/performance-characteristics.md @@ -0,0 +1,87 @@ +--- +layout: multipage-overview +title: Performance Characteristics + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 12 + +permalink: /overviews/collections-2.13/:title.html +--- + +The previous explanations have made it clear that different collection types have different performance characteristics. That's often the primary reason for picking one collection type over another. You can see the performance characteristics of some common operations on collections summarized in the following two tables. + +Performance characteristics of sequence types: + +| | head | tail | apply | update| prepend | append | insert | +| -------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | +| **immutable** | | | | | | | | +| `List` | C | C | L | L | C | L | - | +| `LazyList` | C | C | L | L | C | L | - | +| `ArraySeq` | C | L | C | L | L | L | - | +| `Vector` | eC | eC | eC | eC | eC | eC | - | +| `Queue` | aC | aC | L | L | C | C | - | +| `Range` | C | C | C | - | - | - | - | +| `String` | C | L | C | L | L | L | - | +| **mutable** | | | | | | | | +| `ArrayBuffer` | C | L | C | C | L | aC | L | +| `ListBuffer` | C | L | L | L | C | C | L | +|`StringBuilder`| C | L | C | C | L | aC | L | +| `Queue` | C | L | L | L | C | C | L | +| `ArraySeq` | C | L | C | C | - | - | - | +| `Stack` | C | L | L | L | C | L | L | +| `Array` | C | L | C | C | - | - | - | +| `ArrayDeque` | C | L | C | C | aC | aC | L | + +Performance characteristics of set and map types: + +| | lookup | add | remove | min | +| -------- | ---- | ---- | ---- | ---- | +| **immutable** | | | | | +| `HashSet`/`HashMap`| eC | eC | eC | L | +| `TreeSet`/`TreeMap`| Log | Log | Log | Log | +| `BitSet` | C | L | L | eC1| +| `VectorMap` | eC | eC | aC | L | +| `ListMap` | L | L | L | L | +| **mutable** | | | | | +| `HashSet`/`HashMap`| eC | eC | eC | L | +| `WeakHashMap` | eC | eC | eC | L | +| `BitSet` | C | aC | C | eC1| +| `TreeSet` | Log | Log | Log | Log | + +Footnote: 1 Assuming bits are densely packed. + +The entries in these two tables are explained as follows: + +| | | +| --- | ---- | +| **C** | The operation takes (fast) constant time. | +| **eC** | The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.| +| **aC** | The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed on average only constant time per operation is taken. | +| **Log** | The operation takes time proportional to the logarithm of the collection size. | +| **L** | The operation is linear, that is it takes time proportional to the collection size. | +| **-** | The operation is not supported. | + +The first table treats sequence types--both immutable and mutable--with the following operations: + +| | | +| --- | ---- | +| **head** | Selecting the first element of the sequence. | +| **tail** | Producing a new sequence that consists of all elements except the first one. | +| **apply** | Indexing. | +| **update** | Functional update (with `updated`) for immutable sequences, side-effecting update (with `update` for mutable sequences). | +| **prepend**| Adding an element to the front of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences it modified the existing sequence. | +| **append** | Adding an element and the end of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences it modified the existing sequence. | +| **insert** | Inserting an element at an arbitrary position in the sequence. This is only supported directly for mutable sequences. | + +The second table treats mutable and immutable sets and maps with the following operations: + +| | | +| --- | ---- | +| **lookup** | Testing whether an element is contained in set, or selecting a value associated with a key. | +| **add** | Adding a new element to a set or key/value pair to a map. | +| **remove** | Removing an element from a set or a key from a map. | +| **min** | The smallest element of the set, or the smallest key of a map. | diff --git a/_overviews/collections-2.13/seqs.md b/_overviews/collections-2.13/seqs.md new file mode 100644 index 0000000000..d61f7e4f86 --- /dev/null +++ b/_overviews/collections-2.13/seqs.md @@ -0,0 +1,123 @@ +--- +layout: multipage-overview +title: The sequence traits Seq, IndexedSeq, and LinearSeq + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 5 + +permalink: /overviews/collections-2.13/:title.html +--- + +The [Seq](http://www.scala-lang.org/api/current/scala/collection/Seq.html) trait represents sequences. A sequence is a kind of iterable that has a `length` and whose elements have fixed index positions, starting from `0`. + +The operations on sequences, summarized in the table below, fall into the following categories: + +* **Indexing and length** operations `apply`, `isDefinedAt`, `length`, `indices`, and `lengthCompare`. For a `Seq`, the `apply` operation means indexing; hence a sequence of type `Seq[T]` is a partial function that takes an `Int` argument (an index) and which yields a sequence element of type `T`. In other words `Seq[T]` extends `PartialFunction[Int, T]`. The elements of a sequence are indexed from zero up to the `length` of the sequence minus one. The `length` method on sequences is an alias of the `size` method of general collections. The `lengthCompare` method allows you to compare the lengths of a sequences with an Int even if the sequences has infinite length. +* **Index search operations** `indexOf`, `lastIndexOf`, `indexOfSlice`, `lastIndexOfSlice`, `indexWhere`, `lastIndexWhere`, `segmentLength`, which return the index of an element equal to a given value or matching some predicate. +* **Addition operations** `prepended`, `prependedAll`, `appended`, `appendedAll`, `padTo`, which return new sequences obtained by adding elements at the front or the end of a sequence. +* **Update operations** `updated`, `patch`, which return a new sequence obtained by replacing some elements of the original sequence. +* **Sorting operations** `sorted`, `sortWith`, `sortBy`, which sort sequence elements according to various criteria. +* **Reversal operations** `reverse`, `reverseIterator`, which yield or process sequence elements in reverse order. +* **Comparisons** `startsWith`, `endsWith`, `contains`, `containsSlice`, `corresponds`, `search`, which relate two sequences or search an element in a sequence. +* **Multiset** operations `intersect`, `diff`, `distinct`, `distinctBy`, which perform set-like operations on the elements of two sequences or remove duplicates. + +If a sequence is mutable, it offers in addition a side-effecting `update` method, which lets sequence elements be updated. As always in Scala, syntax like `seq(idx) = elem` is just a shorthand for `seq.update(idx, elem)`, so `update` gives convenient assignment syntax for free. Note the difference between `update` and `updated`. `update` changes a sequence element in place, and is only available for mutable sequences. `updated` is available for all sequences and always returns a new sequence instead of modifying the original. + +### Operations in Class Seq ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Indexing and Length:** | | +| `xs(i)` |(or, written out, `xs apply i`). The element of `xs` at index `i`.| +| `xs isDefinedAt i` |Tests whether `i` is contained in `xs.indices`.| +| `xs.length` |The length of the sequence (same as `size`).| +| `xs lengthCompare n` |Returns `-1` if `xs` is shorter than `n`, `+1` if it is longer, and `0` if it is of length `n`. Works even if the sequence is infinite, for example `LazyList.from(1) lengthCompare 42` returns a positive value.| +| `xs.indices` |The index range of `xs`, extending from `0` to `xs.length - 1`.| +| **Index Search:** | | +| `xs indexOf x` |The index of the first element in `xs` equal to `x` (several variants exist).| +| `xs lastIndexOf x` |The index of the last element in `xs` equal to `x` (several variants exist).| +| `xs indexOfSlice ys` |The first index of `xs` such that successive elements starting from that index form the sequence `ys`.| +| `xs lastIndexOfSlice ys` |The last index of `xs` such that successive elements starting from that index form the sequence `ys`.| +| `xs indexWhere p` |The index of the first element in xs that satisfies `p` (several variants exist).| +| `xs.segmentLength(p, i)`|The length of the longest uninterrupted segment of elements in `xs`, starting with `xs(i)`, that all satisfy the predicate `p`.| +| **Additions:** | | +| `xs.prepended(x)`
or `x +: xs` |A new sequence that consists of `x` prepended to `xs`.| +| `xs.prependedAll(ys)`
or `ys ++: xs` |A new sequence that consists of all the elements of `ys` prepended to `xs`.| +| `xs.appended(x)`
or `xs :+ x` |A new sequence that consists of `x` appended to `xs`.| +| `xs.appendedAll(ys)`
or `xs :++ ys` |A new sequence that consists of all the elements of `ys` appended to `xs`.| +| `xs.padTo(len, x)` |The sequence resulting from appending the value `x` to `xs` until length `len` is reached.| +| **Updates:** | | +| `xs.patch(i, ys, r)` |The sequence resulting from replacing `r` elements of `xs` starting with `i` by the patch `ys`.| +| `xs.updated(i, x)` |A copy of `xs` with the element at index `i` replaced by `x`.| +| `xs(i) = x` |(or, written out, `xs.update(i, x)`, only available for `mutable.Seq`s). Changes the element of `xs` at index `i` to `x`.| +| **Sorting:** | | +| `xs.sorted` |A new sequence obtained by sorting the elements of `xs` using the standard ordering of the element type of `xs`.| +| `xs sortWith lt` |A new sequence obtained by sorting the elements of `xs` using `lt` as comparison operation.| +| `xs sortBy f` |A new sequence obtained by sorting the elements of `xs`. Comparison between two elements proceeds by mapping the function `f` over both and comparing the results.| +| **Reversals:** | | +| `xs.reverse` |A sequence with the elements of `xs` in reverse order.| +| `xs.reverseIterator` |An iterator yielding all the elements of `xs` in reverse order.| +| **Comparisons:** | | +| `xs sameElements ys` |A test whether `xs` and `ys` contain the same elements in the same order| +| `xs startsWith ys` |Tests whether `xs` starts with sequence `ys` (several variants exist).| +| `xs endsWith ys` |Tests whether `xs` ends with sequence `ys` (several variants exist).| +| `xs contains x` |Tests whether `xs` has an element equal to `x`.| +| `xs search x` |Tests whether a sorted sequence `xs` has an element equal to `x`, possibly in a more efficient way than `xs contains x`.| +| `xs containsSlice ys` |Tests whether `xs` has a contiguous subsequence equal to `ys`.| +| `(xs corresponds ys)(p)` |Tests whether corresponding elements of `xs` and `ys` satisfy the binary predicate `p`.| +| **Multiset Operations:** | | +| `xs intersect ys` |The multi-set intersection of sequences `xs` and `ys` that preserves the order of elements in `xs`.| +| `xs diff ys` |The multi-set difference of sequences `xs` and `ys` that preserves the order of elements in `xs`.| +| `xs.distinct` |A subsequence of `xs` that contains no duplicated element.| +| `xs distinctBy f` |A subsequence of `xs` that contains no duplicated element after applying the transforming function `f`. For instance, `List("foo", "bar", "quux").distinctBy(_.length) == List("foo", "bar")`| + +Trait [Seq](http://www.scala-lang.org/api/current/scala/collection/Seq.html) has two subtraits [LinearSeq](http://www.scala-lang.org/api/current/scala/collection/LinearSeq.html), and [IndexedSeq](http://www.scala-lang.org/api/current/scala/collection/IndexedSeq.html). These do not add any new operations to the immutable branch, but each offers different performance characteristics: A linear sequence has efficient `head` and `tail` operations, whereas an indexed sequence has efficient `apply`, `length`, and (if mutable) `update` operations. Frequently used linear sequences are `scala.collection.immutable.List` and `scala.collection.immutable.LazyList`. Frequently used indexed sequences are `scala.Array` and `scala.collection.mutable.ArrayBuffer`. The `Vector` class provides an interesting compromise between indexed and linear access. It has both effectively constant time indexing overhead and constant time linear access overhead. Because of this, vectors are a good foundation for mixed access patterns where both indexed and linear accesses are used. You'll learn more on vectors [later](concrete-immutable-collection-classes.html). + +On the mutable branch, `IndexedSeq` adds operations for transforming its elements in place (by contrast with +transformation operations such as `map` and `sort`, available on the root `Seq`, which return a new collection +instance). + +#### Operations in Class mutable.IndexedSeq #### + +| WHAT IT IS | WHAT IT DOES| +| ------ | ------ | +| **Transformations:** | | +| `xs.mapInPlace(f)` |Transforms all the elements of `xs` by applying the `f` function to each of them.| +| `xs.sortInPlace()` |Sorts the collection `xs`.| +| `xs.sortInPlaceWith(c)` |Sorts the collection `xs` according to the given comparison function `c`.| +| `xs.sortInPlaceBy(f)` |Sorts the collection `xs` according to an ordering defined on the result of the application of the function `f` to each element.| + +### Buffers ### + +An important sub-category of mutable sequences is `Buffer`s. They allow not only updates of existing elements but also element additions, insertions and removals. The principal new methods supported by a buffer are `append` and `appendAll` for element addition at the end, `prepend` and `prependAll` for addition at the front, `insert` and `insertAll` for element insertions, as well as `remove`, `subtractOne` and `subtractAll` for element removal. These operations are summarized in the following table. + +Two often used implementations of buffers are `ListBuffer` and `ArrayBuffer`. As the name implies, a `ListBuffer` is backed by a `List`, and supports efficient conversion of its elements to a `List`, whereas an `ArrayBuffer` is backed by an array, and can be quickly converted into one. + +#### Operations in Class Buffer #### + +| WHAT IT IS | WHAT IT DOES| +| ------ | ------ | +| **Additions:** | | +| `buf append x`
or `buf += x` |Appends element `x` to buffer, and returns `buf` itself as result.| +| `buf appendAll xs`
or`buf ++= xs` |Appends all elements in `xs` to buffer.| +| `buf prepend x`
or `x +=: buf` |Prepends element `x` to buffer.| +| `buf prependAll xs`
or `xs ++=: buf` |Prepends all elements in `xs` to buffer.| +| `buf.insert(i, x)` |Inserts element `x` at index `i` in buffer.| +| `buf.insertAll(i, xs)` |Inserts all elements in `xs` at index `i` in buffer.| +| `buf.padToInPlace(n, x)` |Appends element `x` to buffer until it has `n` elements in total.| +| **Removals:** | | +| `buf subtractOne x`
or `buf -= x` |Removes element `x` from buffer.| +| `buf subtractAll xs`
or `buf --= xs` |Removes elements in `xs` from buffer.| +| `buf remove i` |Removes element at index `i` from buffer.| +| `buf.remove(i, n)` |Removes `n` elements starting at index `i` from buffer.| +| `buf trimStart n` |Removes first `n` elements from buffer.| +| `buf trimEnd n` |Removes last `n` elements from buffer.| +| `buf.clear()` |Removes all elements from buffer.| +| **Replacement:** | | +| `buf.patchInPlace(i, xs, n)` |Replaces (at most) `n` elements of buffer by elements in `xs`, starting from index `i` in buffer.| +| **Cloning:** | | +| `buf.clone()` |A new buffer with the same elements as `buf`.| diff --git a/_overviews/collections-2.13/sets.md b/_overviews/collections-2.13/sets.md new file mode 100644 index 0000000000..9189d179e1 --- /dev/null +++ b/_overviews/collections-2.13/sets.md @@ -0,0 +1,153 @@ +--- +layout: multipage-overview +title: Sets + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 6 + +permalink: /overviews/collections-2.13/:title.html +--- + +`Set`s are `Iterable`s that contain no duplicate elements. The operations on sets are summarized in the following tables for general sets, immutable sets and mutable sets. They fall into the following categories: + +* **Tests** `contains`, `apply`, `subsetOf`. The `contains` method asks whether a set contains a given element. The `apply` method for a set is the same as `contains`, so `set(elem)` is the same as `set contains elem`. That means sets can also be used as test functions that return true for the elements they contain. + +For example: + + + scala> val fruit = Set("apple", "orange", "peach", "banana") + fruit: scala.collection.immutable.Set[java.lang.String] = Set(apple, orange, peach, banana) + scala> fruit("peach") + res0: Boolean = true + scala> fruit("potato") + res1: Boolean = false + + +* **Additions** `incl` and `concat` (or `+` and `++`, respectively), which add one or more elements to a set, yielding a new set. +* **Removals** `excl` and `removedAll` (or `-` and `--`, respectively), which remove one or more elements from a set, yielding a new set. +* **Set operations** for union, intersection, and set difference. Each of these operations exists in two forms: alphabetic and symbolic. The alphabetic versions are `intersect`, `union`, and `diff`, whereas the symbolic versions are `&`, `|`, and `&~`. In fact, the `++` that `Set` inherits from `Iterable` can be seen as yet another alias of `union` or `|`, except that `++` takes an `IterableOnce` argument whereas `union` and `|` take sets. + +### Operations in Class Set ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Tests:** | | +| `xs contains x` |Tests whether `x` is an element of `xs`. | +| `xs(x)` |Same as `xs contains x`. | +| `xs subsetOf ys` |Tests whether `xs` is a subset of `ys`. | +| **Addition:** | | +| `xs concat ys`
or `xs ++ ys` |The set containing all elements of `xs` as well as all elements of `ys`.| +| **Removal:** | | +| `xs.empty` |An empty set of the same class as `xs`. | +| **Binary Operations:** | | +| `xs intersect ys`
or `xs & ys` |The set intersection of `xs` and `ys`. | +| `xs union ys`
or xs | ys |The set union of `xs` and `ys`. | +| `xs diff ys`
or `xs &~ ys` |The set difference of `xs` and `ys`. | + +Immutable sets offer methods to add or remove elements by returning new `Set`s, as summarized in below. + +### Operations in Class immutable.Set ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Additions:** | | +| `xs incl x`
or `xs + x` |The set containing all elements of `xs` as well as `x`.| +| **Removals:** | | +| `xs excl x`
or `xs - x` |The set containing all elements of `xs` except `x`.| +| `xs removedAll ys`
or `xs -- ys` |The set containing all elements of `xs` except the elements of `ys`.| + +Mutable sets offer in addition methods to add, remove, or update elements, which are summarized in below. + +### Operations in Class mutable.Set ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Additions:** | | +| `xs addOne x`
or `xs += x` |Adds element `x` to set `xs` as a side effect and returns `xs` itself.| +| `xs addAll ys`
or `xs ++= ys` |Adds all elements in `ys` to set `xs` as a side effect and returns `xs` itself.| +| `xs add x` |Adds element `x` to `xs` and returns `true` if `x` was not previously contained in the set, `false` if it was.| +| **Removals:** | | +| `xs subtractOne x`
or `xs -= x` |Removes element `x` from set `xs` as a side effect and returns `xs` itself.| +| `xs subtractAll ys`
or `xs --= ys` |Removes all elements in `ys` from set `xs` as a side effect and returns `xs` itself.| +| `xs remove x` |Removes element `x` from `xs` and returns `true` if `x` was previously contained in the set, `false` if it was not.| +| `xs filterInPlace p` |Keeps only those elements in `xs` that satisfy predicate `p`.| +| `xs.clear()` |Removes all elements from `xs`.| +| **Update:** | | +| `xs(x) = b` |(or, written out, `xs.update(x, b)`). If boolean argument `b` is `true`, adds `x` to `xs`, otherwise removes `x` from `xs`.| +| **Cloning:** | | +| `xs.clone` |A new mutable set with the same elements as `xs`.| + +The operation `s += elem` adds `elem` to the set `s` as a side effect, and returns the mutated set as a result. Likewise, `s -= elem` removes `elem` from the set, and returns the mutated set as a result. Besides `+=` and `-=` there are also the bulk operations `++=` and `--=` which add or remove all elements of an iterable or an iterator. + +The choice of the method names `+=` and `-=` means that very similar code can work with either mutable or immutable sets. Consider first the following REPL dialogue which uses an immutable set `s`: + + scala> var s = Set(1, 2, 3) + s: scala.collection.immutable.Set[Int] = Set(1, 2, 3) + scala> s += 4 + scala> s -= 2 + scala> s + res2: scala.collection.immutable.Set[Int] = Set(1, 3, 4) + +We used `+=` and `-=` on a `var` of type `immutable.Set`. A statement such as `s += 4` is an abbreviation for `s = s + 4`. So this invokes the addition method `+` on the set `s` and then assigns the result back to the `s` variable. Consider now an analogous interaction with a mutable set. + + + scala> val s = collection.mutable.Set(1, 2, 3) + s: scala.collection.mutable.Set[Int] = Set(1, 2, 3) + scala> s += 4 + res3: s.type = Set(1, 4, 2, 3) + scala> s -= 2 + res4: s.type = Set(1, 4, 3) + +The end effect is very similar to the previous interaction; we start with a `Set(1, 2, 3)` and end up with a `Set(1, 3, 4)`. However, even though the statements look the same as before, they do something different. `s += 4` now invokes the `+=` method on the mutable set value `s`, changing the set in place. Likewise, `s -= 2` now invokes the `-=` method on the same set. + +Comparing the two interactions shows an important principle. You often can replace a mutable collection stored in a `val` by an immutable collection stored in a `var`, and _vice versa_. This works at least as long as there are no alias references to the collection through which one can observe whether it was updated in place or whether a new collection was created. + +Mutable sets also provide add and remove as variants of `+=` and `-=`. The difference is that `add` and `remove` return a Boolean result indicating whether the operation had an effect on the set. + +The current default implementation of a mutable set uses a hashtable to store the set's elements. The default implementation of an immutable set uses a representation that adapts to the number of elements of the set. An empty set is represented by just a singleton object. Sets of sizes up to four are represented by a single object that stores all elements as fields. Beyond that size, immutable sets are implemented as [Compressed Hash-Array Mapped Prefix-tree](concrete-immutable-collection-classes.html). + +A consequence of these representation choices is that, for sets of small sizes (say up to 4), immutable sets are usually more compact and also more efficient than mutable sets. So, if you expect the size of a set to be small, try making it immutable. + +Two subtraits of sets are `SortedSet` and `BitSet`. + +### Sorted Sets ### + +A [SortedSet](http://www.scala-lang.org/api/current/scala/collection/SortedSet.html) is a set that produces its elements (using `iterator` or `foreach`) in a given ordering (which can be freely chosen at the time the set is created). The default representation of a [SortedSet](http://www.scala-lang.org/api/current/scala/collection/SortedSet.html) is an ordered binary tree which maintains the invariant that all elements in the left subtree of a node are smaller than all elements in the right subtree. That way, a simple in order traversal can return all tree elements in increasing order. Scala's class [immutable.TreeSet](http://www.scala-lang.org/api/current/scala/collection/immutable/TreeSet.html) uses a _red-black_ tree implementation to maintain this ordering invariant and at the same time keep the tree _balanced_-- meaning that all paths from the root of the tree to a leaf have lengths that differ only by at most one element. + +To create an empty [TreeSet](http://www.scala-lang.org/api/current/scala/collection/immutable/TreeSet.html), you could first specify the desired ordering: + + scala> val myOrdering = Ordering.fromLessThan[String](_ > _) + myOrdering: scala.math.Ordering[String] = ... + +Then, to create an empty tree set with that ordering, use: + + scala> TreeSet.empty(myOrdering) + res1: scala.collection.immutable.TreeSet[String] = TreeSet() + +Or you can leave out the ordering argument but give an element type or the empty set. In that case, the default ordering on the element type will be used. + + scala> TreeSet.empty[String] + res2: scala.collection.immutable.TreeSet[String] = TreeSet() + +If you create new sets from a tree-set (for instance by concatenation or filtering) they will keep the same ordering as the original set. For instance, + + scala> res2 + "one" + "two" + "three" + "four" + res3: scala.collection.immutable.TreeSet[String] = TreeSet(four, one, three, two) + +Sorted sets also support ranges of elements. For instance, the `range` method returns all elements from a starting element up to, but excluding, an end element. Or, the `from` method returns all elements greater or equal than a starting element in the set's ordering. The result of calls to both methods is again a sorted set. Examples: + + scala> res3.range("one", "two") + res4: scala.collection.immutable.TreeSet[String] = TreeSet(one, three) + scala> res3 rangeFrom "three" + res5: scala.collection.immutable.TreeSet[String] = TreeSet(three, two) + + +### Bitsets ### + +Bitsets are sets of non-negative integer elements that are implemented in one or more words of packed bits. The internal representation of a [BitSet](http://www.scala-lang.org/api/current/scala/collection/BitSet.html) uses an array of `Long`s. The first `Long` covers elements from 0 to 63, the second from 64 to 127, and so on (Immutable bitsets of elements in the range of 0 to 127 optimize the array away and store the bits directly in a one or two `Long` fields.) For every `Long`, each of its 64 bits is set to 1 if the corresponding element is contained in the set, and is unset otherwise. It follows that the size of a bitset depends on the largest integer that's stored in it. If `N` is that largest integer, then the size of the set is `N/64` `Long` words, or `N/8` bytes, plus a small number of extra bytes for status information. + +Bitsets are hence more compact than other sets if they contain many small elements. Another advantage of bitsets is that operations such as membership test with `contains`, or element addition and removal with `+=` and `-=` are all extremely efficient. diff --git a/_overviews/collections-2.13/strings.md b/_overviews/collections-2.13/strings.md new file mode 100644 index 0000000000..1b84877b51 --- /dev/null +++ b/_overviews/collections-2.13/strings.md @@ -0,0 +1,30 @@ +--- +layout: multipage-overview +title: Strings + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 11 + +permalink: /overviews/collections-2.13/:title.html +--- + +Like arrays, strings are not directly sequences, but they can be converted to them, and they also support all sequence operations on strings. Here are some examples of operations you can invoke on strings. + + scala> val str = "hello" + str: java.lang.String = hello + scala> str.reverse + res6: String = olleh + scala> str.map(_.toUpper) + res7: String = HELLO + scala> str drop 3 + res8: String = lo + scala> str.slice(1, 4) + res9: String = ell + scala> val s: Seq[Char] = str + s: Seq[Char] = hello + +These operations are supported by two implicit conversions. The first, low-priority conversion maps a `String` to a `WrappedString`, which is a subclass of `immutable.IndexedSeq`, This conversion got applied in the last line above where a string got converted into a Seq. The other, high-priority conversion maps a string to a `StringOps` object, which adds all methods on immutable sequences to strings. This conversion was implicitly inserted in the method calls of `reverse`, `map`, `drop`, and `slice` in the example above. diff --git a/_overviews/collections-2.13/trait-iterable.md b/_overviews/collections-2.13/trait-iterable.md new file mode 100644 index 0000000000..3ae2ecfb99 --- /dev/null +++ b/_overviews/collections-2.13/trait-iterable.md @@ -0,0 +1,148 @@ +--- +layout: multipage-overview +title: Trait Iterable + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 4 + +permalink: /overviews/collections-2.13/:title.html +--- + +At the top of the collection hierarchy is trait `Iterable`. All methods in this trait are defined in terms of an abstract method, `iterator`, which yields the collection's elements one by one. + + def iterator: Iterator[A] + +Collection classes that implement `Iterable` just need to define this method; all other methods can be inherited from `Iterable`. + +`Iterable` also defines many concrete methods, which are all listed in the following table. These methods fall into the following categories: + +* **Addition**, `concat`, which appends two collections together, or appends all elements of an iterator to a collection. +* **Map** operations `map`, `flatMap`, and `collect`, which produce a new collection by applying some function to collection elements. +* **Conversions** `toArray`, `toList`, `toIterable`, `toSeq`, `toIndexedSeq`, `toStream`, `toSet`, `toMap`, which turn an `Iterable` collection into something more specific. All these conversions return their receiver argument unchanged if the run-time type of the collection already matches the demanded collection type. For instance, applying `toList` to a list will yield the list itself. +* **Copying operations** `copyToArray`. As its name implies, this copies collection elements to an array. +* **Size info** operations `isEmpty`, `nonEmpty`, `size`, `knownSize`, `sizeIs`. The number of elements of a collections can require a traversal in some cases (e.g. `List`). In other cases the collection can have an infinite number of elements (e.g. `LazyList.from(1)`). +* **Element retrieval** operations `head`, `last`, `headOption`, `lastOption`, and `find`. These select the first or last element of a collection, or else the first element matching a condition. Note, however, that not all collections have a well-defined meaning of what "first" and "last" means. For instance, a hash set might store elements according to their hash keys, which might change from run to run. In that case, the "first" element of a hash set could also be different for every run of a program. A collection is _ordered_ if it always yields its elements in the same order. Most collections are ordered, but some (_e.g._ hash sets) are not-- dropping the ordering gives a little bit of extra efficiency. Ordering is often essential to give reproducible tests and to help in debugging. That's why Scala collections give ordered alternatives for all collection types. For instance, the ordered alternative for `HashSet` is `LinkedHashSet`. +* **Sub-collection retrieval operations** `tail`, `init`, `slice`, `take`, `drop`, `takeWhile`, `dropWhile`, `filter`, `filterNot`, `withFilter`. These all return some sub-collection identified by an index range or some predicate. +* **Subdivision operations** `splitAt`, `span`, `partition`, `partitionMap`, `groupBy`, `groupMap`, `groupMapReduce`, which split the elements of this collection into several sub-collections. +* **Element tests** `exists`, `forall`, `count` which test collection elements with a given predicate. +* **Folds** `foldLeft`, `foldRight`, `reduceLeft`, `reduceRight` which apply a binary operation to successive elements. +* **Specific folds** `sum`, `product`, `min`, `max`, which work on collections of specific types (numeric or comparable). +* **String** operations `mkString`, `addString`, `className`, which give alternative ways of converting a collection to a string. +* **View** operation: A view is a collection that's evaluated lazily. You'll learn more about views in [later](views.html). + +Two more methods exist in `Iterable` that return iterators: `grouped` and `sliding`. These iterators, however, do not return single elements but whole subsequences of elements of the original collection. The maximal size of these subsequences is given as an argument to these methods. The `grouped` method returns its elements in "chunked" increments, where `sliding` yields a sliding "window" over the elements. The difference between the two should become clear by looking at the following REPL interaction: + + scala> val xs = List(1, 2, 3, 4, 5) + xs: List[Int] = List(1, 2, 3, 4, 5) + scala> val git = xs grouped 3 + git: Iterator[List[Int]] = non-empty iterator + scala> git.next() + res3: List[Int] = List(1, 2, 3) + scala> git.next() + res4: List[Int] = List(4, 5) + scala> val sit = xs sliding 3 + sit: Iterator[List[Int]] = non-empty iterator + scala> sit.next() + res5: List[Int] = List(1, 2, 3) + scala> sit.next() + res6: List[Int] = List(2, 3, 4) + scala> sit.next() + res7: List[Int] = List(3, 4, 5) + +### Operations in Class Iterable ### + +| WHAT IT IS | WHAT IT DOES | +| ------ | ------ | +| **Abstract Method:** | | +| `xs.iterator` |An `iterator` that yields every element in `xs`.| +| **Other Iterators:** | | +| `xs foreach f` |Executes function `f` for every element of `xs`.| +| `xs grouped size` |An iterator that yields fixed-sized "chunks" of this collection.| +| `xs sliding size` |An iterator that yields a sliding fixed-sized window of elements in this collection.| +| **Addition:** | | +| `xs concat ys`
(or `xs ++ ys`) |A collection consisting of the elements of both `xs` and `ys`. `ys` is a [IterableOnce](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/IterableOnce.html) collection, i.e., either an [Iterable](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterable.html) or an [Iterator](http://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Iterator.html).| +| **Maps:** | | +| `xs map f` |The collection obtained from applying the function f to every element in `xs`.| +| `xs flatMap f` |The collection obtained from applying the collection-valued function `f` to every element in `xs` and concatenating the results.| +| `xs collect f` |The collection obtained from applying the partial function `f` to every element in `xs` for which it is defined and collecting the results.| +| **Conversions:** | | +| `xs.toArray` |Converts the collection to an array. | +| `xs.toList` |Converts the collection to a list. | +| `xs.toIterable` |Converts the collection to an iterable. | +| `xs.toSeq` |Converts the collection to a sequence. | +| `xs.toIndexedSeq` |Converts the collection to an indexed sequence. | +| `xs.toSet` |Converts the collection to a set. | +| `xs.toMap` |Converts the collection of key/value pairs to a map. If the collection does not have pairs as elements, calling this operation results in a static type error.| +| `xs.to(SortedSet)` | Generic conversion operation that takes a collection factory as parameter. | +| **Copying:** | | +| `xs copyToArray(arr, s, n)`|Copies at most `n` elements of the collection to array `arr` starting at index `s`. The last two arguments are optional.| +| **Size info:** | | +| `xs.isEmpty` |Tests whether the collection is empty. | +| `xs.nonEmpty` |Tests whether the collection contains elements. | +| `xs.size` |The number of elements in the collection. | +| `xs.knownSize` |The number of elements, if this one takes constant time to compute, otherwise `-1`. | +| `xs.sizeCompare(ys)` |Returns a negative value if `xs` is shorter than the `ys` collection, a positive value if it is longer, and `0` if they have the same size. Works even if the collection is infinite, for example `LazyList.from(1) sizeCompare List(1, 2)` returns a positive value. | +| `xs.sizeCompare(n)` |Returns a negative value if `xs` is shorter than `n`, a positive value if it is longer, and `0` if it is of size `n`. Works even if the collection is infinite, for example `LazyList.from(1) sizeCompare 42` returns a positive value. | +| `xs.sizeIs < 42`, `xs.sizeIs != 42`, etc. |Provides a more convenient syntax for `xs.sizeCompare(42) < 0`, `xs.sizeCompare(42) != 0`, etc., respectively.| +| **Element Retrieval:** | | +| `xs.head` |The first element of the collection (or, some element, if no order is defined).| +| `xs.headOption` |The first element of `xs` in an option value, or None if `xs` is empty.| +| `xs.last` |The last element of the collection (or, some element, if no order is defined).| +| `xs.lastOption` |The last element of `xs` in an option value, or None if `xs` is empty.| +| `xs find p` |An option containing the first element in `xs` that satisfies `p`, or `None` if no element qualifies.| +| **Subcollections:** | | +| `xs.tail` |The rest of the collection except `xs.head`. | +| `xs.init` |The rest of the collection except `xs.last`. | +| `xs.slice(from, to)` |A collection consisting of elements in some index range of `xs` (from `from` up to, and excluding `to`).| +| `xs take n` |A collection consisting of the first `n` elements of `xs` (or, some arbitrary `n` elements, if no order is defined).| +| `xs drop n` |The rest of the collection except `xs take n`.| +| `xs takeWhile p` |The longest prefix of elements in the collection that all satisfy `p`.| +| `xs dropWhile p` |The collection without the longest prefix of elements that all satisfy `p`.| +| `xs takeRight n` |A collection consisting of the last `n` elements of `xs` (or, some arbitrary `n` elements, if no order is defined).| +| `xs dropRight n` |The rest of the collection except `xs takeRight n`.| +| `xs filter p` |The collection consisting of those elements of xs that satisfy the predicate `p`.| +| `xs withFilter p` |A non-strict filter of this collection. Subsequent calls to `map`, `flatMap`, `foreach`, and `withFilter` will only apply to those elements of `xs` for which the condition `p` is true.| +| `xs filterNot p` |The collection consisting of those elements of `xs` that do not satisfy the predicate `p`.| +| **Subdivisions:** | | +| `xs splitAt n` |Split `xs` at a position, giving the pair of collections `(xs take n, xs drop n)`.| +| `xs span p` |Split `xs` according to a predicate, giving the pair of collections `(xs takeWhile p, xs.dropWhile p)`.| +| `xs partition p` |Split `xs` into a pair of collections; one with elements that satisfy the predicate `p`, the other with elements that do not, giving the pair of collections `(xs filter p, xs.filterNot p)`| +| `xs groupBy f` |Partition `xs` into a map of collections according to a discriminator function `f`.| +| `xs.groupMap(f)(g)`|Partition `xs` into a map of collections according to a discriminator function `f`, and applies the transformation function `g` to each element in a group.| +| `xs.groupMapReduce(f)(g)(h)`|Partition `xs` according to a discriminator function `f`, and then combine the results of applying the function `g` to each element in a group using the `h` function.| +| **Element Conditions:** | | +| `xs forall p` |A boolean indicating whether the predicate `p` holds for all elements of `xs`.| +| `xs exists p` |A boolean indicating whether the predicate `p` holds for some element in `xs`.| +| `xs count p` |The number of elements in `xs` that satisfy the predicate `p`.| +| **Folds:** | | +| `xs.foldLeft(z)(op)` |Apply binary operation `op` between successive elements of `xs`, going left to right and starting with `z`.| +| `xs.foldRight(z)(op)` |Apply binary operation `op` between successive elements of `xs`, going right to left and ending with `z`.| +| `xs reduceLeft op` |Apply binary operation `op` between successive elements of non-empty collection `xs`, going left to right.| +| `xs reduceRight op` |Apply binary operation `op` between successive elements of non-empty collection `xs`, going right to left.| +| **Specific Folds:** | | +| `xs.sum` |The sum of the numeric element values of collection `xs`.| +| `xs.product` |The product of the numeric element values of collection `xs`.| +| `xs.min` |The minimum of the ordered element values of collection `xs`.| +| `xs.max` |The maximum of the ordered element values of collection `xs`.| +| `xs.minOption` |Like `min` but returns `None` if `xs` is empty.| +| `xs.maxOption` |Like `max` but returns `None` if `xs` is empty.| +| **Strings:** | | +| `xs.addString(b, start, sep, end)`|Adds a string to `StringBuilder` `b` that shows all elements of `xs` between separators `sep` enclosed in strings `start` and `end`. `start`, `sep`, `end` are all optional.| +| `xs.mkString(start, sep, end)`|Converts the collection to a string that shows all elements of `xs` between separators `sep` enclosed in strings `start` and `end`. `start`, `sep`, `end` are all optional.| +| `xs.stringPrefix` |The collection name at the beginning of the string returned from `xs.toString`.| +| **Zippers:** | | +| `xs zip ys` |A collection of pairs of corresponding elements from `xs` and `ys`.| +| `xs.zipAll(ys, x, y)` |A collection of pairs of corresponding elements from `xs` and `ys`, where the shorter sequence is extended to match the longer one by appending elements `x` or `y`.| +| `xs.zipWithIndex` |An collection of pairs of elements from `xs` with their indices.| +| **Views:** | | +| `xs.view` |Produces a view over `xs`.| + +In the inheritance hierarchy below `Iterable` you find three traits: [Seq](https://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Seq.html), [Set](https://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Set.html), and [Map](https://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/Map.html). `Seq` and `Map` implement the [PartialFunction](https://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/PartialFunction.html) trait with its `apply` and `isDefinedAt` methods, each implemented differently. `Set` gets its `apply` method from [SetOps](https://www.scala-lang.org/api/{{ site.scala-213-version }}/scala/collection/SetOps.html). + +For sequences, `apply` is positional indexing, where elements are always numbered from `0`. That is, `Seq(1, 2, 3)(1)` gives `2`. For sets, `apply` is a membership test. For instance, `Set('a', 'b', 'c')('b')` gives `true` whereas `Set()('a')` gives `false`. Finally for maps, `apply` is a selection. For instance, `Map('a' -> 1, 'b' -> 10, 'c' -> 100)('b')` gives `10`. + +In the following, we will explain each of the three kinds of collections in more detail. diff --git a/_overviews/collections-2.13/views.md b/_overviews/collections-2.13/views.md new file mode 100644 index 0000000000..a9ec4c3d25 --- /dev/null +++ b/_overviews/collections-2.13/views.md @@ -0,0 +1,116 @@ +--- +layout: multipage-overview +title: Views + +discourse: true + +partof: collections-213 +overview-name: Collections + +num: 14 + +permalink: /overviews/collections-2.13/:title.html +--- + +Collections have quite a few methods that construct new collections. Examples are `map`, `filter` or `++`. We call such methods *transformers* because they take at least one collection as their receiver object and produce another collection as their result. + +There are two principal ways to implement transformers. One is _strict_, that is a new collection with all its elements is constructed as a result of the transformer. The other is non-strict or _lazy_, that is one constructs only a proxy for the result collection, and its elements get constructed only as one demands them. + +As an example of a non-strict transformer consider the following implementation of a lazy map operation: + + def lazyMap[T, U](coll: Iterable[T], f: T => U) = new Iterable[U] { + def iterator = coll.iterator map f + } + +Note that `lazyMap` constructs a new `Iterable` without stepping through all elements of the given collection `coll`. The given function `f` is instead applied to the elements of the new collection's `iterator` as they are demanded. + +Scala collections are by default strict in all their transformers, except for `LazyList`, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and _vice versa_, which is based on collection views. A _view_ is a special kind of collection that represents some base collection, but implements all transformers lazily. + +To go from a collection to its view, you can use the `view` method on the collection. If `xs` is some collection, then `xs.view` is the same collection, but with all transformers implemented lazily. To get back from a view to a strict collection, you can use the `to` conversion operation with a strict collection factory as parameter (e.g. `xs.view.to(List)`). + +Let's see an example. Say you have a vector of Ints over which you want to map two functions in succession: + + scala> val v = Vector(1 to 10: _*) + v: scala.collection.immutable.Vector[Int] = + Vector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) + scala> v map (_ + 1) map (_ * 2) + res5: scala.collection.immutable.Vector[Int] = + Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) + +In the last statement, the expression `v map (_ + 1)` constructs a new vector which is then transformed into a third vector by the second call to `map (_ * 2)`. In many situations, constructing the intermediate result from the first call to map is a bit wasteful. In the example above, it would be faster to do a single map with the composition of the two functions `(_ + 1)` and `(_ * 2)`. If you have the two functions available in the same place you can do this by hand. But quite often, successive transformations of a data structure are done in different program modules. Fusing those transformations would then undermine modularity. A more general way to avoid the intermediate results is by turning the vector first into a view, then applying all transformations to the view, and finally forcing the view to a vector: + + scala> (v.view map (_ + 1) map (_ * 2)).to(Vector) + res12: scala.collection.immutable.Vector[Int] = + Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) + +Let's do this sequence of operations again, one by one: + + scala> val vv = v.view + vv: scala.collection.IndexedSeqView[Int] = View(?) + +The application `v.view` gives you an `IndexedSeqView[Int]`, i.e. a lazily evaluated `IndexedSeq[Int]`. Like with `LazyList`, +the `toString` operation of views does not force the view elements, that’s why the content of `vv` is shown as `View(?)`. + +Applying the first `map` to the view gives: + + scala> vv map (_ + 1) + res13: scala.collection.IndexedSeqView[Int] = View(?) + +The result of the `map` is another `IndexedSeqView[Int]` value. This is in essence a wrapper that *records* the fact that a `map` with function `(_ + 1)` needs to be applied on the vector `v`. It does not apply that map until the view is forced, however. Let's now apply the second `map` to the last result. + + scala> res13 map (_ * 2) + res14: scala.collection.IndexedSeqView[Int] = View(?) + +Finally, forcing the last result gives: + + scala> res14.to(Vector) + res15: scala.collection.immutable.Vector[Int] = + Vector(4, 6, 8, 10, 12, 14, 16, 18, 20, 22) + +Both stored functions get applied as part of the execution of the `to` operation and a new vector is constructed. That way, no intermediate data structure is needed. + +In general, transformation operations applied to views never build a new data structure, and accessing the elements of a view +effectively traverses as few elements as possible of the underlying data structure. Therefore, views have the following +properties: (1) transformers have a `O(1)` complexity, and (2) element access operations have the same +complexity of the underlying data structure (for instance, indexed access on an `IndexedSeqView` is constant, otherwise +it is linear). + +There are a few exceptions to these rules, though. For instance, the `sorted` operation can not satisfy both +properties. Indeed, the whole underlying collection has to be traversed in order to find its minimum element. On one +hand, if that traversal happened at the time `sorted` was called, then the first property would be violated (`sorted` +would not be lazy on views), on the other hand, if that traversal happened at the time the resulting view elements were +accessed, then the second property would be violated. For such operations, we decided to violate the first property. +These operations are documented as “always forcing the collection elements”. + +The main reason for using views is performance. You have seen that by switching a collection to a view the construction of intermediate results can be avoided. These savings can be quite important. As another example, consider the problem of finding the first palindrome in a list of words. A palindrome is a word which reads backwards the same as forwards. Here are the necessary definitions: + + def isPalindrome(x: String) = x == x.reverse + def findPalindrome(s: Seq[String]) = s find isPalindrome + +Now, assume you have a very long sequence words and you want to find a palindrome in the first million words of that sequence. Can you re-use the definition of `findPalindrome`? Of course, you could write: + + findPalindrome(words take 1000000) + +This nicely separates the two aspects of taking the first million words of a sequence and finding a palindrome in it. But the downside is that it always constructs an intermediary sequence consisting of one million words, even if the first word of that sequence is already a palindrome. So potentially, 999'999 words are copied into the intermediary result without being inspected at all afterwards. Many programmers would give up here and write their own specialized version of finding palindromes in some given prefix of an argument sequence. But with views, you don't have to. Simply write: + + findPalindrome(words.view take 1000000) + +This has the same nice separation of concerns, but instead of a sequence of a million elements it will only construct a single lightweight view object. This way, you do not need to choose between performance and modularity. + +After having seen all these nifty uses of views you might wonder why have strict collections at all? One reason is that performance comparisons do not always favor lazy over strict collections. For smaller collection sizes the added overhead of forming and applying closures in views is often greater than the gain from avoiding the intermediary data structures. A probably more important reason is that evaluation in views can be very confusing if the delayed operations have side effects. + +Here's an example which bit a few users of versions of Scala before 2.8. In these versions the `Range` type was lazy, so it behaved in effect like a view. People were trying to create a number of actors like this: + + val actors = for (i <- 1 to 10) yield actor { ... } + +They were surprised that none of the actors was executing afterwards, even though the actor method should create and start an actor from the code that's enclosed in the braces following it. To explain why nothing happened, remember that the for expression above is equivalent to an application of map: + + val actors = (1 to 10) map (i => actor { ... }) + +Since previously the range produced by `(1 to 10)` behaved like a view, the result of the map was again a view. That is, no element was computed, and, consequently, no actor was created! Actors would have been created by forcing the range of the whole expression, but it's far from obvious that this is what was required to make the actors do their work. + +To avoid surprises like this, the current Scala collections library has more regular rules. All collections except lazy lists and views are strict. The only way to go from a strict to a lazy collection is via the `view` method. The only way to go back is via `to`. So the `actors` definition above would now behave as expected in that it would create and start 10 actors. To get back the surprising previous behavior, you'd have to add an explicit `view` method call: + + val actors = for (i <- (1 to 10).view) yield actor { ... } + +In summary, views are a powerful tool to reconcile concerns of efficiency with concerns of modularity. But in order not to be entangled in aspects of delayed evaluation, you should restrict views to purely functional code where collection transformations do not have side effects. What's best avoided is a mixture of views and operations that create new collections while also having side effects. diff --git a/resources/images/tour/Makefile b/resources/images/tour/Makefile index ac6f9a458e..6782bb92d9 100644 --- a/resources/images/tour/Makefile +++ b/resources/images/tour/Makefile @@ -10,9 +10,12 @@ type-casting: dot -Tsvg $@-diagram.dot -o $@-diagram.svg collections: dot -Tsvg $@-diagram.dot -o $@-diagram.svg + dot -Tsvg $@-diagram-213.dot -o $@-diagram-213.svg collections-immutable: dot -Tsvg $@-diagram.dot -o $@-diagram.svg + dot -Tsvg $@-diagram-213.dot -o $@-diagram-213.svg collections-mutable: dot -Tsvg $@-diagram.dot -o $@-diagram.svg + dot -Tsvg $@-diagram-213.dot -o $@-diagram-213.svg collections-legend: dot -Tsvg $@-diagram.dot -o $@-diagram.svg diff --git a/resources/images/tour/collections-diagram-213.dot b/resources/images/tour/collections-diagram-213.dot new file mode 100644 index 0000000000..e26137b51f --- /dev/null +++ b/resources/images/tour/collections-diagram-213.dot @@ -0,0 +1,21 @@ +digraph Collections { + edge [ + color="#7F7F7F" + ]; + node [ + shape="box", + style="rounded, filled", + fontcolor="#FFFFFF", + color="#6DC6E6" + ]; + rankdir="TB"; + + Iterable -> Seq; + Iterable -> Set; + Iterable -> Map; + Seq -> IndexedSeq; + Seq -> LinearSeq; + Set -> SortedSet; + SortedSet -> BitSet; + Map -> SortedMap; +} diff --git a/resources/images/tour/collections-diagram-213.svg b/resources/images/tour/collections-diagram-213.svg new file mode 100644 index 0000000000..b12b436482 --- /dev/null +++ b/resources/images/tour/collections-diagram-213.svg @@ -0,0 +1,98 @@ + + + + + + +Collections + + +Iterable + +Iterable + + +Seq + +Seq + + +Iterable->Seq + + + + +Set + +Set + + +Iterable->Set + + + + +Map + +Map + + +Iterable->Map + + + + +IndexedSeq + +IndexedSeq + + +Seq->IndexedSeq + + + + +LinearSeq + +LinearSeq + + +Seq->LinearSeq + + + + +SortedSet + +SortedSet + + +Set->SortedSet + + + + +SortedMap + +SortedMap + + +Map->SortedMap + + + + +BitSet + +BitSet + + +SortedSet->BitSet + + + + + diff --git a/resources/images/tour/collections-diagram.svg b/resources/images/tour/collections-diagram.svg index aa2036aeb4..4ecb4da875 100644 --- a/resources/images/tour/collections-diagram.svg +++ b/resources/images/tour/collections-diagram.svg @@ -4,105 +4,105 @@ - + Collections - + Traversable - -Traversable + +Traversable Iterable - -Iterable + +Iterable Traversable->Iterable - - + + Seq - -Seq + +Seq Iterable->Seq - - + + Set - -Set + +Set Iterable->Set - - + + Map - -Map + +Map Iterable->Map - - + + IndexedSeq - -IndexedSeq + +IndexedSeq Seq->IndexedSeq - - + + LinearSeq - -LinearSeq + +LinearSeq Seq->LinearSeq - - + + SortedSet - -SortedSet + +SortedSet Set->SortedSet - - + + SortedMap - -SortedMap + +SortedMap Map->SortedMap - - + + BitSet - -BitSet + +BitSet SortedSet->BitSet - - + + diff --git a/resources/images/tour/collections-immutable-diagram-213.dot b/resources/images/tour/collections-immutable-diagram-213.dot new file mode 100644 index 0000000000..59258f2a9b --- /dev/null +++ b/resources/images/tour/collections-immutable-diagram-213.dot @@ -0,0 +1,55 @@ +digraph ImmutableCollections { + edge [ + color="#7F7F7F" + ]; + node [ + shape="box", + style="rounded, filled", + fontcolor="#FFFFFF", + color="#6DC6E6" + ]; + rankdir="TB"; + + HashSet [color="#4A5659"]; + TreeSet [color="#4A5659"]; + ListSet [color="#4A5659"]; + HashMap [color="#4A5659"]; + TreeMap [color="#4A5659"]; + ListMap [color="#4A5659"]; + VectorMap [color="#4A5659"]; + Vector [color="#4A5659"]; + ArraySeq [color="#4A5659"]; + NumericRange [color="#4A5659"]; + String [color="#4A5659"]; + Range [color="#4A5659"]; + List [color="#4A5659"]; + LazyList [color="#4A5659"]; + Queue [color="#4A5659"]; + + Iterable -> Set; + Iterable -> Seq [penwidth="4"]; + Iterable -> Map; + Set -> SortedSet; + Set -> HashSet [penwidth="4"]; + Set -> ListSet; + SortedSet -> BitSet; + SortedSet -> TreeSet; + Seq -> IndexedSeq; + Seq -> LinearSeq [penwidth="4"]; + IndexedSeq -> Vector [penwidth="4"]; + IndexedSeq -> ArraySeq; + IndexedSeq -> NumericRange; + IndexedSeq -> Range; + IndexedSeq -> String [style="dashed"]; + LinearSeq -> List [penwidth="4"]; + LinearSeq -> LazyList; + LinearSeq -> Queue; + Map -> HashMap [penwidth="4"]; + Map -> SortedMap; + Map -> SeqMap; + SortedMap -> TreeMap; + SeqMap -> ListMap; + SeqMap -> VectorMap; + + {rank=same; Seq; TreeMap} +} diff --git a/resources/images/tour/collections-immutable-diagram-213.svg b/resources/images/tour/collections-immutable-diagram-213.svg new file mode 100644 index 0000000000..4902e1a656 --- /dev/null +++ b/resources/images/tour/collections-immutable-diagram-213.svg @@ -0,0 +1,258 @@ + + + + + + +ImmutableCollections + + +HashSet + +HashSet + + +TreeSet + +TreeSet + + +ListSet + +ListSet + + +HashMap + +HashMap + + +TreeMap + +TreeMap + + +ListMap + +ListMap + + +VectorMap + +VectorMap + + +Vector + +Vector + + +ArraySeq + +ArraySeq + + +NumericRange + +NumericRange + + +String + +String + + +Range + +Range + + +List + +List + + +LazyList + +LazyList + + +Queue + +Queue + + +Iterable + +Iterable + + +Set + +Set + + +Iterable->Set + + + + +Seq + +Seq + + +Iterable->Seq + + + + +Map + +Map + + +Iterable->Map + + + + +Set->HashSet + + + + +Set->ListSet + + + + +SortedSet + +SortedSet + + +Set->SortedSet + + + + +IndexedSeq + +IndexedSeq + + +Seq->IndexedSeq + + + + +LinearSeq + +LinearSeq + + +Seq->LinearSeq + + + + +Map->HashMap + + + + +SortedMap + +SortedMap + + +Map->SortedMap + + + + +SeqMap + +SeqMap + + +Map->SeqMap + + + + +SortedSet->TreeSet + + + + +BitSet + +BitSet + + +SortedSet->BitSet + + + + +IndexedSeq->Vector + + + + +IndexedSeq->ArraySeq + + + + +IndexedSeq->NumericRange + + + + +IndexedSeq->String + + + + +IndexedSeq->Range + + + + +LinearSeq->List + + + + +LinearSeq->LazyList + + + + +LinearSeq->Queue + + + + +SortedMap->TreeMap + + + + +SeqMap->ListMap + + + + +SeqMap->VectorMap + + + + + diff --git a/resources/images/tour/collections-immutable-diagram.svg b/resources/images/tour/collections-immutable-diagram.svg index 30e732690e..bb14323280 100644 --- a/resources/images/tour/collections-immutable-diagram.svg +++ b/resources/images/tour/collections-immutable-diagram.svg @@ -4,245 +4,245 @@ - + ImmutableCollections - + HashSet - -HashSet + +HashSet TreeSet - -TreeSet + +TreeSet ListSet - -ListSet + +ListSet HashMap - -HashMap + +HashMap TreeMap - -TreeMap + +TreeMap ListMap - -ListMap + +ListMap Vector - -Vector + +Vector NumericRange - -NumericRange + +NumericRange String - -String + +String Range - -Range + +Range List - -List + +List Stack - -Stack + +Stack Stream - -Stream + +Stream Queue - -Queue + +Queue Traversable - -Traversable + +Traversable Iterable - -Iterable + +Iterable Traversable->Iterable - - + + Set - -Set + +Set Iterable->Set - - + + Seq - -Seq + +Seq Iterable->Seq - - + + Map - -Map + +Map Iterable->Map - - + + Set->HashSet - - + + Set->ListSet - - + + SortedSet - -SortedSet + +SortedSet Set->SortedSet - - + + IndexedSeq - -IndexedSeq + +IndexedSeq Seq->IndexedSeq - - + + LinearSeq - -LinearSeq + +LinearSeq Seq->LinearSeq - - + + Map->HashMap - - + + Map->ListMap - - + + SortedMap - -SortedMap + +SortedMap Map->SortedMap - - + + SortedSet->TreeSet - - + + BitSet - -BitSet + +BitSet SortedSet->BitSet - - + + IndexedSeq->Vector - - + + IndexedSeq->NumericRange - - + + IndexedSeq->String - - + + IndexedSeq->Range - - + + LinearSeq->List - - + + LinearSeq->Stack - - + + LinearSeq->Stream - - + + LinearSeq->Queue - - + + SortedMap->TreeMap - - + + diff --git a/resources/images/tour/collections-mutable-diagram-213.dot b/resources/images/tour/collections-mutable-diagram-213.dot new file mode 100644 index 0000000000..e47d4ab1bf --- /dev/null +++ b/resources/images/tour/collections-mutable-diagram-213.dot @@ -0,0 +1,84 @@ +digraph MutableCollections { + edge [ + color="#7F7F7F" + ]; + node [ + shape="box", + style="rounded, filled", + fontcolor="#FFFFFF", + color="#6DC6E6" + ]; + rankdir="TB"; + + HashSet [color="#4A5659"]; + LinkedHashSet [color="#4A5659"]; + HashMap [color="#4A5659"]; + WeakHashMap [color="#4A5659"]; + LinkedHashMap [color="#4A5659"]; + ListMap [color="#4A5659"]; + TreeMap [color="#4A5659"]; + ArraySeq [color="#4A5659"]; + ArrayBuffer [color="#4A5659"]; + ArrayDeque [color="#4A5659"]; + StringBuilder [color="#4A5659"]; + ListBuffer [color="#4A5659"]; + Stack [color="#4A5659"]; + Queue [color="#4A5659"]; + PriorityQueue [color="#4A5659"]; + + Iterable -> Map; + Iterable -> Seq [penwidth="4"]; + Iterable -> Set; + Iterable -> PriorityQueue; + Map -> HashMap [penwidth="4"]; + Map -> WeakHashMap; + Map -> TreeMap; + Map -> ListMap; + Map -> MultiMap; + Map -> SeqMap; + SeqMap -> LinkedHashMap; + Set -> HashSet [penwidth="4"]; + Set -> LinkedHashSet; + Set -> SortedSet; + SortedSet -> BitSet; + Seq -> IndexedSeq [penwidth="4"]; + Seq -> Buffer; + ArrayDeque -> Stack; + IndexedSeq -> ArraySeq; + IndexedSeq -> StringBuilder; + IndexedSeq -> ArrayBuffer [penwidth="4"]; + IndexedSeq -> ArrayDeque; + Buffer -> ArrayBuffer [penwidth="4"]; + Buffer -> ArrayDeque; + Buffer -> ListBuffer; + ArrayDeque -> Queue; + + {rank=same; + Iterable; + PriorityQueue} + {rank=same; + Map; + Set} + {rank=same; + WeakHashMap; + LinkedHashMap; + ListMap; + BitSet; + LinkedHashSet; + Seq} + {rank=same; + HashMap; + MultiMap; + HashSet} + {rank=same; + IndexedSeq; + Buffer} + {rank=same; + ArraySeq; + ArrayBuffer} + {rank=same; + StringBuilder; + ListBuffer; + Queue; + Stack} +} diff --git a/resources/images/tour/collections-mutable-diagram-213.svg b/resources/images/tour/collections-mutable-diagram-213.svg new file mode 100644 index 0000000000..760646161e --- /dev/null +++ b/resources/images/tour/collections-mutable-diagram-213.svg @@ -0,0 +1,268 @@ + + + + + + +MutableCollections + + +HashSet + +HashSet + + +LinkedHashSet + +LinkedHashSet + + +HashMap + +HashMap + + +WeakHashMap + +WeakHashMap + + +LinkedHashMap + +LinkedHashMap + + +ListMap + +ListMap + + +TreeMap + +TreeMap + + +ArraySeq + +ArraySeq + + +ArrayBuffer + +ArrayBuffer + + +ArrayDeque + +ArrayDeque + + +Stack + +Stack + + +ArrayDeque->Stack + + + + +Queue + +Queue + + +ArrayDeque->Queue + + + + +StringBuilder + +StringBuilder + + +ListBuffer + +ListBuffer + + +PriorityQueue + +PriorityQueue + + +Iterable + +Iterable + + +Iterable->PriorityQueue + + + + +Map + +Map + + +Iterable->Map + + + + +Seq + +Seq + + +Iterable->Seq + + + + +Set + +Set + + +Iterable->Set + + + + +Map->HashMap + + + + +Map->WeakHashMap + + + + +Map->ListMap + + + + +Map->TreeMap + + + + +MultiMap + +MultiMap + + +Map->MultiMap + + + + +SeqMap + +SeqMap + + +Map->SeqMap + + + + +IndexedSeq + +IndexedSeq + + +Seq->IndexedSeq + + + + +Buffer + +Buffer + + +Seq->Buffer + + + + +Set->HashSet + + + + +Set->LinkedHashSet + + + + +SortedSet + +SortedSet + + +Set->SortedSet + + + + +SeqMap->LinkedHashMap + + + + +BitSet + +BitSet + + +SortedSet->BitSet + + + + +IndexedSeq->ArraySeq + + + + +IndexedSeq->ArrayBuffer + + + + +IndexedSeq->ArrayDeque + + + + +IndexedSeq->StringBuilder + + + + +Buffer->ArrayBuffer + + + + +Buffer->ArrayDeque + + + + +Buffer->ListBuffer + + + + + diff --git a/resources/images/tour/collections-mutable-diagram.svg b/resources/images/tour/collections-mutable-diagram.svg index a4e7ff4630..dee3618e45 100644 --- a/resources/images/tour/collections-mutable-diagram.svg +++ b/resources/images/tour/collections-mutable-diagram.svg @@ -4,420 +4,420 @@ - + MutableCollections - + HashSet - -HashSet + +HashSet ImmutableSetAdaptor - -ImmutableSetAdaptor + +ImmutableSetAdaptor LinkedHashSet - -LinkedHashSet + +LinkedHashSet HashMap - -HashMap + +HashMap OpenHashMap - -OpenHashMap + +OpenHashMap WeakHashMap - -WeakHashMap + +WeakHashMap LinkedHashMap - -LinkedHashMap + +LinkedHashMap ListMap - -ListMap + +ListMap TreeMap - -TreeMap + +TreeMap ImmutableMapAdaptor - -ImmutableMapAdaptor + +ImmutableMapAdaptor ArraySeq - -ArraySeq + +ArraySeq ArrayBuffer - -ArrayBuffer + +ArrayBuffer StringBuilder - -StringBuilder + +StringBuilder ListBuffer - -ListBuffer + +ListBuffer Stack - -Stack + +Stack SynchronizedStack - -SynchronizedStack + +SynchronizedStack Stack->SynchronizedStack - - + + ArrayStack - -ArrayStack + +ArrayStack PriorityQueue - -PriorityQueue + +PriorityQueue SynchronizedPriorityQueue - -SynchronizedPriorityQueue + +SynchronizedPriorityQueue PriorityQueue->SynchronizedPriorityQueue - - + + SynchronizedQueue - -SynchronizedQueue + +SynchronizedQueue MutableList - -MutableList + +MutableList Queue - -Queue + +Queue MutableList->Queue - - + + LinkedList - -LinkedList + +LinkedList DoubleLinkedList - -DoubleLinkedList + +DoubleLinkedList Traversable - -Traversable + +Traversable Iterable - -Iterable + +Iterable Traversable->Iterable - - + + Iterable->PriorityQueue - - + + Map - -Map + +Map Iterable->Map - - + + Seq - -Seq + +Seq Iterable->Seq - - + + Set - -Set + +Set Iterable->Set - - + + Map->HashMap - - + + Map->OpenHashMap - - + + Map->WeakHashMap - - + + Map->LinkedHashMap - - + + Map->ListMap - - + + Map->TreeMap - - + + Map->ImmutableMapAdaptor - - + + ObservableMap - -ObservableMap + +ObservableMap Map->ObservableMap - - + + SynchronizedMap - -SynchronizedMap + +SynchronizedMap Map->SynchronizedMap - - + + MultiMap - -MultiMap + +MultiMap Map->MultiMap - - + + Seq->Stack - - + + Seq->ArrayStack - - + + LinearSeq - -LinearSeq + +LinearSeq Seq->LinearSeq - - + + IndexedSeq - -IndexedSeq + +IndexedSeq Seq->IndexedSeq - - + + Buffer - -Buffer + +Buffer Seq->Buffer - - + + Set->HashSet - - + + Set->ImmutableSetAdaptor - - + + Set->LinkedHashSet - - + + ObservableSet - -ObservableSet + +ObservableSet Set->ObservableSet - - + + SynchronizedSet - -SynchronizedSet + +SynchronizedSet Set->SynchronizedSet - - + + SortedSet - -SortedSet + +SortedSet Set->SortedSet - - + + BitSet - -BitSet + +BitSet SortedSet->BitSet - - + + LinearSeq->MutableList - - + + LinearSeq->LinkedList - - + + LinearSeq->DoubleLinkedList - - + + IndexedSeq->ArraySeq - - + + IndexedSeq->ArrayBuffer - - + + IndexedSeq->StringBuilder - - + + Buffer->ArrayBuffer - - + + Buffer->ListBuffer - - + + ObservableBuffer - -ObservableBuffer + +ObservableBuffer Buffer->ObservableBuffer - - + + SynchronizedBuffer - -SynchronizedBuffer + +SynchronizedBuffer Buffer->SynchronizedBuffer - - + + Queue->SynchronizedQueue - - + +