Skip to content

Wanted: Strawman proposals for new collections architecture #818

Closed
@odersky

Description

@odersky

A redesign of the standard library is on the roadmap for one of the next Scala versions (could be as early as 2.13). Some requirements for the redesign should be:

  1. Normal usage of collections should stay the same. That is, the standard operations
    on collections should still be available under the same names. Some more exotic and advanced
    scenarios such as breakOut can be dropped if alternative ways to achieve the same functionality
    exist.
  2. User-defined implementations of collections should port to the new library as far as is reasonable. We should allow some breakage here, if it is necessary to achieve other goals.
  3. We should strive for simplicity in APIs and implementations.
  4. We should strive to better separate interfaces from implementation and avoid the fragile base
    class problem, where too much gets inherited automatically.
  5. We should try to simplify the inheritance graphs, in particular those of often-used collections
    such as lists.
  6. We should improve the integration of strict and lazy collections by means of a better architecture for views. Views should avoid accidental forcing, and should omit transforms from collections that require such forcing. However, forcing is still needed to support aggregations.
  7. We should try to integrate Java8 streams and/or have our own high-performance implementations for
    parallel collections.
  8. We should generally be at least on par with current collections in what concerns efficiency. In particular, we should still allow specializations of collection operations for particular implementations. These optimizations should still work if the static collection type is abstract. E.g. an optimized implementation of ++ on ArrayBuffers should be called even if the static types of the operands of ++ are Iterables.
  9. The design should be friendly to optimizations such as inlining and, possibly, more advanced
    whole program optimizations.

To gain experience it would be good to experiment with some prototypes that illustrate possible designs and can point to strengths and weaknesses. A full collections design is too heavy to fill this role. It takes too long to write and understand. As an alternative, I propose to implement just a subset of the required functionality. The subset should implementable in about 500 LOC or less and at the same time should be as representative as possible of the whole collections. The API should be the same for all strawman proposals.

After some experimentation, I came up with the following proposal for the API to be implemented by strawman proposals. There's still some modest room to add things if people feel it is important.

Base Traits

Iterator
Iterable
Seq

For now we leave out maps and sets (which does not say they are not important). We gloss over the mutabile/immutable distinction. All collections are in the same package (this is only for the strawman, to keep things simple, not for the full collections, of course).

Iterable and Seq are considered collection types, but Iterator is not.

Collection Implementations

List                    
ListBuffer   
ArrayBuffer          
java.lang.String  
View                 

List is a simplified version of Scala lists. It's an immutable collection favoring linear access patterns.
List operations should be tail-recursive, hence the addition of ListBuffer to hold intermediate results. ArrayBuffer is a mutable collection that supports random access. String should demonstrate how the architecture integrates externally defined types. String should be seen by Scala as an immutable random access sequence of Char. Finally, views should be constructible over all other collections. They are immutable and lazy.

List, ListBuffer and ArrayBuffer should have companion objects that let one construct collections given their elements in the usual way:

List(1, 2, 3)
ListBuffer("a", "bc")
ArrayBuffer()

Collection operations

The following operations should be supported by all collections.

foldLeft
foldRight
isEmpty
head
iterator
view
to

foldLeft and foldRight are the basis of all sorts of aggregations. isEmpty and head exemplify
tests and element projections. iterator and view construct iterators and views over a collection. collect is not part of the current Scala collections, but is useful to support views and Java 8 streams. It should construct a new collection of a given type from the elements of the receiver. An invocation pattern of to could be either of the following:

xs.to[List]
xs.to(List)

Strawman collections also should support the following transforms:

filter
map
flatMap
++
zip
partition
drop

map and flatMap together support all monadic operations. ++ and zip are examples of operations with multiple inputs. drop was chosen as a typical projection that yields a sub-collection. partition was chosen in addition filter because it exemplifies transforms with multiple outputs.

Strawman Seqs (but not necessarily Iterables or Views) should also support the following operations

length
apply
indexWhere
reverse

Sequences in a way are defined by their length and their indexing method apply. indexWhere is an example of a method that combines indices and sequences in an interesting way. reverse is an example of a transform that suggests an access pattern different from left-to-right.

ArrayBuffer and ListBuffer should in addition support the following mutation operations

+=
++=
trimStart

Required Optimizations

There are a number of possible specializations that the strawman architecture should support. Here are
two examples:

  1. xs.drop(n) on a list xs should take time proportional to n, the retained part of the list should not be copied.
  2. xs ++ ys on array buffers xs and ys should be implemented by efficient array copy operations. s1 ++ s2 on strings should use native string concatenation.
  3. partition should require only a single collection traversal if the underlying collection is strict (i.e., not a view).

These specializations should be performed independently of the static types of their arguments.

Why No Arrays?

Collections definitely should support arrays as they do now. In particular, arrays should have the same representation as in Java and all collection operations should be applicable to them. Arrays are left out of the strawman requirements because of the bulk their implementation would add. Even though no explicit implementation is demanded, we should still check all designs for how they would support arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions