-
Notifications
You must be signed in to change notification settings - Fork 102
Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…erator and Spliterator and primitive versions thereof. (Will still need four implementations of some things, but this should reduce it a lot.) Also added Steppers for arrays to check performance. (Very good so far.) Have some terminal operations to Stepper. Trying to avoid the Stepper/StepperLike pattern, but I doubt I'll manage.
…ich isn't written.
(Should overrride everywhere important.)
This consists of (1) toScala[Coll] extension methods for Java8 Streams (2) seqStream and parStream extension methods for Scala collections (3) A manually specialized set of Accumulators that let you quickly save a stream for multiple re-use. (a) accumulate and accumulatePrimitive methods on Java8 Streams (b) to[Coll] on Accumulators (c) .iterator, .toArray, .toList also There is a lot of redundant code in the Accumulators since it is difficult to pull out common functionality without endangering performance of the manually specialized cases. Everything goes through the new Stepper trait to partially tame the Spliterator hierarchy. At this point, Scala collections all go through Accumulator. Need to write something custom for the more important collections. Tests written. Scaladocs written for the new classes.
Includes everything through StreamConverters and tests.
it's 10% faster than Iterator.
(Right now we don't do so well at covering them, though.)
Note that almost all of IndexedSeqLike is Optimized (or should be) or I already have a workaround (e.g. Vector).
like WithTail does. Leave that for speed tests later.
…access for Stepper in tests.
I'll see who's best placed to spend some quality time with this. We're going to push out M4 a bit more because of some unforeseen complications with the trait encoding work, so we'll have plenty of time to review this PR in time for M4 (should be out before end of March). |
- [`Spliterator`](https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html)s for Scala collections | ||
## Converters from Scala collections to Java 8 Streams | ||
|
||
Scala collections gain `seqStream` and `parStream` as extension methods that produce a Java 8 Stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if seq
and par
in Scala collections should be renamed to sequential
and parallel
. AFAICT they see little use in practice, seq
is confusing (because we also have a Seq
type) and Java now uses sequential
and parallel
for streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder the same thing, but I don't think it matters much to this API. We can have the names be as above or sequentialStream
and parallelStream
or whatever regardless.
@szeiger - I've commented on everything you said, but haven't changed any source files yet. I'll do that this weekend. Some of the larger issues I may not have time to get to (e.g. typedPrecisely vs. trait self-type), but I'll let you know. |
I think the focus for now should be on API-related issues that would easily break source compatibility if we were to change them later, e.g. availability of accumulator methods or the codepoint vs char Stepper on String. Anything else like removing typedPrecisely, adding doc comments, or moving specialized methods up into a parent class for simplification could also be done later. We don't really need any of that for the first release or for this PR, which is already rather unwieldy. |
Here are the changes to remove I looked at the implementation of
Without:
So there's an additional method with a |
* which is not part of the standard collections hierarchy, or into a named | ||
* Scala collection, respectively. | ||
* | ||
* Generic streams also gain an `unboxed` method that will convert to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow such streams in the first place? AFAIK the Java APIs provide a nice (albeit manual) separation of primitive and generic types, so any such stream would have to come from Scala. How about we restrict the seqStream
and parStream
methods to element types <: AnyRef
and remove the unboxed
and boxed
methods? If people really want to use boxed streams for primitive values, they'd have to cast the collection to an AnyRef element type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. The boxed
method actually comes from Java, so it makes sense to have a corresponding unbox
. I'd still restrict the types for StreamConverters though. Implementation: szeiger@0d14e71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all the Java benchmarks showing Java 8 Streams wiping the floor with Scala are using primitive streams. I think they should be as deeply supported as possible. In particular, giving the relative difficulty of optimizing tryAdvance-based code, just throwing in an extra unboxing step is not just like throwing in an extra method call to do it. It actually adds quite a bit of overhead in my tests. (I forget exactly how much.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, which is why I would prevent the creation of boxed streams for primitive types in the first place. If you still want to do it, you have to cast to the boxed type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if you want a stream of Char
? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)
I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if you want a stream of Char? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)
The supported primitive stream types can support all other primitive types through widening conversions, so I'd say use IntStream
for Char
. To make this feasible, we'd also need a way to perform a narrowing conversion when collecting into an unboxed primitive collection (like Array[Char]
).
I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)
The loophole is abstracting over types. The current design allows you to get a Stream for a collection of T <: Any
and since it's implemented through an extension method with static dispatch, all primitive types are boxed. Restricting the supported types to T <: AnyRef
would make that illegal, so you're forced to special-case primitive types (or cast if you're willing to accept the boxing overhead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case in point: CharSequence.chars()
returns an IntStream
Some optimizations for the extension methods: szeiger@e6edfa8 |
Since this looks mostly good, with only a few details left to investigate, I'm going to merge it and then create a cleaned-up PR with some changes that I expect to be uncontroversial. This is an experimental module and the upcoming release is a milestone, so we can still do change to the API. |
This provides basic support for getting Java 8 Streams from Scala collections by defining a new iterator-like type, Stepper, that is simultaneously a Spliterator (for Stream compatibility) and an Iterator (since Scala collections can all be iterated anyway). It also provides a large Array-like accumulator type, Accumulator (and specialized variants thereof) that can be used to very rapidly collect the results of a Java 8 Stream (comparable to Array, but with unbounded size) and then operate on them as a Stream again.
Documentation has been extended to cover the use cases on both of these.
A partial set of benchmarks test a variety of common use cases to make sure performance is acceptable. (It is; in general Steppers used as Iterators are as fast or faster than Iterator, with a notable exception for both kinds of Queue at the moment.)