Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

Ichoran · 2016-01-08T21:17:48Z

This provides basic support for getting Java 8 Streams from Scala collections by defining a new iterator-like type, Stepper, that is simultaneously a Spliterator (for Stream compatibility) and an Iterator (since Scala collections can all be iterated anyway). It also provides a large Array-like accumulator type, Accumulator (and specialized variants thereof) that can be used to very rapidly collect the results of a Java 8 Stream (comparable to Array, but with unbounded size) and then operate on them as a Stream again.

Documentation has been extended to cover the use cases on both of these.

A partial set of benchmarks test a variety of common use cases to make sure performance is acceptable. (It is; in general Steppers used as Iterators are as fast or faster than Iterator, with a notable exception for both kinds of Queue at the moment.)

…erator and Spliterator and primitive versions thereof. (Will still need four implementations of some things, but this should reduce it a lot.) Also added Steppers for arrays to check performance. (Very good so far.) Have some terminal operations to Stepper. Trying to avoid the Stepper/StepperLike pattern, but I doubt I'll manage.

…ich isn't written.

(Should overrride everywhere important.)

…d method.

This consists of (1) toScala[Coll] extension methods for Java8 Streams (2) seqStream and parStream extension methods for Scala collections (3) A manually specialized set of Accumulators that let you quickly save a stream for multiple re-use. (a) accumulate and accumulatePrimitive methods on Java8 Streams (b) to[Coll] on Accumulators (c) .iterator, .toArray, .toList also There is a lot of redundant code in the Accumulators since it is difficult to pull out common functionality without endangering performance of the manually specialized cases. Everything goes through the new Stepper trait to partially tame the Spliterator hierarchy. At this point, Scala collections all go through Accumulator. Need to write something custom for the more important collections. Tests written. Scaladocs written for the new classes.

Includes everything through StreamConverters and tests.

… streams.

it's 10% faster than Iterator.

(Right now we don't do so well at covering them, though.)

…patching.

Note that almost all of IndexedSeqLike is Optimized (or should be) or I already have a workaround (e.g. Vector).

like WithTail does. Leave that for speed tests later.

…access for Stepper in tests.

adriaanm · 2016-02-23T01:19:44Z

I'll see who's best placed to spend some quality time with this. We're going to push out M4 a bit more because of some unforeseen complications with the trait encoding work, so we'll have plenty of time to review this PR in time for M4 (should be out before end of March).

szeiger · 2016-03-10T16:29:28Z

README.md

-  - [`Spliterator`](https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html)s for Scala collections
+## Converters from Scala collections to Java 8 Streams
+
+Scala collections gain `seqStream` and `parStream` as extension methods that produce a Java 8 Stream


I'm wondering if seq and par in Scala collections should be renamed to sequential and parallel. AFAICT they see little use in practice, seq is confusing (because we also have a Seq type) and Java now uses sequential and parallel for streams.

I wonder the same thing, but I don't think it matters much to this API. We can have the names be as above or sequentialStream and parallelStream or whatever regardless.

Ichoran · 2016-03-19T03:52:31Z

@szeiger - I've commented on everything you said, but haven't changed any source files yet. I'll do that this weekend. Some of the larger issues I may not have time to get to (e.g. typedPrecisely vs. trait self-type), but I'll let you know.

szeiger · 2016-03-23T17:56:27Z

I think the focus for now should be on API-related issues that would easily break source compatibility if we were to change them later, e.g. availability of accumulator methods or the codepoint vs char Stepper on String. Anything else like removing typedPrecisely, adding doc comments, or moving specialized methods up into a parent class for simplification could also be done later. We don't really need any of that for the first release or for this PR, which is already rather unwieldy.

szeiger · 2016-03-29T13:16:59Z

Here are the changes to remove typedPrecisely: szeiger@8ca53f0

I looked at the implementation of trySplit in abstract class DoubleStepperTest extends DoubleStepper. With typedPrecisely:

   public java.util.Spliterator trySplit();
    Code:
       0: aload_0
       1: invokevirtual #356                // Method trySplit:()Ljava/util/Spliterator$OfDouble;
       4: areturn

  public java.util.Spliterator$OfPrimitive trySplit();
    Code:
       0: aload_0
       1: invokevirtual #356                // Method trySplit:()Ljava/util/Spliterator$OfDouble;
       4: areturn

   public java.util.Spliterator$OfDouble trySplit();
    Code:
       0: aload_0
       1: invokestatic  #62                 // Method scala/compat/java8/collectionImpl/DoubleStepper$class.trySplit:(Lscala/compat/java8/collectionImpl/DoubleStepper;)Ljava/util/Spliterator$OfDouble;
       4: areturn

Without:

  public java.util.Spliterator trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public java.util.Spliterator$OfPrimitive trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public java.util.Spliterator$OfDouble trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public scala.compat.java8.collectionImpl.DoubleStepper trySplit();
    Code:
       0: aload_0
       1: invokestatic  #62                 // Method scala/compat/java8/collectionImpl/DoubleStepper$class.trySplit:(Lscala/compat/java8/collectionImpl/DoubleStepper;)Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

So there's an additional method with a DoubleStepper return type but both versions have the same number of indirections and should work equally well with specialization.

szeiger · 2016-03-29T14:04:01Z

src/main/scala/scala/compat/java8/StreamConverters.scala

+  * which is not part of the standard collections hierarchy, or into a named
+  * Scala collection, respectively.
+  *
+  * Generic streams also gain an `unboxed` method that will convert to the


Should we allow such streams in the first place? AFAIK the Java APIs provide a nice (albeit manual) separation of primitive and generic types, so any such stream would have to come from Scala. How about we restrict the seqStream and parStream methods to element types <: AnyRef and remove the unboxed and boxed methods? If people really want to use boxed streams for primitive values, they'd have to cast the collection to an AnyRef element type.

Ah, I see. The boxed method actually comes from Java, so it makes sense to have a corresponding unbox. I'd still restrict the types for StreamConverters though. Implementation: szeiger@0d14e71

Almost all the Java benchmarks showing Java 8 Streams wiping the floor with Scala are using primitive streams. I think they should be as deeply supported as possible. In particular, giving the relative difficulty of optimizing tryAdvance-based code, just throwing in an extra unboxing step is not just like throwing in an extra method call to do it. It actually adds quite a bit of overhead in my tests. (I forget exactly how much.)

Right, which is why I would prevent the creation of boxed streams for primitive types in the first place. If you still want to do it, you have to cast to the boxed type.

But what if you want a stream of Char? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)

I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)

But what if you want a stream of Char? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)

The supported primitive stream types can support all other primitive types through widening conversions, so I'd say use IntStream for Char. To make this feasible, we'd also need a way to perform a narrowing conversion when collecting into an unboxed primitive collection (like Array[Char]).

I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)

The loophole is abstracting over types. The current design allows you to get a Stream for a collection of T <: Any and since it's implemented through an extension method with static dispatch, all primitive types are boxed. Restricting the supported types to T <: AnyRef would make that illegal, so you're forced to special-case primitive types (or cast if you're willing to accept the boxing overhead).

Case in point: CharSequence.chars() returns an IntStream

szeiger · 2016-03-30T11:40:42Z

Some optimizations for the extension methods: szeiger@e6edfa8

szeiger · 2016-03-31T16:14:13Z

Since this looks mostly good, with only a few details left to investigate, I'm going to merge it and then create a cleaned-up PR with some changes that I expect to be uncontroversial. This is an experimental module and the upcoming release is a milestone, so we can still do change to the API.

Ichoran added 30 commits January 8, 2016 13:12

Further work on core stepper API.

f476c42

Basic conversion of Spliterator to Stepper works.

1cf9ea2

Work in progress on Filter etc..

ec8ebd1

Refactored names a bit, tried to make interface work a little better.

bc909f4

Added outline of a test for Stepper.

7b6e118

Converted to covariant 2nd arg in Stepper, introduced StepperLike.

5f7d2d7

Moved classes outside of StepperTest to see if this helps exception.

649dec4

Fixed TryStepper impl to avoid specialization bug. Added tests.

e6c8ffa

Finished testing terminal operations for Stepper, save spliterator wh…

2550aff

…ich isn't written.

Added proxy-to-spliterator to default Stepper trait.

6ee09ef

(Should overrride everywhere important.)

Tests on all the various permutations of Stepper-Spliterator converters.

37ea39c

Fixed a bug in tryStep of TryStepper by adding a protected tryUncache…

04bc02a

…d method.

Finished tests of basic steppers and fixed bugs in TryStepper.

e7033e3

Pulled out abstractable part of Array steppers for use by any Indexed.

302fd15

Abstracted indexed stepping over Double, Int, Long specialized versions.

ecf85a3

IndexedSeqOptimized added to Stream compat via Stepper.

e562451

Includes everything through StreamConverters and tests.

FlatHashTable-based collections have steppers and can be converted to…

390bbab

… streams.

Vector works with Stepper, and if you use the raw Stepper interface

a482301

it's 10% faster than Iterator.

Added a bunch of comprehensiveness tests to StepConvertersTests.

4c72bd9

(Right now we don't do so well at covering them, though.)

Ranges step quickly. Nearly complete coverage testing for generics.

40c0117

Filled in all the stepper tests and simplifed IndexedSeqOptimized dis…

4013be6

…patching.

Simplified StepsLikeIndexed implementation.

e4c18a3

Added missing file! Also covered all of IndexedSeqLike.

27e72aa

Note that almost all of IndexedSeqLike is Optimized (or should be) or I already have a workaround (e.g. Vector).

LinearSeq conversions in.

ffeb369

Added iterator interface. Probably need to have it store initial segment

a221edc

like WithTail does. Leave that for speed tests later.

Added key and value steppers to maps.

f9d4e35

Distinguish between good and sorta acceptable (sequential traversal) …

d5bb589

…access for Stepper in tests.

Linked and regular HashMaps working.

f13b0f3

szeiger reviewed Mar 10, 2016
View reviewed changes

szeiger reviewed Mar 29, 2016
View reviewed changes

szeiger merged commit 6caeefa into scala:master Mar 31, 2016

szeiger mentioned this pull request Mar 31, 2016

Improvements to Java 8 Stream support #65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

Uh oh!

Ichoran commented Jan 8, 2016

Uh oh!

adriaanm commented Feb 23, 2016

Uh oh!

szeiger Mar 10, 2016

Uh oh!

Ichoran Mar 10, 2016

Uh oh!

Ichoran commented Mar 19, 2016

Uh oh!

szeiger commented Mar 23, 2016

Uh oh!

szeiger commented Mar 29, 2016

Uh oh!

szeiger Mar 29, 2016

Uh oh!

szeiger Mar 29, 2016

Uh oh!

Ichoran Mar 29, 2016

Uh oh!

szeiger Mar 29, 2016

Uh oh!

Ichoran Mar 29, 2016

Uh oh!

szeiger Mar 30, 2016

Uh oh!

szeiger Mar 30, 2016

Uh oh!

szeiger commented Mar 30, 2016

Uh oh!

szeiger commented Mar 31, 2016

Uh oh!

Uh oh!

Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

Uh oh!

Conversation

Ichoran commented Jan 8, 2016

Uh oh!

adriaanm commented Feb 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ichoran commented Mar 19, 2016

Uh oh!

szeiger commented Mar 23, 2016

Uh oh!

szeiger commented Mar 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szeiger commented Mar 30, 2016

Uh oh!

szeiger commented Mar 31, 2016

Uh oh!

Uh oh!