Skip to content

Compatibility for Java 8 Streams through Steppers, a Spliterator/Iterator hybrid #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Mar 31, 2016

Conversation

Ichoran
Copy link
Contributor

@Ichoran Ichoran commented Jan 8, 2016

This provides basic support for getting Java 8 Streams from Scala collections by defining a new iterator-like type, Stepper, that is simultaneously a Spliterator (for Stream compatibility) and an Iterator (since Scala collections can all be iterated anyway). It also provides a large Array-like accumulator type, Accumulator (and specialized variants thereof) that can be used to very rapidly collect the results of a Java 8 Stream (comparable to Array, but with unbounded size) and then operate on them as a Stream again.

Documentation has been extended to cover the use cases on both of these.

A partial set of benchmarks test a variety of common use cases to make sure performance is acceptable. (It is; in general Steppers used as Iterators are as fast or faster than Iterator, with a notable exception for both kinds of Queue at the moment.)

Ichoran added 30 commits January 8, 2016 13:12
…erator and Spliterator and primitive versions thereof. (Will still need four implementations of some things, but this should reduce it a lot.)

Also added Steppers for arrays to check performance.  (Very good so far.)

Have some terminal operations to Stepper.  Trying to avoid the Stepper/StepperLike pattern, but I doubt I'll manage.
(Should overrride everywhere important.)
This consists of
  (1) toScala[Coll] extension methods for Java8 Streams
  (2) seqStream and parStream extension methods for Scala collections
  (3) A manually specialized set of Accumulators that let you quickly
      save a stream for multiple re-use.
    (a) accumulate and accumulatePrimitive methods on Java8 Streams
    (b) to[Coll] on Accumulators
    (c) .iterator, .toArray, .toList also

There is a lot of redundant code in the Accumulators since it is difficult
to pull out common functionality without endangering performance of the
manually specialized cases.

Everything goes through the new Stepper trait to partially tame the
Spliterator hierarchy.

At this point, Scala collections all go through Accumulator.  Need to write
something custom for the more important collections.

Tests written.

Scaladocs written for the new classes.
Includes everything through StreamConverters and tests.
(Right now we don't do so well at covering them, though.)
Note that almost all of IndexedSeqLike is Optimized (or should be) or I already have a workaround (e.g. Vector).
like WithTail does.  Leave that for speed tests later.
@adriaanm
Copy link
Contributor

I'll see who's best placed to spend some quality time with this. We're going to push out M4 a bit more because of some unforeseen complications with the trait encoding work, so we'll have plenty of time to review this PR in time for M4 (should be out before end of March).

- [`Spliterator`](https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html)s for Scala collections
## Converters from Scala collections to Java 8 Streams

Scala collections gain `seqStream` and `parStream` as extension methods that produce a Java 8 Stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if seq and par in Scala collections should be renamed to sequential and parallel. AFAICT they see little use in practice, seq is confusing (because we also have a Seq type) and Java now uses sequential and parallel for streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the same thing, but I don't think it matters much to this API. We can have the names be as above or sequentialStream and parallelStream or whatever regardless.

@Ichoran
Copy link
Contributor Author

Ichoran commented Mar 19, 2016

@szeiger - I've commented on everything you said, but haven't changed any source files yet. I'll do that this weekend. Some of the larger issues I may not have time to get to (e.g. typedPrecisely vs. trait self-type), but I'll let you know.

@szeiger
Copy link
Contributor

szeiger commented Mar 23, 2016

I think the focus for now should be on API-related issues that would easily break source compatibility if we were to change them later, e.g. availability of accumulator methods or the codepoint vs char Stepper on String. Anything else like removing typedPrecisely, adding doc comments, or moving specialized methods up into a parent class for simplification could also be done later. We don't really need any of that for the first release or for this PR, which is already rather unwieldy.

@szeiger
Copy link
Contributor

szeiger commented Mar 29, 2016

Here are the changes to remove typedPrecisely: szeiger@8ca53f0

I looked at the implementation of trySplit in abstract class DoubleStepperTest extends DoubleStepper. With typedPrecisely:

   public java.util.Spliterator trySplit();
    Code:
       0: aload_0
       1: invokevirtual #356                // Method trySplit:()Ljava/util/Spliterator$OfDouble;
       4: areturn

  public java.util.Spliterator$OfPrimitive trySplit();
    Code:
       0: aload_0
       1: invokevirtual #356                // Method trySplit:()Ljava/util/Spliterator$OfDouble;
       4: areturn

   public java.util.Spliterator$OfDouble trySplit();
    Code:
       0: aload_0
       1: invokestatic  #62                 // Method scala/compat/java8/collectionImpl/DoubleStepper$class.trySplit:(Lscala/compat/java8/collectionImpl/DoubleStepper;)Ljava/util/Spliterator$OfDouble;
       4: areturn

Without:

  public java.util.Spliterator trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public java.util.Spliterator$OfPrimitive trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public java.util.Spliterator$OfDouble trySplit();
    Code:
       0: aload_0
       1: invokevirtual #354                // Method trySplit:()Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

  public scala.compat.java8.collectionImpl.DoubleStepper trySplit();
    Code:
       0: aload_0
       1: invokestatic  #62                 // Method scala/compat/java8/collectionImpl/DoubleStepper$class.trySplit:(Lscala/compat/java8/collectionImpl/DoubleStepper;)Lscala/compat/java8/collectionImpl/DoubleStepper;
       4: areturn

So there's an additional method with a DoubleStepper return type but both versions have the same number of indirections and should work equally well with specialization.

* which is not part of the standard collections hierarchy, or into a named
* Scala collection, respectively.
*
* Generic streams also gain an `unboxed` method that will convert to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow such streams in the first place? AFAIK the Java APIs provide a nice (albeit manual) separation of primitive and generic types, so any such stream would have to come from Scala. How about we restrict the seqStream and parStream methods to element types <: AnyRef and remove the unboxed and boxed methods? If people really want to use boxed streams for primitive values, they'd have to cast the collection to an AnyRef element type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The boxed method actually comes from Java, so it makes sense to have a corresponding unbox. I'd still restrict the types for StreamConverters though. Implementation: szeiger@0d14e71

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all the Java benchmarks showing Java 8 Streams wiping the floor with Scala are using primitive streams. I think they should be as deeply supported as possible. In particular, giving the relative difficulty of optimizing tryAdvance-based code, just throwing in an extra unboxing step is not just like throwing in an extra method call to do it. It actually adds quite a bit of overhead in my tests. (I forget exactly how much.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, which is why I would prevent the creation of boxed streams for primitive types in the first place. If you still want to do it, you have to cast to the boxed type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you want a stream of Char? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)

I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you want a stream of Char? There is no primitive type that corresponds. Do we want to make this intentionally awkward? (What do we even do?)

The supported primitive stream types can support all other primitive types through widening conversions, so I'd say use IntStream for Char. To make this feasible, we'd also need a way to perform a narrowing conversion when collecting into an unboxed primitive collection (like Array[Char]).

I already have designed the priority hierarchy such that if you have a collection of Double you can't conveniently get a Stream instead of a DoubleStream out of it. (I hope. That was my goal.)

The loophole is abstracting over types. The current design allows you to get a Stream for a collection of T <: Any and since it's implemented through an extension method with static dispatch, all primitive types are boxed. Restricting the supported types to T <: AnyRef would make that illegal, so you're forced to special-case primitive types (or cast if you're willing to accept the boxing overhead).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case in point: CharSequence.chars() returns an IntStream

@szeiger
Copy link
Contributor

szeiger commented Mar 30, 2016

Some optimizations for the extension methods: szeiger@e6edfa8

@szeiger
Copy link
Contributor

szeiger commented Mar 31, 2016

Since this looks mostly good, with only a few details left to investigate, I'm going to merge it and then create a cleaned-up PR with some changes that I expect to be uncontroversial. This is an experimental module and the upcoming release is a milestone, so we can still do change to the API.

@szeiger szeiger merged commit 6caeefa into scala:master Mar 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants