Skip to content

A new approach to Stepper and Stream materialization #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 30, 2016

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Apr 18, 2016

Getting the most efficient Stepper implementation requires knowledge
of the static type of a collection. This problem could be solved in the
most elegant way by integrating Steppers into the collections framework
as a replacement for Iterators (i.e. every collection gets a stepper
method and iterator delegates to stepper by default).

But this is at odds with the handling of specialized primitive Steppers
in the current implementation. If we want stepper to be an instance
methods instead of an extension method, there needs to be a single such
method for all specialized Steppers. The fundamental change in this
new implementation is to encode the translation from element type to
Stepper type (including widening conversions) as a functional dependency
via the new StepperShape trait.

This greatly reduces the number of implicit methods and classes and
keeps all specialized versions of MakesStepper together. The default
base classes support all unboxing and widening conversions so that a
simple MakesStepper for a collection of boxed elements only needs to
handle the AnyStepper case.

keyStepper and valueStepper are handled in the same way.

StreamConverters use a separate StreamShape for the translation from
element type to BaseStream subtype which is compatible with
StepConverters and supports the same primitive types and widening
conversions.

szeiger added 3 commits April 13, 2016 20:15
- `WrappedArray` now produces the same primitive, non-boxing `Stepper`
  that the underlying `Array` would.

- Plus some improvements to the unit tests and benchmarks.

An interesting observation from the benchmarks: `IntStream.sum` is
horribly slow when used on a seqStream built from an `IntStepper`. It
is still slow when used on a seqStream built directly from an `Array`
with Java’s own stream support. Using a `while` loop on an `IntIterator`
produced by the stream beats both hands down and has the same
performance independent of the stream source.
This was already done for `Array` in the same way. Streams produced by
`Arrays.stream` can be faster than going through a `Stepper`.
Getting the most efficient Stepper implementation requires knowledge
of the static type of a collection. This problem could be solved in the
most elegant way by integrating Steppers into the collections framework
as a replacement for Iterators (i.e. every collection gets a `stepper`
method and `iterator` delegates to `stepper` by default).

But this is at odds with the handling of specialized primitive Steppers
in the current implementation. If we want `stepper` to be an instance
methods instead of an extension method, there needs to be a single such
method for all specialized Steppers. The fundamental change in this
new implementation is to encode the translation from element type to
Stepper type (including widening conversions) as a functional dependency
via the new `StepperShape` trait.

This greatly reduces the number of implicit methods and classes and
keeps all specialized versions of `MakesStepper` together. The default
base classes support all unboxing and widening conversions so that a
simple `MakesStepper` for a collection of boxed elements only needs to
handle the `AnyStepper` case.

`keyStepper` and `valueStepper` are handled in the same way.

`StreamConverters` use a separate `StreamShape` for the translation from
element type to `BaseStream` subtype which is compatible with
`StepConverters` and supports the same primitive types and widening
conversions.
@szeiger
Copy link
Contributor Author

szeiger commented Apr 19, 2016

@Ichoran, do you have time to review this?

@Ichoran
Copy link
Contributor

Ichoran commented Apr 19, 2016

I can on Wednesday morning.

implicit val CharValue = intStreamShape[Char]
implicit val FloatValue = doubleStreamShape[Float]
}
trait StreamShapeLowPrio {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell out "Priority" here?

@Ichoran
Copy link
Contributor

Ichoran commented Apr 20, 2016

Partway through the review, but I'm out of time for right now. I'll try to finish later tonight or, failing that, (my) Wednesday morning.

This does away with the default implementations of `MakesStepper` et.
al. and moves the unboxing logic into `StepperShape` for simpler
implementations with faster dispatch.
@szeiger
Copy link
Contributor Author

szeiger commented Apr 20, 2016

Updated based on review comments.

}

trait PrimitiveStreamUnboxer[A, S] {
def apply(boxed: Stream[A]): S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure apply is the best choice for a method name, since it makes it impossible (with that signature) to have something that is both a function and a PrimitiveStreamUnboxer. If it's for internal use only, just seal it and it's fine. Otherwise I would pick something like streamUnbox to mirror streamAccumulate above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from your original PR, it just got moved around. I haven't really looked into this area of the codebase yet after the initial review.

@Ichoran
Copy link
Contributor

Ichoran commented Apr 20, 2016

Aside for the few extra comments I left, this looks great! Definitely a more compact design.

Have you run the benchmarks again to see if performance stays comparable (especially for small collections)? There's always a concern that even though the extra abstraction is logically entirely removable that in practice it will cause a performance hit.

@szeiger
Copy link
Contributor Author

szeiger commented Apr 21, 2016

I ran a subset of the benchmark suite (fast summation on ArrayBuffer, Array and Vector). My original PR lost a few percent of performance in some cases but I couldn't pinpoint the reason. I saw bigger losses for the large test cases where the time spent constructing the Stepper should have less of an impact.

Measurements on the updated PR aren't fully conclusive, either. The Vector test with 10 elements is still a bit slower, but the one with ArrayBuffer is faster. I'd have to run more comprehensive tests on an idle machine but so far it looks like this doesn't have a negative impact on performance compared the old version (before this PR).

As the next step I want to try removing the specialized Steppers for boxed collections. I saw a few percent of performance degradation when I tried this with the old version. We could remove quite a bit of code by always going through AnyStepper in these cases but if the manually specialized versions turn out to still be measurably faster, they may be worth keeping.

@szeiger szeiger merged commit fd8a54a into scala:master Jun 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants