Description
I'm in the process of integrating Scoverage into Apache Spark's build and am currently blocked by performance issues in Invoker.invoked()
: with coverage instrumentation enabled, some of my suites take up to 10x longer to run.
I profiled one suite in YourKit and found that Invoker.invoked()
spends a huge amount of time computing hashcodes to index into a TreeMap. For example, see the following profiler screenshot:
This particular bottleneck is caused by the fact that Invoker has a map
private val ids = ThreadSafeMap.empty[(String, Int), Any]
where the keys are (coverageDirectory, id)
pairs. Replacing this single-level map by nested maps removes the need to construct and hash a tuple on every invocation, massively speeding things up.
After that optimization, this method ends up becoming bottlenecked on OutputStreamWriter.flush()
calls. This bottleneck wasn't apparent before because it was masked by the hashCode issue. Here's profiling output showing this:
I see a few rationales for the current aggressive flush()
ing behavior:
- We may not always have the opportunity to flush in a JVM exit hook (e.g. if SBT is running tests in non-fork mode.)
- You may want to collect coverage data if the JVM exits in an unclean way (e.g.
kill -9
).
I think that (2) is less of a concern, but (1) is a problem in some environments.
For Spark, we always run tests in forked JVMs so flushing on JVM exit would be perfectly acceptable to us. Therefore I would like to introduce an option to specify this behavior. I can't spot a clear mechanism to plumb configuration options from SBT to the Invoker
, so therefore I propose to use a system property to control this.
I plan to submit pull requests for both issues (the hashCode() optimization and the shutdown hook option).