Description
From 1.7.0_06 onwards, String.substring() (and .subSequence) was changed to no longer re-use the internal char[] data, but make a copy instead. Since RegexParsers.scala:109 calls subSequence() for every character parsed, it now effectively re-allocates the whole remaining parse content for every parse step.
This shows in horrible parse performance and GC for parsing a 3MB file using https://github.com/ngocdaothanh/scaposer , which would parse almost instantly in Java 6.
Details on the changes to java.lang.String are mentioned here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259
http://java-performance.info/changes-to-string-java-1-7-0_06/
http://grokbase.com/t/gg/scala-user/132v5z1678/performance-of-javatokenparsers-with-java7
I guess one way around it would be wrapping CharSequence in a simple buffer, that does re-use the underlying CharSequence, adding in skip/count fields that maintain the current position.