RegexParsers.scala has O(inputlength) memory performance on java >= 7u6

From 1.7.0_06 onwards, String.substring() (and .subSequence) was changed to no longer re-use the internal char[] data, but make a copy instead. Since RegexParsers.scala:109 calls subSequence() for every character parsed, it now effectively re-allocates the whole remaining parse content for every parse step.

This shows in horrible parse performance and GC for parsing a 3MB file using https://github.com/ngocdaothanh/scaposer , which would parse almost instantly in Java 6.

Details on the changes to java.lang.String are mentioned here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259
http://java-performance.info/changes-to-string-java-1-7-0_06/ 
http://grokbase.com/t/gg/scala-user/132v5z1678/performance-of-javatokenparsers-with-java7

I guess one way around it would be wrapping CharSequence in a simple buffer, that does re-use the underlying CharSequence, adding in skip/count fields that maintain the current position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RegexParsers.scala has O(inputlength) memory performance on java >= 7u6 #7710

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RegexParsers.scala has O(inputlength) memory performance on java >= 7u6 #7710

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions