Closed
Description
Parsing using StreamReader is very slow on input files containing long lines (I have input with lines over 100 000 characters long). I suspect quadratic behavior in drop/nextEol code. Affects scala-parser-combinators 1.0.2.
I use PagedSeqReader as a workaround, but it seems poorly documented. Do any of these readers load all file in memory? Is there any benefit in using a java.io.BufferedReader before passing it to StreamReader/PagedSeqReader?
Test code follows. Attached data example file parses in over 80 s on my high-end computer (OpenJDK 7 JVM), falls down to under one second using PagedSeqReader.
object Parser extends RegexParsers {
def parse(is: InputStream): Any = {
val reader = new InputStreamReader(is)
val parseResult = easy(StreamReader(reader))
parseResult match {
case Success(r, _) => r
case n: NoSuccess => sys.error("Parse error: " + n)
}
}
def easy: Parser[Seq[Int]] = "[" ~> repsep(int_const, ",") <~ "]"
def int_const: Parser[Int] = """[+-]?[0-9][0-9]*""".r ^^ (i => i.toInt)
}
object Test extends App {
val data = Test.getClass.getResource("data2")
var time = -System.nanoTime()
val r = Parser.parse(data.openStream)
time += System.nanoTime()
println(r)
println(time / 1e9)
}