Skip to content

scala.util.parsing.input.StreamReader does not scale with long lines #8879

Closed
@scabug

Description

@scabug

Parsing using StreamReader is very slow on input files containing long lines (I have input with lines over 100 000 characters long). I suspect quadratic behavior in drop/nextEol code. Affects scala-parser-combinators 1.0.2.

I use PagedSeqReader as a workaround, but it seems poorly documented. Do any of these readers load all file in memory? Is there any benefit in using a java.io.BufferedReader before passing it to StreamReader/PagedSeqReader?

Test code follows. Attached data example file parses in over 80 s on my high-end computer (OpenJDK 7 JVM), falls down to under one second using PagedSeqReader.

object Parser extends RegexParsers {
  def parse(is: InputStream): Any = {
    val reader = new InputStreamReader(is)

    val parseResult = easy(StreamReader(reader))

    parseResult match {
      case Success(r, _) => r
      case n: NoSuccess => sys.error("Parse error: " + n)
    }

  }

  def easy: Parser[Seq[Int]] = "[" ~> repsep(int_const, ",") <~ "]"

  def int_const: Parser[Int] = """[+-]?[0-9][0-9]*""".r ^^ (i => i.toInt)

}

object Test extends App {
  val data = Test.getClass.getResource("data2")
  var time = -System.nanoTime()
  val r = Parser.parse(data.openStream)
  time += System.nanoTime()
  println(r)
  println(time / 1e9)
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions