Skip to content

Set safe defaults for parser settings #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 4, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions shared/src/main/scala/scala/xml/factory/XMLLoader.scala
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,18 @@ trait XMLLoader[T <: Node] {

/* Override this to use a different SAXParser. */
def parser: SAXParser = {
val f = SAXParserFactory.newInstance()
f.setNamespaceAware(false)
f.newSAXParser()
val parser = SAXParserFactory.newInstance()

parser.setFeature("http://javax.xml.XMLConstants/feature/secure-processing", true)
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many people use this library to parse HTML, but I'm sure there are some. Either way having a DOCTYPE in your XML will fail to load:

    val html =
      """<!DOCTYPE html>
        |<html lang="en">
        |</html>
        |""".stripMargin
    XML.loadString(html)

The result is: org.xml.sax.SAXParseException: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that a lowercase doctype, which is very common, currently fails already before changing these settings:

    val html =
      """<!doctype html>
        |<html lang="en">
        |</html>
        |""".stripMargin
    XML.loadString(html)

That's because it's malformed, org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.

Copy link
Member

@SethTisue SethTisue Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many people use this library to parse HTML

I'm comfortable with (not in a 1.1.1 release, but in a 1.2 or a 2.0) requiring those users to explicitly override the default. We might double check if there is suitable documentation in appropriate places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... that is unfortunate, but it's insecure:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#General_Guidance

If we leave that enabled we leave ourselves open to DOS attacks. The cleanest solution would probably be to document this and have people that need it provide their own parser instance. You just have to do XML.withSAXParser(myParser).load(html) instead, so it shouldn't be a huge burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment (though it/s not a Scaladoc comment) says /* Override this to use a different SAXParser. */, is that not the right recommendation to make...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still unsafe by default

we definitely want to make things safe by default, absolutely.

the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we definitely want to make things safe by default, absolutely.

Ah, sorry. I misunderstood you.

So the idea is some variation of:

def parser(f: SAXParserFactory => SAXParserFactory = identity)

... where the factory has safe defaults already applied to allow for user customization?
That makes a lot of sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that would be awesome.

Copy link
Member

@eed3si9n eed3si9n Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin. So this is more nuanced case-by-case tradeoff.

Given that this issue has been around, and if informed users have been disabling external DTD loading features manually, then the rest are casual/not-so-informed users by process of elimination. I am not too sure if they will read the release notes whether it comes out in version 1.1.x or 1.2.x.

A potential way of dealing with this is to use static typing. As in we would deprecate any methods that are current unsafe during 1.1.x and provide safer one as the alternative. In 1.2.x you can remove the current unsafe methods.

See also #17 where this was discussed.

Copy link
Member

@ashawley ashawley Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for weighing in on this, Eugene.

I put a general comment on this discussion, yesterday, at the top-level (outside of the patch review), see below #177 (comment)

parser.setFeature("http://xml.org/sax/features/external-parameter-entities", false)
parser.setFeature("http://xml.org/sax/features/external-general-entities", false)
parser.setFeature("http://xml.org/sax/features/resolve-dtd-uris", false)
parser.setXIncludeAware(false)
parser.setNamespaceAware(false)

parser.newSAXParser()
}

/**
Expand Down