Skip to content

Tidy wrongly outputs an XML declaration when producing HTML #658

Open
@dechamps

Description

@dechamps

On current HEAD (f0438bd):

$ tidy --output-html yes <<EOF
<?xml version="1.0" ?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<title></title>
<meta charset="utf-8" />
</head><body>
<img src="foo.jpg" alt="Foo" />
<br />
</body></html>
EOF

Returns:

Info: Document content looks like HTML5
No warnings or errors were found.

<?xml version="1.0"?>
<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.7.0">
<title></title>
<meta charset="utf-8">
</head>
<body>
<img src="foo.jpg" alt="Foo"><br>
</body>
</html>

First of all the "Info: Document content looks like HTML5" message is a bit confusing because it's the output that's HTML5, not the input (which is XHTML5), but that's neither here nor there.

What's more problematic is that Tidy outputs an XML preamble as if it was outputting an XML file, despite the fact that the output is not well-formed XML at all (which makes sense, since I asked for HTML). The resulting document makes no sense, since it includes an XML declaration for something that is definitely not XML.

Tidy should never generate an XML declaration when the output is HTML (as opposed to XHTML).

I tried to use --add-xml-decl no, but that doesn't have any effect, as explained in the documentation:

Note that if the input already includes an <?xml ... ?> declaration then this option will be ignored.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions