Open
Description
I'm forwarding some longstanding downstream issues here, one of which is about -ashtml
. Previous reports:
Test case can be found at https://bugs.debian.org/562004 with email attachment tidy.crashtest.zip (downloadable on that page) but anyway I'm attaching a copy here:
tidy.crashtest.zip
The error information
Content of test0.html
:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.6: http://docutils.sourceforge.net/" />
<title>Test</title>
<link rel="stylesheet" href="/usr/lib/pymodules/python2.5/docutils/writers/html4css1/html4css1.css" type="text/css" />
</head>
<body>
<div class="document" id="test">
<h1 class="title">Test</h1>
<p>Some text</p>
</div>
</body>
</html>
Tidy output:
$ tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN"
Info: Document content looks like XHTML 1.0 Strict
No warnings or errors were found.
tidy -ashtml -m test0.html
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like HTML 4.01 Strict
No warnings or errors were found.
$ tidy -ashtml -m test0.html
line 2 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 2 warnings and 0 errors!
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
$ tidy -ashtml -m test0.html'
> ^C
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
$ tidy -ashtml -m test0.html
line 1 column 1 - Warning: An XML declaration was detected. Did you mean to use input-xml?
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!
The problem is that when converting XHTML document with <?xml ...?>
header to HTML, the <?xml ...?>
line was never stripped. Besides, on third invocation the DOCTYPE was missing. Was that expected or is that a bug?