Skip to content

Does REXML support UTF-32? #212

Closed
Closed
@naitoh

Description

@naitoh

I don't think REXML have a test case for UTF-32.

Also, I get the following error

  • test_utf.rb
require 'rexml/document'

xml = File.open("./utf-32.xml")
REXML::Document.new(xml)
  • utf-32.xml
<?xml version="1.0" encoding="UTF-32"?>
<message>Hello world!</message>
$ hexdump utf-32.xml 
0000000 feff 0000 003c 0000 003f 0000 0078 0000
0000010 006d 0000 006c 0000 0020 0000 0076 0000
0000020 0065 0000 0072 0000 0073 0000 0069 0000
0000030 006f 0000 006e 0000 003d 0000 0022 0000
0000040 0031 0000 002e 0000 0030 0000 0022 0000
0000050 0020 0000 0065 0000 006e 0000 0063 0000
0000060 006f 0000 0064 0000 0069 0000 006e 0000
0000070 0067 0000 003d 0000 0022 0000 0055 0000
0000080 0054 0000 0046 0000 002d 0000 0033 0000
0000090 0032 0000 0022 0000 003f 0000 003e 0000
00000a0 000a 0000 003c 0000 006d 0000 0065 0000
00000b0 0073 0000 0073 0000 0061 0000 0067 0000
00000c0 0065 0000 003e 0000 0048 0000 0065 0000
00000d0 006c 0000 006c 0000 006f 0000 0020 0000
00000e0 0077 0000 006f 0000 0072 0000 006c 0000
00000f0 0064 0000 0021 0000 003c 0000 002f 0000
0000100 006d 0000 0065 0000 0073 0000 0073 0000
0000110 0061 0000 0067 0000 0065 0000 003e 0000
0000120 000a 0000                              
0000124
  • rexml (3.3.8)
$ ruby test_utf.rb 
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.3.8/lib/rexml/parsers/baseparser.rb:516:in `pull_event': Malformed XML: Content at the start of the document (got '') (REXML::ParseException)
Line: 2
Position: 292
Last 80 unconsumed characters:
<?xml version="1.0" encoding="UTF-32"?> <
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.3.8/lib/rexml/parsers/baseparser.rb:242:in `pull'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.3.8/lib/rexml/parsers/treeparser.rb:21:in `parse'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.3.8/lib/rexml/document.rb:452:in `build'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.3.8/lib/rexml/document.rb:103:in `initialize'
	from test_utf.rb:4:in `new'
	from test_utf.rb:4:in `<main>'
  • rexml (3.2.6)
$ ruby test_utf.rb 
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': #<RuntimeError: Illegal character "\u0000" in raw string "\u0000"> (REXML::ParseException)
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:122:in `initialize'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:47:in `new'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:47:in `parse'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
test_utf.rb:4:in `new'
test_utf.rb:4:in `<main>'
...
Illegal character "\u0000" in raw string "\u0000"
Line: 2
Position: 292
Last 80 unconsumed characters:
<?xml version="1.0" encoding="UTF-32"?> <
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:21:in `parse'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
	from test_utf.rb:4:in `new'
	from test_utf.rb:4:in `<main>'
/Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check': Illegal character "\u0000" in raw string "\u0000" (RuntimeError)
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/text.rb:122:in `initialize'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:47:in `new'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:47:in `parse'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
	from /Users/naitoh/.rbenv/versions/3.3.4/lib/ruby/gems/3.3.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
	from test_utf.rb:4:in `new'
	from test_utf.rb:4:in `<main>'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions