Skip to content

Decouple encodings from JSON parsing #6

Open
@cdunn2001

Description

@cdunn2001

A. For reading, we should first parse JSON into nested tables, arrays, and strings. We should then interpret the strings only as needed.
B. For writing, the caller should generate the string, and we should simply store it into the nested data structures. Utility libraries could help the caller convert numbers to strings.

As an experiment, I intend to separate these layers. The jsoncpp API would remain, for convenience, but under the covers there can be an extremely fast, efficient reader (maybe based on gason), which can be used directly by anyone who wants unlimited length numbers. One thing which people rarely notice in the JSON standard is that it says nothing about how long numbers can be. The whole issue of converting between ints/floats and strings is implementation-specific.

The other thing is that, in my opinion, we can simply matters by having two versions: One which reads/writes ASCII, and one which reads/writes UTF-32. rapidjson is an example of a library which goes overboard on unicode support. The encoding is threaded through the entire library as a template parameter. Way too complicated!

The problem is in parsing a JSON String. We have to support both standard Unicode characters and special JSON escapes. With UTF-8, we would have to skip variable numbers of bytes while looking for the closing quotation mark, unless we restrict ourselves to ASCII. With UTF-16, we have to worry about "surrogate pairs". But why bother? If someone needs real unicode strings, let them pre- and post-process in UTF-32, which is the easiest to deal with. If they want efficiency, let them be restricted to ASCII.

Those are the basic ideas. An example will make things more clear.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions