Skip to content

Writing multibyte strings #3

Closed
Closed
@middlebrain

Description

@middlebrain

The current implementation of writeString and maybe also writeProperty and writeKey has the problem, that the length (s:<length>) of multibyte strings is not written correctly. This results on the PHP side in unserialize() [function.unserialize]: Error at offset ...

Unfortunately, you must give the length of a serialized string in bytes instead of characters. However, the byte count depends on the character encoding.

When testing the serializer with a servlet I didn't immediately notice this, because the standard encoding of the output is ISO-8859-1 (one character is always one byte). Only when switching to UTF-8 (response. setCharacterEncoding("UTF-8") and using german umlauts the problem came up.

My instant fix for UTF-8 looks like this:

Writer.java:

...
import static java.nio.charset.StandardCharsets.UTF_8;
...
    public void writeString(String value)
    {
        setState(state.value());

        buffer.append("s:");
        buffer.append(value.getBytes(UTF_8).length);
        buffer.append(":\"");
        buffer.append(value);
        buffer.append("\";");
    }
...

It would certainly be better if the desired character encoding could be given to the writer.

Translated with www.DeepL.com/Translator.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions