Open
Description
In utf8_unicode_inplace_ex(), we have the following code:
c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
/* single byte unicode (7 bit ASCII equivilent) has no validation */
count++;
if (count <= len) {
if (c == 0) *data = x2c(&c);
else *data++ = c;
}
}
The code if (c == 0) *data = x2c(&c);
is wrong.
utf the input string.
If input is "x", we copy the character => correct.
If input is "\x00\xA0", we call the x2c function which expects an hexadecimal string as input (like "20" for a space).
Furthermore, the output buffer (data) is not increased, so the character is overwritten on the next loop.
The correct code is
if (c == 0) {
sprintf(data, "%%u%04x", utf[1]);
count += 4;
data += 6;
i++;
*changed = 1;
}
Remarks:
- Using "%04x" replaces a lot of useless code used in other parts of the code
- I didn't add a check
if (count <= len)
aftercount += 4;
because it cannot overflow if we fix the code:len = input_len * 6 + 1;
instead oflen = input_len * 4 + 1;
(4 bytes -> %u1234). The check (and the variable count) could thus be removed everywhere - After filling data, we have several checks (/* invalid UTF-8 character number range (RFC 3629) /, / check for overlong */, ...) with this code
count++; *data++ = c;
What's the point? This copies the character after the ones we already copied. This looks wrong to me. Any reason to keep this code?