Skip to content

Commit 97f8495

Browse files
committed
UCS-4 conversion does not pass BOM through to output
This is to match the way that we handle UCS-2. When a BOM is found at the beginning of a 'UCS-2' string (NOT 'UCS-2BE' or 'UCS-2LE'), we take note of the intended byte order and handle the string accordingly, but do NOT emit a BOM to the output. Rather, we just use the default byte order for the requested output encoding. Some might argue that if the input string used a BOM, and we are emitting output in a text encoding where both big-endian and little-endian byte orders are possible, we should include a BOM in the output string. To such hypothetical debaters of minutiae, I can only offer you a shoulder shrug. No reasonable program which handles UCS-2 and UCS-4 text should require a BOM. Really, the concept of the BOM is a poor idea and should not have been included in Unicode. Standardizing on a single byte order would have been much better, similar to 'network byte order' for the Internet Protocol. But this is not the place to speak at length of such things.
1 parent e6f1a72 commit 97f8495

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

ext/mbstring/libmbfl/filters/mbfilter_ucs4.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -185,11 +185,10 @@ int mbfl_filt_conv_ucs4_wchar(int c, mbfl_convert_filter *filter)
185185
} else {
186186
filter->status = 0x100; /* little-endian */
187187
}
188-
CK((*filter->output_function)(0xfeff, filter->data));
189-
} else {
190-
filter->status &= ~0xff;
188+
} else if (n != 0xfeff) {
191189
CK((*filter->output_function)(n, filter->data));
192190
}
191+
filter->status &= ~0xff;
193192
break;
194193
}
195194

0 commit comments

Comments
 (0)