Skip to content

Commit e4ee979

Browse files
committed
0x5C is not a Yen sign in CP932 (or CP51932)
When Microsoft created CP932 (their version of Shift-JIS), they explicitly used bytes 0-0x7F to represent ASCII characters rather than JIS X 0201 characters. So when converting Unicode to CP932, it is not correct to convert U+00A5 to CP932 0x5C. Fortunately, CP932 does have a multi-byte FULLWIDTH YEN SIGN character which we can use instead. CP51932 uses the same extended character set as CP932; while CP932 is MicroSoft's extended version of Shift-JIS, CP51932 is their extended version of EUC-JP. So the same reasoning applies to CP51932.
1 parent 315d48b commit e4ee979

File tree

4 files changed

+9
-4
lines changed

4 files changed

+9
-4
lines changed

ext/mbstring/libmbfl/filters/mbfilter_cp51932.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,8 +214,8 @@ mbfl_filt_conv_wchar_cp51932(int c, mbfl_convert_filter *filter)
214214
}
215215
if (s1 >= 0x8080) s1 = -1; /* we don't support JIS X0213 */
216216
if (s1 <= 0) {
217-
if (c == 0xa5) { /* YEN SIGN */
218-
s1 = 0x005c; /* YEN SIGN */
217+
if (c == 0xa5) { /* YEN SIGN */
218+
s1 = 0x216F; /* FULLWIDTH YEN SIGN */
219219
} else if (c == 0x203e) { /* OVER LINE */
220220
s1 = 0x007e; /* FULLWIDTH MACRON */
221221
} else if (c == 0xff3c) { /* FULLWIDTH REVERSE SOLIDUS */

ext/mbstring/libmbfl/filters/mbfilter_cp932.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,8 +251,8 @@ mbfl_filt_conv_wchar_cp932(int c, mbfl_convert_filter *filter)
251251
s2 = 1;
252252
}
253253
if (s1 <= 0) {
254-
if (c == 0xa5) { /* YEN SIGN */
255-
s1 = 0x005c; /* YEN SIGN */
254+
if (c == 0xa5) { /* YEN SIGN */
255+
s1 = 0x216F; /* FULLWIDTH YEN SIGN */
256256
} else if (c == 0x203e) { /* OVER LINE */
257257
s1 = 0x007e; /* FULLWIDTH MACRON */
258258
} else if (c == 0xff3c) { /* FULLWIDTH REVERSE SOLIDUS */

ext/mbstring/tests/cp51932_encoding.phpt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ unset($fromUnicode["\x30\x94"]); // Don't map hiragana vu to katakana vu
8484
for ($i = 0; $i <= 0x7F; $i++)
8585
$validChars[chr($i)] = "\x00" . chr($i);
8686

87+
/* U+00A5 is YEN SIGN; convert to FULLWIDTH YEN SIGN */
88+
$fromUnicode["\x00\xA5"] = "\xA1\xEF";
89+
8790
testAllValidChars($validChars, 'CP51932', 'UTF-16BE', false);
8891
testAllValidChars($fromUnicode, 'UTF-16BE', 'CP51932', false);
8992
echo "CP51932 verification and conversion works on all valid characters\n";

ext/mbstring/tests/cp932_encoding.phpt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ for ($i = 0xF0; $i <= 0xF9; $i++) {
3030
$fromUnicode["\x00\xA2"] = "\x81\x91";
3131
/* U+00A3 is POUND SIGN; convert to FULLWIDTH POUND SIGN */
3232
$fromUnicode["\x00\xA3"] = "\x81\x92";
33+
/* U+00A5 is YEN SIGN; convert to FULLWIDTH YEN SIGN */
34+
$fromUnicode["\x00\xA5"] = "\x81\x8F";
3335

3436
/* We map the JIS X 0208 FULLWIDTH TILDE to U+FF5E (FULLWIDTH TILDE)
3537
* But when converting Unicode to CP932, we also accept U+301C (WAVE DASH) */

0 commit comments

Comments
 (0)