Skip to content

FULLWIDTH WON SIGN vs WON SIGN #114428

Open
Open
@malaterre

Description

@malaterre

Bug report

Bug description:

>>> "₩".encode("euc_kr")
b'\xa3\xdc'

while:

>>> "₩".encode("euc_kr")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'euc_kr' codec can't encode character '\u20a9' in position 0: illegal multibyte sequence

I would have expected a similar behavior as the ¥" in shift-jis where python execute:

>>> "¥".encode("shift-jis").decode("shift-jis")
'\\'

I did report it as a bug, but being non-native korean I might have misunderstood the famous backslash-is-won sign whole issue in python.

As a side note here iconv behavior on my ubuntu setup (WON SIGN becomes FULLWIDTH WON SIGN):

$ echo -n "₩" | hexdump
0000000 82e2 00a9
0000003
$ echo -n "₩" | iconv -f utf-8 -t euc-kr | hexdump
0000000 dca3
0000002
$ echo -n "₩" | iconv -f utf-8 -t euc-kr | iconv -f euc-kr -t utf-8 | hexdump
0000000 bfef 00a6
0000003

CPython versions tested on:

3.10

Operating systems tested on:

Linux, Windows

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions