Open
Description
Bug report
Bug description:
>>> "₩".encode("euc_kr")
b'\xa3\xdc'
while:
>>> "₩".encode("euc_kr")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'euc_kr' codec can't encode character '\u20a9' in position 0: illegal multibyte sequence
I would have expected a similar behavior as the ¥"
in shift-jis where python execute:
>>> "¥".encode("shift-jis").decode("shift-jis")
'\\'
I did report it as a bug, but being non-native korean I might have misunderstood the famous backslash-is-won sign whole issue in python.
As a side note here iconv behavior on my ubuntu setup (WON SIGN becomes FULLWIDTH WON SIGN):
$ echo -n "₩" | hexdump
0000000 82e2 00a9
0000003
$ echo -n "₩" | iconv -f utf-8 -t euc-kr | hexdump
0000000 dca3
0000002
$ echo -n "₩" | iconv -f utf-8 -t euc-kr | iconv -f euc-kr -t utf-8 | hexdump
0000000 bfef 00a6
0000003
CPython versions tested on:
3.10
Operating systems tested on:
Linux, Windows