Closed
Description
MWE:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species",
title="Automatic Labels Based on Data Frame Column Names ≥ 2")
fig.show()
fig.write_html('utf-bug.html')
Observations:
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_47908\2411983485.py in <module>
5 title="Automatic Labels Based on Data Frame Column Names ≥ 2")
6 fig.show()
----> 7 fig.write_html('utf-bug.html')
D:\miniconda3\envs\my-env\lib\site-packages\plotly\basedatatypes.py in write_html(self, *args, **kwargs)
3706 import plotly.io as pio
3707
-> 3708 return pio.write_html(self, *args, **kwargs)
3709
3710 def to_image(self, *args, **kwargs):
D:\miniconda3\envs\my-env\lib\site-packages\plotly\io\_html.py in write_html(fig, file, config, auto_play, include_plotlyjs, include_mathjax, post_script, full_html, animation_opts, validate, default_width, default_height, auto_open, div_id)
534 # Write HTML string
535 if path is not None:
--> 536 path.write_text(html_str)
537 else:
538 file.write(html_str)
D:\miniconda3\envs\my-env\lib\pathlib.py in write_text(self, data, encoding, errors)
1239 data.__class__.__name__)
1240 with self.open(mode='w', encoding=encoding, errors=errors) as f:
-> 1241 return f.write(data)
1242
1243 def touch(self, mode=0o666, exist_ok=True):
D:\miniconda3\envs\my-env\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\u2265' in position 3692519: character maps to <undefined>
It seems like the call to path.write_text(html_str)
does not specify the encoding and ends up trying to write UTF-8 characters into a cp1252-encoded stream on Windows. Apparently, cp1252 is the default Windows choice. The ≥
character that breaks it all is present in both encoding tables, so saving should be possible.
Note that the figure shows correctly before saving, so it's only a problem with HTML writing.
Also, this issue seems similar, though not the same:
#1289
Metadata
Metadata
Assignees
Labels
No labels