Skip to content

HTML writing fails on Windows if plot title contains specific UTF characters (≥) #3898

Closed
@Alexander-Serov

Description

@Alexander-Serov

MWE:

import plotly.express as px

df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species",
                title="Automatic Labels Based on Data Frame Column Names ≥ 2")
fig.show()
fig.write_html('utf-bug.html')

Observations:

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_47908\2411983485.py in <module>
      5                 title="Automatic Labels Based on Data Frame Column Names ≥ 2")
      6 fig.show()
----> 7 fig.write_html('utf-bug.html')

D:\miniconda3\envs\my-env\lib\site-packages\plotly\basedatatypes.py in write_html(self, *args, **kwargs)
   3706         import plotly.io as pio
   3707 
-> 3708         return pio.write_html(self, *args, **kwargs)
   3709 
   3710     def to_image(self, *args, **kwargs):

D:\miniconda3\envs\my-env\lib\site-packages\plotly\io\_html.py in write_html(fig, file, config, auto_play, include_plotlyjs, include_mathjax, post_script, full_html, animation_opts, validate, default_width, default_height, auto_open, div_id)
    534     # Write HTML string
    535     if path is not None:
--> 536         path.write_text(html_str)
    537     else:
    538         file.write(html_str)

D:\miniconda3\envs\my-env\lib\pathlib.py in write_text(self, data, encoding, errors)
   1239                             data.__class__.__name__)
   1240         with self.open(mode='w', encoding=encoding, errors=errors) as f:
-> 1241             return f.write(data)
   1242 
   1243     def touch(self, mode=0o666, exist_ok=True):

D:\miniconda3\envs\my-env\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u2265' in position 3692519: character maps to <undefined>

It seems like the call to path.write_text(html_str) does not specify the encoding and ends up trying to write UTF-8 characters into a cp1252-encoded stream on Windows. Apparently, cp1252 is the default Windows choice. The character that breaks it all is present in both encoding tables, so saving should be possible.

Note that the figure shows correctly before saving, so it's only a problem with HTML writing.

Also, this issue seems similar, though not the same:
#1289

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions