Description
Many thanks for the vast range of I/O possibilities that pandas allows! I just met one edge (or maybe not so edge) case where an extra parameter could save some work, especially for newbie users.
Code Sample, a copy-pastable example if possible
pd.DataFrame({"a": ['☿']}).to_html("a.html")
# Would be nice to have:
# pd.DataFrame({"a": ['☿']}).to_html("a.html", encoding="utf-8")
Problem description
With the current signature of DataFrame.to_html
, it is not possible to easily write non-ascii / non-latin1 characters to HTML directly or, more generally, to specify the output encoding. It is necessary to pass an open file:
with open("a.html", "w", encoding="utf-8") as out:
pd.DataFrame({"a": ['☿']}).to_html(out)
It would be nice to have a parameter (admittedly, a 24th one) to allow this, consistent with the to_csv
one. I see that there is some discussion on parameter consistency in #15008 and #28377 (hopefully, I did my searching well and this is not a duplicate issue), so it might be against the design principles. Do you think this would be a viable idea? If yes, I am ready to implement it.
Note: It is then questionable, whether an explicit encoding should also result in a correct <meta charset...>
tag being added to the file.
My motivation: I am currently writing lesson materials for an EDA course and wanted to show how easy it is to export data frames (by chance containing planet symbols but can be any non-Western character) to any format ;-)
Thanks,
Jan
Expected Output
None
Unexpected output
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-67-2972ac7a12d7> in <module>
----> 1 pd.DataFrame({"a": ['☿']}).to_html("a.html")
~\Miniconda3\lib\site-packages\pandas\core\frame.py in to_html(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, bold_rows, classes, escape, notebook, border, table_id, render_links)
2315 )
2316 # TODO: a generic formatter wld b in DataFrameFormatter
-> 2317 formatter.to_html(classes=classes, notebook=notebook, border=border)
2318
2319 if buf is None:
~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in to_html(self, classes, notebook, border)
843 elif isinstance(self.buf, str):
844 with open(self.buf, "w") as f:
--> 845 buffer_put_lines(f, html)
846 else:
847 raise TypeError("buf is not a file name and it has no write " " method")
~\Miniconda3\lib\site-packages\pandas\io\formats\format.py in buffer_put_lines(buf, lines)
1808 if any(isinstance(x, str) for x in lines):
1809 lines = [str(x) for x in lines]
-> 1810 buf.write("\n".join(lines))
~\Miniconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\u263f' in position 193: character maps to <undefined>
Output of pd.show_versions()
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.7
tables : None
xarray : 0.14.0
xlrd : None
xlwt : 1.3.0
xlsxwriter : None