Closed
Description
When trying to write a Stata dataset with strings containing upper latin-1 characters (which are allowed by the Stata format), I get an encoding error.
import pandas.io.stata as sta
sr = sta.StataReader('pandas/pandas/io/tests/data/stata1_encoding.dta')
df = sr.data()
sw = sta.StataWriter('stata1_encoding_dup.dta', df)
sw.write_file()
I get the following output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1242, in write_file
self._write_data_nodates()
File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1326, in _write_data_nodates
self._write(var)
File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1104, in _write
self._file.write(to_write)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1:ordinal not in range(128)
Machine info:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: None
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None