Skip to content

Error when writing non-ascii allowed characters to Stata dta #7286

Closed
@bquistorff

Description

@bquistorff

When trying to write a Stata dataset with strings containing upper latin-1 characters (which are allowed by the Stata format), I get an encoding error.

import pandas.io.stata as sta
sr = sta.StataReader('pandas/pandas/io/tests/data/stata1_encoding.dta')
df = sr.data()
sw = sta.StataWriter('stata1_encoding_dup.dta', df)
sw.write_file()

I get the following output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1242, in write_file
    self._write_data_nodates()
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1326, in _write_data_nodates
    self._write(var)
  File "C:\Python27\lib\site-packages\pandas\io\stata.py", line 1104, in _write
    self._file.write(to_write)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1:ordinal not in range(128)

Machine info:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: None
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions