Skip to content

Unable to import Stata 13 database files with read_stata() #7360

Closed
@refp16

Description

@refp16

pandas v0.14.0 (May 31 , 2014) seems uncapable of importing Stata 13 datasets although according to this http://pandas.pydata.org/pandas-docs/stable/whatsnew.html, it should. Stata 12 files can be imported without problems.

The output of running this

import pandas
pandas.show_versions()
dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta')

follows:

%run D:/Datos/RFERRER/Desktop/import_stata13.py

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.0
Cython: 0.19.2
numpy: 1.8.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 1.2.1
sphinx: 1.2.2
patsy: 0.2.0
scikits.timeseries: 0.91.3
dateutil: 2.2
pytz: 2013.8
bottleneck: None
tables: 2.4.0
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.2.3
bs4: None
html5lib: 0.95-dev
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.3
pymysql: None
psycopg2: None
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\openpyxl\__init__.py:31: UserWarning: The installed version of lxml is too old to be used with openpyxl
  warnings.warn("The installed version of lxml is too old to be used with openpyxl")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\Users\rferrer\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.1938.win-x86_64\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
    195             else:
    196                 filename = fname
--> 197             exec compile(scripttext, filename, 'exec') in glob, loc
    198     else:
    199         def execfile(fname, *where):

D:\Datos\RFERRER\Desktop\import_stata13.py in <module>()
      3 pandas.show_versions()
      4 
----> 5 dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta')

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, encoding, index)
     45         identifier of column that should be used as index of the DataFrame
     46     """
---> 47     reader = StataReader(filepath_or_buffer, encoding)
     48 
     49     return reader.data(convert_dates, convert_categoricals, index)

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in __init__(self, path_or_buf, encoding)
    455             self.path_or_buf = path_or_buf
    456 
--> 457         self._read_header()
    458 
    459     def _read_header(self):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _read_header(self)
    657 
    658         """Calculate size of a data record."""
--> 659         self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist)
    660 
    661     def _calcsize(self, fmt):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in <lambda>(x)
    657 
    658         """Calculate size of a data record."""
--> 659         self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist)
    660 
    661     def _calcsize(self, fmt):

C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _calcsize(self, fmt)
    661     def _calcsize(self, fmt):
    662         return (type(fmt) is int and fmt
--> 663                 or struct.calcsize(self.byteorder + fmt))
    664 
    665     def _col_size(self, k=None):

TypeError: cannot concatenate 'str' and 'NoneType' objects

The dataset myauto.dta is just the auto dataset made available running sysuse auto within Stata.

The problem is originally documented here: http://stackoverflow.com/questions/24053652/pandas-and-stata-13-files.

My Python is set up with Enthough Canopy 1.4.0 (64 bit).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions