Closed
Description
pandas v0.14.0 (May 31 , 2014) seems uncapable of importing Stata 13 datasets although according to this http://pandas.pydata.org/pandas-docs/stable/whatsnew.html, it should. Stata 12 files can be imported without problems.
The output of running this
import pandas
pandas.show_versions()
dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta')
follows:
%run D:/Datos/RFERRER/Desktop/import_stata13.py
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.14.0
nose: 1.3.0
Cython: 0.19.2
numpy: 1.8.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 1.2.1
sphinx: 1.2.2
patsy: 0.2.0
scikits.timeseries: 0.91.3
dateutil: 2.2
pytz: 2013.8
bottleneck: None
tables: 2.4.0
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.2.3
bs4: None
html5lib: 0.95-dev
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.3
pymysql: None
psycopg2: None
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\openpyxl\__init__.py:31: UserWarning: The installed version of lxml is too old to be used with openpyxl
warnings.warn("The installed version of lxml is too old to be used with openpyxl")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\rferrer\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.1938.win-x86_64\lib\site-packages\IPython\utils\py3compat.pyc in execfile(fname, glob, loc)
195 else:
196 filename = fname
--> 197 exec compile(scripttext, filename, 'exec') in glob, loc
198 else:
199 def execfile(fname, *where):
D:\Datos\RFERRER\Desktop\import_stata13.py in <module>()
3 pandas.show_versions()
4
----> 5 dta = pandas.io.stata.read_stata('D:\\Datos\\rferrer\\Desktop\\myauto.dta')
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, encoding, index)
45 identifier of column that should be used as index of the DataFrame
46 """
---> 47 reader = StataReader(filepath_or_buffer, encoding)
48
49 return reader.data(convert_dates, convert_categoricals, index)
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in __init__(self, path_or_buf, encoding)
455 self.path_or_buf = path_or_buf
456
--> 457 self._read_header()
458
459 def _read_header(self):
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _read_header(self)
657
658 """Calculate size of a data record."""
--> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist)
660
661 def _calcsize(self, fmt):
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in <lambda>(x)
657
658 """Calculate size of a data record."""
--> 659 self.col_sizes = lmap(lambda x: self._calcsize(x), self.typlist)
660
661 def _calcsize(self, fmt):
C:\Users\rferrer\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\io\stata.pyc in _calcsize(self, fmt)
661 def _calcsize(self, fmt):
662 return (type(fmt) is int and fmt
--> 663 or struct.calcsize(self.byteorder + fmt))
664
665 def _col_size(self, k=None):
TypeError: cannot concatenate 'str' and 'NoneType' objects
The dataset myauto.dta
is just the auto
dataset made available running sysuse auto
within Stata.
The problem is originally documented here: http://stackoverflow.com/questions/24053652/pandas-and-stata-13-files.
My Python is set up with Enthough Canopy 1.4.0 (64 bit).