Description
Code Sample
df = pd.DataFrame({'Int': [1, 2, 3], 'Period': pd.period_range(start="2019-01", end="2019-03", freq="M")})
df.to_csv("PeriodDtype.csv")
pd.read_csv("PeriodDtype.csv", dtype={"Int": np.int64, "Period": pd.PeriodDtype("M")})
Problem description
Using pandas 0.24.2, I wrote a simple data frame with the following dtypes into a csv file,
Int int64
Period period[M]
dtype: object
When I tried to read it back in, I found that read_csv()
could not parse PeriodDtype("M")
. I got the following error message:
NotImplementedError: Extension Array: <class 'pandas.core.arrays.period.PeriodArray'>
must implement _from_sequence_of_strings in order to be used in parser methods
I saw a similar issue #24542 raised for Datetime dtype. It seems that _from_sequence_of_strings()
is also not defined for PeriodArray
, which prevents parsing columns with PeriodDtype
.
I think adding _from_sequence_of_strings()
for PeriodArray
would be a good enhancement. If that is the case I would be interested in making that change.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.16.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None