Skip to content

Commit b8942c9

Browse files
committed
Merge pull request #3693 from jreback/io_api
CLN: added io.api for i/o importing functions
2 parents ed7af5c + 03adc86 commit b8942c9

22 files changed

+713
-509
lines changed

RELEASE.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ pandas 0.11.1
3535
GH3606_)
3636
- Support for reading Amazon S3 files. (GH3504_)
3737
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
38+
includes ``to_stata`` DataFrame method, and a ``read_stata`` top-level reader
3839
- Added support for writing in ``to_csv`` and reading in ``read_csv``,
3940
multi-index columns. The ``header`` option in ``read_csv`` now accepts a
4041
list of the rows from which to read the index. Added the option,
@@ -104,6 +105,11 @@ pandas 0.11.1
104105
does not control triggering of summary, similar to < 0.11.0.
105106
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
106107
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)
108+
- io API changes
109+
110+
- added ``pandas.io.api`` for i/o imports
111+
- removed ``Excel`` support to ``pandas.io.excel``
112+
- added top-level ``pd.read_sql`` and ``to_sql`` DataFrame methods
107113

108114
**Bug Fixes**
109115

doc/source/10min.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -699,8 +699,7 @@ Reading from an excel file
699699

700700
.. ipython:: python
701701
702-
xls = ExcelFile('foo.xlsx')
703-
xls.parse('sheet1', index_col=None, na_values=['NA'])
702+
read_excel('foo.xlsx', 'sheet1', index_col=None, na_values=['NA'])
704703
705704
.. ipython:: python
706705
:suppress:

doc/source/api.rst

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,20 @@ File IO
4848

4949
read_table
5050
read_csv
51-
ExcelFile.parse
51+
52+
.. currentmodule:: pandas.io.excel
53+
54+
.. autosummary::
55+
:toctree: generated/
56+
57+
read_excel
58+
59+
.. currentmodule:: pandas.io.stata
60+
61+
.. autosummary::
62+
:toctree: generated/
63+
64+
read_stata
5265

5366
.. currentmodule:: pandas.io.html
5467

@@ -57,15 +70,29 @@ File IO
5770

5871
read_html
5972

73+
SQL
74+
~~~
75+
76+
.. currentmodule:: pandas.io.sql
77+
78+
.. autosummary::
79+
:toctree: generated/
80+
81+
read_sql
82+
6083
HDFStore: PyTables (HDF5)
6184
~~~~~~~~~~~~~~~~~~~~~~~~~
85+
6286
.. currentmodule:: pandas.io.pytables
6387

6488
.. autosummary::
6589
:toctree: generated/
6690

91+
read_hdf
6792
HDFStore.put
93+
HDFStore.append
6894
HDFStore.get
95+
HDFStore.select
6996

7097
Standard moving window functions
7198
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -532,9 +559,11 @@ Serialization / IO / Conversion
532559
DataFrame.load
533560
DataFrame.save
534561
DataFrame.to_csv
562+
DataFrame.to_hdf
535563
DataFrame.to_dict
536564
DataFrame.to_excel
537565
DataFrame.to_html
566+
DataFrame.to_stata
538567
DataFrame.to_records
539568
DataFrame.to_sparse
540569
DataFrame.to_string

doc/source/cookbook.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -32,25 +32,25 @@ Selection
3232

3333
The :ref:`indexing <indexing>` docs.
3434

35-
`Boolean Rows Indexing
35+
Indexing using both row labels and conditionals, see
36+
`here
3637
<http://stackoverflow.com/questions/14725068/pandas-using-row-labels-in-boolean-indexing>`__
37-
Indexing using both row labels and conditionals
3838

39-
`Using loc and iloc in selections
39+
Use loc for label-oriented slicing and iloc positional slicing, see
40+
`here
4041
<https://github.com/pydata/pandas/issues/2904>`__
41-
Use loc for label-oriented slicing and iloc positional slicing
4242

43-
`Extending a panel along the minor axis
43+
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see
44+
`here
4445
<http://stackoverflow.com/questions/15364050/extending-a-pandas-panel-frame-along-the-minor-axis>`__
45-
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions
4646

47-
`Boolean masking in a panel
47+
Mask a panel by using ``np.where`` and then reconstructing the panel with the new masked values
48+
`here
4849
<http://stackoverflow.com/questions/14650341/boolean-mask-in-pandas-panel>`__
49-
Mask a panel by using ``np.where`` and then reconstructing the panel with the new masked values
5050

51-
`Selecting via the complement
51+
Using ``~`` to take the complement of a boolean array, see
52+
`here
5253
<http://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas>`__
53-
``~`` can be used to take the complement of a boolean array
5454

5555
`Efficiently creating columns using applymap
5656
<http://stackoverflow.com/questions/16575868/efficiently-creating-additional-columns-in-a-pandas-dataframe-using-map>`__

doc/source/io.rst

Lines changed: 40 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import csv
1010
from StringIO import StringIO
1111
import pandas as pd
12+
ExcelWriter = pd.ExcelWriter
1213
1314
import numpy as np
1415
np.random.seed(123456)
@@ -27,6 +28,18 @@
2728
IO Tools (Text, CSV, HDF5, ...)
2829
*******************************
2930

31+
The Pandas I/O api is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas``
32+
object. The corresponding ``writer`` functions are object methods that are accessed like ``df.to_csv()``
33+
34+
.. csv-table::
35+
:widths: 12, 15, 15, 15, 15
36+
:delim: ;
37+
38+
Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
39+
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
40+
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
41+
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;
42+
3043
.. _io.read_csv_table:
3144

3245
CSV & Text files
@@ -971,44 +984,48 @@ And then import the data directly to a DataFrame by calling:
971984
Excel files
972985
-----------
973986

974-
The ``ExcelFile`` class can read an Excel 2003 file using the ``xlrd`` Python
987+
The ``read_excel`` method can read an Excel 2003 file using the ``xlrd`` Python
975988
module and use the same parsing code as the above to convert tabular data into
976989
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
977990
advanced strategies
978991

979-
To use it, create the ``ExcelFile`` object:
992+
.. note::
980993

981-
.. code-block:: python
994+
The prior method of accessing Excel is now deprecated as of 0.11.1,
995+
this will work but will be removed in a future version.
982996

983-
xls = ExcelFile('path_to_file.xls')
997+
.. code-block:: python
984998
985-
Then use the ``parse`` instance method with a sheetname, then use the same
986-
additional arguments as the parsers above:
999+
from pandas.io.parsers import ExcelFile
1000+
xls = ExcelFile('path_to_file.xls')
1001+
xls.parse('Sheet1', index_col=None, na_values=['NA'])
9871002
988-
.. code-block:: python
1003+
Replaced by
1004+
1005+
.. code-block:: python
9891006
990-
xls.parse('Sheet1', index_col=None, na_values=['NA'])
1007+
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
9911008
9921009
To read sheets from an Excel 2007 file, you can pass a filename with a ``.xlsx``
9931010
extension, in which case the ``openpyxl`` module will be used to read the file.
9941011

9951012
It is often the case that users will insert columns to do temporary computations
996-
in Excel and you may not want to read in those columns. `ExcelFile.parse` takes
1013+
in Excel and you may not want to read in those columns. `read_excel` takes
9971014
a `parse_cols` keyword to allow you to specify a subset of columns to parse.
9981015

9991016
If `parse_cols` is an integer, then it is assumed to indicate the last column
10001017
to be parsed.
10011018

10021019
.. code-block:: python
10031020
1004-
xls.parse('Sheet1', parse_cols=2, index_col=None, na_values=['NA'])
1021+
read_excel('path_to_file.xls', 'Sheet1', parse_cols=2, index_col=None, na_values=['NA'])
10051022
10061023
If `parse_cols` is a list of integers, then it is assumed to be the file column
10071024
indices to be parsed.
10081025

10091026
.. code-block:: python
10101027
1011-
xls.parse('Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
1028+
read_excel('path_to_file.xls', Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
10121029
10131030
To write a DataFrame object to a sheet of an Excel file, you can use the
10141031
``to_excel`` instance method. The arguments are largely the same as ``to_csv``
@@ -1883,16 +1900,13 @@ Writing to STATA format
18831900
18841901
.. _io.StataWriter:
18851902
1886-
The function :func:'~pandas.io.StataWriter.write_file' will write a DataFrame
1887-
into a .dta file. The format version of this file is always the latest one,
1888-
115.
1903+
The method ``to_stata`` will write a DataFrame into a .dta file.
1904+
The format version of this file is always the latest one, 115.
18891905
18901906
.. ipython:: python
18911907
1892-
from pandas.io.stata import StataWriter
18931908
df = DataFrame(randn(10,2),columns=list('AB'))
1894-
writer = StataWriter('stata.dta',df)
1895-
writer.write_file()
1909+
df.to_stata('stata.dta')
18961910
18971911
Reading from STATA format
18981912
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1901,24 +1915,21 @@ Reading from STATA format
19011915
19021916
.. versionadded:: 0.11.1
19031917
1904-
The class StataReader will read the header of the given dta file at
1905-
initialization. Its function :func:'~pandas.io.StataReader.data' will
1906-
read the observations, converting them to a DataFrame which is returned:
1918+
The top-level function ``read_stata`` will read a dta format file
1919+
and return a DataFrame:
19071920
19081921
.. ipython:: python
19091922
1910-
from pandas.io.stata import StataReader
1911-
reader = StataReader('stata.dta')
1912-
reader.data()
1923+
pd.read_stata('stata.dta')
19131924
1914-
The parameter convert_categoricals indicates wheter value labels should be
1915-
read and used to create a Categorical variable from them. Value labels can
1916-
also be retrieved by the function variable_labels, which requires data to be
1917-
called before.
1925+
Currently the ``index`` is retrieved as a column on read back.
19181926
1919-
The StataReader supports .dta Formats 104, 105, 108, 113-115.
1927+
The parameter ``convert_categoricals`` indicates wheter value labels should be
1928+
read and used to create a ``Categorical`` variable from them. Value labels can
1929+
also be retrieved by the function ``variable_labels``, which requires data to be
1930+
called before (see ``pandas.io.stata.StataReader``).
19201931
1921-
Alternatively, the function :func:'~pandas.io.read_stata' can be used
1932+
The StataReader supports .dta Formats 104, 105, 108, 113-115.
19221933
19231934
.. ipython:: python
19241935
:suppress:

doc/source/v0.10.0.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
.. _whatsnew_0100:
22

3+
.. ipython:: python
4+
:suppress:
5+
6+
from StringIO import StringIO
7+
38
v0.10.0 (December 17, 2012)
49
---------------------------
510

doc/source/v0.11.1.txt

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,19 @@ v0.11.1 (??)
66
This is a minor release from 0.11.0 and includes several new features and
77
enhancements along with a large number of bug fixes.
88

9+
The I/O api is now much more consistent with the following top-level reading
10+
functions available, e.g. ``pd.read_csv``, and the counterpart writers are
11+
available as object methods, e.g. ``df.to_csv``
12+
13+
.. csv-table::
14+
:widths: 12, 15, 15, 15, 15
15+
:delim: ;
16+
17+
Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
18+
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
19+
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
20+
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;
21+
922
API changes
1023
~~~~~~~~~~~
1124

@@ -74,6 +87,31 @@ API changes
7487
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
7588
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)
7689

90+
- IO api
91+
92+
- added top-level function ``read_excel`` to replace the following,
93+
The original API is deprecated and will be removed in a future version
94+
95+
.. code-block:: python
96+
97+
from pandas.io.parsers import ExcelFile
98+
xls = ExcelFile('path_to_file.xls')
99+
xls.parse('Sheet1', index_col=None, na_values=['NA'])
100+
101+
With
102+
103+
.. code-block:: python
104+
105+
import pandas as pd
106+
pd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
107+
108+
- added top-level function ``read_sql`` that is equivalent to the following
109+
110+
.. code-block:: python
111+
112+
from pandas.io.sql import read_frame
113+
read_frame(....)
114+
77115
Enhancements
78116
~~~~~~~~~~~~
79117

@@ -109,6 +147,8 @@ Enhancements
109147
a list or tuple.
110148

111149
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
150+
accessable via ``read_stata`` top-level function for reading,
151+
and ``to_stata`` DataFrame method for writing
112152

113153
- ``DataFrame.replace()`` now allows regular expressions on contained
114154
``Series`` with object dtype. See the examples section in the regular docs
@@ -218,7 +258,7 @@ Bug Fixes
218258
.. ipython :: python
219259

220260
df = DataFrame({'a': list('ab..'), 'b': [1, 2, 3, 4]})
221-
df.replace(regex=r'\s*\.\s*', value=nan)
261+
df.replace(regex=r'\s*\.\s*', value=np.nan)
222262

223263
to replace all occurrences of the string ``'.'`` with zero or more
224264
instances of surrounding whitespace with ``NaN``.
@@ -227,7 +267,7 @@ Bug Fixes
227267

228268
.. ipython :: python
229269

230-
df.replace('.', nan)
270+
df.replace('.', np.nan)
231271

232272
to replace all occurrences of the string ``'.'`` with ``NaN``.
233273

pandas/__init__.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,8 @@
2828
from pandas.sparse.api import *
2929
from pandas.stats.api import *
3030
from pandas.tseries.api import *
31+
from pandas.io.api import *
3132

32-
from pandas.io.parsers import (read_csv, read_table, read_clipboard,
33-
read_fwf, to_clipboard, ExcelFile,
34-
ExcelWriter)
35-
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
36-
from pandas.io.html import read_html
3733
from pandas.util.testing import debug
3834

3935
from pandas.tools.describe import value_range

0 commit comments

Comments
 (0)