Skip to content

CLN: added io.api for i/o importing functions #3693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ pandas 0.11.1
GH3606_)
- Support for reading Amazon S3 files. (GH3504_)
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
includes ``to_stata`` DataFrame method, and a ``read_stata`` top-level reader
- Added support for writing in ``to_csv`` and reading in ``read_csv``,
multi-index columns. The ``header`` option in ``read_csv`` now accepts a
list of the rows from which to read the index. Added the option,
Expand Down Expand Up @@ -104,6 +105,11 @@ pandas 0.11.1
does not control triggering of summary, similar to < 0.11.0.
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)
- io API changes

- added ``pandas.io.api`` for i/o imports
- removed ``Excel`` support to ``pandas.io.excel``
- added top-level ``pd.read_sql`` and ``to_sql`` DataFrame methods

**Bug Fixes**

Expand Down
3 changes: 1 addition & 2 deletions doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -699,8 +699,7 @@ Reading from an excel file

.. ipython:: python

xls = ExcelFile('foo.xlsx')
xls.parse('sheet1', index_col=None, na_values=['NA'])
read_excel('foo.xlsx', 'sheet1', index_col=None, na_values=['NA'])

.. ipython:: python
:suppress:
Expand Down
31 changes: 30 additions & 1 deletion doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,20 @@ File IO

read_table
read_csv
ExcelFile.parse

.. currentmodule:: pandas.io.excel

.. autosummary::
:toctree: generated/

read_excel

.. currentmodule:: pandas.io.stata

.. autosummary::
:toctree: generated/

read_stata

.. currentmodule:: pandas.io.html

Expand All @@ -57,15 +70,29 @@ File IO

read_html

SQL
~~~

.. currentmodule:: pandas.io.sql

.. autosummary::
:toctree: generated/

read_sql

HDFStore: PyTables (HDF5)
~~~~~~~~~~~~~~~~~~~~~~~~~

.. currentmodule:: pandas.io.pytables

.. autosummary::
:toctree: generated/

read_hdf
HDFStore.put
HDFStore.append
HDFStore.get
HDFStore.select

Standard moving window functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -532,9 +559,11 @@ Serialization / IO / Conversion
DataFrame.load
DataFrame.save
DataFrame.to_csv
DataFrame.to_hdf
DataFrame.to_dict
DataFrame.to_excel
DataFrame.to_html
DataFrame.to_stata
DataFrame.to_records
DataFrame.to_sparse
DataFrame.to_string
Expand Down
20 changes: 10 additions & 10 deletions doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,25 +32,25 @@ Selection

The :ref:`indexing <indexing>` docs.

`Boolean Rows Indexing
Indexing using both row labels and conditionals, see
`here
<http://stackoverflow.com/questions/14725068/pandas-using-row-labels-in-boolean-indexing>`__
Indexing using both row labels and conditionals

`Using loc and iloc in selections
Use loc for label-oriented slicing and iloc positional slicing, see
`here
<https://github.com/pydata/pandas/issues/2904>`__
Use loc for label-oriented slicing and iloc positional slicing

`Extending a panel along the minor axis
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see
`here
<http://stackoverflow.com/questions/15364050/extending-a-pandas-panel-frame-along-the-minor-axis>`__
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions

`Boolean masking in a panel
Mask a panel by using ``np.where`` and then reconstructing the panel with the new masked values
`here
<http://stackoverflow.com/questions/14650341/boolean-mask-in-pandas-panel>`__
Mask a panel by using ``np.where`` and then reconstructing the panel with the new masked values

`Selecting via the complement
Using ``~`` to take the complement of a boolean array, see
`here
<http://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas>`__
``~`` can be used to take the complement of a boolean array

`Efficiently creating columns using applymap
<http://stackoverflow.com/questions/16575868/efficiently-creating-additional-columns-in-a-pandas-dataframe-using-map>`__
Expand Down
69 changes: 40 additions & 29 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import csv
from StringIO import StringIO
import pandas as pd
ExcelWriter = pd.ExcelWriter

import numpy as np
np.random.seed(123456)
Expand All @@ -27,6 +28,18 @@
IO Tools (Text, CSV, HDF5, ...)
*******************************

The Pandas I/O api is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas``
object. The corresponding ``writer`` functions are object methods that are accessed like ``df.to_csv()``

.. csv-table::
:widths: 12, 15, 15, 15, 15
:delim: ;

Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;

.. _io.read_csv_table:

CSV & Text files
Expand Down Expand Up @@ -971,44 +984,48 @@ And then import the data directly to a DataFrame by calling:
Excel files
-----------

The ``ExcelFile`` class can read an Excel 2003 file using the ``xlrd`` Python
The ``read_excel`` method can read an Excel 2003 file using the ``xlrd`` Python
module and use the same parsing code as the above to convert tabular data into
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
advanced strategies

To use it, create the ``ExcelFile`` object:
.. note::

.. code-block:: python
The prior method of accessing Excel is now deprecated as of 0.11.1,
this will work but will be removed in a future version.

xls = ExcelFile('path_to_file.xls')
.. code-block:: python

Then use the ``parse`` instance method with a sheetname, then use the same
additional arguments as the parsers above:
from pandas.io.parsers import ExcelFile
xls = ExcelFile('path_to_file.xls')
xls.parse('Sheet1', index_col=None, na_values=['NA'])

.. code-block:: python
Replaced by

.. code-block:: python

xls.parse('Sheet1', index_col=None, na_values=['NA'])
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

To read sheets from an Excel 2007 file, you can pass a filename with a ``.xlsx``
extension, in which case the ``openpyxl`` module will be used to read the file.

It is often the case that users will insert columns to do temporary computations
in Excel and you may not want to read in those columns. `ExcelFile.parse` takes
in Excel and you may not want to read in those columns. `read_excel` takes
a `parse_cols` keyword to allow you to specify a subset of columns to parse.

If `parse_cols` is an integer, then it is assumed to indicate the last column
to be parsed.

.. code-block:: python

xls.parse('Sheet1', parse_cols=2, index_col=None, na_values=['NA'])
read_excel('path_to_file.xls', 'Sheet1', parse_cols=2, index_col=None, na_values=['NA'])

If `parse_cols` is a list of integers, then it is assumed to be the file column
indices to be parsed.

.. code-block:: python

xls.parse('Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
read_excel('path_to_file.xls', Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])

To write a DataFrame object to a sheet of an Excel file, you can use the
``to_excel`` instance method. The arguments are largely the same as ``to_csv``
Expand Down Expand Up @@ -1883,16 +1900,13 @@ Writing to STATA format

.. _io.StataWriter:

The function :func:'~pandas.io.StataWriter.write_file' will write a DataFrame
into a .dta file. The format version of this file is always the latest one,
115.
The method ``to_stata`` will write a DataFrame into a .dta file.
The format version of this file is always the latest one, 115.

.. ipython:: python

from pandas.io.stata import StataWriter
df = DataFrame(randn(10,2),columns=list('AB'))
writer = StataWriter('stata.dta',df)
writer.write_file()
df.to_stata('stata.dta')

Reading from STATA format
~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -1901,24 +1915,21 @@ Reading from STATA format

.. versionadded:: 0.11.1

The class StataReader will read the header of the given dta file at
initialization. Its function :func:'~pandas.io.StataReader.data' will
read the observations, converting them to a DataFrame which is returned:
The top-level function ``read_stata`` will read a dta format file
and return a DataFrame:

.. ipython:: python

from pandas.io.stata import StataReader
reader = StataReader('stata.dta')
reader.data()
pd.read_stata('stata.dta')

The parameter convert_categoricals indicates wheter value labels should be
read and used to create a Categorical variable from them. Value labels can
also be retrieved by the function variable_labels, which requires data to be
called before.
Currently the ``index`` is retrieved as a column on read back.

The StataReader supports .dta Formats 104, 105, 108, 113-115.
The parameter ``convert_categoricals`` indicates wheter value labels should be
read and used to create a ``Categorical`` variable from them. Value labels can
also be retrieved by the function ``variable_labels``, which requires data to be
called before (see ``pandas.io.stata.StataReader``).

Alternatively, the function :func:'~pandas.io.read_stata' can be used
The StataReader supports .dta Formats 104, 105, 108, 113-115.

.. ipython:: python
:suppress:
Expand Down
5 changes: 5 additions & 0 deletions doc/source/v0.10.0.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
.. _whatsnew_0100:

.. ipython:: python
:suppress:

from StringIO import StringIO

v0.10.0 (December 17, 2012)
---------------------------

Expand Down
44 changes: 42 additions & 2 deletions doc/source/v0.11.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@ v0.11.1 (??)
This is a minor release from 0.11.0 and includes several new features and
enhancements along with a large number of bug fixes.

The I/O api is now much more consistent with the following top-level reading
functions available, e.g. ``pd.read_csv``, and the counterpart writers are
available as object methods, e.g. ``df.to_csv``

.. csv-table::
:widths: 12, 15, 15, 15, 15
:delim: ;

Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;

API changes
~~~~~~~~~~~

Expand Down Expand Up @@ -74,6 +87,31 @@ API changes
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)

- IO api

- added top-level function ``read_excel`` to replace the following,
The original API is deprecated and will be removed in a future version

.. code-block:: python

from pandas.io.parsers import ExcelFile
xls = ExcelFile('path_to_file.xls')
xls.parse('Sheet1', index_col=None, na_values=['NA'])

With

.. code-block:: python

import pandas as pd
pd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

- added top-level function ``read_sql`` that is equivalent to the following

.. code-block:: python

from pandas.io.sql import read_frame
read_frame(....)

Enhancements
~~~~~~~~~~~~

Expand Down Expand Up @@ -109,6 +147,8 @@ Enhancements
a list or tuple.

- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
accessable via ``read_stata`` top-level function for reading,
and ``to_stata`` DataFrame method for writing

- ``DataFrame.replace()`` now allows regular expressions on contained
``Series`` with object dtype. See the examples section in the regular docs
Expand Down Expand Up @@ -218,7 +258,7 @@ Bug Fixes
.. ipython :: python

df = DataFrame({'a': list('ab..'), 'b': [1, 2, 3, 4]})
df.replace(regex=r'\s*\.\s*', value=nan)
df.replace(regex=r'\s*\.\s*', value=np.nan)

to replace all occurrences of the string ``'.'`` with zero or more
instances of surrounding whitespace with ``NaN``.
Expand All @@ -227,7 +267,7 @@ Bug Fixes

.. ipython :: python

df.replace('.', nan)
df.replace('.', np.nan)

to replace all occurrences of the string ``'.'`` with ``NaN``.

Expand Down
6 changes: 1 addition & 5 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,8 @@
from pandas.sparse.api import *
from pandas.stats.api import *
from pandas.tseries.api import *
from pandas.io.api import *

from pandas.io.parsers import (read_csv, read_table, read_clipboard,
read_fwf, to_clipboard, ExcelFile,
ExcelWriter)
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
from pandas.io.html import read_html
from pandas.util.testing import debug

from pandas.tools.describe import value_range
Expand Down
Loading