From fc1449f38d9f59040241c8a8266daf22e0ecb0cd Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 11 May 2015 10:37:51 +0200 Subject: [PATCH 1/4] DOC: update docs regarding to return_type/expand (due to GH10085) --- doc/source/text.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/text.rst b/doc/source/text.rst index 810e3e0146f9f..d40445d8490f7 100644 --- a/doc/source/text.rst +++ b/doc/source/text.rst @@ -82,11 +82,11 @@ Elements in the split lists can be accessed using ``get`` or ``[]`` notation: s2.str.split('_').str.get(1) s2.str.split('_').str[1] -Easy to expand this to return a DataFrame using ``return_type``. +Easy to expand this to return a DataFrame using ``expand``. .. ipython:: python - s2.str.split('_', return_type='frame') + s2.str.split('_', expand=True) Methods like ``replace`` and ``findall`` take `regular expressions `__, too: From 90bbe357e1e0e554507d618bcb7d337b9dc9c990 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 11 May 2015 10:48:00 +0200 Subject: [PATCH 2/4] DOC: fix some doc build errors - wrap docstring -> raw indication needed for newlines characters - using anonymous hyperlinks (two underscores instead of one) for preventing warning on same names - some missing imports and whitespace --- doc/source/basics.rst | 2 +- doc/source/contributing.rst | 10 +++++----- doc/source/install.rst | 4 ++-- doc/source/visualization.rst | 4 ++-- pandas/core/strings.py | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 76efdc0553c7d..6c743352a34ae 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -236,7 +236,7 @@ see :ref:`here` Boolean Reductions ~~~~~~~~~~~~~~~~~~ - You can apply the reductions: :attr:`~DataFrame.empty`, :meth:`~DataFrame.any`, +You can apply the reductions: :attr:`~DataFrame.empty`, :meth:`~DataFrame.any`, :meth:`~DataFrame.all`, and :meth:`~DataFrame.bool` to provide a way to summarize a boolean result. diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst index 1ece60bf704d6..1f58992dba017 100644 --- a/doc/source/contributing.rst +++ b/doc/source/contributing.rst @@ -113,10 +113,10 @@ This creates the directory `pandas-yourname` and connects your repository to the upstream (main project) *pandas* repository. The testing suite will run automatically on Travis-CI once your Pull Request is -submitted. However, if you wish to run the test suite on a branch prior to +submitted. However, if you wish to run the test suite on a branch prior to submitting the Pull Request, then Travis-CI needs to be hooked up to your GitHub repository. Instructions are for doing so are `here -`_. +`__. Creating a Branch ----------------- @@ -219,7 +219,7 @@ To return to you home root environment: deactivate See the full ``conda`` docs `here -`_. +`__. At this point you can easily do an *in-place* install, as detailed in the next section. @@ -372,7 +372,7 @@ If you want to do a full clean build, do:: Starting with 0.13.1 you can tell ``make.py`` to compile only a single section of the docs, greatly reducing the turn-around time for checking your changes. You will be prompted to delete `.rst` files that aren't required. This is okay -since the prior version can be checked out from git, but make sure to +since the prior version can be checked out from git, but make sure to not commit the file deletions. :: @@ -401,7 +401,7 @@ Built Master Branch Documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When pull-requests are merged into the pandas *master* branch, the main parts of the documentation are -also built by Travis-CI. These docs are then hosted `here `_. +also built by Travis-CI. These docs are then hosted `here `__. Contributing to the code base ============================= diff --git a/doc/source/install.rst b/doc/source/install.rst index 79adab0463588..3aa6b338e3397 100644 --- a/doc/source/install.rst +++ b/doc/source/install.rst @@ -35,7 +35,7 @@ pandas at all. Simply create an account, and have access to pandas from within your brower via an `IPython Notebook `__ in a few minutes. -.. _install.anaconda +.. _install.anaconda: Installing pandas with Anaconda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -68,7 +68,7 @@ admin rights to install it, it will install in the user's home directory, and this also makes it trivial to delete Anaconda at a later date (just delete that folder). -.. _install.miniconda +.. _install.miniconda: Installing pandas with Miniconda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst index 6dfeeadeb0167..51912b5d6b106 100644 --- a/doc/source/visualization.rst +++ b/doc/source/visualization.rst @@ -220,8 +220,8 @@ Histogram can be drawn specifying ``kind='hist'``. .. ipython:: python - df4 = pd.DataFrame({'a': randn(1000) + 1, 'b': randn(1000), - 'c': randn(1000) - 1}, columns=['a', 'b', 'c']) + df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000), + 'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c']) plt.figure(); diff --git a/pandas/core/strings.py b/pandas/core/strings.py index 8da43c18b989f..f4ac0166cf44b 100644 --- a/pandas/core/strings.py +++ b/pandas/core/strings.py @@ -813,7 +813,7 @@ def str_strip(arr, to_strip=None, side='both'): def str_wrap(arr, width, **kwargs): - """ + r""" Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width. From c91f305375461c74cfb39643b62e2639d1a844ab Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 11 May 2015 12:14:11 +0200 Subject: [PATCH 3/4] DOC: update section on CategoricalIndex in categorical docs --- doc/source/categorical.rst | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst index 11e7fb0fd4117..c05d4045e6fcc 100644 --- a/doc/source/categorical.rst +++ b/doc/source/categorical.rst @@ -813,12 +813,16 @@ basic type) and applying along columns will also convert to object. df.apply(lambda row: type(row["cats"]), axis=1) df.apply(lambda col: col.dtype, axis=0) -No Categorical Index -~~~~~~~~~~~~~~~~~~~~ +Categorical Index +~~~~~~~~~~~~~~~~~ + +.. versionadded:: 0.16.1 + +A new ``CategoricalIndex`` index type is introduced in version 0.16.1. See the +:ref:`advanced indexing docs ` for a more detailed +explanation. -There is currently no index of type ``category``, so setting the index to categorical column will -convert the categorical data to a "normal" dtype first and therefore remove any custom -ordering of the categories: +Setting the index, will create create a ``CategoricalIndex`` .. ipython:: python @@ -827,13 +831,12 @@ ordering of the categories: values = [4,2,3,1] df = DataFrame({"strings":strings, "values":values}, index=cats) df.index - # This should sort by categories but does not as there is no CategoricalIndex! + # This now sorts by the categories order df.sort_index() -.. note:: - This could change if a `CategoricalIndex` is implemented (see - https://github.com/pydata/pandas/issues/7629) - +In previous versions (<0.16.1) there is no index of type ``category``, so +setting the index to categorical column will convert the categorical data to a +"normal" dtype first and therefore remove any custom ordering of the categories. Side Effects ~~~~~~~~~~~~ From fad60791d6e8b7a90d46d8b3ea9c9bb7e82dcb06 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 11 May 2015 11:39:54 +0200 Subject: [PATCH 4/4] DOC: last clean-up of whatsnew file 0.16.1 --- doc/source/whatsnew/v0.16.1.txt | 169 ++++++++++++++------------------ 1 file changed, 74 insertions(+), 95 deletions(-) diff --git a/doc/source/whatsnew/v0.16.1.txt b/doc/source/whatsnew/v0.16.1.txt index 5e893f3c4fd73..79a0c48238be7 100755 --- a/doc/source/whatsnew/v0.16.1.txt +++ b/doc/source/whatsnew/v0.16.1.txt @@ -31,44 +31,6 @@ Highlights include: Enhancements ~~~~~~~~~~~~ -- ``BusinessHour`` offset is now supported, which represents business hours starting from 09:00 - 17:00 on ``BusinessDay`` by default. See :ref:`Here ` for details. (:issue:`7905`) - - .. ipython:: python - - Timestamp('2014-08-01 09:00') + BusinessHour() - Timestamp('2014-08-01 07:00') + BusinessHour() - Timestamp('2014-08-01 16:30') + BusinessHour() - -- ``DataFrame.diff`` now takes an ``axis`` parameter that determines the direction of differencing (:issue:`9727`) - -- Allow ``clip``, ``clip_lower``, and ``clip_upper`` to accept array-like arguments as thresholds (This is a regression from 0.11.0). These methods now have an ``axis`` parameter which determines how the Series or DataFrame will be aligned with the threshold(s). (:issue:`6966`) - -- ``DataFrame.mask()`` and ``Series.mask()`` now support same keywords as ``where`` (:issue:`8801`) - -- ``drop`` function can now accept ``errors`` keyword to suppress ``ValueError`` raised when any of label does not exist in the target data. (:issue:`6736`) - - .. ipython:: python - - df = DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C']) - df.drop(['A', 'X'], axis=1, errors='ignore') - -- Allow conversion of values with dtype ``datetime64`` or ``timedelta64`` to strings using ``astype(str)`` (:issue:`9757`) -- ``get_dummies`` function now accepts ``sparse`` keyword. If set to ``True``, the return ``DataFrame`` is sparse, e.g. ``SparseDataFrame``. (:issue:`8823`) -- ``Period`` now accepts ``datetime64`` as value input. (:issue:`9054`) - -- Allow timedelta string conversion when leading zero is missing from time definition, ie `0:00:00` vs `00:00:00`. (:issue:`9570`) -- Allow ``Panel.shift`` with ``axis='items'`` (:issue:`9890`) - -- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`) -- Allow ``Categorical.add_categories`` to accept ``Series`` or ``np.array``. (:issue:`9927`) - -- Add/delete ``str/dt/cat`` accessors dynamically from ``__dir__``. (:issue:`9910`) -- Add ``normalize`` as a ``dt`` accessor method. (:issue:`10047`) - -- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here ` - -- ``pd.lib.infer_dtype`` now returns ``'bytes'`` in Python 3 where appropriate. (:issue:`10032`) - .. _whatsnew_0161.enhancements.categoricalindex: CategoricalIndex @@ -188,16 +150,6 @@ String Methods Enhancements :ref:`Continuing from v0.16.0 `, the following enhancements make string operations easier and more consistent with standard python string operations. -- The following new methods are accesible via ``.str`` accessor to apply the function to each values. (:issue:`9766`, :issue:`9773`, :issue:`10031`, :issue:`10045`, :issue:`10052`) - - ================ =============== =============== =============== ================ - .. .. Methods .. .. - ================ =============== =============== =============== ================ - ``capitalize()`` ``swapcase()`` ``normalize()`` ``partition()`` ``rpartition()`` - ``index()`` ``rindex()`` ``translate()`` - ================ =============== =============== =============== ================ - - - Added ``StringMethods`` (``.str`` accessor) to ``Index`` (:issue:`9068`) @@ -220,6 +172,14 @@ enhancements make string operations easier and more consistent with standard pyt idx.str.startswith('a') s[s.index.str.startswith('a')] +- The following new methods are accesible via ``.str`` accessor to apply the function to each values. (:issue:`9766`, :issue:`9773`, :issue:`10031`, :issue:`10045`, :issue:`10052`) + + ================ =============== =============== =============== ================ + .. .. Methods .. .. + ================ =============== =============== =============== ================ + ``capitalize()`` ``swapcase()`` ``normalize()`` ``partition()`` ``rpartition()`` + ``index()`` ``rindex()`` ``translate()`` + ================ =============== =============== =============== ================ - ``split`` now takes ``expand`` keyword to specify whether to expand dimensionality. ``return_type`` is deprecated. (:issue:`9847`) @@ -244,14 +204,59 @@ enhancements make string operations easier and more consistent with standard pyt - Improved ``extract`` and ``get_dummies`` methods for ``Index.str`` (:issue:`9980`) -.. _whatsnew_0161.api: -API changes -~~~~~~~~~~~ +.. _whatsnew_0161.enhancements.other: + +Other Enhancements +^^^^^^^^^^^^^^^^^^ + +- ``BusinessHour`` offset is now supported, which represents business hours starting from 09:00 - 17:00 on ``BusinessDay`` by default. See :ref:`Here ` for details. (:issue:`7905`) + + .. ipython:: python + from pandas.tseries.offsets import BusinessHour + Timestamp('2014-08-01 09:00') + BusinessHour() + Timestamp('2014-08-01 07:00') + BusinessHour() + Timestamp('2014-08-01 16:30') + BusinessHour() +- ``DataFrame.diff`` now takes an ``axis`` parameter that determines the direction of differencing (:issue:`9727`) +- Allow ``clip``, ``clip_lower``, and ``clip_upper`` to accept array-like arguments as thresholds (This is a regression from 0.11.0). These methods now have an ``axis`` parameter which determines how the Series or DataFrame will be aligned with the threshold(s). (:issue:`6966`) + +- ``DataFrame.mask()`` and ``Series.mask()`` now support same keywords as ``where`` (:issue:`8801`) +- ``drop`` function can now accept ``errors`` keyword to suppress ``ValueError`` raised when any of label does not exist in the target data. (:issue:`6736`) + + .. ipython:: python + + df = DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C']) + df.drop(['A', 'X'], axis=1, errors='ignore') + +- Add support for separating years and quarters using dashes, for + example 2014-Q1. (:issue:`9688`) + +- Allow conversion of values with dtype ``datetime64`` or ``timedelta64`` to strings using ``astype(str)`` (:issue:`9757`) +- ``get_dummies`` function now accepts ``sparse`` keyword. If set to ``True``, the return ``DataFrame`` is sparse, e.g. ``SparseDataFrame``. (:issue:`8823`) +- ``Period`` now accepts ``datetime64`` as value input. (:issue:`9054`) + +- Allow timedelta string conversion when leading zero is missing from time definition, ie `0:00:00` vs `00:00:00`. (:issue:`9570`) +- Allow ``Panel.shift`` with ``axis='items'`` (:issue:`9890`) + +- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`) +- Allow ``Categorical.add_categories`` to accept ``Series`` or ``np.array``. (:issue:`9927`) + +- Add/delete ``str/dt/cat`` accessors dynamically from ``__dir__``. (:issue:`9910`) +- Add ``normalize`` as a ``dt`` accessor method. (:issue:`10047`) + +- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here ` + +- ``pd.lib.infer_dtype`` now returns ``'bytes'`` in Python 3 where appropriate. (:issue:`10032`) + + +.. _whatsnew_0161.api: + +API changes +~~~~~~~~~~~ - When passing in an ax to ``df.plot( ..., ax=ax)``, the `sharex` kwarg will now default to `False`. The result is that the visibility of xlabels and xticklabels will not anymore be changed. You @@ -260,16 +265,19 @@ API changes If pandas creates the subplots itself (e.g. no passed in `ax` kwarg), then the default is still ``sharex=True`` and the visibility changes are applied. - - -- Add support for separating years and quarters using dashes, for - example 2014-Q1. (:issue:`9688`) - - :meth:`~pandas.DataFrame.assign` now inserts new columns in alphabetical order. Previously the order was arbitrary. (:issue:`9777`) - By default, ``read_csv`` and ``read_table`` will now try to infer the compression type based on the file extension. Set ``compression=None`` to restore the previous behavior (no decompression). (:issue:`9770`) +.. _whatsnew_0161.deprecations: + +Deprecations +^^^^^^^^^^^^ + +- ``Series.str.split``'s ``return_type`` keyword was removed in favor of ``expand`` (:issue:`9847`) + + .. _whatsnew_0161.index_repr: Index Representation @@ -303,25 +311,17 @@ New Behavior .. ipython:: python - pd.set_option('display.width',100) - pd.Index(range(4),name='foo') - pd.Index(range(25),name='foo') - pd.Index(range(104),name='foo') - pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref', 'a_bit_a_longer_one']*2) - pd.CategoricalIndex(['a','bb','ccc','dddd'],ordered=True,name='foobar') - pd.CategoricalIndex(['a','bb','ccc','dddd']*10,ordered=True,name='foobar') - pd.CategoricalIndex(['a','bb','ccc','dddd']*100,ordered=True,name='foobar') - pd.CategoricalIndex(np.arange(1000),ordered=True,name='foobar') - pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern') - pd.date_range('20130101',periods=25,name='foo',tz='US/Eastern') - pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') - -.. _whatsnew_0161.deprecations: + pd.set_option('display.width', 80) + pd.Index(range(4), name='foo') + pd.Index(range(30), name='foo') + pd.Index(range(104), name='foo') + pd.CategoricalIndex(['a','bb','ccc','dddd'], ordered=True, name='foobar') + pd.CategoricalIndex(['a','bb','ccc','dddd']*10, ordered=True, name='foobar') + pd.CategoricalIndex(['a','bb','ccc','dddd']*100, ordered=True, name='foobar') + pd.date_range('20130101',periods=4, name='foo', tz='US/Eastern') + pd.date_range('20130101',periods=25, freq='D') + pd.date_range('20130101',periods=104, name='foo', tz='US/Eastern') -Deprecations -^^^^^^^^^^^^ - -- ``Series.str.split``'s ``return_type`` keyword was removed in favor of ``expand`` (:issue:`9847`) .. _whatsnew_0161.performance: @@ -333,7 +333,6 @@ Performance Improvements - Improved the performance of ``pd.lib.max_len_string_array`` by 5-7x (:issue:`10024`) - .. _whatsnew_0161.bug_fixes: Bug Fixes @@ -361,7 +360,6 @@ Bug Fixes - Bug where repeated plotting of ``DataFrame`` with a ``DatetimeIndex`` may raise ``TypeError`` (:issue:`9852`) - Bug in ``setup.py`` that would allow an incompat cython version to build (:issue:`9827`) - Bug in plotting ``secondary_y`` incorrectly attaches ``right_ax`` property to secondary axes specifying itself recursively. (:issue:`9861`) - - Bug in ``Series.quantile`` on empty Series of type ``Datetime`` or ``Timedelta`` (:issue:`9675`) - Bug in ``where`` causing incorrect results when upcasting was required (:issue:`9731`) - Bug in ``FloatArrayFormatter`` where decision boundary for displaying "small" floats in decimal format is off by one order of magnitude for a given display.precision (:issue:`9764`) @@ -372,20 +370,13 @@ Bug Fixes - Bug in index equality comparisons using ``==`` failing on Index/MultiIndex type incompatibility (:issue:`9785`) - Bug in which ``SparseDataFrame`` could not take `nan` as a column name (:issue:`8822`) - Bug in ``to_msgpack`` and ``read_msgpack`` zlib and blosc compression support (:issue:`9783`) - - Bug ``GroupBy.size`` doesn't attach index name properly if grouped by ``TimeGrouper`` (:issue:`9925`) - Bug causing an exception in slice assignments because ``length_of_indexer`` returns wrong results (:issue:`9995`) - Bug in csv parser causing lines with initial whitespace plus one non-space character to be skipped. (:issue:`9710`) - Bug in C csv parser causing spurious NaNs when data started with newline followed by whitespace. (:issue:`10022`) - - Bug causing elements with a null group to spill into the final group when grouping by a ``Categorical`` (:issue:`9603`) - Bug where .iloc and .loc behavior is not consistent on empty dataframes (:issue:`9964`) - - Bug in invalid attribute access on a ``TimedeltaIndex`` incorrectly raised ``ValueError`` instead of ``AttributeError`` (:issue:`9680`) - - - - - Bug in unequal comparisons between categorical data and a scalar, which was not in the categories (e.g. ``Series(Categorical(list("abc"), ordered=True)) > "d"``. This returned ``False`` for all elements, but now raises a ``TypeError``. Equality comparisons also now return ``False`` for ``==`` and ``True`` for ``!=``. (:issue:`9848`) - Bug in DataFrame ``__setitem__`` when right hand side is a dictionary (:issue:`9874`) - Bug in ``where`` when dtype is ``datetime64/timedelta64``, but dtype of other is not (:issue:`9804`) @@ -394,25 +385,13 @@ Bug Fixes - Bug in ``DataFrame`` constructor when ``columns`` parameter is set, and ``data`` is an empty list (:issue:`9939`) - Bug in bar plot with ``log=True`` raises ``TypeError`` if all values are less than 1 (:issue:`9905`) - Bug in horizontal bar plot ignores ``log=True`` (:issue:`9905`) - - - - Bug in PyTables queries that did not return proper results using the index (:issue:`8265`, :issue:`9676`) - - - - - Bug where dividing a dataframe containing values of type ``Decimal`` by another ``Decimal`` would raise. (:issue:`9787`) - Bug where using DataFrames asfreq would remove the name of the index. (:issue:`9885`) - Bug causing extra index point when resample BM/BQ (:issue:`9756`) - Changed caching in ``AbstractHolidayCalendar`` to be at the instance level rather than at the class level as the latter can result in unexpected behaviour. (:issue:`9552`) - - Fixed latex output for multi-indexed dataframes (:issue:`9778`) - Bug causing an exception when setting an empty range using ``DataFrame.loc`` (:issue:`9596`) - - - - - Bug in hiding ticklabels with subplots and shared axes when adding a new plot to an existing grid of axes (:issue:`9158`) - Bug in ``transform`` and ``filter`` when grouping on a categorical variable (:issue:`9921`) - Bug in ``transform`` when groups are equal in number and dtype to the input index (:issue:`9700`)