Skip to content

Commit cf97bfd

Browse files
authored
Merge branch 'master' into enh-33196
2 parents e3a4571 + 991f784 commit cf97bfd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1427
-636
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Most development discussion is taking place on github in this repo. Further, the
158158

159159
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
160160

161-
A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/docs/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
161+
A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas.pydata.org/docs/dev/development/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
162162

163163
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
164164

asv_bench/asv.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
// followed by the pip installed packages).
4040
"matrix": {
4141
"numpy": [],
42-
"Cython": [],
42+
"Cython": ["0.29.16"],
4343
"matplotlib": [],
4444
"sqlalchemy": [],
4545
"scipy": [],

asv_bench/benchmarks/multiindex_object.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,38 @@ def setup(self):
7474
],
7575
dtype=object,
7676
)
77+
self.other_mi_many_mismatches = MultiIndex.from_tuples(
78+
[
79+
(-7, 41),
80+
(-2, 3),
81+
(-0.7, 5),
82+
(0, 0),
83+
(0, 1.5),
84+
(0, 340),
85+
(0, 1001),
86+
(1, -4),
87+
(1, 20),
88+
(1, 1040),
89+
(432, -5),
90+
(432, 17),
91+
(439, 165.5),
92+
(998, -4),
93+
(998, 24065),
94+
(999, 865.2),
95+
(999, 1000),
96+
(1045, -843),
97+
]
98+
)
7799

78100
def time_get_indexer(self):
79101
self.mi_int.get_indexer(self.obj_index)
80102

103+
def time_get_indexer_and_backfill(self):
104+
self.mi_int.get_indexer(self.other_mi_many_mismatches, method="backfill")
105+
106+
def time_get_indexer_and_pad(self):
107+
self.mi_int.get_indexer(self.other_mi_many_mismatches, method="pad")
108+
81109
def time_is_monotonic(self):
82110
self.mi_int.is_monotonic
83111

asv_bench/benchmarks/rolling.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,4 +165,26 @@ def peakmem_fixed(self):
165165
self.roll.max()
166166

167167

168+
class ForwardWindowMethods:
169+
params = (
170+
["DataFrame", "Series"],
171+
[10, 1000],
172+
["int", "float"],
173+
["median", "mean", "max", "min", "kurt", "sum"],
174+
)
175+
param_names = ["constructor", "window_size", "dtype", "method"]
176+
177+
def setup(self, constructor, window_size, dtype, method):
178+
N = 10 ** 5
179+
arr = np.random.random(N).astype(dtype)
180+
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=window_size)
181+
self.roll = getattr(pd, constructor)(arr).rolling(window=indexer)
182+
183+
def time_rolling(self, constructor, window_size, dtype, method):
184+
getattr(self.roll, method)()
185+
186+
def peakmem_rolling(self, constructor, window_size, dtype, method):
187+
getattr(self.roll, method)()
188+
189+
168190
from .pandas_vb_common import setup # noqa: F401 isort:skip

doc/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,7 @@
416416
"python": ("https://docs.python.org/3/", None),
417417
"scipy": ("https://docs.scipy.org/doc/scipy/reference/", None),
418418
"statsmodels": ("https://www.statsmodels.org/devel/", None),
419+
"pyarrow": ("https://arrow.apache.org/docs/", None),
419420
}
420421

421422
# extlinks alias

doc/source/getting_started/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Installation
2121
<div class="card-body">
2222
<p class="card-text">
2323

24-
pandas is part of the `Anaconda <http://docs.continuum.io/anaconda/>`__ distribution and can be
24+
pandas is part of the `Anaconda <https://docs.continuum.io/anaconda/>`__ distribution and can be
2525
installed with Anaconda or Miniconda:
2626

2727
.. raw:: html

doc/source/reference/window.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,4 @@ Base class for defining custom window boundaries.
8585
:toctree: api/
8686

8787
api.indexers.BaseIndexer
88+
api.indexers.FixedForwardWindowIndexer

doc/source/user_guide/computation.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -571,6 +571,20 @@ and we want to use an expanding window where ``use_expanding`` is ``True`` other
571571
3 3.0
572572
4 10.0
573573
574+
.. versionadded:: 1.1
575+
576+
For some problems knowledge of the future is available for analysis. For example, this occurs when
577+
each data point is a full time series read from an experiment, and the task is to extract underlying
578+
conditions. In these cases it can be useful to perform forward-looking rolling window computations.
579+
:func:`FixedForwardWindowIndexer <pandas.api.indexers.FixedForwardWindowIndexer>` class is available for this purpose.
580+
This :func:`BaseIndexer <pandas.api.indexers.BaseIndexer>` subclass implements a closed fixed-width
581+
forward-looking rolling window, and we can use it as follows:
582+
583+
.. ipython:: ipython
584+
585+
from pandas.api.indexers import FixedForwardWindowIndexer
586+
indexer = FixedForwardWindowIndexer(window_size=2)
587+
df.rolling(indexer, min_periods=1).sum()
574588

575589
.. _stats.rolling_window.endpoints:
576590

doc/source/user_guide/io.rst

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4602,17 +4602,15 @@ frames efficient, and to make sharing data across data analysis languages easy.
46024602
Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas
46034603
dtypes, including extension dtypes such as categorical and datetime with tz.
46044604

4605-
Several caveats.
4605+
Several caveats:
46064606

4607-
* This is a newer library, and the format, though stable, is not guaranteed to be backward compatible
4608-
to the earlier versions.
46094607
* The format will NOT write an ``Index``, or ``MultiIndex`` for the
46104608
``DataFrame`` and will raise an error if a non-default one is provided. You
46114609
can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to
46124610
ignore it.
46134611
* Duplicate column names and non-string columns names are not supported
4614-
* Non supported types include ``Period`` and actual Python object types. These will raise a helpful error message
4615-
on an attempt at serialization.
4612+
* Actual Python objects in object dtype columns are not supported. These will
4613+
raise a helpful error message on an attempt at serialization.
46164614

46174615
See the `Full Documentation <https://github.com/wesm/feather>`__.
46184616

doc/source/whatsnew/v1.1.0.rst

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,13 @@ Other enhancements
8888
- :class:`Series.str` now has a `fullmatch` method that matches a regular expression against the entire string in each row of the series, similar to `re.fullmatch` (:issue:`32806`).
8989
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`)
9090
- :meth:`MultiIndex.union` will now raise `RuntimeWarning` if the object inside are unsortable, pass `sort=False` to suppress this warning (:issue:`33015`)
91-
- :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_json` now support passing a dict of compression arguments when using the ``gzip`` and ``bz2`` protocols. This can be used to set a custom compression level, e.g., ``df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}`` (:issue:`33196`)
92-
91+
- The :meth:`DataFrame.to_feather` method now supports additional keyword
92+
arguments (e.g. to set the compression) that are added in pyarrow 0.17
93+
(:issue:`33422`).
94+
- :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_json` now support
95+
passing a dict of compression arguments when using the ``gzip`` and ``bz2`` protocols.
96+
This can be used to set a custom compression level, e.g.,
97+
``df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}`` (:issue:`33196`)
9398
.. ---------------------------------------------------------------------------
9499
95100
Development Changes
@@ -109,6 +114,7 @@ Other API changes
109114
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
110115
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``std``, ``var``, ``count``, ``skew``, ``cov``, ``corr`` will now raise a ``NotImplementedError`` (:issue:`32865`)
111116
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``min``, ``max`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
117+
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
112118
-
113119

114120
Backwards incompatible API changes
@@ -120,6 +126,67 @@ Backwards incompatible API changes
120126
Previously a ``UnsupportedFunctionCall`` was raised (``AssertionError`` if ``min_count`` passed into :meth:`~DataFrameGroupby.median`) (:issue:`31485`)
121127
- :meth:`DataFrame.at` and :meth:`Series.at` will raise a ``TypeError`` instead of a ``ValueError`` if an incompatible key is passed, and ``KeyError`` if a missing key is passed, matching the behavior of ``.loc[]`` (:issue:`31722`)
122128
- Passing an integer dtype other than ``int64`` to ``np.array(period_index, dtype=...)`` will now raise ``TypeError`` instead of incorrectly using ``int64`` (:issue:`32255`)
129+
130+
``MultiIndex.get_indexer`` interprets `method` argument differently
131+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
132+
133+
This restores the behavior of :meth:`MultiIndex.get_indexer` with ``method='backfill'`` or ``method='pad'`` to the behavior before pandas 0.23.0. In particular, MultiIndexes are treated as a list of tuples and padding or backfilling is done with respect to the ordering of these lists of tuples (:issue:`29896`).
134+
135+
As an example of this, given:
136+
137+
.. ipython:: python
138+
139+
df = pd.DataFrame({
140+
'a': [0, 0, 0, 0],
141+
'b': [0, 2, 3, 4],
142+
'c': ['A', 'B', 'C', 'D'],
143+
}).set_index(['a', 'b'])
144+
mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])
145+
146+
The differences in reindexing ``df`` with ``mi_2`` and using ``method='backfill'`` can be seen here:
147+
148+
*pandas >= 0.23, < 1.1.0*:
149+
150+
.. code-block:: ipython
151+
152+
In [1]: df.reindex(mi_2, method='backfill')
153+
Out[1]:
154+
c
155+
0 -1 A
156+
0 A
157+
1 D
158+
3 A
159+
4 A
160+
5 C
161+
162+
*pandas <0.23, >= 1.1.0*
163+
164+
.. ipython:: python
165+
166+
df.reindex(mi_2, method='backfill')
167+
168+
And the differences in reindexing ``df`` with ``mi_2`` and using ``method='pad'`` can be seen here:
169+
170+
*pandas >= 0.23, < 1.1.0*
171+
172+
.. code-block:: ipython
173+
174+
In [1]: df.reindex(mi_2, method='pad')
175+
Out[1]:
176+
c
177+
0 -1 NaN
178+
0 NaN
179+
1 D
180+
3 NaN
181+
4 A
182+
5 C
183+
184+
*pandas < 0.23, >= 1.1.0*
185+
186+
.. ipython:: python
187+
188+
df.reindex(mi_2, method='pad')
189+
123190
-
124191

125192
.. _whatsnew_110.api_breaking.indexing_raises_key_errors:
@@ -274,7 +341,7 @@ Deprecations
274341
version 1.1. All other arguments should be given as keyword
275342
arguments (:issue:`27573`).
276343

277-
-
344+
- :func:`pandas.api.types.is_categorical` is deprecated and will be removed in a future version; use `:func:pandas.api.types.is_categorical_dtype` instead (:issue:`33385`)
278345

279346
.. ---------------------------------------------------------------------------
280347
@@ -390,7 +457,7 @@ Missing
390457
^^^^^^^
391458

392459
- Calling :meth:`fillna` on an empty Series now correctly returns a shallow copied object. The behaviour is now consistent with :class:`Index`, :class:`DataFrame` and a non-empty :class:`Series` (:issue:`32543`).
393-
460+
- Bug in :meth:`~Series.any` and :meth:`~Series.all` incorrectly returning ``<NA>`` for all ``False`` or all ``True`` values using the nulllable boolean dtype and with ``skipna=False`` (:issue:`33253`)
394461

395462
MultiIndex
396463
^^^^^^^^^^
@@ -434,6 +501,7 @@ I/O
434501
- Bug in :meth:`read_sas` was raising an ``AttributeError`` when reading files from Google Cloud Storage (issue:`33069`)
435502
- Bug in :meth:`DataFrame.to_sql` where an ``AttributeError`` was raised when saving an out of bounds date (:issue:`26761`)
436503
- Bug in :meth:`read_excel` did not correctly handle multiple embedded spaces in OpenDocument text cells. (:issue:`32207`)
504+
- Bug in :meth:`read_json` was raising ``TypeError`` when reading a list of booleans into a Series. (:issue:`31464`)
437505

438506
Plotting
439507
^^^^^^^^

environment.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ channels:
44
dependencies:
55
# required
66
- numpy>=1.15
7-
- python=3.7
7+
- python=3
88
- python-dateutil>=2.6.1
99
- pytz
1010

@@ -86,7 +86,7 @@ dependencies:
8686
- lxml
8787

8888
# pd.read_excel, DataFrame.to_excel, pd.ExcelWriter, pd.ExcelFile
89-
- openpyxl<=3.0.1
89+
- openpyxl
9090
- xlrd
9191
- xlsxwriter
9292
- xlwt

0 commit comments

Comments
 (0)