Skip to content

Commit 27f4261

Browse files
authored
Merge pull request #110 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 6bce1ea + dd84044 commit 27f4261

File tree

252 files changed

+5822
-3851
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

252 files changed

+5822
-3851
lines changed

asv_bench/benchmarks/io/parsers.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
try:
44
from pandas._libs.tslibs.parsing import (
5-
_concat_date_cols,
5+
concat_date_cols,
66
_does_string_look_like_datetime,
77
)
88
except ImportError:
@@ -39,4 +39,4 @@ def setup(self, value, dim):
3939
)
4040

4141
def time_check_concat(self, value, dim):
42-
_concat_date_cols(self.object)
42+
concat_date_cols(self.object)

asv_bench/benchmarks/rolling.py

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -150,19 +150,18 @@ def time_quantile(self, constructor, window, dtype, percentile, interpolation):
150150
self.roll.quantile(percentile, interpolation=interpolation)
151151

152152

153-
class PeakMemFixed:
154-
def setup(self):
155-
N = 10
156-
arr = 100 * np.random.random(N)
157-
self.roll = pd.Series(arr).rolling(10)
158-
159-
def peakmem_fixed(self):
160-
# GH 25926
161-
# This is to detect memory leaks in rolling operations.
162-
# To save time this is only ran on one method.
163-
# 6000 iterations is enough for most types of leaks to be detected
164-
for x in range(6000):
165-
self.roll.max()
153+
class PeakMemFixedWindowMinMax:
154+
155+
params = ["min", "max"]
156+
157+
def setup(self, operation):
158+
N = int(1e6)
159+
arr = np.random.random(N)
160+
self.roll = pd.Series(arr).rolling(2)
161+
162+
def peakmem_fixed(self, operation):
163+
for x in range(5):
164+
getattr(self.roll, operation)()
166165

167166

168167
class ForwardWindowMethods:

ci/deps/azure-36-minimum_versions.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
name: pandas-dev
22
channels:
3-
- defaults
43
- conda-forge
54
dependencies:
65
- python=3.6.1
@@ -19,12 +18,12 @@ dependencies:
1918
- jinja2=2.8
2019
- numba=0.46.0
2120
- numexpr=2.6.2
22-
- numpy=1.13.3
21+
- numpy=1.15.4
2322
- openpyxl=2.5.7
2423
- pytables=3.4.3
2524
- python-dateutil=2.7.3
2625
- pytz=2017.2
27-
- scipy=0.19.0
26+
- scipy=1.2
2827
- xlrd=1.1.0
2928
- xlsxwriter=0.9.8
3029
- xlwt=1.2.0

ci/deps/azure-37-numpydev.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ dependencies:
1414
- pytz
1515
- pip
1616
- pip:
17-
- cython==0.29.16
18-
# GH#33507 cython 3.0a1 is causing TypeErrors 2020-04-13
17+
- cython>=0.29.16
1918
- "git+git://github.com/dateutil/dateutil.git"
2019
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
2120
- "--pre"

ci/deps/azure-macos-36.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ dependencies:
1919
- matplotlib=2.2.3
2020
- nomkl
2121
- numexpr
22-
- numpy=1.14
22+
- numpy=1.15.4
2323
- openpyxl
2424
- pyarrow>=0.13.0
2525
- pytables

ci/setup_env.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ conda list pandas
128128
echo "[Build extensions]"
129129
python setup.py build_ext -q -i -j2
130130

131-
# XXX: Some of our environments end up with old versions of pip (10.x)
131+
# TODO: Some of our environments end up with old versions of pip (10.x)
132132
# Adding a new enough version of pip to the requirements explodes the
133133
# solve time. Just using pip to update itself.
134134
# - py35_macos

conda.recipe/meta.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@ requirements:
2020
- cython
2121
- numpy
2222
- setuptools >=3.3
23-
- python-dateutil >=2.5.0
23+
- python-dateutil >=2.7.3
2424
- pytz
2525
run:
2626
- python {{ python }}
2727
- {{ pin_compatible('numpy') }}
28-
- python-dateutil >=2.5.0
28+
- python-dateutil >=2.7.3
2929
- pytz
3030

3131
test:

doc/source/development/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -581,7 +581,7 @@ do not make sudden changes to the code that could have the potential to break
581581
a lot of user code as a result, that is, we need it to be as *backwards compatible*
582582
as possible to avoid mass breakages.
583583

584-
Additional standards are outlined on the `pandas code style guide <code_style>`_
584+
Additional standards are outlined on the :ref:`pandas code style guide <code_style>`
585585

586586
Optional dependencies
587587
---------------------

doc/source/getting_started/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ Dependencies
220220
Package Minimum supported version
221221
================================================================ ==========================
222222
`setuptools <https://setuptools.readthedocs.io/en/latest/>`__ 24.2.0
223-
`NumPy <https://www.numpy.org>`__ 1.13.3
223+
`NumPy <https://www.numpy.org>`__ 1.15.4
224224
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.7.3
225225
`pytz <https://pypi.org/project/pytz/>`__ 2017.2
226226
================================================================ ==========================

doc/source/reference/general_utility_functions.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,12 @@ Exceptions and warnings
3535
.. autosummary::
3636
:toctree: api/
3737

38+
errors.AccessorRegistrationWarning
3839
errors.DtypeWarning
3940
errors.EmptyDataError
4041
errors.OutOfBoundsDatetime
42+
errors.MergeError
43+
errors.NumbaUtilError
4144
errors.ParserError
4245
errors.ParserWarning
4346
errors.PerformanceWarning

doc/source/reference/groupby.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,10 @@ Function application
3636

3737
GroupBy.apply
3838
GroupBy.agg
39-
GroupBy.aggregate
40-
GroupBy.transform
39+
SeriesGroupBy.aggregate
40+
DataFrameGroupBy.aggregate
41+
SeriesGroupBy.transform
42+
DataFrameGroupBy.transform
4143
GroupBy.pipe
4244

4345
Computations / descriptive stats

doc/source/user_guide/basics.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1781,6 +1781,31 @@ used to sort a pandas object by its index levels.
17811781
# Series
17821782
unsorted_df['three'].sort_index()
17831783
1784+
.. _basics.sort_index_key:
1785+
1786+
.. versionadded:: 1.1.0
1787+
1788+
Sorting by index also supports a ``key`` parameter that takes a callable
1789+
function to apply to the index being sorted. For `MultiIndex` objects,
1790+
the key is applied per-level to the levels specified by `level`.
1791+
1792+
.. ipython:: python
1793+
1794+
s1 = pd.DataFrame({
1795+
"a": ['B', 'a', 'C'],
1796+
"b": [1, 2, 3],
1797+
"c": [2, 3, 4]
1798+
}).set_index(list("ab"))
1799+
s1
1800+
1801+
.. ipython:: python
1802+
1803+
s1.sort_index(level="a")
1804+
s1.sort_index(level="a", key=lambda idx: idx.str.lower())
1805+
1806+
For information on key sorting by value, see :ref:`value sorting
1807+
<basics.sort_value_key>`.
1808+
17841809
.. _basics.sort_values:
17851810

17861811
By values
@@ -1813,6 +1838,39 @@ argument:
18131838
s.sort_values()
18141839
s.sort_values(na_position='first')
18151840
1841+
.. _basics.sort_value_key:
1842+
1843+
.. versionadded:: 1.1.0
1844+
1845+
Sorting also supports a ``key`` parameter that takes a callable function
1846+
to apply to the values being sorted.
1847+
1848+
.. ipython:: python
1849+
1850+
s1 = pd.Series(['B', 'a', 'C'])
1851+
1852+
.. ipython:: python
1853+
1854+
s1.sort_values()
1855+
s1.sort_values(key=lambda x: x.str.lower())
1856+
1857+
`key` will be given the :class:`Series` of values and should return a ``Series``
1858+
or array of the same shape with the transformed values. For `DataFrame` objects,
1859+
the key is applied per column, so the key should still expect a Series and return
1860+
a Series, e.g.
1861+
1862+
.. ipython:: python
1863+
1864+
df = pd.DataFrame({"a": ['B', 'a', 'C'], "b": [1, 2, 3]})
1865+
1866+
.. ipython:: python
1867+
1868+
df.sort_values(by='a')
1869+
df.sort_values(by='a', key=lambda col: col.str.lower())
1870+
1871+
The name or type of each column can be used to apply different functions to
1872+
different columns.
1873+
18161874
.. _basics.sort_indexes_and_values:
18171875

18181876
By indexes and values

doc/source/user_guide/enhancingperf.rst

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ Consider the following toy example of doubling each observation:
396396
1000 loops, best of 3: 233 us per loop
397397
398398
# Custom function with numba
399-
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy())
399+
In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy())
400400
1000 loops, best of 3: 145 us per loop
401401
402402
Caveats
@@ -599,13 +599,6 @@ identifier.
599599
The ``inplace`` keyword determines whether this assignment will performed
600600
on the original ``DataFrame`` or return a copy with the new column.
601601

602-
.. warning::
603-
604-
For backwards compatibility, ``inplace`` defaults to ``True`` if not
605-
specified. This will change in a future version of pandas - if your
606-
code depends on an inplace assignment you should update to explicitly
607-
set ``inplace=True``.
608-
609602
.. ipython:: python
610603
611604
df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
@@ -614,7 +607,7 @@ on the original ``DataFrame`` or return a copy with the new column.
614607
df.eval('a = 1', inplace=True)
615608
df
616609
617-
When ``inplace`` is set to ``False``, a copy of the ``DataFrame`` with the
610+
When ``inplace`` is set to ``False``, the default, a copy of the ``DataFrame`` with the
618611
new or modified columns is returned and the original frame is unchanged.
619612

620613
.. ipython:: python
@@ -653,11 +646,6 @@ whether the query modifies the original frame.
653646
df.query('a > 2', inplace=True)
654647
df
655648
656-
.. warning::
657-
658-
Unlike with ``eval``, the default value for ``inplace`` for ``query``
659-
is ``False``. This is consistent with prior versions of pandas.
660-
661649
Local variables
662650
~~~~~~~~~~~~~~~
663651

doc/source/user_guide/indexing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -881,7 +881,7 @@ The operators are: ``|`` for ``or``, ``&`` for ``and``, and ``~`` for ``not``.
881881
These **must** be grouped by using parentheses, since by default Python will
882882
evaluate an expression such as ``df['A'] > 2 & df['B'] < 3`` as
883883
``df['A'] > (2 & df['B']) < 3``, while the desired evaluation order is
884-
``(df['A > 2) & (df['B'] < 3)``.
884+
``(df['A'] > 2) & (df['B'] < 3)``.
885885

886886
Using a boolean vector to index a Series works exactly as in a NumPy ndarray:
887887

doc/source/user_guide/reshaping.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -471,16 +471,24 @@ If ``crosstab`` receives only two Series, it will provide a frequency table.
471471
472472
pd.crosstab(df['A'], df['B'])
473473
474-
Any input passed containing ``Categorical`` data will have **all** of its
475-
categories included in the cross-tabulation, even if the actual data does
476-
not contain any instances of a particular category.
474+
``crosstab`` can also be implemented
475+
to ``Categorical`` data.
477476

478477
.. ipython:: python
479478
480479
foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
481480
bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
482481
pd.crosstab(foo, bar)
483482
483+
If you want to include **all** of data categories even if the actual data does
484+
not contain any instances of a particular category, you should set ``dropna=False``.
485+
486+
For example:
487+
488+
.. ipython:: python
489+
490+
pd.crosstab(foo, bar, dropna=False)
491+
484492
Normalization
485493
~~~~~~~~~~~~~
486494

0 commit comments

Comments
 (0)