Skip to content

Commit 6aa6979

Browse files
Merge remote-tracking branch 'upstream/master' into fix-20432
2 parents 0df3a10 + 7d2f5ce commit 6aa6979

File tree

107 files changed

+2209
-1027
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

107 files changed

+2209
-1027
lines changed

codecov.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ coverage:
88
status:
99
project:
1010
default:
11-
target: '82'
11+
target: '72'
1212
patch:
1313
default:
1414
target: '50'

doc/cheatsheet/Pandas_Cheat_Sheet.pdf

9.56 KB
Binary file not shown.
9.14 KB
Binary file not shown.

doc/source/ecosystem.rst

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ which can be used for a wide variety of time series data mining tasks.
9898
Visualization
9999
-------------
100100

101-
While :ref:`pandas has built-in support for data visualization with matplotlib <visualization>`,
101+
`Pandas has its own Styler class for table visualization <user_guide/style.ipynb>`_, and while
102+
:ref:`pandas also has built-in support for data visualization through charts with matplotlib <visualization>`,
102103
there are a number of other pandas-compatible libraries.
103104

104105
`Altair <https://altair-viz.github.io/>`__
@@ -368,6 +369,14 @@ far exceeding the performance of the native ``df.to_sql`` method. Internally, it
368369
Microsoft's BCP utility, but the complexity is fully abstracted away from the end user.
369370
Rigorously tested, it is a complete replacement for ``df.to_sql``.
370371

372+
`Deltalake <https://pypi.org/project/deltalake>`__
373+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374+
375+
Deltalake python package lets you access tables stored in
376+
`Delta Lake <https://delta.io/>`__ natively in Python without the need to use Spark or
377+
JVM. It provides the ``delta_table.to_pyarrow_table().to_pandas()`` method to convert
378+
any Delta table into Pandas dataframe.
379+
371380

372381
.. _ecosystem.out-of-core:
373382

doc/source/user_guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ Further information on any specific method can be obtained in the
3838
integer_na
3939
boolean
4040
visualization
41+
style
4142
computation
4243
groupby
4344
window
4445
timeseries
4546
timedeltas
46-
style
4747
options
4848
enhancingperf
4949
scale

doc/source/user_guide/style.ipynb

Lines changed: 794 additions & 404 deletions
Large diffs are not rendered by default.

doc/source/user_guide/visualization.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,12 @@
22

33
{{ header }}
44

5-
*************
6-
Visualization
7-
*************
5+
*******************
6+
Chart Visualization
7+
*******************
8+
9+
This section demonstrates visualization through charting. For information on
10+
visualization of tabular data please see the section on `Table Visualization <style.ipynb>`_.
811

912
We use the standard convention for referencing the matplotlib API:
1013

doc/source/user_guide/window.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ be calculated with :meth:`~Rolling.apply` by specifying a separate column of wei
101101
102102
All windowing operations support a ``min_periods`` argument that dictates the minimum amount of
103103
non-``np.nan`` values a window must have; otherwise, the resulting value is ``np.nan``.
104-
``min_peridos`` defaults to 1 for time-based windows and ``window`` for fixed windows
104+
``min_periods`` defaults to 1 for time-based windows and ``window`` for fixed windows
105105

106106
.. ipython:: python
107107

doc/source/whatsnew/v1.3.0.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,30 @@ both XPath 1.0 and XSLT 1.0 is available. (:issue:`27554`)
110110
111111
For more, see :ref:`io.xml` in the user guide on IO tools.
112112

113+
.. _whatsnew_130.dataframe_honors_copy_with_dict:
114+
115+
DataFrame constructor honors ``copy=False`` with dict
116+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
117+
118+
When passing a dictionary to :class:`DataFrame` with ``copy=False``,
119+
a copy will no longer be made (:issue:`32960`)
120+
121+
.. ipython:: python
122+
123+
arr = np.array([1, 2, 3])
124+
df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
125+
df
126+
127+
``df["A"]`` remains a view on ``arr``:
128+
129+
.. ipython:: python
130+
131+
arr[0] = 0
132+
assert df.iloc[0, 0] == 0
133+
134+
The default behavior when not passing ``copy`` will remain unchanged, i.e.
135+
a copy will be made.
136+
113137
.. _whatsnew_130.enhancements.other:
114138

115139
Other enhancements
@@ -302,6 +326,38 @@ cast to ``dtype=object`` (:issue:`38709`)
302326
ser2
303327
304328
329+
.. _whatsnew_130.notable_bug_fixes.rolling_groupby_column:
330+
331+
GroupBy.rolling no longer returns grouped-by column in values
332+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
333+
334+
The group-by column will now be dropped from the result of a
335+
``groupby.rolling`` operation (:issue:`32262`)
336+
337+
.. ipython:: python
338+
339+
df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
340+
df
341+
342+
*Previous behavior*:
343+
344+
.. code-block:: ipython
345+
346+
In [1]: df.groupby("A").rolling(2).sum()
347+
Out[1]:
348+
A B
349+
A
350+
1 0 NaN NaN
351+
1 2.0 1.0
352+
2 2 NaN NaN
353+
3 3 NaN NaN
354+
355+
*New behavior*:
356+
357+
.. ipython:: python
358+
359+
df.groupby("A").rolling(2).sum()
360+
305361
.. _whatsnew_130.notable_bug_fixes.rolling_var_precision:
306362

307363
Removed artificial truncation in rolling variance and standard deviation
@@ -501,6 +557,7 @@ Numeric
501557
- Bug in :meth:`DataFrame.mode` and :meth:`Series.mode` not keeping consistent integer :class:`Index` for empty input (:issue:`33321`)
502558
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
503559
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError`` (:issue:`38932`)
560+
- Bug in ``rank`` method for :class:`Series`, :class:`DataFrame`, :class:`DataFrameGroupBy`, and :class:`SeriesGroupBy` treating the most negative ``int64`` value as missing (:issue:`32859`)
504561
- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
505562
- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
506563
- Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)
@@ -513,6 +570,8 @@ Conversion
513570
- Bug in creating a :class:`DataFrame` from an empty ``np.recarray`` not retaining the original dtypes (:issue:`40121`)
514571
- Bug in :class:`DataFrame` failing to raise ``TypeError`` when constructing from a ``frozenset`` (:issue:`40163`)
515572
- Bug in :class:`Index` construction silently ignoring a passed ``dtype`` when the data cannot be cast to that dtype (:issue:`21311`)
573+
- Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`)
574+
-
516575

517576
Strings
518577
^^^^^^^

pandas/_libs/algos.pyx

Lines changed: 23 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -794,68 +794,14 @@ def backfill(ndarray[algos_t] old, ndarray[algos_t] new, limit=None) -> ndarray:
794794
return indexer
795795

796796

797-
@cython.boundscheck(False)
798-
@cython.wraparound(False)
799797
def backfill_inplace(algos_t[:] values, uint8_t[:] mask, limit=None):
800-
cdef:
801-
Py_ssize_t i, N
802-
algos_t val
803-
uint8_t prev_mask
804-
int lim, fill_count = 0
805-
806-
N = len(values)
807-
808-
# GH#2778
809-
if N == 0:
810-
return
811-
812-
lim = validate_limit(N, limit)
798+
pad_inplace(values[::-1], mask[::-1], limit=limit)
813799

814-
val = values[N - 1]
815-
prev_mask = mask[N - 1]
816-
for i in range(N - 1, -1, -1):
817-
if mask[i]:
818-
if fill_count >= lim:
819-
continue
820-
fill_count += 1
821-
values[i] = val
822-
mask[i] = prev_mask
823-
else:
824-
fill_count = 0
825-
val = values[i]
826-
prev_mask = mask[i]
827800

828-
829-
@cython.boundscheck(False)
830-
@cython.wraparound(False)
831801
def backfill_2d_inplace(algos_t[:, :] values,
832802
const uint8_t[:, :] mask,
833803
limit=None):
834-
cdef:
835-
Py_ssize_t i, j, N, K
836-
algos_t val
837-
int lim, fill_count = 0
838-
839-
K, N = (<object>values).shape
840-
841-
# GH#2778
842-
if N == 0:
843-
return
844-
845-
lim = validate_limit(N, limit)
846-
847-
for j in range(K):
848-
fill_count = 0
849-
val = values[j, N - 1]
850-
for i in range(N - 1, -1, -1):
851-
if mask[j, i]:
852-
if fill_count >= lim:
853-
continue
854-
fill_count += 1
855-
values[j, i] = val
856-
else:
857-
fill_count = 0
858-
val = values[j, i]
804+
pad_2d_inplace(values[:, ::-1], mask[:, ::-1], limit)
859805

860806

861807
@cython.boundscheck(False)
@@ -962,6 +908,7 @@ ctypedef fused rank_t:
962908
def rank_1d(
963909
ndarray[rank_t, ndim=1] values,
964910
const intp_t[:] labels,
911+
bint is_datetimelike=False,
965912
ties_method="average",
966913
bint ascending=True,
967914
bint pct=False,
@@ -977,17 +924,19 @@ def rank_1d(
977924
Array containing unique label for each group, with its ordering
978925
matching up to the corresponding record in `values`. If not called
979926
from a groupby operation, will be an array of 0's
927+
is_datetimelike : bool, default False
928+
True if `values` contains datetime-like entries.
980929
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default
981930
'average'
982931
* average: average rank of group
983932
* min: lowest rank in group
984933
* max: highest rank in group
985934
* first: ranks assigned in order they appear in the array
986935
* dense: like 'min', but rank always increases by 1 between groups
987-
ascending : boolean, default True
936+
ascending : bool, default True
988937
False for ranks by high (1) to low (N)
989938
na_option : {'keep', 'top', 'bottom'}, default 'keep'
990-
pct : boolean, default False
939+
pct : bool, default False
991940
Compute percentage rank of data within each group
992941
na_option : {'keep', 'top', 'bottom'}, default 'keep'
993942
* keep: leave NA values where they are
@@ -1032,7 +981,7 @@ def rank_1d(
1032981

1033982
if rank_t is object:
1034983
mask = missing.isnaobj(masked_vals)
1035-
elif rank_t is int64_t:
984+
elif rank_t is int64_t and is_datetimelike:
1036985
mask = (masked_vals == NPY_NAT).astype(np.uint8)
1037986
elif rank_t is float64_t:
1038987
mask = np.isnan(masked_vals).astype(np.uint8)
@@ -1059,7 +1008,7 @@ def rank_1d(
10591008
if rank_t is object:
10601009
nan_fill_val = NegInfinity()
10611010
elif rank_t is int64_t:
1062-
nan_fill_val = np.iinfo(np.int64).min
1011+
nan_fill_val = NPY_NAT
10631012
elif rank_t is uint64_t:
10641013
nan_fill_val = 0
10651014
else:
@@ -1275,6 +1224,7 @@ def rank_1d(
12751224
def rank_2d(
12761225
ndarray[rank_t, ndim=2] in_arr,
12771226
int axis=0,
1227+
bint is_datetimelike=False,
12781228
ties_method="average",
12791229
bint ascending=True,
12801230
na_option="keep",
@@ -1299,7 +1249,9 @@ def rank_2d(
12991249
tiebreak = tiebreakers[ties_method]
13001250

13011251
keep_na = na_option == 'keep'
1302-
check_mask = rank_t is not uint64_t
1252+
1253+
# For cases where a mask is not possible, we can avoid mask checks
1254+
check_mask = not (rank_t is uint64_t or (rank_t is int64_t and not is_datetimelike))
13031255

13041256
if axis == 0:
13051257
values = np.asarray(in_arr).T.copy()
@@ -1310,28 +1262,34 @@ def rank_2d(
13101262
if values.dtype != np.object_:
13111263
values = values.astype('O')
13121264

1313-
if rank_t is not uint64_t:
1265+
if check_mask:
13141266
if ascending ^ (na_option == 'top'):
13151267
if rank_t is object:
13161268
nan_value = Infinity()
13171269
elif rank_t is float64_t:
13181270
nan_value = np.inf
1319-
elif rank_t is int64_t:
1271+
1272+
# int64 and datetimelike
1273+
else:
13201274
nan_value = np.iinfo(np.int64).max
13211275

13221276
else:
13231277
if rank_t is object:
13241278
nan_value = NegInfinity()
13251279
elif rank_t is float64_t:
13261280
nan_value = -np.inf
1327-
elif rank_t is int64_t:
1281+
1282+
# int64 and datetimelike
1283+
else:
13281284
nan_value = NPY_NAT
13291285

13301286
if rank_t is object:
13311287
mask = missing.isnaobj2d(values)
13321288
elif rank_t is float64_t:
13331289
mask = np.isnan(values)
1334-
elif rank_t is int64_t:
1290+
1291+
# int64 and datetimelike
1292+
else:
13351293
mask = values == NPY_NAT
13361294

13371295
np.putmask(values, mask, nan_value)

0 commit comments

Comments
 (0)