Skip to content

Commit 252197e

Browse files
committed
Merge remote-tracking branch 'upstream/main' into pandas-asan
2 parents 3e295c5 + daa9cdb commit 252197e

File tree

85 files changed

+897
-431
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+897
-431
lines changed

.gitignore

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
.mesonpy-native-file.ini
4040
MANIFEST
4141
compile_commands.json
42+
debug
4243
.debug
4344

4445
# Python files #
@@ -104,10 +105,11 @@ scikits
104105
# Generated Sources #
105106
#####################
106107
!skts.c
107-
!np_datetime.c
108-
!np_datetime_strings.c
109108
*.c
110109
*.cpp
110+
!pandas/_libs/src/**/*.c
111+
!pandas/_libs/src/**/*.h
112+
!pandas/_libs/include/**/*.h
111113

112114
# Unit / Performance Testing #
113115
##############################

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@ ci:
2020
repos:
2121
- repo: https://github.com/hauntsaninja/black-pre-commit-mirror
2222
# black compiled with mypyc
23-
rev: 23.10.1
23+
rev: 23.11.0
2424
hooks:
2525
- id: black
2626
- repo: https://github.com/astral-sh/ruff-pre-commit
27-
rev: v0.1.4
27+
rev: v0.1.6
2828
hooks:
2929
- id: ruff
3030
args: [--exit-non-zero-on-fix]
@@ -47,7 +47,7 @@ repos:
4747
types_or: [python, rst, markdown, cython, c]
4848
additional_dependencies: [tomli]
4949
- repo: https://github.com/MarcoGorelli/cython-lint
50-
rev: v0.15.0
50+
rev: v0.16.0
5151
hooks:
5252
- id: cython-lint
5353
- id: double-quote-cython-strings
@@ -111,11 +111,11 @@ repos:
111111
types: [text] # overwrite types: [rst]
112112
types_or: [python, rst]
113113
- repo: https://github.com/sphinx-contrib/sphinx-lint
114-
rev: v0.8.1
114+
rev: v0.9.0
115115
hooks:
116116
- id: sphinx-lint
117117
- repo: https://github.com/pre-commit/mirrors-clang-format
118-
rev: v17.0.4
118+
rev: v17.0.6
119119
hooks:
120120
- id: clang-format
121121
files: ^pandas/_libs/src|^pandas/_libs/include

doc/source/user_guide/categorical.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -647,7 +647,7 @@ Pivot tables:
647647
648648
raw_cat = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"])
649649
df = pd.DataFrame({"A": raw_cat, "B": ["c", "d", "c", "d"], "values": [1, 2, 3, 4]})
650-
pd.pivot_table(df, values="values", index=["A", "B"])
650+
pd.pivot_table(df, values="values", index=["A", "B"], observed=False)
651651
652652
Data munging
653653
------------

doc/source/whatsnew/v0.23.0.rst

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -286,12 +286,33 @@ For pivoting operations, this behavior is *already* controlled by the ``dropna``
286286
df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
287287
df
288288
289-
.. ipython:: python
290289
291-
pd.pivot_table(df, values='values', index=['A', 'B'],
292-
dropna=True)
293-
pd.pivot_table(df, values='values', index=['A', 'B'],
294-
dropna=False)
290+
.. code-block:: ipython
291+
292+
In [1]: pd.pivot_table(df, values='values', index=['A', 'B'], dropna=True)
293+
294+
Out[1]:
295+
values
296+
A B
297+
a c 1.0
298+
d 2.0
299+
b c 3.0
300+
d 4.0
301+
302+
In [2]: pd.pivot_table(df, values='values', index=['A', 'B'], dropna=False)
303+
304+
Out[2]:
305+
values
306+
A B
307+
a c 1.0
308+
d 2.0
309+
y NaN
310+
b c 3.0
311+
d 4.0
312+
y NaN
313+
z c NaN
314+
d NaN
315+
y NaN
295316
296317
297318
.. _whatsnew_0230.enhancements.window_raw:

doc/source/whatsnew/v2.1.2.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ Fixed regressions
3838
Bug fixes
3939
~~~~~~~~~
4040
- Fixed bug in :class:`.DataFrameGroupBy` reductions not preserving object dtype when ``infer_string`` is set (:issue:`55620`)
41-
- Fixed bug in :meth:`.DataFrameGroupBy.min()` and :meth:`.DataFrameGroupBy.max()` not preserving extension dtype for empty object (:issue:`55619`)
4241
- Fixed bug in :meth:`.SeriesGroupBy.value_counts` returning incorrect dtype for string columns (:issue:`55627`)
4342
- Fixed bug in :meth:`Categorical.equals` if other has arrow backed string dtype (:issue:`55364`)
4443
- Fixed bug in :meth:`DataFrame.__setitem__` not inferring string dtype for zero-dimensional array with ``infer_string=True`` (:issue:`55366`)

doc/source/whatsnew/v2.1.4.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Bug fixes
2424
- Bug in :class:`Series` constructor raising DeprecationWarning when ``index`` is a list of :class:`Series` (:issue:`55228`)
2525
- Bug in :meth:`Index.__getitem__` returning wrong result for Arrow dtypes and negative stepsize (:issue:`55832`)
2626
- Fixed bug in :func:`to_numeric` converting to extension dtype for ``string[pyarrow_numpy]`` dtype (:issue:`56179`)
27+
- Fixed bug in :meth:`.DataFrameGroupBy.min()` and :meth:`.DataFrameGroupBy.max()` not preserving extension dtype for empty object (:issue:`55619`)
2728
- Fixed bug in :meth:`DataFrame.__setitem__` casting :class:`Index` with object-dtype to PyArrow backed strings when ``infer_string`` option is set (:issue:`55638`)
2829
- Fixed bug in :meth:`DataFrame.to_hdf` raising when columns have ``StringDtype`` (:issue:`55088`)
2930
- Fixed bug in :meth:`Index.insert` casting object-dtype to PyArrow backed strings when ``infer_string`` option is set (:issue:`55638`)

doc/source/whatsnew/v2.2.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,7 @@ Other enhancements
226226
- Allow passing ``read_only``, ``data_only`` and ``keep_links`` arguments to openpyxl using ``engine_kwargs`` of :func:`read_excel` (:issue:`55027`)
227227
- DataFrame.apply now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
228228
- Implement masked algorithms for :meth:`Series.value_counts` (:issue:`54984`)
229+
- Improved error message that appears in :meth:`DatetimeIndex.to_period` with frequencies which are not supported as period frequencies, such as "BMS" (:issue:`56243`)
229230
- Improved error message when constructing :class:`Period` with invalid offsets such as "QS" (:issue:`55785`)
230231

231232
.. ---------------------------------------------------------------------------
@@ -434,6 +435,7 @@ Other Deprecations
434435
- Deprecated the ``ordinal`` keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`)
435436
- Deprecated the ``unit`` keyword in :class:`TimedeltaIndex` construction, use :func:`to_timedelta` instead (:issue:`55499`)
436437
- Deprecated the behavior of :meth:`Series.value_counts` and :meth:`Index.value_counts` with object dtype; in a future version these will not perform dtype inference on the resulting :class:`Index`, do ``result.index = result.index.infer_objects()`` to retain the old behavior (:issue:`56161`)
438+
- Deprecated the default of ``observed=False`` in :meth:`DataFrame.pivot_table`; will be ``True`` in a future version (:issue:`56236`)
437439
- Deprecated the extension test classes ``BaseNoReduceTests``, ``BaseBooleanReduceTests``, and ``BaseNumericReduceTests``, use ``BaseReduceTests`` instead (:issue:`54663`)
438440
- Deprecated the option ``mode.data_manager`` and the ``ArrayManager``; only the ``BlockManager`` will be available in future versions (:issue:`55043`)
439441
- Deprecated the previous implementation of :class:`DataFrame.stack`; specify ``future_stack=True`` to adopt the future version (:issue:`53515`)

pandas/_libs/include/pandas/parser/tokenizer.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,8 @@ See LICENSE for the license
1818
#define ERROR_OVERFLOW 2
1919
#define ERROR_INVALID_CHARS 3
2020

21-
#include "pandas/portable.h"
2221
#include <stdint.h>
2322

24-
#include "pandas/vendored/klib/khash.h"
25-
2623
#define STREAM_INIT_SIZE 32
2724

2825
#define REACHED_EOF 1

pandas/_libs/include/pandas/vendored/ujson/python/version.h

Lines changed: 0 additions & 41 deletions
This file was deleted.

pandas/_libs/src/parser/pd_parser.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Distributed under the terms of the BSD Simplified License.
1010

1111
#include "pandas/parser/pd_parser.h"
1212
#include "pandas/parser/io.h"
13+
#include "pandas/portable.h"
1314

1415
static int to_double(char *item, double *p_value, char sci, char decimal,
1516
int *maybe_int) {

pandas/_libs/src/parser/tokenizer.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,16 @@ Python's built-in csv module and Warren Weckesser's textreader project on
1616
GitHub. See Python Software Foundation License and BSD licenses for these.
1717
1818
*/
19-
2019
#include "pandas/parser/tokenizer.h"
20+
#include "pandas/portable.h"
2121

2222
#include <ctype.h>
2323
#include <float.h>
2424
#include <math.h>
2525
#include <stdbool.h>
2626

2727
#include "pandas/portable.h"
28+
#include "pandas/vendored/klib/khash.h" // for kh_int64_t, kh_destroy_int64
2829

2930
void coliter_setup(coliter_t *self, parser_t *parser, int64_t i,
3031
int64_t start) {

pandas/_libs/src/vendored/numpy/datetime/np_datetime.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,8 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
2525
#include <Python.h>
2626

2727
#include "pandas/vendored/numpy/datetime/np_datetime.h"
28-
#include <numpy/arrayobject.h>
29-
#include <numpy/arrayscalars.h>
3028
#include <numpy/ndarraytypes.h>
29+
#include <numpy/npy_common.h>
3130

3231
#if defined(_WIN32)
3332
#ifndef ENABLE_INTSAFE_SIGNED_FUNCTIONS

pandas/_libs/src/vendored/numpy/datetime/np_datetime_strings.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,8 @@ This file implements string parsing and creation for NumPy datetime.
3232

3333
#include <time.h>
3434

35-
#include <numpy/arrayobject.h>
36-
#include <numpy/arrayscalars.h>
3735
#include <numpy/ndarraytypes.h>
36+
#include <numpy/npy_common.h>
3837

3938
#include "pandas/vendored/numpy/datetime/np_datetime.h"
4039
#include "pandas/vendored/numpy/datetime/np_datetime_strings.h"

pandas/_libs/src/vendored/ujson/lib/ultrajsondec.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@ Numeric decoder derived from TCL library
4141
// Licence at LICENSES/ULTRAJSON_LICENSE
4242

4343
#include "pandas/vendored/ujson/lib/ultrajson.h"
44-
#include <assert.h>
4544
#include <errno.h>
4645
#include <limits.h>
4746
#include <locale.h>

pandas/_libs/src/vendored/ujson/lib/ultrajsonenc.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,6 @@ Numeric decoder derived from TCL library
4141
// Licence at LICENSES/ULTRAJSON_LICENSE
4242

4343
#include "pandas/vendored/ujson/lib/ultrajson.h"
44-
#include <assert.h>
45-
#include <float.h>
4644
#include <locale.h>
4745
#include <math.h>
4846
#include <stdint.h>

pandas/_libs/src/vendored/ujson/python/objToJSON.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ Numeric decoder derived from TCL library
4040

4141
#define PY_SSIZE_T_CLEAN
4242
#include <Python.h>
43-
#include <math.h>
4443

4544
#define NO_IMPORT_ARRAY
4645
#define PY_ARRAY_UNIQUE_SYMBOL UJSON_NUMPY

pandas/_libs/src/vendored/ujson/python/ujson.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ Numeric decoder derived from TCL library
3838

3939
// Licence at LICENSES/ULTRAJSON_LICENSE
4040

41-
#include "pandas/vendored/ujson/python/version.h"
4241
#define PY_SSIZE_T_CLEAN
4342
#include <Python.h>
4443
#define PY_ARRAY_UNIQUE_SYMBOL UJSON_NUMPY

pandas/_testing/__init__.py

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -101,14 +101,11 @@
101101
if TYPE_CHECKING:
102102
from pandas._typing import (
103103
Dtype,
104-
Frequency,
105104
NpDtype,
106105
)
107106

108107
from pandas.core.arrays import ArrowExtensionArray
109108

110-
_N = 30
111-
112109
UNSIGNED_INT_NUMPY_DTYPES: list[NpDtype] = ["uint8", "uint16", "uint32", "uint64"]
113110
UNSIGNED_INT_EA_DTYPES: list[Dtype] = ["UInt8", "UInt16", "UInt32", "UInt64"]
114111
SIGNED_INT_NUMPY_DTYPES: list[NpDtype] = [int, "int8", "int16", "int32", "int64"]
@@ -339,16 +336,6 @@ def to_array(obj):
339336
# Others
340337

341338

342-
def makeTimeSeries(nper=None, freq: Frequency = "B", name=None) -> Series:
343-
if nper is None:
344-
nper = _N
345-
return Series(
346-
np.random.default_rng(2).standard_normal(nper),
347-
index=date_range("2000-01-01", periods=nper, freq=freq),
348-
name=name,
349-
)
350-
351-
352339
def makeCustomIndex(
353340
nentries,
354341
nlevels,
@@ -883,7 +870,6 @@ def shares_memory(left, right) -> bool:
883870
"loc",
884871
"makeCustomDataframe",
885872
"makeCustomIndex",
886-
"makeTimeSeries",
887873
"maybe_produces_warning",
888874
"NARROW_NP_DTYPES",
889875
"NP_NAT_OBJECTS",

pandas/conftest.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -766,9 +766,11 @@ def datetime_series() -> Series:
766766
"""
767767
Fixture for Series of floats with DatetimeIndex
768768
"""
769-
s = tm.makeTimeSeries()
770-
s.name = "ts"
771-
return s
769+
return Series(
770+
np.random.default_rng(2).standard_normal(30),
771+
index=date_range("2000-01-01", periods=30, freq="B"),
772+
name="ts",
773+
)
772774

773775

774776
def _create_series(index):

pandas/core/apply.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from pandas._config import option_context
2121

2222
from pandas._libs import lib
23+
from pandas._libs.internals import BlockValuesRefs
2324
from pandas._typing import (
2425
AggFuncType,
2526
AggFuncTypeBase,
@@ -1254,6 +1255,8 @@ def series_generator(self) -> Generator[Series, None, None]:
12541255
ser = self.obj._ixs(0, axis=0)
12551256
mgr = ser._mgr
12561257

1258+
is_view = mgr.blocks[0].refs.has_reference() # type: ignore[union-attr]
1259+
12571260
if isinstance(ser.dtype, ExtensionDtype):
12581261
# values will be incorrect for this block
12591262
# TODO(EA2D): special case would be unnecessary with 2D EAs
@@ -1267,6 +1270,14 @@ def series_generator(self) -> Generator[Series, None, None]:
12671270
ser._mgr = mgr
12681271
mgr.set_values(arr)
12691272
object.__setattr__(ser, "_name", name)
1273+
if not is_view:
1274+
# In apply_series_generator we store the a shallow copy of the
1275+
# result, which potentially increases the ref count of this reused
1276+
# `ser` object (depending on the result of the applied function)
1277+
# -> if that happened and `ser` is already a copy, then we reset
1278+
# the refs here to avoid triggering a unnecessary CoW inside the
1279+
# applied function (https://github.com/pandas-dev/pandas/pull/56212)
1280+
mgr.blocks[0].refs = BlockValuesRefs(mgr.blocks[0]) # type: ignore[union-attr]
12701281
yield ser
12711282

12721283
@staticmethod

pandas/core/arrays/period.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1174,7 +1174,12 @@ def dt64arr_to_periodarr(
11741174

11751175
reso = get_unit_from_dtype(data.dtype)
11761176
freq = Period._maybe_convert_freq(freq)
1177-
base = freq._period_dtype_code
1177+
try:
1178+
base = freq._period_dtype_code
1179+
except (AttributeError, TypeError):
1180+
# AttributeError: _period_dtype_code might not exist
1181+
# TypeError: _period_dtype_code might intentionally raise
1182+
raise TypeError(f"{freq.name} is not supported as period frequency")
11781183
return c_dt64arr_to_periodarr(data.view("i8"), base, tz, reso=reso), freq
11791184

11801185

pandas/core/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ class PandasObject(DirNamesMixin):
108108
@property
109109
def _constructor(self):
110110
"""
111-
Class constructor (for this class it's just `__class__`.
111+
Class constructor (for this class it's just `__class__`).
112112
"""
113113
return type(self)
114114

0 commit comments

Comments
 (0)