Skip to content

Commit 91f8436

Browse files
authored
Merge branch 'main' into sas/decompress3
2 parents 17c72f8 + b6d5e97 commit 91f8436

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+925
-371
lines changed

doc/source/reference/testing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Exceptions and warnings
2626

2727
errors.AbstractMethodError
2828
errors.AccessorRegistrationWarning
29+
errors.CSSWarning
2930
errors.DataError
3031
errors.DtypeWarning
3132
errors.DuplicateLabelError

doc/source/whatsnew/v1.4.4.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Fixed regressions
2424
Bug fixes
2525
~~~~~~~~~
2626
- The :class:`errors.FutureWarning` raised when passing arguments (other than ``filepath_or_buffer``) as positional in :func:`read_csv` is now raised at the correct stacklevel (:issue:`47385`)
27-
-
27+
- Bug in :meth:`DataFrame.to_sql` when ``method`` was a ``callable`` that did not return an ``int`` and would raise a ``TypeError`` (:issue:`46891`)
2828

2929
.. ---------------------------------------------------------------------------
3030

doc/source/whatsnew/v1.5.0.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,7 @@ Other enhancements
275275
- :class:`.DataError`, :class:`.SpecificationError`, :class:`.SettingWithCopyError`, :class:`.SettingWithCopyWarning`, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, and :class:`.IndexingError` are now exposed in ``pandas.errors`` (:issue:`27656`)
276276
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
277277
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
278+
- :class:`Series` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) will now successfully operate when the dtype is numeric and ``numeric_only=True`` is provided; previously this would raise a ``NotImplementedError`` (:issue:`47500`)
278279

279280
.. ---------------------------------------------------------------------------
280281
.. _whatsnew_150.notable_bug_fixes:
@@ -766,7 +767,8 @@ Other Deprecations
766767
- Deprecated the argument ``na_sentinel`` in :func:`factorize`, :meth:`Index.factorize`, and :meth:`.ExtensionArray.factorize`; pass ``use_na_sentinel=True`` instead to use the sentinel ``-1`` for NaN values and ``use_na_sentinel=False`` instead of ``na_sentinel=None`` to encode NaN values (:issue:`46910`)
767768
- Deprecated :meth:`DataFrameGroupBy.transform` not aligning the result when the UDF returned DataFrame (:issue:`45648`)
768769
- Clarified warning from :func:`to_datetime` when delimited dates can't be parsed in accordance to specified ``dayfirst`` argument (:issue:`46210`)
769-
770+
- Deprecated :class:`Series` and :class:`Resampler` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) raising a ``NotImplementedError`` when the dtype is non-numric and ``numeric_only=True`` is provided; this will raise a ``TypeError`` in a future version (:issue:`47500`)
771+
- Deprecated :meth:`Series.rank` returning an empty result when the dtype is non-numeric and ``numeric_only=True`` is provided; this will raise a ``TypeError`` in a future version (:issue:`47500`)
770772

771773
.. ---------------------------------------------------------------------------
772774
.. _whatsnew_150.performance:
@@ -852,6 +854,7 @@ Conversion
852854
- Bug in metaclass of generic abstract dtypes causing :meth:`DataFrame.apply` and :meth:`Series.apply` to raise for the built-in function ``type`` (:issue:`46684`)
853855
- Bug in :meth:`DataFrame.to_records` returning inconsistent numpy types if the index was a :class:`MultiIndex` (:issue:`47263`)
854856
- Bug in :meth:`DataFrame.to_dict` for ``orient="list"`` or ``orient="index"`` was not returning native types (:issue:`46751`)
857+
- Bug in :meth:`DataFrame.apply` that returns a :class:`DataFrame` instead of a :class:`Series` when applied to an empty :class:`DataFrame` and ``axis=1`` (:issue:`39111`)
855858

856859
Strings
857860
^^^^^^^
@@ -882,6 +885,7 @@ Indexing
882885
- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`)
883886
- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtype :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`)
884887
- Bug in :meth:`DataFrame.loc` when setting values to a column and right hand side is a dictionary (:issue:`47216`)
888+
- Bug in :meth:`DataFrame.loc` when setting a :class:`DataFrame` not aligning index in some cases (:issue:`47578`)
885889
- Bug in :meth:`Series.__setitem__` with ``datetime64[ns]`` dtype, an all-``False`` boolean mask, and an incompatible value incorrectly casting to ``object`` instead of retaining ``datetime64[ns]`` dtype (:issue:`45967`)
886890
- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`)
887891
- Bug in :meth:`Series.__setitem__` losing precision when enlarging :class:`Series` with scalar (:issue:`32346`)
@@ -934,12 +938,14 @@ I/O
934938
- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`)
935939
- :meth:`to_html` now excludes the ``border`` attribute from ``<table>`` elements when ``border`` keyword is set to ``False``.
936940
- Bug in :func:`read_sas` with certain types of compressed SAS7BDAT files (:issue:`35545`)
941+
- Bug in :func:`read_excel` not forward filling :class:`MultiIndex` when no names were given (:issue:`47487`)
937942
- Bug in :func:`read_sas` returned ``None`` rather than an empty DataFrame for SAS7BDAT files with zero rows (:issue:`18198`)
938943
- Bug in :class:`StataWriter` where value labels were always written with default encoding (:issue:`46750`)
939944
- Bug in :class:`StataWriterUTF8` where some valid characters were removed from variable names (:issue:`47276`)
940945
- Bug in :meth:`DataFrame.to_excel` when writing an empty dataframe with :class:`MultiIndex` (:issue:`19543`)
941946
- Bug in :func:`read_sas` with RLE-compressed SAS7BDAT files that contain 0x40 control bytes (:issue:`31243`)
942947
- Bug in :func:`read_sas` that scrambled column names (:issue:`31243`)
948+
- Bug in :func:`read_sas` with RLE-compressed SAS7BDAT files that contain 0x00 control bytes (:issue:`47099`)
943949
-
944950

945951
Period
@@ -995,6 +1001,7 @@ Reshaping
9951001
- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
9961002
- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`)
9971003
- Bug in concatenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`)
1004+
- Bug in :func:`concat` losing dtype of columns when ``join="outer"`` and ``sort=True`` (:issue:`47329`)
9981005
- Bug in :func:`concat` not sorting the column names when ``None`` is included (:issue:`47331`)
9991006
- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`)
10001007
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)

pandas/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# flake8: noqa
2+
from __future__ import annotations
23

34
__docformat__ = "restructuredtext"
45

@@ -185,7 +186,7 @@
185186
__deprecated_num_index_names = ["Float64Index", "Int64Index", "UInt64Index"]
186187

187188

188-
def __dir__():
189+
def __dir__() -> list[str]:
189190
# GH43028
190191
# Int64Index etc. are deprecated, but we still want them to be available in the dir.
191192
# Remove in Pandas 2.0, when we remove Int64Index etc. from the code base.

pandas/_config/config.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
Callable,
6161
Generic,
6262
Iterable,
63+
Iterator,
6364
NamedTuple,
6465
cast,
6566
)
@@ -435,13 +436,13 @@ def __init__(self, *args) -> None:
435436

436437
self.ops = list(zip(args[::2], args[1::2]))
437438

438-
def __enter__(self):
439+
def __enter__(self) -> None:
439440
self.undo = [(pat, _get_option(pat, silent=True)) for pat, val in self.ops]
440441

441442
for pat, val in self.ops:
442443
_set_option(pat, val, silent=True)
443444

444-
def __exit__(self, *args):
445+
def __exit__(self, *args) -> None:
445446
if self.undo:
446447
for pat, val in self.undo:
447448
_set_option(pat, val, silent=True)
@@ -733,7 +734,7 @@ def pp(name: str, ks: Iterable[str]) -> list[str]:
733734

734735

735736
@contextmanager
736-
def config_prefix(prefix):
737+
def config_prefix(prefix) -> Iterator[None]:
737738
"""
738739
contextmanager for multiple invocations of API with a common prefix
739740

pandas/_libs/algos.pyx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,7 @@ def kth_smallest(numeric_t[::1] arr, Py_ssize_t k) -> numeric_t:
324324

325325
@cython.boundscheck(False)
326326
@cython.wraparound(False)
327+
@cython.cdivision(True)
327328
def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
328329
cdef:
329330
Py_ssize_t i, j, xi, yi, N, K
@@ -356,8 +357,8 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
356357
nobs += 1
357358
dx = vx - meanx
358359
dy = vy - meany
359-
meanx += 1 / nobs * dx
360-
meany += 1 / nobs * dy
360+
meanx += 1. / nobs * dx
361+
meany += 1. / nobs * dy
361362
ssqdmx += (vx - meanx) * dx
362363
ssqdmy += (vy - meany) * dy
363364
covxy += (vx - meanx) * dy

pandas/_testing/__init__.py

Lines changed: 36 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -238,15 +238,15 @@
238238
_testing_mode_warnings = (DeprecationWarning, ResourceWarning)
239239

240240

241-
def set_testing_mode():
241+
def set_testing_mode() -> None:
242242
# set the testing mode filters
243243
testing_mode = os.environ.get("PANDAS_TESTING_MODE", "None")
244244
if "deprecate" in testing_mode:
245245
for category in _testing_mode_warnings:
246246
warnings.simplefilter("always", category)
247247

248248

249-
def reset_testing_mode():
249+
def reset_testing_mode() -> None:
250250
# reset the testing mode filters
251251
testing_mode = os.environ.get("PANDAS_TESTING_MODE", "None")
252252
if "deprecate" in testing_mode:
@@ -257,7 +257,7 @@ def reset_testing_mode():
257257
set_testing_mode()
258258

259259

260-
def reset_display_options():
260+
def reset_display_options() -> None:
261261
"""
262262
Reset the display options for printing and representing objects.
263263
"""
@@ -333,38 +333,38 @@ def to_array(obj):
333333
# Others
334334

335335

336-
def getCols(k):
336+
def getCols(k) -> str:
337337
return string.ascii_uppercase[:k]
338338

339339

340340
# make index
341-
def makeStringIndex(k=10, name=None):
341+
def makeStringIndex(k=10, name=None) -> Index:
342342
return Index(rands_array(nchars=10, size=k), name=name)
343343

344344

345-
def makeCategoricalIndex(k=10, n=3, name=None, **kwargs):
345+
def makeCategoricalIndex(k=10, n=3, name=None, **kwargs) -> CategoricalIndex:
346346
"""make a length k index or n categories"""
347347
x = rands_array(nchars=4, size=n, replace=False)
348348
return CategoricalIndex(
349349
Categorical.from_codes(np.arange(k) % n, categories=x), name=name, **kwargs
350350
)
351351

352352

353-
def makeIntervalIndex(k=10, name=None, **kwargs):
353+
def makeIntervalIndex(k=10, name=None, **kwargs) -> IntervalIndex:
354354
"""make a length k IntervalIndex"""
355355
x = np.linspace(0, 100, num=(k + 1))
356356
return IntervalIndex.from_breaks(x, name=name, **kwargs)
357357

358358

359-
def makeBoolIndex(k=10, name=None):
359+
def makeBoolIndex(k=10, name=None) -> Index:
360360
if k == 1:
361361
return Index([True], name=name)
362362
elif k == 2:
363363
return Index([False, True], name=name)
364364
return Index([False, True] + [False] * (k - 2), name=name)
365365

366366

367-
def makeNumericIndex(k=10, name=None, *, dtype):
367+
def makeNumericIndex(k=10, name=None, *, dtype) -> NumericIndex:
368368
dtype = pandas_dtype(dtype)
369369
assert isinstance(dtype, np.dtype)
370370

@@ -382,21 +382,21 @@ def makeNumericIndex(k=10, name=None, *, dtype):
382382
return NumericIndex(values, dtype=dtype, name=name)
383383

384384

385-
def makeIntIndex(k=10, name=None):
385+
def makeIntIndex(k=10, name=None) -> Int64Index:
386386
base_idx = makeNumericIndex(k, name=name, dtype="int64")
387387
return Int64Index(base_idx)
388388

389389

390-
def makeUIntIndex(k=10, name=None):
390+
def makeUIntIndex(k=10, name=None) -> UInt64Index:
391391
base_idx = makeNumericIndex(k, name=name, dtype="uint64")
392392
return UInt64Index(base_idx)
393393

394394

395-
def makeRangeIndex(k=10, name=None, **kwargs):
395+
def makeRangeIndex(k=10, name=None, **kwargs) -> RangeIndex:
396396
return RangeIndex(0, k, 1, name=name, **kwargs)
397397

398398

399-
def makeFloatIndex(k=10, name=None):
399+
def makeFloatIndex(k=10, name=None) -> Float64Index:
400400
base_idx = makeNumericIndex(k, name=name, dtype="float64")
401401
return Float64Index(base_idx)
402402

@@ -456,57 +456,57 @@ def all_timeseries_index_generator(k: int = 10) -> Iterable[Index]:
456456

457457

458458
# make series
459-
def make_rand_series(name=None, dtype=np.float64):
459+
def make_rand_series(name=None, dtype=np.float64) -> Series:
460460
index = makeStringIndex(_N)
461461
data = np.random.randn(_N)
462462
data = data.astype(dtype, copy=False)
463463
return Series(data, index=index, name=name)
464464

465465

466-
def makeFloatSeries(name=None):
466+
def makeFloatSeries(name=None) -> Series:
467467
return make_rand_series(name=name)
468468

469469

470-
def makeStringSeries(name=None):
470+
def makeStringSeries(name=None) -> Series:
471471
return make_rand_series(name=name)
472472

473473

474-
def makeObjectSeries(name=None):
474+
def makeObjectSeries(name=None) -> Series:
475475
data = makeStringIndex(_N)
476476
data = Index(data, dtype=object)
477477
index = makeStringIndex(_N)
478478
return Series(data, index=index, name=name)
479479

480480

481-
def getSeriesData():
481+
def getSeriesData() -> dict[str, Series]:
482482
index = makeStringIndex(_N)
483483
return {c: Series(np.random.randn(_N), index=index) for c in getCols(_K)}
484484

485485

486-
def makeTimeSeries(nper=None, freq="B", name=None):
486+
def makeTimeSeries(nper=None, freq="B", name=None) -> Series:
487487
if nper is None:
488488
nper = _N
489489
return Series(
490490
np.random.randn(nper), index=makeDateIndex(nper, freq=freq), name=name
491491
)
492492

493493

494-
def makePeriodSeries(nper=None, name=None):
494+
def makePeriodSeries(nper=None, name=None) -> Series:
495495
if nper is None:
496496
nper = _N
497497
return Series(np.random.randn(nper), index=makePeriodIndex(nper), name=name)
498498

499499

500-
def getTimeSeriesData(nper=None, freq="B"):
500+
def getTimeSeriesData(nper=None, freq="B") -> dict[str, Series]:
501501
return {c: makeTimeSeries(nper, freq) for c in getCols(_K)}
502502

503503

504-
def getPeriodData(nper=None):
504+
def getPeriodData(nper=None) -> dict[str, Series]:
505505
return {c: makePeriodSeries(nper) for c in getCols(_K)}
506506

507507

508508
# make frame
509-
def makeTimeDataFrame(nper=None, freq="B"):
509+
def makeTimeDataFrame(nper=None, freq="B") -> DataFrame:
510510
data = getTimeSeriesData(nper, freq)
511511
return DataFrame(data)
512512

@@ -533,14 +533,19 @@ def makeMixedDataFrame():
533533
return DataFrame(getMixedTypeDict()[1])
534534

535535

536-
def makePeriodFrame(nper=None):
536+
def makePeriodFrame(nper=None) -> DataFrame:
537537
data = getPeriodData(nper)
538538
return DataFrame(data)
539539

540540

541541
def makeCustomIndex(
542-
nentries, nlevels, prefix="#", names=False, ndupe_l=None, idx_type=None
543-
):
542+
nentries,
543+
nlevels,
544+
prefix="#",
545+
names: bool | str | list[str] | None = False,
546+
ndupe_l=None,
547+
idx_type=None,
548+
) -> Index:
544549
"""
545550
Create an index/multindex with given dimensions, levels, names, etc'
546551
@@ -637,7 +642,8 @@ def keyfunc(x):
637642
# convert tuples to index
638643
if nentries == 1:
639644
# we have a single level of tuples, i.e. a regular Index
640-
index = Index(tuples[0], name=names[0])
645+
name = None if names is None else names[0]
646+
index = Index(tuples[0], name=name)
641647
elif nlevels == 1:
642648
name = None if names is None else names[0]
643649
index = Index((x[0] for x in tuples), name=name)
@@ -659,7 +665,7 @@ def makeCustomDataframe(
659665
dtype=None,
660666
c_idx_type=None,
661667
r_idx_type=None,
662-
):
668+
) -> DataFrame:
663669
"""
664670
Create a DataFrame using supplied parameters.
665671
@@ -780,7 +786,7 @@ def _gen_unique_rand(rng, _extra_size):
780786
return i.tolist(), j.tolist()
781787

782788

783-
def makeMissingDataframe(density=0.9, random_state=None):
789+
def makeMissingDataframe(density=0.9, random_state=None) -> DataFrame:
784790
df = makeDataFrame()
785791
i, j = _create_missing_idx(*df.shape, density=density, random_state=random_state)
786792
df.values[i, j] = np.nan
@@ -854,7 +860,7 @@ def skipna_wrapper(x):
854860
return skipna_wrapper
855861

856862

857-
def convert_rows_list_to_csv_str(rows_list: list[str]):
863+
def convert_rows_list_to_csv_str(rows_list: list[str]) -> str:
858864
"""
859865
Convert list of CSV rows to single CSV-formatted string for current OS.
860866

pandas/_testing/_io.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -250,7 +250,7 @@ def wrapper(*args, **kwargs):
250250
return wrapper
251251

252252

253-
def can_connect(url, error_classes=None):
253+
def can_connect(url, error_classes=None) -> bool:
254254
"""
255255
Try to connect to the given url. True if succeeds, False if OSError
256256
raised
@@ -424,7 +424,7 @@ def write_to_compressed(compression, path, data, dest="test"):
424424
# Plotting
425425

426426

427-
def close(fignum=None):
427+
def close(fignum=None) -> None:
428428
from matplotlib.pyplot import (
429429
close as _close,
430430
get_fignums,

0 commit comments

Comments
 (0)