Skip to content

Commit a67913d

Browse files
authored
Merge branch 'main' into sql_nullable
2 parents 7d24b2a + 4846169 commit a67913d

28 files changed

+280
-150
lines changed

.github/workflows/scorecards.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ jobs:
1414
analysis:
1515
name: Scorecards analysis
1616
runs-on: ubuntu-22.04
17+
continue-on-error: true
1718
permissions:
1819
# Needed to upload the results to code-scanning dashboard.
1920
security-events: write

doc/source/whatsnew/v1.5.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Fixed regressions
1919
- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
2020
- Fixed regression in :meth:`SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
2121
- Fixed performance regression in setting with the :meth:`~DataFrame.at` indexer (:issue:`49771`)
22+
- Fixed regression in :func:`to_datetime` raising ``ValueError`` when parsing array of ``float`` containing ``np.nan`` (:issue:`50237`)
2223
-
2324

2425
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (
3030

3131
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_nullable_backend:
3232

33-
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
34-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
33+
Configuration option, ``mode.nullable_backend``, to return pyarrow-backed dtypes
34+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3535

3636
The ``use_nullable_dtypes`` keyword argument has been expanded to the following functions to enable automatic conversion to nullable dtypes (:issue:`36712`)
3737

@@ -41,16 +41,22 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4141
* :func:`read_sql_query`
4242
* :func:`read_sql_table`
4343

44-
Additionally a new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
44+
Additionally a new global configuration, ``mode.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
4545
to select the nullable dtypes implementation.
4646

4747
* :func:`read_csv` (with ``engine="pyarrow"``)
4848
* :func:`read_excel`
4949
* :func:`read_parquet`
5050
* :func:`read_orc`
5151

52-
By default, ``io.nullable_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
53-
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`).
52+
53+
And the following methods will also utilize the ``mode.nullable_backend`` option.
54+
55+
* :meth:`DataFrame.convert_dtypes`
56+
* :meth:`Series.convert_dtypes`
57+
58+
By default, ``mode.nullable_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
59+
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`, :issue:`49997`).
5460

5561
.. ipython:: python
5662
@@ -59,12 +65,12 @@ be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (
5965
1,2.5,True,a,,,,,
6066
3,4.5,False,b,6,7.5,True,a,
6167
""")
62-
with pd.option_context("io.nullable_backend", "pandas"):
68+
with pd.option_context("mode.nullable_backend", "pandas"):
6369
df = pd.read_csv(data, use_nullable_dtypes=True)
6470
df.dtypes
6571
6672
data.seek(0)
67-
with pd.option_context("io.nullable_backend", "pyarrow"):
73+
with pd.option_context("mode.nullable_backend", "pyarrow"):
6874
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
6975
df_pyarrow.dtypes
7076
@@ -472,6 +478,7 @@ Other API changes
472478
- :func:`read_stata` with parameter ``index_col`` set to ``None`` (the default) will now set the index on the returned :class:`DataFrame` to a :class:`RangeIndex` instead of a :class:`Int64Index` (:issue:`49745`)
473479
- Changed behavior of :class:`Index`, :class:`Series`, and :class:`DataFrame` arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operations, use ``result.infer_objects()`` to do type inference on the result (:issue:`49999`)
474480
- Changed behavior of :class:`Index` constructor with an object-dtype ``numpy.ndarray`` containing all-``bool`` values or all-complex values, this will now retain object dtype, consistent with the :class:`Series` behavior (:issue:`49594`)
481+
- Changed behavior of :class:`Series` and :class:`DataFrame` constructors when given an integer dtype and floating-point data that is not round numbers, this now raises ``ValueError`` instead of silently retaining the float dtype; do ``Series(data)`` or ``DataFrame(data)`` to get the old behavior, and ``Series(data).astype(dtype)`` or ``DataFrame(data).astype(dtype)`` to get the specified dtype (:issue:`49599`)
475482
- Changed behavior of :meth:`DataFrame.shift` with ``axis=1``, an integer ``fill_value``, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (:issue:`49842`)
476483
- Files are now closed when encountering an exception in :func:`read_json` (:issue:`49921`)
477484
- Changed behavior of :func:`read_csv`, :func:`read_json` & :func:`read_fwf`, where the index will now always be a :class:`RangeIndex`, when no index is specified. Previously the index would be a :class:`Index` with dtype ``object`` if the new DataFrame/Series has length 0 (:issue:`49572`)
@@ -777,6 +784,7 @@ Datetimelike
777784
- Bug in ``pandas.tseries.holiday.Holiday`` where a half-open date interval causes inconsistent return types from :meth:`USFederalHolidayCalendar.holidays` (:issue:`49075`)
778785
- Bug in rendering :class:`DatetimeIndex` and :class:`Series` and :class:`DataFrame` with timezone-aware dtypes with ``dateutil`` or ``zoneinfo`` timezones near daylight-savings transitions (:issue:`49684`)
779786
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing :class:`Timestamp`, ``datetime.datetime``, ``datetime.date``, or ``np.datetime64`` objects when non-ISO8601 ``format`` was passed (:issue:`49298`, :issue:`50036`)
787+
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
780788
-
781789

782790
Timedelta

pandas/_libs/tslibs/parsing.pyx

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,6 @@ class DateParseError(ValueError):
8585
_DEFAULT_DATETIME = datetime(1, 1, 1).replace(hour=0, minute=0,
8686
second=0, microsecond=0)
8787

88-
PARSING_WARNING_MSG = (
89-
"Parsing dates in {format} format when dayfirst={dayfirst} was specified. "
90-
"This may lead to inconsistently parsed dates! Specify a format "
91-
"to ensure consistent parsing."
92-
)
93-
9488
cdef:
9589
set _not_datelike_strings = {"a", "A", "m", "M", "p", "P", "t", "T"}
9690

@@ -203,28 +197,10 @@ cdef object _parse_delimited_date(str date_string, bint dayfirst):
203197
# date_string can't be converted to date, above format
204198
return None, None
205199

206-
swapped_day_and_month = False
207200
if 1 <= month <= MAX_DAYS_IN_MONTH and 1 <= day <= MAX_DAYS_IN_MONTH \
208201
and (month <= MAX_MONTH or day <= MAX_MONTH):
209202
if (month > MAX_MONTH or (day <= MAX_MONTH and dayfirst)) and can_swap:
210203
day, month = month, day
211-
swapped_day_and_month = True
212-
if dayfirst and not swapped_day_and_month:
213-
warnings.warn(
214-
PARSING_WARNING_MSG.format(
215-
format="MM/DD/YYYY",
216-
dayfirst="True",
217-
),
218-
stacklevel=find_stack_level(),
219-
)
220-
elif not dayfirst and swapped_day_and_month:
221-
warnings.warn(
222-
PARSING_WARNING_MSG.format(
223-
format="DD/MM/YYYY",
224-
dayfirst="False (the default)",
225-
),
226-
stacklevel=find_stack_level(),
227-
)
228204
# In Python <= 3.6.0 there is no range checking for invalid dates
229205
# in C api, thus we call faster C version for 3.6.1 or newer
230206
return datetime_new(year, month, day, 0, 0, 0, 0, None), reso

pandas/_libs/tslibs/strptime.pyx

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,11 @@ from pandas._libs.tslibs.np_datetime cimport (
4242
pydatetime_to_dt64,
4343
)
4444
from pandas._libs.tslibs.timestamps cimport _Timestamp
45-
from pandas._libs.util cimport is_datetime64_object
45+
from pandas._libs.util cimport (
46+
is_datetime64_object,
47+
is_float_object,
48+
is_integer_object,
49+
)
4650

4751
cnp.import_array()
4852

@@ -185,6 +189,12 @@ def array_strptime(
185189
elif is_datetime64_object(val):
186190
iresult[i] = get_datetime64_nanos(val, NPY_FR_ns)
187191
continue
192+
elif (
193+
(is_integer_object(val) or is_float_object(val))
194+
and (val != val or val == NPY_NAT)
195+
):
196+
iresult[i] = NPY_NAT
197+
continue
188198
else:
189199
val = str(val)
190200

pandas/core/config_init.py

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -539,13 +539,25 @@ def use_inf_as_na_cb(key) -> None:
539539
The default storage for StringDtype.
540540
"""
541541

542+
nullable_backend_doc = """
543+
: string
544+
The nullable dtype implementation to return.
545+
Available options: 'pandas', 'pyarrow', the default is 'pandas'.
546+
"""
547+
542548
with cf.config_prefix("mode"):
543549
cf.register_option(
544550
"string_storage",
545551
"python",
546552
string_storage_doc,
547553
validator=is_one_of_factory(["python", "pyarrow"]),
548554
)
555+
cf.register_option(
556+
"nullable_backend",
557+
"pandas",
558+
nullable_backend_doc,
559+
validator=is_one_of_factory(["pandas", "pyarrow"]),
560+
)
549561

550562
# Set up the io.excel specific reader configuration.
551563
reader_engine_doc = """
@@ -673,20 +685,6 @@ def use_inf_as_na_cb(key) -> None:
673685
validator=is_one_of_factory(["auto", "sqlalchemy"]),
674686
)
675687

676-
io_nullable_backend_doc = """
677-
: string
678-
The nullable dtype implementation to return when ``use_nullable_dtypes=True``.
679-
Available options: 'pandas', 'pyarrow', the default is 'pandas'.
680-
"""
681-
682-
with cf.config_prefix("io.nullable_backend"):
683-
cf.register_option(
684-
"io_nullable_backend",
685-
"pandas",
686-
io_nullable_backend_doc,
687-
validator=is_one_of_factory(["pandas", "pyarrow"]),
688-
)
689-
690688
# --------
691689
# Plotting
692690
# ---------

pandas/core/construction.py

Lines changed: 3 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
DtypeObj,
2828
T,
2929
)
30-
from pandas.errors import IntCastingNaNError
3130

3231
from pandas.core.dtypes.base import (
3332
ExtensionDtype,
@@ -46,7 +45,6 @@
4645
is_datetime64_ns_dtype,
4746
is_dtype_equal,
4847
is_extension_array_dtype,
49-
is_float_dtype,
5048
is_integer_dtype,
5149
is_list_like,
5250
is_object_dtype,
@@ -503,7 +501,6 @@ def sanitize_array(
503501
copy: bool = False,
504502
*,
505503
allow_2d: bool = False,
506-
strict_ints: bool = False,
507504
) -> ArrayLike:
508505
"""
509506
Sanitize input data to an ndarray or ExtensionArray, copy if specified,
@@ -517,8 +514,6 @@ def sanitize_array(
517514
copy : bool, default False
518515
allow_2d : bool, default False
519516
If False, raise if we have a 2D Arraylike.
520-
strict_ints : bool, default False
521-
If False, silently ignore failures to cast float data to int dtype.
522517
523518
Returns
524519
-------
@@ -571,32 +566,7 @@ def sanitize_array(
571566
if isinstance(data, np.matrix):
572567
data = data.A
573568

574-
if dtype is not None and is_float_dtype(data.dtype) and is_integer_dtype(dtype):
575-
# possibility of nan -> garbage
576-
try:
577-
# GH 47391 numpy > 1.24 will raise a RuntimeError for nan -> int
578-
# casting aligning with IntCastingNaNError below
579-
with np.errstate(invalid="ignore"):
580-
# GH#15832: Check if we are requesting a numeric dtype and
581-
# that we can convert the data to the requested dtype.
582-
subarr = maybe_cast_to_integer_array(data, dtype)
583-
584-
except IntCastingNaNError:
585-
raise
586-
except ValueError:
587-
# Pre-2.0, we would have different behavior for Series vs DataFrame.
588-
# DataFrame would call np.array(data, dtype=dtype, copy=copy),
589-
# which would cast to the integer dtype even if the cast is lossy.
590-
# See GH#40110.
591-
if strict_ints:
592-
raise
593-
594-
# We ignore the dtype arg and return floating values,
595-
# e.g. test_constructor_floating_data_int_dtype
596-
# TODO: where is the discussion that documents the reason for this?
597-
subarr = np.array(data, copy=copy)
598-
599-
elif dtype is None:
569+
if dtype is None:
600570
subarr = data
601571
if data.dtype == object:
602572
subarr = maybe_infer_to_datetimelike(data)
@@ -629,27 +599,8 @@ def sanitize_array(
629599
subarr = np.array([], dtype=np.float64)
630600

631601
elif dtype is not None:
632-
try:
633-
subarr = _try_cast(data, dtype, copy)
634-
except ValueError:
635-
if is_integer_dtype(dtype):
636-
if strict_ints:
637-
raise
638-
casted = np.array(data, copy=False)
639-
if casted.dtype.kind == "f":
640-
# GH#40110 match the behavior we have if we passed
641-
# a ndarray[float] to begin with
642-
return sanitize_array(
643-
casted,
644-
index,
645-
dtype,
646-
copy=False,
647-
allow_2d=allow_2d,
648-
)
649-
else:
650-
raise
651-
else:
652-
raise
602+
subarr = _try_cast(data, dtype, copy)
603+
653604
else:
654605
subarr = maybe_convert_platform(data)
655606
if subarr.dtype == object:

pandas/core/dtypes/cast.py

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from typing import (
1010
TYPE_CHECKING,
1111
Any,
12+
Literal,
1213
Sized,
1314
TypeVar,
1415
cast,
@@ -70,10 +71,12 @@
7071
pandas_dtype as pandas_dtype_func,
7172
)
7273
from pandas.core.dtypes.dtypes import (
74+
BaseMaskedDtype,
7375
CategoricalDtype,
7476
DatetimeTZDtype,
7577
ExtensionDtype,
7678
IntervalDtype,
79+
PandasExtensionDtype,
7780
PeriodDtype,
7881
)
7982
from pandas.core.dtypes.generic import (
@@ -958,6 +961,7 @@ def convert_dtypes(
958961
convert_boolean: bool = True,
959962
convert_floating: bool = True,
960963
infer_objects: bool = False,
964+
nullable_backend: Literal["pandas", "pyarrow"] = "pandas",
961965
) -> DtypeObj:
962966
"""
963967
Convert objects to best possible type, and optionally,
@@ -979,6 +983,11 @@ def convert_dtypes(
979983
infer_objects : bool, defaults False
980984
Whether to also infer objects to float/int if possible. Is only hit if the
981985
object array contains pd.NA.
986+
nullable_backend : str, default "pandas"
987+
Nullable dtype implementation to use.
988+
989+
* "pandas" returns numpy-backed nullable types
990+
* "pyarrow" returns pyarrow-backed nullable types using ``ArrowDtype``
982991
983992
Returns
984993
-------
@@ -997,9 +1006,9 @@ def convert_dtypes(
9971006

9981007
if is_string_dtype(inferred_dtype):
9991008
if not convert_string or inferred_dtype == "bytes":
1000-
return input_array.dtype
1009+
inferred_dtype = input_array.dtype
10011010
else:
1002-
return pandas_dtype_func("string")
1011+
inferred_dtype = pandas_dtype_func("string")
10031012

10041013
if convert_integer:
10051014
target_int_dtype = pandas_dtype_func("Int64")
@@ -1020,7 +1029,7 @@ def convert_dtypes(
10201029
elif (
10211030
infer_objects
10221031
and is_object_dtype(input_array.dtype)
1023-
and inferred_dtype == "integer"
1032+
and (isinstance(inferred_dtype, str) and inferred_dtype == "integer")
10241033
):
10251034
inferred_dtype = target_int_dtype
10261035

@@ -1047,7 +1056,10 @@ def convert_dtypes(
10471056
elif (
10481057
infer_objects
10491058
and is_object_dtype(input_array.dtype)
1050-
and inferred_dtype == "mixed-integer-float"
1059+
and (
1060+
isinstance(inferred_dtype, str)
1061+
and inferred_dtype == "mixed-integer-float"
1062+
)
10511063
):
10521064
inferred_dtype = pandas_dtype_func("Float64")
10531065

@@ -1062,7 +1074,27 @@ def convert_dtypes(
10621074
inferred_dtype = input_array.dtype
10631075

10641076
else:
1065-
return input_array.dtype
1077+
inferred_dtype = input_array.dtype
1078+
1079+
if nullable_backend == "pyarrow":
1080+
from pandas.core.arrays.arrow.array import to_pyarrow_type
1081+
from pandas.core.arrays.arrow.dtype import ArrowDtype
1082+
from pandas.core.arrays.string_ import StringDtype
1083+
1084+
if isinstance(inferred_dtype, PandasExtensionDtype):
1085+
base_dtype = inferred_dtype.base
1086+
elif isinstance(inferred_dtype, (BaseMaskedDtype, ArrowDtype)):
1087+
base_dtype = inferred_dtype.numpy_dtype
1088+
elif isinstance(inferred_dtype, StringDtype):
1089+
base_dtype = np.dtype(str)
1090+
else:
1091+
# error: Incompatible types in assignment (expression has type
1092+
# "Union[str, Any, dtype[Any], ExtensionDtype]",
1093+
# variable has type "Union[dtype[Any], ExtensionDtype, None]")
1094+
base_dtype = inferred_dtype # type: ignore[assignment]
1095+
pa_type = to_pyarrow_type(base_dtype)
1096+
if pa_type is not None:
1097+
inferred_dtype = ArrowDtype(pa_type)
10661098

10671099
# error: Incompatible return value type (got "Union[str, Union[dtype[Any],
10681100
# ExtensionDtype]]", expected "Union[dtype[Any], ExtensionDtype]")

0 commit comments

Comments
 (0)