Skip to content

Commit e161cf6

Browse files
author
Marco Gorelli
committed
Fetch latest whatsnew edit from upstream master
2 parents 8865cad + 0a7bb2a commit e161cf6

File tree

24 files changed

+250
-95
lines changed

24 files changed

+250
-95
lines changed

doc/source/user_guide/io.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2828
:delim: ;
2929

3030
text;`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__;:ref:`read_csv<io.read_csv_table>`;:ref:`to_csv<io.store_in_csv>`
31+
text;`TXT <https://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/introduction/getting_started/configuring_fixed_width_text_file_formats.htm>`__;:ref:`read_fwf<io.fwf_reader>`
3132
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
3233
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
3334
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
@@ -1372,6 +1373,7 @@ should pass the ``escapechar`` option:
13721373
print(data)
13731374
pd.read_csv(StringIO(data), escapechar='\\')
13741375
1376+
.. _io.fwf_reader:
13751377
.. _io.fwf:
13761378

13771379
Files with fixed width columns

doc/source/whatsnew/v0.25.1.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Categorical
3131
Datetimelike
3232
^^^^^^^^^^^^
3333
- Bug in :func:`to_datetime` where passing a timezone-naive :class:`DatetimeArray` or :class:`DatetimeIndex` and ``utc=True`` would incorrectly return a timezone-naive result (:issue:`27733`)
34-
-
34+
- Bug in :meth:`Period.to_timestamp` where a :class:`Period` outside the :class:`Timestamp` implementation bounds (roughly 1677-09-21 to 2262-04-11) would return an incorrect :class:`Timestamp` instead of raising ``OutOfBoundsDatetime`` (:issue:`19643`)
3535
-
3636
-
3737

@@ -53,7 +53,7 @@ Numeric
5353
^^^^^^^
5454
- Bug in :meth:`Series.interpolate` when using a timezone aware :class:`DatetimeIndex` (:issue:`27548`)
5555
- Bug when printing negative floating point complex numbers would raise an ``IndexError`` (:issue:`27484`)
56-
-
56+
- Bug where :class:`DataFrame` arithmetic operators such as :meth:`DataFrame.mul` with a :class:`Series` with axis=1 would raise an ``AttributeError`` on :class:`DataFrame` larger than the minimum threshold to invoke numexpr (:issue:`27636`)
5757
-
5858

5959
Conversion
@@ -84,6 +84,7 @@ Indexing
8484
- Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`)
8585
- Break reference cycle involving :class:`Index` and other index classes to allow garbage collection of index objects without running the GC. (:issue:`27585`, :issue:`27840`)
8686
- Fix regression in assigning values to a single column of a DataFrame with a ``MultiIndex`` columns (:issue:`27841`).
87+
- Fix regression in ``.ix`` fallback with an ``IntervalIndex`` (:issue:`27865`).
8788
- When using :meth:`DataFrame.explode`, don't duplicate entire exploded column when joining back with original frame (:issue:`28005`).
8889

8990
Missing
@@ -102,7 +103,6 @@ MultiIndex
102103

103104
I/O
104105
^^^
105-
106106
- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
107107
- Better error message when a negative header is passed in :func:`pandas.read_csv` (:issue:`27779`)
108108
-
@@ -159,6 +159,14 @@ Other
159159
-
160160
-
161161

162+
I/O and LZMA
163+
~~~~~~~~~~~~
164+
165+
Some users may unknowingly have an incomplete Python installation, which lacks the `lzma` module from the standard library. In this case, `import pandas` failed due to an `ImportError` (:issue: `27575`).
166+
Pandas will now warn, rather than raising an `ImportError` if the `lzma` module is not present. Any subsequent attempt to use `lzma` methods will raise a `RuntimeError`.
167+
A possible fix for the lack of the `lzma` module is to ensure you have the necessary libraries and then re-install Python.
168+
For example, on MacOS installing Python with `pyenv` may lead to an incomplete Python installation due to unmet system dependencies at compilation time (like `xz`). Compilation will succeed, but Python might fail at run time. The issue can be solved by installing the necessary dependencies and then re-installing Python.
169+
162170
.. _whatsnew_0.251.contributors:
163171

164172
Contributors

doc/source/whatsnew/v1.0.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ MultiIndex
158158
I/O
159159
^^^
160160

161-
-
161+
- :meth:`read_csv` now accepts binary mode file buffers when using the Python csv engine (:issue:`23779`)
162162
-
163163

164164
Plotting

pandas/_libs/parsers.pyx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
# See LICENSE for the license
33
import bz2
44
import gzip
5-
import lzma
65
import os
76
import sys
87
import time
@@ -59,9 +58,12 @@ from pandas.core.arrays import Categorical
5958
from pandas.core.dtypes.concat import union_categoricals
6059
import pandas.io.common as icom
6160

61+
from pandas.compat import _import_lzma, _get_lzma_file
6262
from pandas.errors import (ParserError, DtypeWarning,
6363
EmptyDataError, ParserWarning)
6464

65+
lzma = _import_lzma()
66+
6567
# Import CParserError as alias of ParserError for backwards compatibility.
6668
# Ultimately, we want to remove this import. See gh-12665 and gh-14479.
6769
CParserError = ParserError
@@ -645,9 +647,9 @@ cdef class TextReader:
645647
'zip file %s', str(zip_names))
646648
elif self.compression == 'xz':
647649
if isinstance(source, str):
648-
source = lzma.LZMAFile(source, 'rb')
650+
source = _get_lzma_file(lzma)(source, 'rb')
649651
else:
650-
source = lzma.LZMAFile(filename=source)
652+
source = _get_lzma_file(lzma)(filename=source)
651653
else:
652654
raise ValueError('Unrecognized compression type: %s' %
653655
self.compression)

pandas/_libs/tslibs/period.pyx

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ PyDateTime_IMPORT
2121

2222
from pandas._libs.tslibs.np_datetime cimport (
2323
npy_datetimestruct, dtstruct_to_dt64, dt64_to_dtstruct,
24-
pandas_datetime_to_datetimestruct, NPY_DATETIMEUNIT, NPY_FR_D)
24+
pandas_datetime_to_datetimestruct, check_dts_bounds,
25+
NPY_DATETIMEUNIT, NPY_FR_D)
2526

2627
cdef extern from "src/datetime/np_datetime.h":
2728
int64_t npy_datetimestruct_to_datetime(NPY_DATETIMEUNIT fr,
@@ -1011,7 +1012,7 @@ def dt64arr_to_periodarr(int64_t[:] dtarr, int freq, tz=None):
10111012

10121013
@cython.wraparound(False)
10131014
@cython.boundscheck(False)
1014-
def periodarr_to_dt64arr(int64_t[:] periodarr, int freq):
1015+
def periodarr_to_dt64arr(const int64_t[:] periodarr, int freq):
10151016
"""
10161017
Convert array to datetime64 values from a set of ordinals corresponding to
10171018
periods per period convention.
@@ -1024,9 +1025,8 @@ def periodarr_to_dt64arr(int64_t[:] periodarr, int freq):
10241025

10251026
out = np.empty(l, dtype='i8')
10261027

1027-
with nogil:
1028-
for i in range(l):
1029-
out[i] = period_ordinal_to_dt64(periodarr[i], freq)
1028+
for i in range(l):
1029+
out[i] = period_ordinal_to_dt64(periodarr[i], freq)
10301030

10311031
return out.base # .base to access underlying np.ndarray
10321032

@@ -1179,14 +1179,15 @@ cpdef int64_t period_ordinal(int y, int m, int d, int h, int min,
11791179
return get_period_ordinal(&dts, freq)
11801180

11811181

1182-
cpdef int64_t period_ordinal_to_dt64(int64_t ordinal, int freq) nogil:
1182+
cdef int64_t period_ordinal_to_dt64(int64_t ordinal, int freq) except? -1:
11831183
cdef:
11841184
npy_datetimestruct dts
11851185

11861186
if ordinal == NPY_NAT:
11871187
return NPY_NAT
11881188

11891189
get_date_info(ordinal, freq, &dts)
1190+
check_dts_bounds(&dts)
11901191
return dtstruct_to_dt64(&dts)
11911192

11921193

pandas/compat/__init__.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import platform
1111
import struct
1212
import sys
13+
import warnings
1314

1415
PY35 = sys.version_info[:2] == (3, 5)
1516
PY36 = sys.version_info >= (3, 6)
@@ -65,3 +66,32 @@ def is_platform_mac():
6566

6667
def is_platform_32bit():
6768
return struct.calcsize("P") * 8 < 64
69+
70+
71+
def _import_lzma():
72+
"""Attempts to import lzma, warning the user when lzma is not available.
73+
"""
74+
try:
75+
import lzma
76+
77+
return lzma
78+
except ImportError:
79+
msg = (
80+
"Could not import the lzma module. "
81+
"Your installed Python is incomplete. "
82+
"Attempting to use lzma compression will result in a RuntimeError."
83+
)
84+
warnings.warn(msg)
85+
86+
87+
def _get_lzma_file(lzma):
88+
"""Returns the lzma method LZMAFile when the module was correctly imported.
89+
Otherwise, raises a RuntimeError.
90+
"""
91+
if lzma is None:
92+
raise RuntimeError(
93+
"lzma module not available. "
94+
"A Python re-install with the proper "
95+
"dependencies might be required to solve this issue."
96+
)
97+
return lzma.LZMAFile

pandas/core/arrays/sparse.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
)
4040
from pandas.core.dtypes.dtypes import register_extension_dtype
4141
from pandas.core.dtypes.generic import (
42+
ABCDataFrame,
4243
ABCIndexClass,
4344
ABCSeries,
4445
ABCSparseArray,
@@ -1735,13 +1736,15 @@ def sparse_unary_method(self):
17351736

17361737
@classmethod
17371738
def _create_arithmetic_method(cls, op):
1738-
def sparse_arithmetic_method(self, other):
1739-
op_name = op.__name__
1739+
op_name = op.__name__
17401740

1741-
if isinstance(other, (ABCSeries, ABCIndexClass)):
1741+
def sparse_arithmetic_method(self, other):
1742+
if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
17421743
# Rely on pandas to dispatch to us.
17431744
return NotImplemented
17441745

1746+
other = lib.item_from_zerodim(other)
1747+
17451748
if isinstance(other, SparseArray):
17461749
return _sparse_array_op(self, other, op, op_name)
17471750

pandas/core/computation/expressions.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,16 +76,17 @@ def _can_use_numexpr(op, op_str, a, b, dtype_check):
7676

7777
# required min elements (otherwise we are adding overhead)
7878
if np.prod(a.shape) > _MIN_ELEMENTS:
79-
8079
# check for dtype compatibility
8180
dtypes = set()
8281
for o in [a, b]:
83-
if hasattr(o, "dtypes"):
82+
# Series implements dtypes, check for dimension count as well
83+
if hasattr(o, "dtypes") and o.ndim > 1:
8484
s = o.dtypes.value_counts()
8585
if len(s) > 1:
8686
return False
8787
dtypes |= set(s.index.astype(str))
88-
elif isinstance(o, np.ndarray):
88+
# ndarray and Series Case
89+
elif hasattr(o, "dtype"):
8990
dtypes |= {o.dtype.name}
9091

9192
# allowed are a superset

pandas/core/indexes/base.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2325,7 +2325,10 @@ def __sub__(self, other):
23252325
return Index(np.array(self) - other)
23262326

23272327
def __rsub__(self, other):
2328-
return Index(other - np.array(self))
2328+
# wrap Series to ensure we pin name correctly
2329+
from pandas import Series
2330+
2331+
return Index(other - Series(self))
23292332

23302333
def __and__(self, other):
23312334
return self.intersection(other)

pandas/core/indexing.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,14 +124,17 @@ def __getitem__(self, key):
124124
key = tuple(com.apply_if_callable(x, self.obj) for x in key)
125125
try:
126126
values = self.obj._get_value(*key)
127-
except (KeyError, TypeError, InvalidIndexError):
127+
except (KeyError, TypeError, InvalidIndexError, AttributeError):
128128
# TypeError occurs here if the key has non-hashable entries,
129129
# generally slice or list.
130130
# TODO(ix): most/all of the TypeError cases here are for ix,
131131
# so this check can be removed once ix is removed.
132132
# The InvalidIndexError is only catched for compatibility
133133
# with geopandas, see
134134
# https://github.com/pandas-dev/pandas/issues/27258
135+
# TODO: The AttributeError is for IntervalIndex which
136+
# incorrectly implements get_value, see
137+
# https://github.com/pandas-dev/pandas/issues/27865
135138
pass
136139
else:
137140
if is_scalar(values):

pandas/core/ops/__init__.py

Lines changed: 11 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@
1717
from pandas.core.dtypes.common import (
1818
ensure_object,
1919
is_bool_dtype,
20-
is_categorical_dtype,
2120
is_datetime64_dtype,
22-
is_datetime64tz_dtype,
2321
is_datetimelike_v_numeric,
2422
is_extension_array_dtype,
2523
is_integer_dtype,
@@ -32,6 +30,7 @@
3230
ABCDataFrame,
3331
ABCDatetimeArray,
3432
ABCDatetimeIndex,
33+
ABCExtensionArray,
3534
ABCIndexClass,
3635
ABCSeries,
3736
ABCSparseSeries,
@@ -699,42 +698,17 @@ def wrapper(self, other, axis=None):
699698

700699
if isinstance(other, ABCSeries) and not self._indexed_same(other):
701700
raise ValueError("Can only compare identically-labeled Series objects")
702-
elif (
703-
is_list_like(other)
704-
and len(other) != len(self)
705-
and not isinstance(other, (set, frozenset))
706-
):
707-
raise ValueError("Lengths must match")
708701

709-
elif isinstance(other, (np.ndarray, ABCIndexClass, ABCSeries)):
702+
elif isinstance(
703+
other, (np.ndarray, ABCExtensionArray, ABCIndexClass, ABCSeries)
704+
):
710705
# TODO: make this treatment consistent across ops and classes.
711706
# We are not catching all listlikes here (e.g. frozenset, tuple)
712707
# The ambiguous case is object-dtype. See GH#27803
713708
if len(self) != len(other):
714709
raise ValueError("Lengths must match to compare")
715710

716-
if is_categorical_dtype(self):
717-
# Dispatch to Categorical implementation; CategoricalIndex
718-
# behavior is non-canonical GH#19513
719-
res_values = dispatch_to_extension_op(op, self, other)
720-
721-
elif is_datetime64_dtype(self) or is_datetime64tz_dtype(self):
722-
# Dispatch to DatetimeIndex to ensure identical
723-
# Series/Index behavior
724-
from pandas.core.arrays import DatetimeArray
725-
726-
res_values = dispatch_to_extension_op(op, DatetimeArray(self), other)
727-
728-
elif is_timedelta64_dtype(self):
729-
from pandas.core.arrays import TimedeltaArray
730-
731-
res_values = dispatch_to_extension_op(op, TimedeltaArray(self), other)
732-
733-
elif is_extension_array_dtype(self) or (
734-
is_extension_array_dtype(other) and not is_scalar(other)
735-
):
736-
# Note: the `not is_scalar(other)` condition rules out
737-
# e.g. other == "category"
711+
if should_extension_dispatch(self, other):
738712
res_values = dispatch_to_extension_op(op, self, other)
739713

740714
elif is_scalar(other) and isna(other):
@@ -756,9 +730,12 @@ def wrapper(self, other, axis=None):
756730
)
757731

758732
result = self._constructor(res_values, index=self.index)
759-
# rename is needed in case res_name is None and result.name
760-
# is not.
761-
return finalizer(result).rename(res_name)
733+
result = finalizer(result)
734+
735+
# Set the result's name after finalizer is called because finalizer
736+
# would set it back to self.name
737+
result.name = res_name
738+
return result
762739

763740
wrapper.__name__ = op_name
764741
return wrapper

pandas/core/ops/array_ops.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,9 @@ def masked_arith_op(x, y, op):
7474
result[mask] = op(xrav[mask], yrav[mask])
7575

7676
else:
77-
assert is_scalar(y), type(y)
78-
assert isinstance(x, np.ndarray), type(x)
77+
if not is_scalar(y):
78+
raise TypeError(type(y))
79+
7980
# mask is only meaningful for x
8081
result = np.empty(x.size, dtype=x.dtype)
8182
mask = notna(xrav)

0 commit comments

Comments
 (0)