Skip to content

Commit 14af646

Browse files
committed
Merge branch 'master' of github.com:pandas-dev/pandas into document-mask-indexing
2 parents 1bd5d9d + 0dfe989 commit 14af646

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+799
-592
lines changed

asv_bench/benchmarks/indexing.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,9 @@ def time_boolean_rows_boolean(self):
158158
class DataFrameNumericIndexing:
159159
def setup(self):
160160
self.idx_dupe = np.array(range(30)) * 99
161-
self.df = DataFrame(np.random.randn(10000, 5))
161+
self.df = DataFrame(np.random.randn(100000, 5))
162162
self.df_dup = concat([self.df, 2 * self.df, 3 * self.df])
163-
self.bool_indexer = [True] * 5000 + [False] * 5000
163+
self.bool_indexer = [True] * 50000 + [False] * 50000
164164

165165
def time_iloc_dups(self):
166166
self.df_dup.iloc[self.idx_dupe]

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Version 1.0
2424
.. toctree::
2525
:maxdepth: 2
2626

27+
v1.0.5
2728
v1.0.4
2829
v1.0.3
2930
v1.0.2

doc/source/whatsnew/v1.0.4.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,4 @@ Bug fixes
4545
Contributors
4646
~~~~~~~~~~~~
4747

48-
.. contributors:: v1.0.3..v1.0.4|HEAD
48+
.. contributors:: v1.0.3..v1.0.4

doc/source/whatsnew/v1.0.5.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
2+
.. _whatsnew_105:
3+
4+
What's new in 1.0.5 (June XX, 2020)
5+
-----------------------------------
6+
7+
These are the changes in pandas 1.0.5. See :ref:`release` for a full changelog
8+
including other versions of pandas.
9+
10+
{{ header }}
11+
12+
.. ---------------------------------------------------------------------------
13+
14+
.. _whatsnew_105.regressions:
15+
16+
Fixed regressions
17+
~~~~~~~~~~~~~~~~~
18+
-
19+
-
20+
21+
.. _whatsnew_105.bug_fixes:
22+
23+
Bug fixes
24+
~~~~~~~~~
25+
-
26+
-
27+
28+
Contributors
29+
~~~~~~~~~~~~
30+
31+
.. contributors:: v1.0.4..v1.0.5|HEAD

doc/source/whatsnew/v1.1.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,8 @@ Backwards incompatible API changes
394394
- :meth:`Series.to_timestamp` now raises a ``TypeError`` if the axis is not a :class:`PeriodIndex`. Previously an ``AttributeError`` was raised (:issue:`33327`)
395395
- :meth:`Series.to_period` now raises a ``TypeError`` if the axis is not a :class:`DatetimeIndex`. Previously an ``AttributeError`` was raised (:issue:`33327`)
396396
- :func: `pandas.api.dtypes.is_string_dtype` no longer incorrectly identifies categorical series as string.
397+
- :func:`read_excel` no longer takes ``**kwds`` arguments. This means that passing in keyword ``chunksize`` now raises a ``TypeError``
398+
(previously raised a ``NotImplementedError``), while passing in keyword ``encoding`` now raises a ``TypeError`` (:issue:`34464`)
397399

398400
``MultiIndex.get_indexer`` interprets `method` argument differently
399401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -727,6 +729,7 @@ Performance improvements
727729
- Performance improvement in arithmetic operations between two :class:`DataFrame` objects (:issue:`32779`)
728730
- Performance improvement in :class:`pandas.core.groupby.RollingGroupby` (:issue:`34052`)
729731
- Performance improvement in arithmetic operations (sub, add, mul, div) for MultiIndex (:issue:`34297`)
732+
- Performance improvement in `DataFrame[bool_indexer]` when `bool_indexer` is a list (:issue:`33924`)
730733

731734
.. ---------------------------------------------------------------------------
732735
@@ -759,6 +762,7 @@ Datetimelike
759762
- Bug in :meth:`DatetimeIndex.to_period` not infering the frequency when called with no arguments (:issue:`33358`)
760763
- Bug in :meth:`DatetimeIndex.tz_localize` incorrectly retaining ``freq`` in some cases where the original freq is no longer valid (:issue:`30511`)
761764
- Bug in :meth:`DatetimeIndex.intersection` losing ``freq`` and timezone in some cases (:issue:`33604`)
765+
- Bug in :meth:`DatetimeIndex.get_indexer` where incorrect output would be returned for mixed datetime-like targets (:issue:`33741`)
762766
- Bug in :class:`DatetimeIndex` addition and subtraction with some types of :class:`DateOffset` objects incorrectly retaining an invalid ``freq`` attribute (:issue:`33779`)
763767
- Bug in :class:`DatetimeIndex` where setting the ``freq`` attribute on an index could silently change the ``freq`` attribute on another index viewing the same data (:issue:`33552`)
764768
- :meth:`DataFrame.min`/:meth:`DataFrame.max` not returning consistent result with :meth:`Series.min`/:meth:`Series.max` when called on objects initialized with empty :func:`pd.to_datetime`
@@ -963,6 +967,7 @@ Sparse
963967
- Creating a :class:`SparseArray` from timezone-aware dtype will issue a warning before dropping timezone information, instead of doing so silently (:issue:`32501`)
964968
- Bug in :meth:`arrays.SparseArray.from_spmatrix` wrongly read scipy sparse matrix (:issue:`31991`)
965969
- Bug in :meth:`Series.sum` with ``SparseArray`` raises ``TypeError`` (:issue:`25777`)
970+
- Bug where :class:`DataFrame` containing :class:`SparseArray` filled with ``NaN`` when indexed by a list-like (:issue:`27781`, :issue:`29563`)
966971
- The repr of :class:`SparseDtype` now includes the repr of its ``fill_value`` attribute. Previously it used ``fill_value``'s string representation (:issue:`34352`)
967972

968973
ExtensionArray
@@ -994,6 +999,7 @@ Other
994999
- Bug in :meth:`DataFrame.plot.scatter` caused an error when plotting variable marker sizes (:issue:`32904`)
9951000
- :class:`IntegerArray` now implements the ``sum`` operation (:issue:`33172`)
9961001
- Bug in :class:`Tick` comparisons raising ``TypeError`` when comparing against timedelta-like objects (:issue:`34088`)
1002+
- Bug in :class:`Tick` multiplication raising ``TypeError`` when multiplying by a float (:issue:`34486`)
9971003

9981004
.. ---------------------------------------------------------------------------
9991005

pandas/_libs/hashtable_func_helper.pxi.in

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,8 @@ cpdef value_count_{{dtype}}({{c_type}}[:] values, bint dropna):
8484
int64_t[:] result_counts
8585
{{endif}}
8686

87-
Py_ssize_t k
87+
# Don't use Py_ssize_t, since table.n_buckets is unsigned
88+
khiter_t k
8889

8990
table = kh_init_{{ttype}}()
9091
{{if dtype == 'object'}}
@@ -132,7 +133,8 @@ def duplicated_{{dtype}}(const {{c_type}}[:] values, object keep='first'):
132133
{{if dtype != 'object'}}
133134
{{dtype}}_t value
134135
{{endif}}
135-
Py_ssize_t k, i, n = len(values)
136+
Py_ssize_t i, n = len(values)
137+
khiter_t k
136138
kh_{{ttype}}_t *table = kh_init_{{ttype}}()
137139
ndarray[uint8_t, ndim=1, cast=True] out = np.empty(n, dtype='bool')
138140

@@ -222,7 +224,8 @@ def ismember_{{dtype}}(const {{c_type}}[:] arr, {{c_type}}[:] values):
222224
boolean ndarry len of (arr)
223225
"""
224226
cdef:
225-
Py_ssize_t i, n, k
227+
Py_ssize_t i, n
228+
khiter_t k
226229
int ret = 0
227230
ndarray[uint8_t] result
228231
{{c_type}} val
@@ -295,7 +298,8 @@ def mode_{{dtype}}({{ctype}}[:] values, bint dropna):
295298
cdef:
296299
int count, max_count = 1
297300
int j = -1 # so you can do +=
298-
Py_ssize_t k
301+
# Don't use Py_ssize_t, since table.n_buckets is unsigned
302+
khiter_t k
299303
kh_{{table_type}}_t *table
300304
ndarray[{{ctype}}] modes
301305

pandas/_libs/lib.pyx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1380,8 +1380,10 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
13801380
return "mixed-integer"
13811381

13821382
elif PyDateTime_Check(val):
1383-
if is_datetime_array(values):
1383+
if is_datetime_array(values, skipna=skipna):
13841384
return "datetime"
1385+
elif is_date_array(values, skipna=skipna):
1386+
return "date"
13851387

13861388
elif PyDate_Check(val):
13871389
if is_date_array(values, skipna=skipna):
@@ -1752,10 +1754,10 @@ cdef class DatetimeValidator(TemporalValidator):
17521754
return is_null_datetime64(value)
17531755

17541756

1755-
cpdef bint is_datetime_array(ndarray values):
1757+
cpdef bint is_datetime_array(ndarray values, bint skipna=True):
17561758
cdef:
17571759
DatetimeValidator validator = DatetimeValidator(len(values),
1758-
skipna=True)
1760+
skipna=skipna)
17591761
return validator.validate(values)
17601762

17611763

pandas/_libs/src/parser/tokenizer.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -709,7 +709,7 @@ int skip_this_line(parser_t *self, int64_t rownum) {
709709
}
710710

711711
int tokenize_bytes(parser_t *self,
712-
size_t line_limit, int64_t start_lines) {
712+
size_t line_limit, uint64_t start_lines) {
713713
int64_t i;
714714
uint64_t slen;
715715
int should_skip;
@@ -1348,7 +1348,7 @@ int parser_trim_buffers(parser_t *self) {
13481348

13491349
int _tokenize_helper(parser_t *self, size_t nrows, int all) {
13501350
int status = 0;
1351-
int64_t start_lines = self->lines;
1351+
uint64_t start_lines = self->lines;
13521352

13531353
if (self->state == FINISHED) {
13541354
return 0;

pandas/_libs/tslibs/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
"OutOfBoundsDatetime",
99
"IncompatibleFrequency",
1010
"Period",
11+
"Resolution",
1112
"Timedelta",
1213
"delta_to_nanoseconds",
1314
"ints_to_pytimedelta",
@@ -20,6 +21,7 @@
2021
from .nattype import NaT, NaTType, iNaT, is_null_datetimelike, nat_strings
2122
from .np_datetime import OutOfBoundsDatetime
2223
from .period import IncompatibleFrequency, Period
24+
from .resolution import Resolution
2325
from .timedeltas import Timedelta, delta_to_nanoseconds, ints_to_pytimedelta
2426
from .timestamps import Timestamp
2527
from .tzconversion import tz_convert_single

pandas/_libs/tslibs/conversion.pxd

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from cpython.datetime cimport datetime
1+
from cpython.datetime cimport datetime, tzinfo
22

3-
from numpy cimport int64_t, int32_t
3+
from numpy cimport int64_t, int32_t, ndarray
44

55
from pandas._libs.tslibs.np_datetime cimport npy_datetimestruct
66

@@ -24,3 +24,5 @@ cdef int64_t get_datetime64_nanos(object val) except? -1
2424

2525
cpdef datetime localize_pydatetime(datetime dt, object tz)
2626
cdef int64_t cast_from_unit(object ts, str unit) except? -1
27+
28+
cpdef ndarray[int64_t] normalize_i8_timestamps(const int64_t[:] stamps, tzinfo tz)

pandas/_libs/tslibs/conversion.pyx

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -763,7 +763,7 @@ cpdef inline datetime localize_pydatetime(datetime dt, object tz):
763763

764764
@cython.wraparound(False)
765765
@cython.boundscheck(False)
766-
def normalize_i8_timestamps(int64_t[:] stamps, object tz):
766+
cpdef ndarray[int64_t] normalize_i8_timestamps(const int64_t[:] stamps, tzinfo tz):
767767
"""
768768
Normalize each of the (nanosecond) timezone aware timestamps in the given
769769
array by rounding down to the beginning of the day (i.e. midnight).
@@ -774,31 +774,6 @@ def normalize_i8_timestamps(int64_t[:] stamps, object tz):
774774
stamps : int64 ndarray
775775
tz : tzinfo or None
776776
777-
Returns
778-
-------
779-
result : int64 ndarray of converted of normalized nanosecond timestamps
780-
"""
781-
cdef:
782-
int64_t[:] result
783-
784-
result = _normalize_local(stamps, tz)
785-
786-
return result.base # .base to access underlying np.ndarray
787-
788-
789-
@cython.wraparound(False)
790-
@cython.boundscheck(False)
791-
cdef int64_t[:] _normalize_local(const int64_t[:] stamps, tzinfo tz):
792-
"""
793-
Normalize each of the (nanosecond) timestamps in the given array by
794-
rounding down to the beginning of the day (i.e. midnight) for the
795-
given timezone `tz`.
796-
797-
Parameters
798-
----------
799-
stamps : int64 ndarray
800-
tz : tzinfo
801-
802777
Returns
803778
-------
804779
result : int64 ndarray of converted of normalized nanosecond timestamps
@@ -813,7 +788,16 @@ cdef int64_t[:] _normalize_local(const int64_t[:] stamps, tzinfo tz):
813788
npy_datetimestruct dts
814789
int64_t delta, local_val
815790

816-
if is_tzlocal(tz):
791+
if tz is None or is_utc(tz):
792+
with nogil:
793+
for i in range(n):
794+
if stamps[i] == NPY_NAT:
795+
result[i] = NPY_NAT
796+
continue
797+
local_val = stamps[i]
798+
dt64_to_dtstruct(local_val, &dts)
799+
result[i] = _normalized_stamp(&dts)
800+
elif is_tzlocal(tz):
817801
for i in range(n):
818802
if stamps[i] == NPY_NAT:
819803
result[i] = NPY_NAT
@@ -843,7 +827,7 @@ cdef int64_t[:] _normalize_local(const int64_t[:] stamps, tzinfo tz):
843827
dt64_to_dtstruct(stamps[i] + deltas[pos[i]], &dts)
844828
result[i] = _normalized_stamp(&dts)
845829

846-
return result
830+
return result.base # `.base` to access underlying ndarray
847831

848832

849833
cdef inline int64_t _normalized_stamp(npy_datetimestruct *dts) nogil:

0 commit comments

Comments
 (0)