Skip to content

Commit 7477fd1

Browse files
Merge branch 'master' of https://github.com/pandas-dev/pandas
2 parents 8db09d0 + 13dc13f commit 7477fd1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+922
-321
lines changed

ci/code_checks.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -292,10 +292,6 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
292292
pytest -q --doctest-modules pandas/core/generic.py
293293
RET=$(($RET + $?)) ; echo $MSG "DONE"
294294

295-
MSG='Doctests groupby.py' ; echo $MSG
296-
pytest -q --doctest-modules pandas/core/groupby/groupby.py -k"-cumcount -describe -pipe"
297-
RET=$(($RET + $?)) ; echo $MSG "DONE"
298-
299295
MSG='Doctests series.py' ; echo $MSG
300296
pytest -q --doctest-modules pandas/core/series.py
301297
RET=$(($RET + $?)) ; echo $MSG "DONE"
@@ -318,6 +314,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
318314
pytest -q --doctest-modules pandas/core/dtypes/
319315
RET=$(($RET + $?)) ; echo $MSG "DONE"
320316

317+
MSG='Doctests groupby' ; echo $MSG
318+
pytest -q --doctest-modules pandas/core/groupby/
319+
RET=$(($RET + $?)) ; echo $MSG "DONE"
320+
321321
MSG='Doctests indexes' ; echo $MSG
322322
pytest -q --doctest-modules pandas/core/indexes/
323323
RET=$(($RET + $?)) ; echo $MSG "DONE"

doc/source/user_guide/io.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,14 +285,18 @@ chunksize : int, default ``None``
285285
Quoting, compression, and file format
286286
+++++++++++++++++++++++++++++++++++++
287287

288-
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``
288+
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``, ``dict``}, default ``'infer'``
289289
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
290290
bz2, zip, or xz if filepath_or_buffer is a string ending in '.gz', '.bz2',
291291
'.zip', or '.xz', respectively, and no decompression otherwise. If using 'zip',
292292
the ZIP file must contain only one data file to be read in.
293-
Set to ``None`` for no decompression.
293+
Set to ``None`` for no decompression. Can also be a dict with key ``'method'``
294+
set to one of {``'zip'``, ``'gzip'``, ``'bz2'``}, and other keys set to
295+
compression settings. As an example, the following could be passed for
296+
faster compression: ``compression={'method': 'gzip', 'compresslevel': 1}``.
294297

295298
.. versionchanged:: 0.24.0 'infer' option added and set to default.
299+
.. versionchanged:: 1.1.0 dict option extended to support ``gzip`` and ``bz2``.
296300
thousands : str, default ``None``
297301
Thousands separator.
298302
decimal : str, default ``'.'``
@@ -3347,6 +3351,12 @@ The compression type can be an explicit parameter or be inferred from the file e
33473351
If 'infer', then use ``gzip``, ``bz2``, ``zip``, or ``xz`` if filename ends in ``'.gz'``, ``'.bz2'``, ``'.zip'``, or
33483352
``'.xz'``, respectively.
33493353

3354+
The compression parameter can also be a ``dict`` in order to pass options to the
3355+
compression protocol. It must have a ``'method'`` key set to the name
3356+
of the compression protocol, which must be one of
3357+
{``'zip'``, ``'gzip'``, ``'bz2'``}. All other key-value pairs are passed to
3358+
the underlying compression library.
3359+
33503360
.. ipython:: python
33513361
33523362
df = pd.DataFrame({
@@ -3383,6 +3393,15 @@ The default is to 'infer':
33833393
rt = pd.read_pickle("s1.pkl.bz2")
33843394
rt
33853395
3396+
Passing options to the compression protocol in order to speed up compression:
3397+
3398+
.. ipython:: python
3399+
3400+
df.to_pickle(
3401+
"data.pkl.gz",
3402+
compression={"method": "gzip", 'compresslevel': 1}
3403+
)
3404+
33863405
.. ipython:: python
33873406
:suppress:
33883407

doc/source/user_guide/timeseries.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -772,6 +772,7 @@ There are several time/date properties that one can access from ``Timestamp`` or
772772
week,"The week ordinal of the year"
773773
dayofweek,"The number of the day of the week with Monday=0, Sunday=6"
774774
weekday,"The number of the day of the week with Monday=0, Sunday=6"
775+
isocalendar,"The ISO 8601 year, week and day of the date"
775776
quarter,"Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, etc."
776777
days_in_month,"The number of days in the month of the datetime"
777778
is_month_start,"Logical indicating if first day of month (defined by frequency)"
@@ -786,6 +787,15 @@ Furthermore, if you have a ``Series`` with datetimelike values, then you can
786787
access these properties via the ``.dt`` accessor, as detailed in the section
787788
on :ref:`.dt accessors<basics.dt_accessors>`.
788789

790+
.. versionadded:: 1.1.0
791+
792+
You may obtain the year, week and day components of the ISO year from the ISO 8601 standard:
793+
794+
.. ipython:: python
795+
796+
idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
797+
idx.to_series().dt.isocalendar
798+
789799
.. _timeseries.offsets:
790800

791801
DateOffset objects

doc/source/whatsnew/v0.4.x.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_04x:
22

3-
v.0.4.1 through v0.4.3 (September 25 - October 9, 2011)
4-
-------------------------------------------------------
3+
Versions 0.4.1 through 0.4.3 (September 25 - October 9, 2011)
4+
-------------------------------------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.5.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11

22
.. _whatsnew_050:
33

4-
v.0.5.0 (October 24, 2011)
5-
--------------------------
4+
Version 0.5.0 (October 24, 2011)
5+
--------------------------------
66

77
{{ header }}
88

doc/source/whatsnew/v0.6.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_060:
22

3-
v.0.6.0 (November 25, 2011)
4-
---------------------------
3+
Version 0.6.0 (November 25, 2011)
4+
---------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v1.1.0.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,16 @@ Other enhancements
126126
- :class:`Series.str` now has a `fullmatch` method that matches a regular expression against the entire string in each row of the series, similar to `re.fullmatch` (:issue:`32806`).
127127
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`)
128128
- :meth:`MultiIndex.union` will now raise `RuntimeWarning` if the object inside are unsortable, pass `sort=False` to suppress this warning (:issue:`33015`)
129+
- :class:`Series.dt` and :class:`DatatimeIndex` now have an `isocalendar` accessor that returns a :class:`DataFrame` with year, week, and day calculated according to the ISO 8601 calendar (:issue:`33206`).
129130
- The :meth:`DataFrame.to_feather` method now supports additional keyword
130131
arguments (e.g. to set the compression) that are added in pyarrow 0.17
131132
(:issue:`33422`).
133+
- :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`,
134+
and :meth:`DataFrame.to_json` now support passing a dict of
135+
compression arguments when using the ``gzip`` and ``bz2`` protocols.
136+
This can be used to set a custom compression level, e.g.,
137+
``df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}``
138+
(:issue:`33196`)
132139

133140
.. ---------------------------------------------------------------------------
134141
@@ -409,7 +416,7 @@ Performance improvements
409416
sparse values from ``scipy.sparse`` matrices using the
410417
:meth:`DataFrame.sparse.from_spmatrix` constructor (:issue:`32821`,
411418
:issue:`32825`, :issue:`32826`, :issue:`32856`, :issue:`32858`).
412-
- Performance improvement in reductions (sum, min, max) for nullable (integer and boolean) dtypes (:issue:`30982`, :issue:`33261`).
419+
- Performance improvement in reductions (sum, prod, min, max) for nullable (integer and boolean) dtypes (:issue:`30982`, :issue:`33261`, :issue:`33442`).
413420

414421

415422
.. ---------------------------------------------------------------------------
@@ -441,6 +448,8 @@ Datetimelike
441448
- Bug in :meth:`DatetimeIndex.searchsorted` not accepting a ``list`` or :class:`Series` as its argument (:issue:`32762`)
442449
- Bug where :meth:`PeriodIndex` raised when passed a :class:`Series` of strings (:issue:`26109`)
443450
- Bug in :class:`Timestamp` arithmetic when adding or subtracting a ``np.ndarray`` with ``timedelta64`` dtype (:issue:`33296`)
451+
- Bug in :meth:`DatetimeIndex.to_period` not infering the frequency when called with no arguments (:issue:`33358`)
452+
444453

445454
Timedelta
446455
^^^^^^^^^
@@ -505,6 +514,7 @@ Indexing
505514
- Bug in :meth:`DataFrame.iloc.__setitem__` creating a new array instead of overwriting ``Categorical`` values in-place (:issue:`32831`)
506515
- Bug in :meth:`DataFrame.copy` _item_cache not invalidated after copy causes post-copy value updates to not be reflected (:issue:`31784`)
507516
- Bug in `Series.__getitem__` with an integer key and a :class:`MultiIndex` with leading integer level failing to raise ``KeyError`` if the key is not present in the first level (:issue:`33355`)
517+
- Bug in :meth:`DataFrame.iloc` when slicing a single column-:class:`DataFrame`` with ``ExtensionDtype`` (e.g. ``df.iloc[:, :1]``) returning an invalid result (:issue:`32957`)
508518

509519
Missing
510520
^^^^^^^
@@ -623,6 +633,7 @@ Other
623633
- Bug in :meth:`DataFrame.to_records` incorrectly losing timezone information in timezone-aware ``datetime64`` columns (:issue:`32535`)
624634
- Fixed :func:`pandas.testing.assert_series_equal` to correctly raise if left object is a different subclass with ``check_series_type=True`` (:issue:`32670`).
625635
- :meth:`IntegerArray.astype` now supports ``datetime64`` dtype (:issue:32538`)
636+
- Getting a missing attribute in a query/eval string raises the correct ``AttributeError`` (:issue:`32408`)
626637
- Fixed bug in :func:`pandas.testing.assert_series_equal` where dtypes were checked for ``Interval`` and ``ExtensionArray`` operands when ``check_dtype`` was ``False`` (:issue:`32747`)
627638
- Bug in :meth:`Series.map` not raising on invalid ``na_action`` (:issue:`32815`)
628639
- Bug in :meth:`DataFrame.__dir__` caused a segfault when using unicode surrogates in a column name (:issue:`25509`)

pandas/_libs/reshape.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ ctypedef fused reshape_t:
3636

3737
@cython.wraparound(False)
3838
@cython.boundscheck(False)
39-
def unstack(reshape_t[:, :] values, uint8_t[:] mask,
39+
def unstack(reshape_t[:, :] values, const uint8_t[:] mask,
4040
Py_ssize_t stride, Py_ssize_t length, Py_ssize_t width,
4141
reshape_t[:, :] new_values, uint8_t[:, :] new_mask):
4242
"""

pandas/_libs/tslibs/ccalendar.pxd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@ from cython cimport Py_ssize_t
22

33
from numpy cimport int64_t, int32_t
44

5+
ctypedef (int32_t, int32_t, int32_t) iso_calendar_t
56

67
cdef int dayofweek(int y, int m, int d) nogil
78
cdef bint is_leapyear(int64_t year) nogil
89
cpdef int32_t get_days_in_month(int year, Py_ssize_t month) nogil
910
cpdef int32_t get_week_of_year(int year, int month, int day) nogil
11+
cpdef iso_calendar_t get_iso_calendar(int year, int month, int day) nogil
1012
cpdef int32_t get_day_of_year(int year, int month, int day) nogil

pandas/_libs/tslibs/ccalendar.pyx

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -150,33 +150,65 @@ cpdef int32_t get_week_of_year(int year, int month, int day) nogil:
150150
-------
151151
week_of_year : int32_t
152152
153+
Notes
154+
-----
155+
Assumes the inputs describe a valid date.
156+
"""
157+
return get_iso_calendar(year, month, day)[1]
158+
159+
160+
@cython.wraparound(False)
161+
@cython.boundscheck(False)
162+
cpdef iso_calendar_t get_iso_calendar(int year, int month, int day) nogil:
163+
"""
164+
Return the year, week, and day of year corresponding to ISO 8601
165+
166+
Parameters
167+
----------
168+
year : int
169+
month : int
170+
day : int
171+
172+
Returns
173+
-------
174+
year : int32_t
175+
week : int32_t
176+
day : int32_t
177+
153178
Notes
154179
-----
155180
Assumes the inputs describe a valid date.
156181
"""
157182
cdef:
158183
int32_t doy, dow
159-
int woy
184+
int32_t iso_year, iso_week
160185

161186
doy = get_day_of_year(year, month, day)
162187
dow = dayofweek(year, month, day)
163188

164189
# estimate
165-
woy = (doy - 1) - dow + 3
166-
if woy >= 0:
167-
woy = woy // 7 + 1
190+
iso_week = (doy - 1) - dow + 3
191+
if iso_week >= 0:
192+
iso_week = iso_week // 7 + 1
168193

169194
# verify
170-
if woy < 0:
171-
if (woy > -2) or (woy == -2 and is_leapyear(year - 1)):
172-
woy = 53
195+
if iso_week < 0:
196+
if (iso_week > -2) or (iso_week == -2 and is_leapyear(year - 1)):
197+
iso_week = 53
173198
else:
174-
woy = 52
175-
elif woy == 53:
199+
iso_week = 52
200+
elif iso_week == 53:
176201
if 31 - day + dow < 3:
177-
woy = 1
202+
iso_week = 1
203+
204+
iso_year = year
205+
if iso_week == 1 and doy > 7:
206+
iso_year += 1
207+
208+
elif iso_week >= 52 and doy < 7:
209+
iso_year -= 1
178210

179-
return woy
211+
return iso_year, iso_week, dow + 1
180212

181213

182214
@cython.wraparound(False)

pandas/_libs/tslibs/fields.pyx

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ from cython import Py_ssize_t
88

99
import numpy as np
1010
cimport numpy as cnp
11-
from numpy cimport ndarray, int64_t, int32_t, int8_t
11+
from numpy cimport ndarray, int64_t, int32_t, int8_t, uint32_t
1212
cnp.import_array()
1313

1414
from pandas._libs.tslibs.ccalendar import (
1515
get_locale_names, MONTHS_FULL, DAYS_FULL, DAY_SECONDS)
1616
from pandas._libs.tslibs.ccalendar cimport (
1717
get_days_in_month, is_leapyear, dayofweek, get_week_of_year,
18-
get_day_of_year)
18+
get_day_of_year, get_iso_calendar, iso_calendar_t)
1919
from pandas._libs.tslibs.np_datetime cimport (
2020
npy_datetimestruct, pandas_timedeltastruct, dt64_to_dtstruct,
2121
td64_to_tdstruct)
@@ -670,3 +670,42 @@ cpdef isleapyear_arr(ndarray years):
670670
np.logical_and(years % 4 == 0,
671671
years % 100 > 0))] = 1
672672
return out.view(bool)
673+
674+
675+
@cython.wraparound(False)
676+
@cython.boundscheck(False)
677+
def build_isocalendar_sarray(const int64_t[:] dtindex):
678+
"""
679+
Given a int64-based datetime array, return the ISO 8601 year, week, and day
680+
as a structured array.
681+
"""
682+
cdef:
683+
Py_ssize_t i, count = len(dtindex)
684+
npy_datetimestruct dts
685+
ndarray[uint32_t] iso_years, iso_weeks, days
686+
iso_calendar_t ret_val
687+
688+
sa_dtype = [
689+
("year", "u4"),
690+
("week", "u4"),
691+
("day", "u4"),
692+
]
693+
694+
out = np.empty(count, dtype=sa_dtype)
695+
696+
iso_years = out["year"]
697+
iso_weeks = out["week"]
698+
days = out["day"]
699+
700+
with nogil:
701+
for i in range(count):
702+
if dtindex[i] == NPY_NAT:
703+
ret_val = 0, 0, 0
704+
else:
705+
dt64_to_dtstruct(dtindex[i], &dts)
706+
ret_val = get_iso_calendar(dts.year, dts.month, dts.day)
707+
708+
iso_years[i] = ret_val[0]
709+
iso_weeks[i] = ret_val[1]
710+
days[i] = ret_val[2]
711+
return out

0 commit comments

Comments
 (0)