Skip to content

Commit 8edc0d7

Browse files
authored
Merge branch 'main' into bug-agg-nonunique-col
2 parents 2700ad4 + 1c2ad16 commit 8edc0d7

33 files changed

+656
-143
lines changed

doc/source/development/maintaining.rst

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ conversation is over. It's typically best to give the reporter some time to
174174
respond or self-close their issue if it's determined that the behavior is not a bug,
175175
or the feature is out of scope. Sometimes reporters just go away though, and
176176
we'll close the issue after the conversation has died.
177+
If you think an issue should be closed but are not completely sure, please apply
178+
the "closing candidate" label and wait for other maintainers to take a look.
177179

178180
.. _maintaining.reviewing:
179181

@@ -252,14 +254,16 @@ Cleaning up old pull requests
252254
Occasionally, contributors are unable to finish off a pull request.
253255
If some time has passed (two weeks, say) since the last review requesting changes,
254256
gently ask if they're still interested in working on this. If another two weeks or
255-
so passes with no response, thank them for their work and close the pull request.
256-
Comment on the original issue that "There's a stalled PR at #1234 that may be
257-
helpful.", and perhaps label the issue as "Good first issue" if the PR was relatively
258-
close to being accepted.
257+
so passes with no response, thank them for their work and then either:
259258

260-
Additionally, core-team members can push to contributors branches. This can be
261-
helpful for pushing an important PR across the line, or for fixing a small
262-
merge conflict.
259+
- close the pull request;
260+
- push to the contributor's branch to push their work over the finish line (if
261+
you're part of ``pandas-core``). This can be helpful for pushing an important PR
262+
across the line, or for fixing a small merge conflict.
263+
264+
If closing the pull request, then please comment on the original issue that
265+
"There's a stalled PR at #1234 that may be helpful.", and perhaps label the issue
266+
as "Good first issue" if the PR was relatively close to being accepted.
263267

264268
Becoming a pandas maintainer
265269
----------------------------
@@ -276,12 +280,13 @@ The required steps for adding a maintainer are:
276280
* ``pandas-core`` is for core team members
277281
* ``pandas-triage`` is for pandas triage members
278282

283+
If adding to ``pandas-core``, there are two additional steps:
284+
279285
3. Add the contributor to the pandas Google group.
280286
4. Create a pull request to add the contributor's GitHub handle to ``pandas-dev/pandas/web/pandas/config.yml``.
281-
5. Create a pull request to add the contributor's name/GitHub handle to the `governance document <https://github.com/pandas-dev/pandas-governance/blob/master/people.md>`_.
282287

283288
The current list of core-team members is at
284-
https://github.com/pandas-dev/pandas-governance/blob/master/people.md
289+
https://github.com/pandas-dev/pandas/blob/main/web/pandas/config.yml
285290

286291

287292
.. _maintaining.merging:
@@ -496,5 +501,5 @@ Post-Release
496501
- Twitter, Mastodon and Telegram
497502

498503

499-
.. _governance documents: https://github.com/pandas-dev/pandas-governance
504+
.. _governance documents: https://github.com/pandas-dev/pandas/blob/main/web/pandas/about/governance.md
500505
.. _list of permissions: https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization

doc/source/getting_started/overview.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,15 +154,15 @@ project and makes it possible to `donate <https://pandas.pydata.org/donate.html>
154154
Project governance
155155
------------------
156156

157-
The governance process that pandas project has used informally since its inception in 2008 is formalized in `Project Governance documents <https://github.com/pandas-dev/pandas-governance>`__.
157+
The governance process that pandas project has used informally since its inception in 2008 is formalized in `Project Governance documents <https://github.com/pandas-dev/pandas/blob/main/web/pandas/about/governance.md>`__.
158158
The documents clarify how decisions are made and how the various elements of our community interact, including the relationship between open source collaborative development and work that may be funded by for-profit or non-profit entities.
159159

160160
Wes McKinney is the Benevolent Dictator for Life (BDFL).
161161

162162
Development team
163163
-----------------
164164

165-
The list of the Core Team members and more detailed information can be found on the `people’s page <https://github.com/pandas-dev/pandas-governance/blob/master/people.md>`__ of the governance repo.
165+
The list of the Core Team members and more detailed information can be found on the `pandas website <https://pandas.pydata.org/about/team.html>`__.
166166

167167

168168
Institutional partners

doc/source/whatsnew/v2.1.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Other enhancements
3838
- Let :meth:`DataFrame.to_feather` accept a non-default :class:`Index` and non-string column names (:issue:`51787`)
3939
- :class:`api.extensions.ExtensionArray` now has a :meth:`~api.extensions.ExtensionArray.map` method (:issue:`51809`)
4040
- Improve error message when having incompatible columns using :meth:`DataFrame.merge` (:issue:`51861`)
41+
- Added to the escape mode "latex-math" preserving without escaping all characters between "\(" and "\)" in formatter (:issue:`51903`)
4142
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`)
4243
- :meth:`DataFrame.applymap` now uses the :meth:`~api.extensions.ExtensionArray.map` method of underlying :class:`api.extensions.ExtensionArray` instances (:issue:`52219`)
4344
- :meth:`arrays.SparseArray.map` now supports ``na_action`` (:issue:`52096`).
@@ -122,6 +123,7 @@ Deprecations
122123
- Deprecated 'method', 'limit', and 'fill_axis' keywords in :meth:`DataFrame.align` and :meth:`Series.align`, explicitly call ``fillna`` on the alignment results instead (:issue:`51856`)
123124
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
124125
- Deprecated the 'axis' keyword in :meth:`.GroupBy.idxmax`, :meth:`.GroupBy.idxmin`, :meth:`.GroupBy.fillna`, :meth:`.GroupBy.take`, :meth:`.GroupBy.skew`, :meth:`.GroupBy.rank`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cummax`, :meth:`.GroupBy.cummin`, :meth:`.GroupBy.pct_change`, :meth:`GroupBy.diff`, :meth:`.GroupBy.shift`, and :meth:`DataFrameGroupBy.corrwith`; for ``axis=1`` operate on the underlying :class:`DataFrame` instead (:issue:`50405`, :issue:`51046`)
126+
- Deprecated passing a dictionary to :meth:`.SeriesGroupBy.agg`; pass a list of aggregations instead (:issue:`50684`)
125127
- Deprecated logical operations (``|``, ``&``, ``^``) between pandas objects and dtype-less sequences (e.g. ``list``, ``tuple``), wrap a sequence in a :class:`Series` or numpy array before operating instead (:issue:`51521`)
126128
- Deprecated :meth:`DataFrame.swapaxes` and :meth:`Series.swapaxes`, use :meth:`DataFrame.transpose` or :meth:`Series.transpose` instead (:issue:`51946`)
127129
- Deprecated parameter ``convert_type`` in :meth:`Series.apply` (:issue:`52140`)
@@ -181,6 +183,7 @@ Timezones
181183
Numeric
182184
^^^^^^^
183185
- Bug in :meth:`Series.corr` and :meth:`Series.cov` raising ``AttributeError`` for masked dtypes (:issue:`51422`)
186+
- Bug in :meth:`DataFrame.corrwith` raising ``NotImplementedError`` for pyarrow-backed dtypes (:issue:`52314`)
184187
-
185188

186189
Conversion

pandas/_libs/groupby.pyi

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,15 @@ def group_var(
8888
is_datetimelike: bool = ...,
8989
name: str = ...,
9090
) -> None: ...
91+
def group_skew(
92+
out: np.ndarray, # float64_t[:, ::1]
93+
counts: np.ndarray, # int64_t[::1]
94+
values: np.ndarray, # ndarray[float64_T, ndim=2]
95+
labels: np.ndarray, # const intp_t[::1]
96+
mask: np.ndarray | None = ...,
97+
result_mask: np.ndarray | None = ...,
98+
skipna: bool = ...,
99+
) -> None: ...
91100
def group_mean(
92101
out: np.ndarray, # floating[:, ::1]
93102
counts: np.ndarray, # int64_t[::1]

pandas/_libs/groupby.pyx

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -891,6 +891,94 @@ def group_var(
891891
out[i, j] /= (ct - ddof)
892892

893893

894+
@cython.wraparound(False)
895+
@cython.boundscheck(False)
896+
@cython.cdivision(True)
897+
@cython.cpow
898+
def group_skew(
899+
float64_t[:, ::1] out,
900+
int64_t[::1] counts,
901+
ndarray[float64_t, ndim=2] values,
902+
const intp_t[::1] labels,
903+
const uint8_t[:, ::1] mask=None,
904+
uint8_t[:, ::1] result_mask=None,
905+
bint skipna=True,
906+
) -> None:
907+
cdef:
908+
Py_ssize_t i, j, N, K, lab, ngroups = len(counts)
909+
int64_t[:, ::1] nobs
910+
Py_ssize_t len_values = len(values), len_labels = len(labels)
911+
bint isna_entry, uses_mask = mask is not None
912+
float64_t[:, ::1] M1, M2, M3
913+
float64_t delta, delta_n, term1, val
914+
int64_t n1, n
915+
float64_t ct
916+
917+
if len_values != len_labels:
918+
raise ValueError("len(index) != len(labels)")
919+
920+
nobs = np.zeros((<object>out).shape, dtype=np.int64)
921+
922+
# M1, M2, and M3 correspond to 1st, 2nd, and third Moments
923+
M1 = np.zeros((<object>out).shape, dtype=np.float64)
924+
M2 = np.zeros((<object>out).shape, dtype=np.float64)
925+
M3 = np.zeros((<object>out).shape, dtype=np.float64)
926+
927+
N, K = (<object>values).shape
928+
929+
out[:, :] = 0.0
930+
931+
with nogil:
932+
for i in range(N):
933+
lab = labels[i]
934+
if lab < 0:
935+
continue
936+
937+
counts[lab] += 1
938+
939+
for j in range(K):
940+
val = values[i, j]
941+
942+
if uses_mask:
943+
isna_entry = mask[i, j]
944+
else:
945+
isna_entry = _treat_as_na(val, False)
946+
947+
if not isna_entry:
948+
# Based on RunningSats::Push from
949+
# https://www.johndcook.com/blog/skewness_kurtosis/
950+
n1 = nobs[lab, j]
951+
n = n1 + 1
952+
953+
nobs[lab, j] = n
954+
delta = val - M1[lab, j]
955+
delta_n = delta / n
956+
term1 = delta * delta_n * n1
957+
958+
M1[lab, j] += delta_n
959+
M3[lab, j] += term1 * delta_n * (n - 2) - 3 * delta_n * M2[lab, j]
960+
M2[lab, j] += term1
961+
elif not skipna:
962+
M1[lab, j] = NaN
963+
M2[lab, j] = NaN
964+
M3[lab, j] = NaN
965+
966+
for i in range(ngroups):
967+
for j in range(K):
968+
ct = <float64_t>nobs[i, j]
969+
if ct < 3:
970+
if result_mask is not None:
971+
result_mask[i, j] = 1
972+
out[i, j] = NaN
973+
elif M2[i, j] == 0:
974+
out[i, j] = 0
975+
else:
976+
out[i, j] = (
977+
(ct * (ct - 1) ** 0.5 / (ct - 2))
978+
* (M3[i, j] / M2[i, j] ** 1.5)
979+
)
980+
981+
894982
@cython.wraparound(False)
895983
@cython.boundscheck(False)
896984
def group_mean(

pandas/_libs/lib.pyx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3060,6 +3060,9 @@ def dtypes_all_equal(list types not None) -> bool:
30603060
"""
30613061
first = types[0]
30623062
for t in types[1:]:
3063+
if t is first:
3064+
# Fastpath can provide a nice boost for EADtypes
3065+
continue
30633066
try:
30643067
if not t == first:
30653068
return False

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1015,7 +1015,7 @@ cdef class _Timestamp(ABCTimestamp):
10151015
base_ts = "microseconds" if timespec == "nanoseconds" else timespec
10161016
base = super(_Timestamp, self).isoformat(sep=sep, timespec=base_ts)
10171017
# We need to replace the fake year 1970 with our real year
1018-
base = f"{self.year}-" + base.split("-", 1)[1]
1018+
base = f"{self.year:04d}-" + base.split("-", 1)[1]
10191019

10201020
if self.nanosecond == 0 and timespec != "nanoseconds":
10211021
return base

pandas/core/algorithms.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1666,7 +1666,11 @@ def union_with_duplicates(
16661666
lvals = lvals._values
16671667
if isinstance(rvals, ABCIndex):
16681668
rvals = rvals._values
1669-
unique_vals = unique(concat_compat([lvals, rvals]))
1669+
# error: List item 0 has incompatible type "Union[ExtensionArray,
1670+
# ndarray[Any, Any], Index]"; expected "Union[ExtensionArray,
1671+
# ndarray[Any, Any]]"
1672+
combined = concat_compat([lvals, rvals]) # type: ignore[list-item]
1673+
unique_vals = unique(combined)
16701674
unique_vals = ensure_wrapped_if_datetimelike(unique_vals)
16711675
repeats = final_count.reindex(unique_vals).values
16721676
return np.repeat(unique_vals, repeats)

pandas/core/dtypes/cast.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
TYPE_CHECKING,
1111
Any,
1212
Literal,
13+
Sequence,
1314
Sized,
1415
TypeVar,
1516
cast,
@@ -1317,7 +1318,7 @@ def find_result_type(left: ArrayLike, right: Any) -> DtypeObj:
13171318

13181319

13191320
def common_dtype_categorical_compat(
1320-
objs: list[Index | ArrayLike], dtype: DtypeObj
1321+
objs: Sequence[Index | ArrayLike], dtype: DtypeObj
13211322
) -> DtypeObj:
13221323
"""
13231324
Update the result of find_common_type to account for NAs in a Categorical.

0 commit comments

Comments
 (0)