Skip to content

CLN: Remove MultiIndex._get_grouper_for_level #49597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b08af20
CLN: deleted function
codamuse Nov 9, 2022
57b212b
raise notImplementedError
codamuse Nov 10, 2022
42c897d
check for _is_multi in base class and @final _get_grouper_by_level
codamuse Nov 10, 2022
1d9f33e
tweak NotImplementedError message
codamuse Nov 10, 2022
8677352
STYLE: fix pylint useless-else-on-loop warnings (#49595)
Moisan Nov 9, 2022
5cbe002
STYLE enable pylint: chained-comparison (#49586)
natmokval Nov 9, 2022
6c0d3fb
PERF: df.groupby(categorical) (#49596)
lukemanley Nov 9, 2022
bbeef69
REF: tighter typing in constructor functions (#49591)
jbrockmendel Nov 9, 2022
9236857
CLN: assorted (#49590)
jbrockmendel Nov 9, 2022
63a9350
API: Index(object_dtype_bool_ndarray) retain object dtype (#49594)
jbrockmendel Nov 9, 2022
9adaf8c
BUG: Series(index=[]) should have dtype=object (#49574)
topper-123 Nov 9, 2022
8e0aa38
CLN: collect fastpath in Series.__init__ (#49575)
topper-123 Nov 9, 2022
3d92ac3
BUG/PERF: MultiIndex.value_counts returning flat index (#49558)
lukemanley Nov 9, 2022
a366e83
CI: Change flaky to_excel test to compare DataFrames (#49509)
mroeschke Nov 9, 2022
5a0fa14
STYLE Enable Pylint statement import-self (#49601)
uzzell Nov 9, 2022
f9dc98b
TST: MultiIndex.get_indexer with na/missing (#48877)
lukemanley Nov 9, 2022
f7f0617
DEPR: Enforce DataFrame(list_with_categorical) deprecation (#49592)
jbrockmendel Nov 9, 2022
a93ec96
DEPR: Enforce deprecation of numeric_only=None in DataFrame aggregati…
rhshadrach Nov 9, 2022
8514362
add/timedeltas-seconds-documentation (#49584)
joaopmjm Nov 10, 2022
2088f0e
REF: remove infer_datetimelike_array (#49608)
jbrockmendel Nov 10, 2022
22497ef
DEPR: Enforce deprecations in indexes/datetimes.py (#49607)
mroeschke Nov 10, 2022
d05207a
TST/CI: Follow up fix test_write_fspath_all (#49621)
mroeschke Nov 10, 2022
74cd050
CLN: test_nanops.py (#49423)
mroeschke Nov 10, 2022
0533e09
REGR: Better warning in pivot_table when dropping nuisance columns (#…
rhshadrach Nov 10, 2022
9d15690
REGR: MultiIndex.join does not work for ea dtypes (#49284)
phofl Nov 10, 2022
a835fba
BUG: groupby with sort=False still sorts an ordered categorical (#49613)
rhshadrach Nov 10, 2022
07dba4f
BUG: date_range with freq="C" (business days) return value changed on…
douglaslohmann Nov 10, 2022
0daeb6a
API: make Timestamp/Timedelta _as_unit public as_unit (#48819)
jbrockmendel Nov 10, 2022
678b46a
DEPR: Enforce Series(float_with_nan, dtype=inty) (#49605)
jbrockmendel Nov 10, 2022
56cef58
DEPR: Disallow missing nested label when indexing MultiIndex level (#…
mroeschke Nov 11, 2022
36936a3
BUG: groupby.nth should be a filter (#49262)
rhshadrach Nov 11, 2022
0c55e18
CI: Updating website sync to new server (#49614)
datapythonista Nov 11, 2022
4cd9b6a
for #49638 updated the doc (#49639)
ramvikrams Nov 11, 2022
72b92d3
for gh-49508 changing Doc for DataFrame.astype (#49556)
ramvikrams Nov 11, 2022
f52331f
DEPR: Remove df.reduction(level) (#49611)
mroeschke Nov 11, 2022
a38a34f
DEPR: Enforce default of numeric_only=False in DataFrame methods (#49…
rhshadrach Nov 11, 2022
0541b89
STYLE: fix pylint reimported warnings (#49645)
Moisan Nov 11, 2022
133c5f0
remove vestiges of MultiIndex grouper
codamuse Nov 12, 2022
338b8f6
revert broader changes on CI fails
codamuse Nov 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/docbuild-and-upload.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,22 +64,22 @@ jobs:
mkdir -m 700 -p ~/.ssh
echo "${{ secrets.server_ssh_key }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBE1Kkopomm7FHG5enATf7SgnpICZ4W2bw+Ho+afqin+w7sMcrsa0je7sbztFAV8YchDkiBKnWTG4cRT+KZgZCaY=" > ~/.ssh/known_hosts
echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFjYkJBk7sos+r7yATODogQc3jUdW1aascGpyOD4bohj8dWjzwLJv/OJ/fyOQ5lmj81WKDk67tGtqNJYGL9acII=" > ~/.ssh/known_hosts
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))

- name: Copy cheatsheets into site directory
run: cp doc/cheatsheet/Pandas_Cheat_Sheet* web/build/

- name: Upload web
run: rsync -az --delete --exclude='pandas-docs' --exclude='docs' web/build/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas
run: rsync -az --delete --exclude='pandas-docs' --exclude='docs' web/build/ web@${{ secrets.server_ip }}:/var/www/html
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

- name: Upload dev docs
run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/dev
run: rsync -az --delete doc/build/html/ web@${{ secrets.server_ip }}:/var/www/html/pandas-docs/dev
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

- name: Upload prod docs
run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/version/${GITHUB_REF_NAME:1}
run: rsync -az --delete doc/build/html/ web@${{ secrets.server_ip }}:/var/www/html/pandas-docs/version/${GITHUB_REF_NAME:1}
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')

- name: Move docs into site directory
Expand Down
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/frame_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -454,10 +454,10 @@ def setup(self, axis):
)

def time_count_level_multi(self, axis):
self.df.count(axis=axis, level=1)
self.df.count(axis=axis)

def time_count_level_mixed_dtypes_multi(self, axis):
self.df_mixed.count(axis=axis, level=1)
self.df_mixed.count(axis=axis)


class Apply:
Expand Down
36 changes: 20 additions & 16 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,31 +600,35 @@ def time_frame_agg(self, dtype, method):


class Cumulative:
param_names = ["dtype", "method"]
param_names = ["dtype", "method", "with_nans"]
params = [
["float64", "int64", "Float64", "Int64"],
["cummin", "cummax", "cumsum"],
[True, False],
]

def setup(self, dtype, method):
def setup(self, dtype, method, with_nans):
if with_nans and dtype == "int64":
raise NotImplementedError("Construction of df would raise")

N = 500_000
vals = np.random.randint(-10, 10, (N, 5))
null_vals = vals.astype(float, copy=True)
null_vals[::2, :] = np.nan
null_vals[::3, :] = np.nan
df = DataFrame(vals, columns=list("abcde"), dtype=dtype)
null_df = DataFrame(null_vals, columns=list("abcde"), dtype=dtype)
keys = np.random.randint(0, 100, size=N)
df["key"] = keys
null_df["key"] = keys
self.df = df
self.null_df = null_df
vals = np.random.randint(-10, 10, (N, 5))

def time_frame_transform(self, dtype, method):
self.df.groupby("key").transform(method)
if with_nans:
null_vals = vals.astype(float, copy=True)
null_vals[::2, :] = np.nan
null_vals[::3, :] = np.nan
df = DataFrame(null_vals, columns=list("abcde"), dtype=dtype)
df["key"] = keys
self.df = df
else:
df = DataFrame(vals, columns=list("abcde")).astype(dtype, copy=False)
df["key"] = keys
self.df = df

def time_frame_transform_many_nulls(self, dtype, method):
self.null_df.groupby("key").transform(method)
def time_frame_transform(self, dtype, method, with_nans):
self.df.groupby("key").transform(method)


class RankWithTies:
Expand Down
20 changes: 10 additions & 10 deletions asv_bench/benchmarks/stat_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ def time_op(self, op, dtype, axis):

class FrameMultiIndexOps:

params = ([0, 1, [0, 1]], ops)
param_names = ["level", "op"]
params = [ops]
param_names = ["op"]

def setup(self, level, op):
def setup(self, op):
levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
np.arange(10).repeat(10000),
Expand All @@ -37,8 +37,8 @@ def setup(self, level, op):
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
self.df_func = getattr(df, op)

def time_op(self, level, op):
self.df_func(level=level)
def time_op(self, op):
self.df_func()


class SeriesOps:
Expand All @@ -56,10 +56,10 @@ def time_op(self, op, dtype):

class SeriesMultiIndexOps:

params = ([0, 1, [0, 1]], ops)
param_names = ["level", "op"]
params = [ops]
param_names = ["op"]

def setup(self, level, op):
def setup(self, op):
levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
np.arange(10).repeat(10000),
Expand All @@ -70,8 +70,8 @@ def setup(self, level, op):
s = pd.Series(np.random.randn(len(index)), index=index)
self.s_func = getattr(s, op)

def time_op(self, level, op):
self.s_func(level=level)
def time_op(self, op):
self.s_func()


class Rank:
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/tslibs/tslib.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class TimeIntsToPydatetime:
_tzs,
)
param_names = ["box", "size", "tz"]
# TODO: fold? freq?
# TODO: fold?

def setup(self, box, size, tz):
if box == "date" and tz is not None:
Expand Down
1 change: 1 addition & 0 deletions doc/source/development/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ want to clone your fork to your machine::
git clone https://github.com/your-user-name/pandas.git pandas-yourname
cd pandas-yourname
git remote add upstream https://github.com/pandas-dev/pandas.git
git fetch upstream

This creates the directory ``pandas-yourname`` and connects your repository to
the upstream (main project) *pandas* repository.
Expand Down
4 changes: 4 additions & 0 deletions doc/source/reference/arrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Properties
Timestamp.second
Timestamp.tz
Timestamp.tzinfo
Timestamp.unit
Timestamp.value
Timestamp.week
Timestamp.weekofyear
Expand All @@ -149,6 +150,7 @@ Methods
.. autosummary::
:toctree: api/

Timestamp.as_unit
Timestamp.astimezone
Timestamp.ceil
Timestamp.combine
Expand Down Expand Up @@ -242,6 +244,7 @@ Properties
Timedelta.nanoseconds
Timedelta.resolution
Timedelta.seconds
Timedelta.unit
Timedelta.value
Timedelta.view

Expand All @@ -250,6 +253,7 @@ Methods
.. autosummary::
:toctree: api/

Timedelta.as_unit
Timedelta.ceil
Timedelta.floor
Timedelta.isoformat
Expand Down
38 changes: 24 additions & 14 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1354,9 +1354,14 @@ This shows the first or last n rows from each group.
Taking the nth row of each group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To select from a DataFrame or Series the nth item, use
:meth:`~pd.core.groupby.DataFrameGroupBy.nth`. This is a reduction method, and
will return a single row (or no row) per group if you pass an int for n:
To select the nth item from each group, use :meth:`.DataFrameGroupBy.nth` or
:meth:`.SeriesGroupBy.nth`. Arguments supplied can be any integer, lists of integers,
slices, or lists of slices; see below for examples. When the nth element of a group
does not exist an error is *not* raised; instead no corresponding rows are returned.

In general this operation acts as a filtration. In certain cases it will also return
one row per group, making it also a reduction. However because in general it can
return zero or multiple rows per group, pandas treats it as a filtration in all cases.

.. ipython:: python

Expand All @@ -1367,6 +1372,14 @@ will return a single row (or no row) per group if you pass an int for n:
g.nth(-1)
g.nth(1)

If the nth element of a group does not exist, then no corresponding row is included
in the result. In particular, if the specified ``n`` is larger than any group, the
result will be an empty DataFrame.

.. ipython:: python

g.nth(5)

If you want to select the nth not-null item, use the ``dropna`` kwarg. For a DataFrame this should be either ``'any'`` or ``'all'`` just like you would pass to dropna:

.. ipython:: python
Expand All @@ -1376,21 +1389,11 @@ If you want to select the nth not-null item, use the ``dropna`` kwarg. For a Dat
g.first()

# nth(-1) is the same as g.last()
g.nth(-1, dropna="any") # NaNs denote group exhausted when using dropna
g.nth(-1, dropna="any")
g.last()

g.B.nth(0, dropna="all")

As with other methods, passing ``as_index=False``, will achieve a filtration, which returns the grouped row.

.. ipython:: python

df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=["A", "B"])
g = df.groupby("A", as_index=False)

g.nth(0)
g.nth(-1)

You can also select multiple rows from each group by specifying multiple nth values as a list of ints.

.. ipython:: python
Expand All @@ -1400,6 +1403,13 @@ You can also select multiple rows from each group by specifying multiple nth val
# get the first, 4th, and last date index for each month
df.groupby([df.index.year, df.index.month]).nth([0, 3, -1])

You may also use a slices or lists of slices.

.. ipython:: python

df.groupby([df.index.year, df.index.month]).nth[1:]
df.groupby([df.index.year, df.index.month]).nth[1:, :-1]

Enumerate group items
~~~~~~~~~~~~~~~~~~~~~

Expand Down
10 changes: 6 additions & 4 deletions doc/source/whatsnew/v0.15.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,11 +154,13 @@ Other enhancements:

- ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters (:issue:`8302`):

.. ipython:: python
:okwarning:
.. code-block:: python

s = pd.Series([False, True, False], index=[0, 0, 1])
s.any(level=0)
>>> s = pd.Series([False, True, False], index=[0, 0, 1])
>>> s.any(level=0)
0 True
1 False
dtype: bool

- ``Panel`` now supports the ``all`` and ``any`` aggregation functions. (:issue:`8302`):

Expand Down
19 changes: 12 additions & 7 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,12 +383,17 @@ this pathological behavior (:issue:`37827`):

*New behavior*:

.. ipython:: python
:okwarning:
.. code-block:: ipython

df.mean()
In [3]: df.mean()
Out[3]:
A 1.0
dtype: float64

df[["A"]].mean()
In [4]: df[["A"]].mean()
Out[4]:
A 1.0
dtype: float64

Moreover, DataFrame reductions with ``numeric_only=None`` will now be
consistent with their Series counterparts. In particular, for
Expand All @@ -415,10 +420,10 @@ instead of casting to a NumPy array which may have different semantics (:issue:`

*New behavior*:

.. ipython:: python
:okwarning:
.. code-block:: ipython

df.any()
In [5]: df.any()
Out[5]: Series([], dtype: bool)


.. _whatsnew_120.api_breaking.python:
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.5.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`MultiIndex.join` for extension array dtypes (:issue:`49277`)
- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
- Fixed regression in :meth:`DataFrame.plot` preventing :class:`~matplotlib.colors.Colormap` instance
from being passed using the ``colormap`` argument if Matplotlib 3.6+ is used (:issue:`49374`)
- Fixed regression in :func:`date_range` returning an invalid set of periods for ``CustomBusinessDay`` frequency and ``start`` date with timezone (:issue:`49441`)
-

.. ---------------------------------------------------------------------------
Expand Down
Loading