Skip to content

ENH: Support plugin DataFrame accessor via entry points (#29076) #61499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

PedroM4rques
Copy link
Contributor

@PedroM4rques PedroM4rques commented May 26, 2025

TLDR: Allows external libraries to register DataFrame accessors using the 'pandas_dataframe_accessor' entry point group. This enables plugins to be automatically used without explicit import.

I'm working on this PR collaboratively with @afonso-antunes .

Proposal

We propose implementing an entrypoint system similar to Vaex (#29076) to allow easy access to the functionalities of any installed plugin without requiring explicit imports. The idea is to make all installed packages available for use, only being "imported" when they are needed in the program, in a seamless manner.

Current Behavior

Currently, each plugin must be explicitly imported:

import pandas as pd
import vaex.graphql  # required to enable .graphql (.graphql is compatible with pd.DataFrames)

df = pd.DataFrame(...)
df.graphql.query(...)  # only works after the import

Proposed Behavior

With our feature implemented, the code would be simplified to:

import pandas as pd

df = pd.DataFrame(...)
df.graphql.query(...)  # works directly if the plugin is installed via pip

PedroM4rques and others added 2 commits May 26, 2025 20:07
…9076)

Allows external libraries to register DataFrame accessors
using the 'pandas_dataframe_accessor' entry point group.
This enables plugins to be automatically used without
explicit import.

Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>
@afonso-antunes
Copy link

afonso-antunes commented May 26, 2025

Most of the errors are from:
from importlib_metadata import entry_points

@datapythonista datapythonista added API Design Needs Discussion Requires discussion from core team before further action labels May 27, 2025
@datapythonista
Copy link
Member

I'm personally happy to add this, but it's kind of a big change in terms of the code users can write. @pandas-dev/pandas-core, thoughts here?

If this moves forward, you'll want to fix the CI and add documentation for this.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented May 27, 2025

I'm personally happy to add this, but it's kind of a big change in terms of the code users can write. @pandas-dev/pandas-core, thoughts here?

If this moves forward, you'll want to fix the CI and add documentation for this.

Hard to understand without documentation that illustrates a use case.

@datapythonista
Copy link
Member

Hard to understand without documentation that illustrates a use case.

We allow creating pandas accessors with register_dataframe_accessor. For cyberpandas for example, when you do import cyberpandas they'd call that, and you'll be able to use the accessor:

df.ip.is_ipv6

If we allow registration via Python entry points, the import cyberpandas won't be needed anymore. On import pandas we will check the packages the user has installed in their environment that provide an accessor, and we will register them automatically.

Same idea as what PDEP-9 proposed for the read_* and to_* functions/methods if you remember that.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented May 27, 2025

On import pandas we will check the packages the user has installed in their environment that provide an accessor, and we will register them automatically.

Couldn't that be really expensive if lots of packages were installed in the environment? What if there were conflicts in naming?

@TomAugspurger
Copy link
Contributor

Agreed this would need docs, but I'm generally +1 on using entry points rather than import-time side effects.

Couldn't that be really expensive if lots of packages were installed in the environment?

It's scoped to projects that declare an entrypoint, so it doesn't scale with every package installed. But it would be good to measure the performance impact on import pandas here, both for an environment with several and without any entrypoints installed.

What if there were conflicts in naming?

That's probably defined somewhere in importlib, but the same issue would affect plugins using import-time registration today.

PedroM4rques and others added 2 commits May 27, 2025 17:53
Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>
@afonso-antunes
Copy link

Since the implementation seems now stable, would it make sense to start working on the documentation already, or would you prefer we wait for further maintainer input?

@datapythonista
Copy link
Member

What if there were conflicts in naming?

I think by default, the last package found for the entry point with that name will overwrite the previous. As Tom says, this is the same as with imports now, the second import will overwrite the accessor of the first. But we have control over it when registering the entry points. We could keep the behaviour but show a warning, raise an exception and ask the user to remove one of the packages (probably not a great option), let the user decide which package has higher priority in the config... Since this should be very rare, I would go for the simplest solution that doesn't "fail" silently, which would be show a warning saying something like Both packageA and packageB provide the accessor foo. packageA is being used, please uninstall the package you don't want to use to remove this warning.

would it make sense to start working on the documentation already, or would you prefer we wait for further maintainer input?

Up to you @PedroM4rques. The more complete is this PR the easier is for everybody to understand what it's proposed. But if at the end there is no agreement to add this, you'll be spending time in a PR that won't get merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GraphQL support / accessor plugin system
5 participants