-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Support plugin DataFrame accessor via entry points (#29076) #61499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…9076) Allows external libraries to register DataFrame accessors using the 'pandas_dataframe_accessor' entry point group. This enables plugins to be automatically used without explicit import. Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>
Most of the errors are from: |
I'm personally happy to add this, but it's kind of a big change in terms of the code users can write. @pandas-dev/pandas-core, thoughts here? If this moves forward, you'll want to fix the CI and add documentation for this. |
Hard to understand without documentation that illustrates a use case. |
We allow creating pandas accessors with df.ip.is_ipv6 If we allow registration via Python entry points, the Same idea as what PDEP-9 proposed for the |
Couldn't that be really expensive if lots of packages were installed in the environment? What if there were conflicts in naming? |
Agreed this would need docs, but I'm generally +1 on using entry points rather than import-time side effects.
It's scoped to projects that declare an entrypoint, so it doesn't scale with every package installed. But it would be good to measure the performance impact on
That's probably defined somewhere in |
Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>
Since the implementation seems now stable, would it make sense to start working on the documentation already, or would you prefer we wait for further maintainer input? |
I think by default, the last package found for the entry point with that name will overwrite the previous. As Tom says, this is the same as with imports now, the second import will overwrite the accessor of the first. But we have control over it when registering the entry points. We could keep the behaviour but show a warning, raise an exception and ask the user to remove one of the packages (probably not a great option), let the user decide which package has higher priority in the config... Since this should be very rare, I would go for the simplest solution that doesn't "fail" silently, which would be show a warning saying something like
Up to you @PedroM4rques. The more complete is this PR the easier is for everybody to understand what it's proposed. But if at the end there is no agreement to add this, you'll be spending time in a PR that won't get merged. |
TLDR: Allows external libraries to register DataFrame accessors using the 'pandas_dataframe_accessor' entry point group. This enables plugins to be automatically used without explicit import.
I'm working on this PR collaboratively with @afonso-antunes .
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Proposal
We propose implementing an entrypoint system similar to Vaex (#29076) to allow easy access to the functionalities of any installed plugin without requiring explicit imports. The idea is to make all installed packages available for use, only being "imported" when they are needed in the program, in a seamless manner.
Current Behavior
Currently, each plugin must be explicitly imported:
Proposed Behavior
With our feature implemented, the code would be simplified to: