Skip to content

PERF: Improve efficiency of BlockValuesRefs._clear_dead_references(...) #59643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions pandas/_libs/internals.pyx
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
from collections import defaultdict

cimport cython
from cpython.object cimport PyObject
from cpython.pyport cimport PY_SSIZE_T_MAX
from cpython.slice cimport PySlice_GetIndicesEx
from cpython.weakref cimport PyWeakref_NewRef
from cpython.weakref cimport (
PyWeakref_GetObject,
PyWeakref_NewRef,
)
from cython cimport Py_ssize_t

import numpy as np
Expand All @@ -26,6 +30,10 @@ from pandas._libs.util cimport (
)


cdef extern from "Python.h":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised that this also makes a difference - Cython doesn't translate that is None call for a cdef object into the C-equivalent expression already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does! But PyWeakref_GetObject returns a PyObject* which is not comparable to a PyObject (not-ponter) and None is the latter type. This is the reason for this quite ugly hack. If you know a better way I would really like to hear about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did Cython give you a compilation error? I would expect None to be a pointer to a PyObject as well and for Cython to handle this within its native syntax.

Of course I am wrong a lot! So I might have the wrong mental model for what Cython is doing, but worth double checking this can't be expressed more naturally first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't want to go down a rabbit hole on this - just double check real quick. If you spend anything more than 2 minutes trying to find an answer, let's just roll with what we've got

Copy link
Contributor Author

@Tolker-KU Tolker-KU Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all. Cython gives this error when using None

Error compiling Cython file:
------------------------------------------------------------
...
        # all these objects stay alive, e.g. df.items() for wide DataFrames
        # see GH#55245 and GH#55008
        if force or len(self.referenced_blocks) > self.clear_counter:
            self.referenced_blocks = [
                ref for ref in self.referenced_blocks
                if PyWeakref_GetObject(ref) is not None
                                            ^
------------------------------------------------------------

pandas/_libs/internals.pyx:914:44: Invalid types for 'is_not' (PyObject *, Python object)

PyObject* Py_None


@cython.final
@cython.freelist(32)
cdef class BlockPlacement:
Expand Down Expand Up @@ -902,7 +910,8 @@ cdef class BlockValuesRefs:
# see GH#55245 and GH#55008
if force or len(self.referenced_blocks) > self.clear_counter:
self.referenced_blocks = [
ref for ref in self.referenced_blocks if ref() is not None
ref for ref in self.referenced_blocks
if PyWeakref_GetObject(ref) != Py_None
]
nr_of_refs = len(self.referenced_blocks)
if nr_of_refs < self.clear_counter // 2:
Expand Down