Skip to content

Performance degradation in 2.1.3 #605

Closed
@jeblair

Description

@jeblair

Hi,

The recent commit f1a82e4 has caused a significant performance
degradation in the use of GitPython. As an example, one unit
test that we have which performs operations on about 42
repositories normally runs in 6.7 seconds, but with the two new
gc.collect() calls in f1a82e4 it now takes 24.2 seconds.

The gitdb.util.mman.collect() call does not seem to be expensive,
only gc.collect().

It's worth noting that most objects in python are freed by use of
reference counting, while gc.collect() invokes the garbage
collector, which is only needed to free objects with reference
cycles. Normally, the garbage collector runs periodically and
does not need to be explicitly invoked.

Since the garbage collector should only be effective on objects
which no longer have references from active frames, it's not
clear why these two calls to gc.collect() are required. It would
be nice to know if there are any other ways to address what they
are attempting to fix. If they truly are required in some
circumstances, it would be good to know what those are and if
there are cases where we do not need to invoke the expense. And
if they are not required, we should remove them.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions