Skip to content

PERF: improve find_common_type perf for special case of all-equal dtypes #44594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 25, 2021

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Nov 23, 2021

In several benchmarks, the find_common_type spends quite some time on checking if all dtypes are equal, which can be improved a bit by 1) assuming all dtypes are actual dtypes (is_dtype_equal is generic and will first convert the input to a dtype, giving overhead) and 2) writing the for loop generator in small cython helper:

types = [np.dtype("float64") for _ in range(10000)]

In [2]: %timeit pd.core.dtypes.cast.find_common_type(types)
7.18 ms ± 627 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- master
368 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- PR

(based on some testing, both aspects above contribute significantly to the improvement)

@jorisvandenbossche jorisvandenbossche added the Performance Memory or execution speed performance label Nov 23, 2021
@@ -3038,3 +3038,23 @@ def is_bool_list(obj: list) -> bool:

# Note: we return True for empty list
return True


def dtypes_all_equal(list types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use python-style annotation with bool return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you can't combine that with a not None, so left as is, but I added it to the pyi file alongside all others functions in lib.pyx

@jbrockmendel
Copy link
Member

nice speedup!

@jreback jreback added this to the 1.4 milestone Nov 23, 2021
@@ -3038,3 +3038,25 @@ def is_bool_list(obj: list) -> bool:

# Note: we return True for empty list
return True


def dtypes_all_equal(list types not None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a return type

@jreback jreback merged commit d621e8e into pandas-dev:master Nov 25, 2021
@jorisvandenbossche jorisvandenbossche deleted the perf-find_common_type branch November 25, 2021 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants