-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-42825: Enable /OPT:REF #24098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
bpo-42825: Enable /OPT:REF #24098
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…causes test failures as Python310.dll depends on some identical things not folding together (such as wrap_binary_func and wrap_binary_func_l).
@zooba - anything I should do to help this get reviewed? Thanks! |
This PR is stale because it has been open for 30 days with no activity. |
adorilson
pushed a commit
to adorilson/cpython
that referenced
this pull request
Mar 13, 2021
We explicitly disable /OPT:ICF as some manual optimisations depend on some functions still having distinct pointers (such as wrap_binary_func and wrap_binary_func_l).
geofft
added a commit
to geofft/cpython
that referenced
this pull request
May 25, 2025
"Identical code folding" (ICF) is the feature of an optimizer to find that two functions have the same code and that they can therefore be deduplicated in the binary. While this is usually safe, it can cause observable behavior differences if the program relies on the fact that the two functions have different addresses. CPython relies on this in (at least) Objects/typeobject.c, which defines two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same implementation, and stores their addresses in the slotdefs array. If these two functions have the same address, update_one_slot() in that file will fill in slots it shouldn't, causing, for instances, classes defined in Python that inherit from some built-in types to misbehave. As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode, where it looks to see if there are any uses of a function symbol outside function calls (e.g., relocations in data sections) and skips ICF on such functions. The intent is that this avoids observable behavior differences but still saves storage as much as possible. This version is about two months old at the time of writing. To support older LLVM versions, we have to turn off ICF entirely. This problem was previously noticed for Windows/MSVC in python#53093 (and again in python#24098), where the default behavior of PGO is to enable ICF (which they expand to "identical COMDAT folding") and we had to turn it off.
geofft
added a commit
to geofft/cpython
that referenced
this pull request
May 25, 2025
"Identical code folding" (ICF) is the feature of an optimizer to find that two functions have the same code and that they can therefore be deduplicated in the binary. While this is usually safe, it can cause observable behavior differences if the program relies on the fact that the two functions have different addresses. CPython relies on this in (at least) Objects/typeobject.c, which defines two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same implementation, and stores their addresses in the slotdefs array. If these two functions have the same address, update_one_slot() in that file will fill in slots it shouldn't, causing, for instances, classes defined in Python that inherit from some built-in types to misbehave. As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode, where it looks to see if there are any uses of a function symbol outside function calls (e.g., relocations in data sections) and skips ICF on such functions. The intent is that this avoids observable behavior differences but still saves storage as much as possible. This version is about two months old at the time of writing. To support older LLVM versions, we have to turn off ICF entirely. This problem was previously noticed for Windows/MSVC in python#53093 (and again in python#24098), where the default behavior of PGO is to enable ICF (which they expand to "identical COMDAT folding") and we had to turn it off.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR enables the /OPT:REF linker optimization on Windows, allowing the linker to discard uncalled code. This is analogous to --gc-sections for ld on linux, if you're familiar with that. Basically the linker can determine that a function is never called and not include it in the final binary.
Note that I had to explicitly keep /OPT:ICF disabled because enabling Identical COMDAT Folding (ICF) causes test failures. This is because Python310.dll depends on some identical things not folding together (such as wrap_binary_func and wrap_binary_func_l) as their addresses are compared to determine equality in some important places. There are even helpful tests that verify COMDAT folding is disabled which helped me catch this 😊.
This could be backported to previous versions if desired, it should be simple and safe to do to whichever versions are deployed at scale in the wild, but I'll leave that to the community and maintainers to decide, I didn't want to go through the backport process.
Note: I am not sure how to build for PGO locally, but I think it's good to have this on for Configuration == PGInstrument and/or PGUpdate, so I used a Condition of "!= Debug". This should be verified in the real build pipeline that does PGO, or let me know how I can do it locally.
In terms of the size savings, here's a few example binaries and then the total across all of them in the build output folder - all measurements were done for amd64 Release binaries. It's over a 10% savings in the size of everything, and some binaries are considerably more than that.
https://bugs.python.org/issue42825