Skip to content

bpo-42825: Enable /OPT:REF #24098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 19, 2021
Merged

Conversation

Austin-Lamb
Copy link
Contributor

@Austin-Lamb Austin-Lamb commented Jan 4, 2021

This PR enables the /OPT:REF linker optimization on Windows, allowing the linker to discard uncalled code. This is analogous to --gc-sections for ld on linux, if you're familiar with that. Basically the linker can determine that a function is never called and not include it in the final binary.

Note that I had to explicitly keep /OPT:ICF disabled because enabling Identical COMDAT Folding (ICF) causes test failures. This is because Python310.dll depends on some identical things not folding together (such as wrap_binary_func and wrap_binary_func_l) as their addresses are compared to determine equality in some important places. There are even helpful tests that verify COMDAT folding is disabled which helped me catch this 😊.

This could be backported to previous versions if desired, it should be simple and safe to do to whichever versions are deployed at scale in the wild, but I'll leave that to the community and maintainers to decide, I didn't want to go through the backport process.

Note: I am not sure how to build for PGO locally, but I think it's good to have this on for Configuration == PGInstrument and/or PGUpdate, so I used a Condition of "!= Debug". This should be verified in the real build pipeline that does PGO, or let me know how I can do it locally.

In terms of the size savings, here's a few example binaries and then the total across all of them in the build output folder - all measurements were done for amd64 Release binaries. It's over a 10% savings in the size of everything, and some binaries are considerably more than that.

Binary Size before Size after Delta (%)
py.exe 1,024,000 735,744 -288,256 (-28.2%)
python.exe 93,696 91,136 -2,560 (-2.7%)
python_uwp.exe 225,792 140,288 -85,504 (-37.9%)
python310.dll 5,268,992 4,753,408 -515,584 (-9.8%)
sqlite3.dll 1,505,280 1,376,256 -129,024 (-8.6%)
Total of .exe/.dll/.pyd 21,632,408 19,278,744 -2,353,664 (-10.9%)

https://bugs.python.org/issue42825

…causes test failures as Python310.dll depends on some identical things not folding together (such as wrap_binary_func and wrap_binary_func_l).
@Austin-Lamb
Copy link
Contributor Author

@zooba - anything I should do to help this get reviewed?

Thanks!

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 19, 2021
@zooba zooba merged commit b4af629 into python:master Feb 19, 2021
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
We explicitly disable /OPT:ICF as some manual optimisations depend on some functions still having distinct pointers (such as wrap_binary_func and wrap_binary_func_l).
geofft added a commit to geofft/cpython that referenced this pull request May 25, 2025
"Identical code folding" (ICF) is the feature of an optimizer to find that two
functions have the same code and that they can therefore be deduplicated
in the binary. While this is usually safe, it can cause observable
behavior differences if the program relies on the fact that the two
functions have different addresses.

CPython relies on this in (at least) Objects/typeobject.c, which defines
two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same
implementation, and stores their addresses in the slotdefs array. If
these two functions have the same address, update_one_slot() in that
file will fill in slots it shouldn't, causing, for instances,
classes defined in Python that inherit from some built-in types to
misbehave.

As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode,
where it looks to see if there are any uses of a function symbol outside
function calls (e.g., relocations in data sections) and skips ICF on
such functions. The intent is that this avoids observable behavior
differences but still saves storage as much as possible.

This version is about two months old at the time of writing. To support
older LLVM versions, we have to turn off ICF entirely.

This problem was previously noticed for Windows/MSVC in python#53093 (and
again in python#24098), where the default behavior of PGO is to enable ICF
(which they expand to "identical COMDAT folding") and we had to turn it
off.
geofft added a commit to geofft/cpython that referenced this pull request May 25, 2025
"Identical code folding" (ICF) is the feature of an optimizer to find that two
functions have the same code and that they can therefore be deduplicated
in the binary. While this is usually safe, it can cause observable
behavior differences if the program relies on the fact that the two
functions have different addresses.

CPython relies on this in (at least) Objects/typeobject.c, which defines
two functions wrap_binaryfunc() and wrap_binaryfunc_l() with the same
implementation, and stores their addresses in the slotdefs array. If
these two functions have the same address, update_one_slot() in that
file will fill in slots it shouldn't, causing, for instances,
classes defined in Python that inherit from some built-in types to
misbehave.

As of LLVM 20 (llvm/llvm-project#116275), BOLT has a "safe ICF" mode,
where it looks to see if there are any uses of a function symbol outside
function calls (e.g., relocations in data sections) and skips ICF on
such functions. The intent is that this avoids observable behavior
differences but still saves storage as much as possible.

This version is about two months old at the time of writing. To support
older LLVM versions, we have to turn off ICF entirely.

This problem was previously noticed for Windows/MSVC in python#53093 (and
again in python#24098), where the default behavior of PGO is to enable ICF
(which they expand to "identical COMDAT folding") and we had to turn it
off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants