-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Modules] Don't fail when an unused textual header is missing. #138227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
According to the documentation > A header declaration that does not contain `exclude` nor `textual` specifies a header that contributes to the enclosing module. Which means that `exclude` and `textual` header don't contribute to the enclosing module and their presence isn't required to build such a module. The keywords tell clang how a header should be treated in a context of the module but they don't add headers to the module. When a textual header *is* used, clang still emits "file not found" error pointing to the location where the missing file is included.
@llvm/pr-subscribers-clang Author: Volodymyr Sapsai (vsapsai) ChangesAccording to the documentation Which means that When a textual header is used, clang still emits "file not found" error pointing to the location where the missing file is included. Full diff: https://github.com/llvm/llvm-project/pull/138227.diff 3 Files Affected:
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index c2f13fa48d464..1ba02b919a22a 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -310,8 +310,10 @@ void ModuleMap::resolveHeader(Module *Mod,
} else if (Header.HasBuiltinHeader && !Header.Size && !Header.ModTime) {
// There's a builtin header but no corresponding on-disk header. Assume
// this was supposed to modularize the builtin header alone.
- } else if (Header.Kind == Module::HK_Excluded) {
- // Ignore missing excluded header files. They're optional anyway.
+ } else if ((Header.Kind == Module::HK_Excluded) ||
+ (Header.Kind == Module::HK_Textual)) {
+ // Ignore excluded and textual header files as a module can be built with
+ // such headers missing.
} else {
// If we find a module that has a missing header, we mark this module as
// unavailable and store the header directive for displaying diagnostics.
diff --git a/clang/test/Modules/Inputs/submodules/module.modulemap b/clang/test/Modules/Inputs/submodules/module.modulemap
index 1c1b76a08969e..9e8143b8101de 100644
--- a/clang/test/Modules/Inputs/submodules/module.modulemap
+++ b/clang/test/Modules/Inputs/submodules/module.modulemap
@@ -30,3 +30,7 @@ module missing_umbrella_with_inferred_submodules {
module * { export * }
export *
}
+
+module missing_textual_header {
+ textual header "missing_textual.h"
+}
diff --git a/clang/test/Modules/missing-header.m b/clang/test/Modules/missing-header.m
index c162e1b5f08b3..84d82e5ceda32 100644
--- a/clang/test/Modules/missing-header.m
+++ b/clang/test/Modules/missing-header.m
@@ -8,6 +8,9 @@
@import missing_unavailable_headers.not_missing; // OK
// CHECK-NOT: missing_unavailable_headers
+@import missing_textual_header; // OK
+// CHECK-NOT: missing_textual_header
+
@import missing_headers;
// CHECK: module.modulemap:15:27: error: header 'missing.h' not found
// CHECK: could not build module 'missing_headers'
|
@llvm/pr-subscribers-clang-modules Author: Volodymyr Sapsai (vsapsai) ChangesAccording to the documentation Which means that When a textual header is used, clang still emits "file not found" error pointing to the location where the missing file is included. Full diff: https://github.com/llvm/llvm-project/pull/138227.diff 3 Files Affected:
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index c2f13fa48d464..1ba02b919a22a 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -310,8 +310,10 @@ void ModuleMap::resolveHeader(Module *Mod,
} else if (Header.HasBuiltinHeader && !Header.Size && !Header.ModTime) {
// There's a builtin header but no corresponding on-disk header. Assume
// this was supposed to modularize the builtin header alone.
- } else if (Header.Kind == Module::HK_Excluded) {
- // Ignore missing excluded header files. They're optional anyway.
+ } else if ((Header.Kind == Module::HK_Excluded) ||
+ (Header.Kind == Module::HK_Textual)) {
+ // Ignore excluded and textual header files as a module can be built with
+ // such headers missing.
} else {
// If we find a module that has a missing header, we mark this module as
// unavailable and store the header directive for displaying diagnostics.
diff --git a/clang/test/Modules/Inputs/submodules/module.modulemap b/clang/test/Modules/Inputs/submodules/module.modulemap
index 1c1b76a08969e..9e8143b8101de 100644
--- a/clang/test/Modules/Inputs/submodules/module.modulemap
+++ b/clang/test/Modules/Inputs/submodules/module.modulemap
@@ -30,3 +30,7 @@ module missing_umbrella_with_inferred_submodules {
module * { export * }
export *
}
+
+module missing_textual_header {
+ textual header "missing_textual.h"
+}
diff --git a/clang/test/Modules/missing-header.m b/clang/test/Modules/missing-header.m
index c162e1b5f08b3..84d82e5ceda32 100644
--- a/clang/test/Modules/missing-header.m
+++ b/clang/test/Modules/missing-header.m
@@ -8,6 +8,9 @@
@import missing_unavailable_headers.not_missing; // OK
// CHECK-NOT: missing_unavailable_headers
+@import missing_textual_header; // OK
+// CHECK-NOT: missing_textual_header
+
@import missing_headers;
// CHECK: module.modulemap:15:27: error: header 'missing.h' not found
// CHECK: could not build module 'missing_headers'
|
The change for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at the missing_textual_header
module alone this is a bit odd, but that's just the semantics of textual headers. Really textual headers just exist to control the non-modular include warning, so this is fine.
Thanks for the review! |
We have bisected build breakages internally at google at this change. Ironically the compiler complains about a missing header while before it didn't. We do not have a reproducer yet. |
Here's a reproducer for our breakage: // RUN: rm -rf %t
// RUN: split-file %s %t
//
// First, build a module with a header.
//
// RUN: %clang_cc1 -fmodules -fmodule-map-file=%t/modules1.map -fmodule-name=a -emit-module -xc++ -fmodules-embed-all-files -o %t/a.pcm %t/modules1.map
// RUN: %clang_cc1 -fmodules -fmodule-map-file=%t/modules1.map -fmodule-map-file=%t/modules2.map -fmodule-name=b -emit-module \
// RUN: -fmodule-file=%t/a.pcm -xc++ -fmodules-embed-all-files -o %t/b.pcm %t/modules2.map
//
// Remove the header and check we can still build the code that finds it in a PCM.
//
// RUN: rm %t/foo.h
// RUN: %clang_cc1 -fmodules -fmodule-map-file=%t/modules2.map -fmodule-file=%t/b.pcm -fsyntax-only %t/use.cpp
//--- modules1.map
module a {
module foo {
header "foo.h"
export *
}
export *
}
//--- modules2.map
module all_textual {
module foo {
textual header "foo.h"
export *
}
module wrap_foo {
textual header "wrap_foo.h"
export *
}
export *
}
module b {
module wrap_foo {
private header "wrap_foo.h"
export *
}
export *
}
//--- foo.h
#ifndef FOO_H
#define FOO_H
void foo();
#endif
//--- wrap_foo.h
#include "foo.h"
//--- use.cpp
#include "wrap_foo.h"
void test() {
foo();
} It previously built fine because the llvm-project/clang/lib/Lex/ModuleMap.cpp Line 326 in 09fd8f0
and therefore it made us pick the module llvm-project/clang/lib/Lex/ModuleMap.cpp Line 618 in 09fd8f0
After the change, we no longer mark I know what we do is cheesy and we might need to reconsider how to approach it, but this is a breaking change and it would be great to find a resolution for it before committing to the new behavior. Could we revert this while we discuss the potential ways out of it? |
@vsapsai Please revert if there's no obvious fix. |
What is your timeframe for stopping putting the same header ("wrap_foo.h") into multiple modules? If you do that it was never guaranteed which module would be used for a header. I'm willing to unblock you but I'd like to know if you are committed to fixing the issue in the module maps. |
I don't think we can ever get rid of that pattern and we will keep relying on Clang supporting this. I am sure there are ways to meet both your needs and our needs. To reiterate, Clang supports having the same header in multiple module maps and we don't view our current setup as having at a high level. I just want to explicit that our stance is that there are no issues that need "fixing in the module maps". How we achieve this today is shaky (relying on availability doesn't look right), but we absolutely need:
|
It's weird that clang picked |
Interesting, changing
But seems we are closer to this behavior than I expected. |
@vsapsai I guess, it's a good sign? :) Do you see how our use case can be supported by a trivial and low-risk forward fix? If not, I'd insist on a revert before we can figure out the way forward. We can run this sort of a change through our testing before it relands, and ensure it doesn't break our code. We did this on multiple occasions before (especially for changes that touch Clang header modules). |
I still don't understand what is your use case. My [very ungenerous] interpretation is "to access a file you were told not to access without triggering any tripwires". And I'm not sure such a use case is worth supporting. But I understand that is only my own interpretation which can be incorrect. And I want to believe you have a better use case that doesn't rely on accessing private headers. |
The code I shared compiles with no error before this change and has errors now. It is okay to discuss if our use-case is "worth supporting", but we heavily rely on this behavior and this has already broken us and this puts us in a corner. Could we revert this and have a discussion on how to satisfy both our use cases on equal grounds? |
Ok, I'm going to revert the change to help you out. But I'm going to re-land it in a week or when you are ready, whichever comes first. There was no indication there is anything wrong with the change or if the issue is wide-spread. And if a single company relies on an existing side effect, there is no obligation for everyone else to maintain such a side effect forever. I'm willing to help figure out how to achieve the desired result in a different way. But for that need to know what is the desired result. |
I would like a second opinion on your proposed course of action from @llvm/clang-area-team. It is concerning to me that we as a company are given a week to adjust to the change and there is no discussion either about alternative approaches or the scale of the changes needed to support this. We do rely on all kinds of weird behaviors and side effects in Clang, and are happy to adjust to make the best for the community. I am pretty sure we can find a way to make this work too, but not under undue pressure. I agree that maintaining anything for a single company is not something that the community must or should do, but we are also not asking for maintaining this behavior indefinitely. We ask to give us reasonable time to understand the problem and adjust and engage with us to help fix this. I also do not know how to react to the xkcd, it seems rather snarky and dismissive. I hope it was not intended that way, but I still wanted to call it out that it came over like this to me. With all that being said, thank you for reverting the commit, this really helps relieve some pressure from the discussion. |
Back to the original issue.
We would need to example I shared above to keep working. We rely on an optimization that picks a module with a modular header and imports it rather than a textual one whenever it's available. The reason we cannot change that easily is that:
Any change on our side is either in compile time or will take long to implement. We already prefer modules where header is modular over textual, but we the non-private vs private header takes higher priority.
If that does not work by default, we would probably need a ( |
Can you clarify, are you saying this pattern of having a header in two different modules has to keep working indefinitely, or are you willing to migrate off of it? I don't understand what the reason is for the header to be in two different modules in the first place. It sounds like it is somehow related to a build optimization, but what exactly is that? |
What makes you think that no discussion is possible? For the record, "we need this to work, you must make it work" is not a discussion.
|
Do you control the module map that specifies a header as private? Does it need to stay private? Personally, to me it looks like a bug that a private header is getting picked up [somewhat accidentally]. I'm not planning to fix this bug, just that relying on outright buggy behavior doesn't look promising long-term. |
(This is mostly written wearing my "Googler" hat, not the clang area team lead hat) There's been a lot of discussion of what the exact use case is. I think it's helpful to share the context that Google essentially uses Clang header modules to wrap generated proto headers, and this is critical for enabling our builds to scale. The details here about private modules, module wrappers, and textual inclusions are Blazel-generated, low-level implementation details that don't matter in the grand scheme of things. The intention is to provide a IWYU-clean, header module programming model that is efficient as possible, meaning sometimes the large generated proto header needs to be included textually, and sometimes handled as a module import. I believe the arrangement of the module maps is changeable, but it is a low-level detail at the bottom of a performance critical system, so the lead time for changes is probably measured in months, not days, and probably requires significant effort. Focusing back on the commit at hand, it would have forced our internal build system to provide thousands more header files as inputs to compilations across the system, and we can't afford to do that. We backed it out of the last few internal release candidates, but carrying it over requires manual effort. The purpose of the commit seems to have been to emit fewer errors, not more. I'm sure Google's usage of header modules relies on internal implementation details that it shouldn't, but we appreciate having more time to debug the issue, so thank you for the accommodation. And, in the end, the folks debugging this (Ilya & co) are busy working on clang header module reproduction and reduction tools: https://github.com/emaxx-google/cvise/branches @emaxx-google Effective open source collaboration is all about balancing the needs of contributors, and while this work on cvise and reblaze isn't happening in LLVM, I hope the value of these contributions outweighs the inconvenience of dealing with Google's reliance on header module implementation details. |
Interestingly enough, @vsapsai is also working on reproducers for explicitly built clang modules, and that is how he ran into the compiler bug about a not-included textual header. When discovering which files to capture for the reproduction, we technically shouldn't need to also include this header input since it's not needed to build the module, it seems like that approach would also help for reducing reproducers. |
Do you include proto headers directly or through the wrapper headers? Are there any important macros in proto headers? Do you need to abuse macros to get something out of proto headers? Kinda like My initial reaction is to offer a solution by not using private headers (especially that it is so easily subverted by textual headers) and to rely on |
Our whole build infrastructure relies on producing those module maps.
Things are a little more complicated on top of that, we actually have a submodule per header in each module matching a target. And we also rely on That way we enforce the structure from our BUILD rules: headers are only allowed to include headers from their direct dependencies and only the non-private ones. I believe the way access checking of headers is done today, Clang allows us to have a dependency on a header that you could get through a module that has it publicly available, even if it ends up picking a module where the header was actually private. In turn, this allows us to add optimization by limiting the number of header and modules we need to ship to distributed build for each of the compile actions. I know there are many details missing from this explanation, but I hope this gives a start. Changing how our build is done within all of Google is obviously hard. I understand that having multiple modules for a header may not be something others care about, but we need it and are willing to put the work to make it work too. Does that explanation help to clarify our use case or do folks still have questions?
Directly.
I don't think there are. I am not sure why it's important.
As mentioned above, we rely on private headers to check we do not accidentally include a header in
I find exchanges like this non-respectful and non-considerate. Please give us some benefit of a doubt, we are trying to explain our use-case and we are happy to put in the work to make both our cases work.
Could you directly quote something in my writing that you interpreted like this? |
Firstly, thank you for your responsiveness @ilya-biryukov, and for helping us understand how this bug fix impacts Google's build. For all individuals involved in this discussion, I would like to remind that our code of conduct does ask us to understand each other when we disagree. In general, though, I share the same concern as @vsapsai about leaving this patch reverted for too long. But in fear of speaking too much on behalf of others, perhaps it would be helpful for us to have a video call, coordinated by the area team, or otherwise, to understand how to move forward if we cannot come to a resolution soon. |
I have a concrete proposal to move forward, #141792. I have checked that the failure we've seen in our infrastructure goes away with both changes applied. Given that multiple modules owning the same header seem to be a rare case outside of Google, I hope this also won't cause any breakages and the change acceptable (it's only changing the logic that kicks in when we do have multiple modules for the same header). I really appreciate the steps forward here and would be happy to hop on a video call if necessary. I would still suggest to discuss #141792 first and if that stalls, we can schedule a call. Does that work? Unfortunately, I will be OOO until next Monday and won't be able to respond here. |
Many thanks for engaging in this discussion and sorry for the delays in my responses too. |
I am sorry, after re-reading the words I realize it can come across as condescending, I didn't mean it that way. I'm sorry and I understand you have your own requirements and constraints. You've mentioned that you'll be out, so I'm not going to re-land the patch while the conversation is still ongoing.
Giving the benefit of the doubt I interpret your request as "what I can do in the future to make things better". I would say that
and the lack of the commitment to any work on your side made it sound as a demand. I'm sorry that my replies came across as hostile and snarky, I didn't mean it that way. Though I'll admit I'm not thrilled about your current approach, that's what I was communicating. But that's what you have right now and there is no easy way to change it fast. |
Thanks for the detailed explanation. I'm relieved that you aren't a heavy user of textual headers to make macros work (this is one of the use cases for textual headers). After reading about Bazel system I think a big issue is the mismatch between the way Bazel rules work and the way modules work. Specifically, in Bazel you know through which dependency a header is included. So you can prohibit it for some dependencies while allow it for others. But with modules you cannot specify through which module a header is accessed. Though depending on the module map granularity and the usage of module maps you can replicate some kind of those checks. I need to think about it more. But I had an idea if |
We reviewed this thread today at the Clang Area Team meeting and it seems like y'all have perhaps found a solution everyone is happy with. However, if you'd like a meeting or if there's some other way we can support you, we're definitely happy to help!
The usual path is: relevant maintainer(s), then lead maintainer, then relevant area team(s), then project council. But there's really no "wrong" group to escalate to, so this was a perfectly reasonable escalation path. |
I think I understand the logic behind your approach.
And the same header in multiple modules just happened to be the case, it wasn't a deliberate design choice. |
What about the approach to provide all_textual.pcm to use.cpp and for clang to find foo.h in all_textual.pcm? It doesn't work right now but I'm not sure that it shouldn't (there can be good reasons why it shouldn't work, tbh). |
According to the documentation
Which means that
exclude
andtextual
header don't contribute to the enclosing module and their presence isn't required to build such a module. The keywords tell clang how a header should be treated in a context of the module but they don't add headers to the module.When a textual header is used, clang still emits "file not found" error pointing to the location where the missing file is included.