Skip to content

modularize the config module bootstrap #141272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

Shourya742
Copy link
Contributor

@Shourya742 Shourya742 commented May 20, 2025

Currently, our config module is quite large over 3,000 lines, and handles a wide range of responsibilities. This PR aims to break it down into smaller, more focused submodules to improve readability and maintainability:

  • toml: Introduces a dedicated toml submodule within the config module. Its sole purpose is to define configuration-related structs along with their corresponding deserialization logic. It also contains the parse_inner method, which serves as the central function for extracting relevant information from the TOML structs and constructing the final configuration.

  • rust, dist, install, llvm, build, gcc, and others: Each of these modules contains TOML subsections specific to their domain, along with the logic necessary to convert them into parts of the final configuration struct.

  • common: Contains shared types and enums used across multiple TOML subsections.

  • parsing: Houses the logic that integrates all the TOML subsections into the complete configuration struct.

r? @Kobzol

@rustbot
Copy link
Collaborator

rustbot commented May 20, 2025

This PR modifies src/bootstrap/src/core/config.

If appropriate, please update CONFIG_CHANGE_HISTORY in src/bootstrap/src/utils/change_tracker.rs.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels May 20, 2025
@rustbot

This comment has been minimized.

@Shourya742 Shourya742 force-pushed the 2025-05-18-modularize-config-module branch from 6a1e868 to b8374d6 Compare May 20, 2025 03:17
@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general direction of this. Could you perhaps split the changes into multiple commits though? It would be easier to review. Thank you :)

@Shourya742 Shourya742 force-pushed the 2025-05-18-modularize-config-module branch from b8374d6 to ad21211 Compare May 21, 2025 17:19
@Kobzol
Copy link
Contributor

Kobzol commented May 22, 2025

Kind of crazy just how much code is there in config definition and parsing. The changes LGTM, but since this is a large-ish change, I'd like to hear also from other members of t-bootstrap if they are OK with it. CC @onur-ozkan @jieyouxu.

@jieyouxu jieyouxu self-assigned this May 24, 2025
Comment on lines 302 to 307
/// Since we use `#[serde(deny_unknown_fields)]` on `TomlConfig`, we need a wrapper type
/// for the "change-id" field to parse it even if other fields are invalid. This ensures
/// that if deserialization fails due to other fields, we can still provide the changelogs
/// to allow developers to potentially find the reason for the failure in the logs..
#[derive(Deserialize, Default)]
pub(crate) struct ChangeIdWrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have types module but this module also includes bunch of types. Right now it seems like there is no clear distinction between these modules and people will start putting things all around without having a clear clue. If we want to have module for per task, we have to make it super explicit/clear and have it well documented.

Comment on lines 195 to 228
// NOTE: can't derive(Deserialize) because the intermediate trip through toml::Value only
// deserializes i64, and derive() only generates visit_u64
impl<'de> Deserialize<'de> for DebuginfoLevel {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
use serde::de::Error;

Ok(match Deserialize::deserialize(deserializer)? {
StringOrInt::String(s) if s == "none" => DebuginfoLevel::None,
StringOrInt::Int(0) => DebuginfoLevel::None,
StringOrInt::String(s) if s == "line-directives-only" => {
DebuginfoLevel::LineDirectivesOnly
}
StringOrInt::String(s) if s == "line-tables-only" => DebuginfoLevel::LineTablesOnly,
StringOrInt::String(s) if s == "limited" => DebuginfoLevel::Limited,
StringOrInt::Int(1) => DebuginfoLevel::Limited,
StringOrInt::String(s) if s == "full" => DebuginfoLevel::Full,
StringOrInt::Int(2) => DebuginfoLevel::Full,
StringOrInt::Int(n) => {
let other = serde::de::Unexpected::Signed(n);
return Err(D::Error::invalid_value(other, &"expected 0, 1, or 2"));
}
StringOrInt::String(s) => {
let other = serde::de::Unexpected::Str(&s);
return Err(D::Error::invalid_value(
other,
&"expected none, line-tables-only, limited, or full",
));
}
})
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a parsing logic.

@onur-ozkan
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 24, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 24, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@jieyouxu
Copy link
Member

cc @clubby789

@Shourya742
Copy link
Contributor Author

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 24, 2025
@jieyouxu jieyouxu removed their assignment May 26, 2025
@jieyouxu
Copy link
Member

Vibes-wise, organization seems fine to me.

@onur-ozkan
Copy link
Member

onur-ozkan commented May 28, 2025

I still have this concern (also see this as an additional ref). With the current approach, adding new things will always be unclear for some people. Even now, for example, we have parsing.rs and types.rs modules but some parsing logic has leaked into other modules and some types have leaked to some other modules as well.

What I would suggest instead is organizing the modules based on the configuration field. For example, mod.rs could represent the root level of the configuration and the rest could be split into the sub fields like rust.rs for [rust], llvm.rs for [llvm], target_specific.rs for things like [target.x86_64-unknown-linux-gnu], etc.

With this structure it would make it very explicit where things belong and anything that doesn't fit this rule can be easily caught during PR reviews.

Shourya742 added 10 commits May 29, 2025 10:28
…uctures

This commit introduces a new `toml` submodule within the `src/bootstrap/config`
directory, fundamentally restructuring how `bootstrap.toml`'s on-disk
representation is handled.

Key changes introduced in this commit:

1.  Dedicated `toml` Module: A new `src/bootstrap/config/toml` directory is
    created. This module is now solely responsible for defining the Rust types
    that directly mirror the structure of the `bootstrap.toml` file. These
    types serve as the initial layer for `serde` deserialization before
    further processing and merging into the final `Config`.

2.  Field-Based Segregation within `toml`: To further enhance clarity and
    ownership, the raw TOML configuration types are segregated into individual
    files corresponding to the top-level sections of `bootstrap.toml`:
    * `build.rs`: For `[build]` related TOML structures.
    * `change_id.rs`: For the `change-id` field.
    * `dist.rs`: For `[dist]` options.
    * `gcc.rs`: For `[gcc]` options.
    * `install.rs`: For `[install]` options.
    * `llvm.rs`: For `[llvm]` options.
    * `rust.rs`: For `[rust]` options.
    * `target.rs`: For `[target.<triple>]` options.

3.  Encapsulated Core Utilities: The `Merge` trait (now defined in
    `toml/merge.rs`) and the `define_config!` macro (in `toml/macros.rs`) have
    been moved into dedicated modules within the `toml` subdirectory. As these
    utilities are integral to how the raw TOML structures are processed
    (deserialized, merged, and constructed), centralizing them here enhances the
    self-contained nature of the `toml` module. A `common.rs` is also introduced
    within `toml` for types shared among its sub-modules.
This commit moves the `Config` struct's methods responsible for reading and
deserializing `bootstrap.toml` files into `TomlConfig` instances to
`src/bootstrap/config/toml/mod.rs`.

Specifically, `Config::get_builder_toml`, `Config::get_toml`, and
`Config::get_toml_inner` are now located within the `toml` submodule.
Moves `check_incompatible_options_for_ci_llvm` to `src/bootstrap/config/toml/llvm.rs` and
`check_incompatible_options_for_ci_rustc` to `src/bootstrap/config/toml/rust.rs`.

The generic `set` and threads_from_config  helper function is moved to `common.rs`.
- With this all parsing or deserialization logic has now moved to toml.
@Shourya742 Shourya742 requested review from Kobzol and onur-ozkan May 29, 2025 12:42
@rust-log-analyzer

This comment has been minimized.

@Shourya742 Shourya742 force-pushed the 2025-05-18-modularize-config-module branch from 2dc25e6 to 877aa18 Compare May 29, 2025 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants