1
1
# Parallel Compilation
2
2
3
- As of <!-- date-check --> May 2022, The only stage of the compiler
4
- that is already parallel is codegen. The nightly compiler implements query evaluation,
5
- but there is still a lot of work to be done. The lack of parallelism at other stages
6
- also represents an opportunity for improving compiler performance. One can try out the current
7
- parallel compiler work by enabling it in the ` config.toml ` .
3
+ As of <!-- date-check --> August 2022, the only stage of the compiler that
4
+ is already parallel is codegen. Some other parts of the nightly compiler
5
+ have parallel implementations, such as query evaluation, type check and
6
+ monomorphization, but there is still a lot of work to be done. The lack of
7
+ parallelism at other stages (for example, macro expansion) also represents
8
+ an opportunity for improving compiler performance.
9
+
10
+ ** To try out the current parallel compiler** , one can install rustc from
11
+ source code with enable ` parallel-compiler = true ` in the ` config.toml ` .
8
12
9
13
These next few sections describe where and how parallelism is currently used,
10
14
and the current status of making parallel compilation the default in ` rustc ` .
11
15
12
- The underlying thread-safe data-structures used in the parallel compiler
13
- can be found in the ` rustc_data_structures::sync ` module. Some of these data structures
14
- use the ` parking_lot ` crate as well.
15
-
16
16
## Codegen
17
17
18
- There are two underlying thread safe data structures used in code generation:
19
-
20
- - ` Lrc `
21
- - Which is an [ ` Arc ` ] [ Arc ] if ` parallel_compiler ` is true, and a [ ` Rc ` ] [ Rc ]
22
- if it is not.
23
- - ` MetadataRef ` -> [ ` OwningRef<Box<dyn Erased + Send + Sync>, [u8]> ` ] [ OwningRef ]
24
- - This data structure is specific to ` rustc ` .
25
-
26
18
During [ monomorphization] [ monomorphization ] the compiler splits up all the code to
27
19
be generated into smaller chunks called _ codegen units_ . These are then generated by
28
20
independent instances of LLVM running in parallel. At the end, the linker
29
21
is run to combine all the codegen units together into one binary. This process
30
22
occurs in the ` rustc_codegen_ssa::base ` module.
31
23
24
+ ## Data Structures
25
+
26
+ The underlying thread-safe data-structures used in the parallel compiler
27
+ can be found in the ` rustc_data_structures::sync ` module. These data structures
28
+ are implemented diferently depending on whether ` parallel-compiler ` is true.
29
+
30
+ | data structure | parallel | non-parallel |
31
+ | -------------------------------- | --------------------------------------------------- | ------------ |
32
+ | Lrc | std::sync::Arc | std::rc::Rc |
33
+ | Weak | std::sync::Weak | std::rc::Weak |
34
+ | Atomic{Bool}/{Usize}/{U32}/{U64} | std::sync::atomic::Atomic{Bool}/{Usize}/{U32}/{U64} | (Cell\< Bool/Usize/U32/U64>) |
35
+ | OnceCell | std::sync::OnceLock | std::cell::OnceCell |
36
+ | Lock\< T> | (parking_lot::Mutex\< T>) | (std::cell::RefCell) |
37
+ | RwLock\< T> | (parking_lot::RwLock\< T>) | (std::cell::RefCell) |
38
+ | MTRef<'a, T> | &'a T | &'a mut T |
39
+ | MTLock\< T> | (Lock\< T>) | (T) |
40
+ | ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref |
41
+ | MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref |
42
+ | WriteGuard | MappedWriteGuard | std::cell::RefMut |
43
+ | MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut |
44
+ | LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
45
+ | MappedLockGuard | parking_lot::MappedMutexGuard | std::cell::RefMut |
46
+ | MetadataRef | [ ` OwningRef<Box<dyn Erased + Send + Sync>, [u8]> ` ] [ OwningRef ] | [ ` OwningRef<Box<dyn Erased>, [u8]> ` ] [ OwningRef ] |
47
+
48
+ - There are currently a lot of global data structures that need to be made
49
+ thread-safe. A key strategy here has been converting interior-mutable
50
+ data-structures (e.g. ` Cell ` ) into their thread-safe siblings (e.g. ` Mutex ` ).
51
+
52
+ ### WorkLocal
53
+
54
+ ` WorkLocal ` is a special data structure implemented for parallel compiler.
55
+ It holds worker-locals values for each thread in a thread pool. You can only
56
+ access the worker local value through the Deref impl on the thread pool it
57
+ was constructed on. It will panic otherwise.
58
+
59
+ ` WorkLocal ` is used to implement the ` Arena ` allocator in the parallel
60
+ environment, which is critical in parallel queries. Its implementation
61
+ locals in the ` rustc-rayon-core::worker_local ` module. However, in the
62
+ non-parallel compiler, it is implemented as ` (OneThread<T>) ` , which ` T `
63
+ can be accessed directly through ` Deref::deref ` .
64
+
65
+ ## Parallel Iterator
66
+
67
+ The parallel iterators provided by the [ ` rayon ` ] crate are efficient
68
+ ways to achieve parallelization. The current nightly rustc uses (a custom
69
+ ork of) [ ` rayon ` ] to run tasks in parallel. The custom fork allows the
70
+ execution of DAGs of tasks, not just trees.
71
+
72
+ Some iterator functions are implemented in the current nightly compiler to
73
+ run loops in parallel when ` parallel-compiler ` is true.
74
+
75
+ | Function(Omit ` Send ` and ` Sync ` ) | Introduction | Owning Module |
76
+ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- |
77
+ | ** par_iter** <T: IntoParallelIterator>(t: T) -> T::Iter | generate a parallel iterator | rustc_data_structure::sync |
78
+ | ** par_for_each_in** <T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run ` for_each ` on each element | rustc_data_structure::sync |
79
+ | ** Map::par_body_owners** (self, f: impl Fn(LocalDefId)) | run ` f ` on all hir owners in the crate | rustc_middle::hir::map |
80
+ | ** Map::par_for_each_module** (self, f: impl Fn(LocalDefId)) | run ` f ` on all modules and sub modules in the crate | rustc_middle::hir::map |
81
+ | ** ModuleItems::par_items** (&self, f: impl Fn(ItemId)) | run ` f ` on all items in the module | rustc_middle::hir |
82
+ | ** ModuleItems::par_trait_items** (&self, f: impl Fn(TraitItemId)) | run ` f ` on all trait items in the module | rustc_middle::hir |
83
+ | ** ModuleItems::par_impl_items** (&self, f: impl Fn(ImplItemId)) | run ` f ` on all impl items in the module | rustc_middle::hir |
84
+ | ** ModuleItems::par_foreign_items** (&self, f: impl Fn(ForeignItemId)) | run ` f ` on all foreign items in the module | rustc_middle::hir |
85
+
86
+ There are a lot of loops in the compiler which can possibly be
87
+ parallelized using these functions. As of <!-- date-check--> August
88
+ 2022, scenarios where the parallel iterator function has been used
89
+ are as follows:
90
+
91
+ | caller | scenario | callee |
92
+ | ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ |
93
+ | rustc_metadata::rmeta::encoder::prefetch_mir | Prefetch queries which will be needed later by metadata encoding | par_iter |
94
+ | rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in |
95
+ | rustc_interface::passes::analysis | Check the validity of the match statements | Map::par_body_owners |
96
+ | rustc_interface::passes::analysis | MIR borrow check | Map::par_body_owners |
97
+ | rustc_typeck::check::typeck_item_bodies | Type check | Map::par_body_owners |
98
+ | rustc_interface::passes::hir_id_validator::check_crate | Check the validity of hir | Map::par_for_each_module |
99
+ | rustc_interface::passes::analysis | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module |
100
+ | rustc_interface::passes::analysis | Liveness and intrinsic checking of MIR | Map::par_for_each_module |
101
+ | rustc_interface::passes::analysis | Deathness checking | Map::par_for_each_module |
102
+ | rustc_interface::passes::analysis | Privacy checking | Map::par_for_each_module |
103
+ | rustc_lint::late::check_crate | Run per-module lints | Map::par_for_each_module |
104
+ | rustc_typeck::check_crate | well formed checking | Map::par_for_each_module |
105
+
106
+ And There are still many loops that have the potential to use
107
+ parallel iterators.
108
+
32
109
## Query System
33
110
34
111
The query model has some properties that make it actually feasible to evaluate
@@ -48,9 +125,12 @@ When a query `foo` is evaluated, the cache table for `foo` is locked.
48
125
start evaluating.
49
126
- If there * is* another query invocation for the same key in progress, we
50
127
release the lock, and just block the thread until the other invocation has
51
- computed the result we are waiting for. This cannot deadlock because, as
52
- mentioned before, query invocations form a DAG. Some threads will always make
53
- progress.
128
+ computed the result we are waiting for. ** Deadlocks are possible** , in which
129
+ case ` rustc_query_system::query::job::deadlock() ` will be called to detect
130
+ and remove the deadlock and then return cycle error as the query result.
131
+
132
+ Parallel query still has a lot of work to do, most of which are related to
133
+ the previous ` Data Structures ` and ` Parallel Iterators ` . See [ this tracking issue] [ tracking ] .
54
134
55
135
## Rustdoc
56
136
@@ -64,18 +144,6 @@ As of <!-- date-check --> May 2022, work on explicitly parallelizing the
64
144
compiler has stalled. There is a lot of design and correctness work that needs
65
145
to be done.
66
146
67
- These are the basic ideas in the effort to make ` rustc ` parallel:
68
-
69
- - There are a lot of loops in the compiler that just iterate over all items in
70
- a crate. These can possibly be parallelized.
71
- - We can use (a custom fork of) [ ` rayon ` ] to run tasks in parallel. The custom
72
- fork allows the execution of DAGs of tasks, not just trees.
73
- - There are currently a lot of global data structures that need to be made
74
- thread-safe. A key strategy here has been converting interior-mutable
75
- data-structures (e.g. ` Cell ` ) into their thread-safe siblings (e.g. ` Mutex ` ).
76
-
77
- [ `rayon` ] : https://crates.io/crates/rayon
78
-
79
147
As of <!-- date-check --> May 2022, much of this effort is on hold due
80
148
to lack of manpower. We have a working prototype with promising performance
81
149
gains in many cases. However, there are two blockers:
@@ -93,8 +161,8 @@ are a bit out of date):
93
161
- [ This IRLO thread by Zoxc, one of the pioneers of the effort] [ irlo0 ]
94
162
- [ This list of interior mutability in the compiler by nikomatsakis] [ imlist ]
95
163
- [ This IRLO thread by alexchricton about performance] [ irlo1 ]
96
- - [ This tracking issue] [ tracking ]
97
164
165
+ [ `rayon` ] : https://crates.io/crates/rayon
98
166
[ irlo0 ] : https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
99
167
[ imlist ] : https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
100
168
[ irlo1 ] : https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
0 commit comments