@@ -20,6 +20,7 @@ observed by the parent of a task::try task that itself spawns child tasks
20
20
(such as any #[test] function). In both cases the data structures live in
21
21
KillHandle.
22
22
23
+
23
24
I. Task killing.
24
25
25
26
The model for killing involves two atomic flags, the "kill flag" and the
@@ -60,9 +61,92 @@ killer does perform both writes, it means it saw a KILL_RUNNING in the
60
61
unkillable flag, which means an unkillable task will see KILL_KILLED and fail
61
62
immediately (rendering the subsequent write to the kill flag unnecessary).
62
63
64
+
63
65
II. Exit code propagation.
64
66
65
- FIXME(#7544): Decide on the ultimate model for this and document it.
67
+ The basic model for exit code propagation, which is used with the "watched"
68
+ spawn mode (on by default for linked spawns, off for supervised and unlinked
69
+ spawns), is that a parent will wait for all its watched children to exit
70
+ before reporting whether it succeeded or failed. A watching parent will only
71
+ report success if it succeeded and all its children also reported success;
72
+ otherwise, it will report failure. This is most useful for writing test cases:
73
+
74
+ ~~~
75
+ #[test]
76
+ fn test_something_in_another_task {
77
+ do spawn {
78
+ assert!(collatz_conjecture_is_false());
79
+ }
80
+ }
81
+ ~~~
82
+
83
+ Here, as the child task will certainly outlive the parent task, we might miss
84
+ the failure of the child when deciding whether or not the test case passed.
85
+ The watched spawn mode avoids this problem.
86
+
87
+ In order to propagate exit codes from children to their parents, any
88
+ 'watching' parent must wait for all of its children to exit before it can
89
+ report its final exit status. We achieve this by using an UnsafeArc, using the
90
+ reference counting to track how many children are still alive, and using the
91
+ unwrap() operation in the parent's exit path to wait for all children to exit.
92
+ The UnsafeArc referred to here is actually the KillHandle itself.
93
+
94
+ This also works transitively, as if a "middle" watched child task is itself
95
+ watching a grandchild task, the "middle" task will do unwrap() on its own
96
+ KillHandle (thereby waiting for the grandchild to exit) before dropping its
97
+ reference to its watching parent (which will alert the parent).
98
+
99
+ While UnsafeArc::unwrap() accomplishes the synchronization, there remains the
100
+ matter of reporting the exit codes themselves. This is easiest when an exiting
101
+ watched task has no watched children of its own:
102
+
103
+ - If the task with no watched children exits successfully, it need do nothing.
104
+ - If the task with no watched children has failed, it sets a flag in the
105
+ parent's KillHandle ("any_child_failed") to false. It then stays false forever.
106
+
107
+ However, if a "middle" watched task with watched children of its own exits
108
+ before its child exits, we need to ensure that the grandparent task may still
109
+ see a failure from the grandchild task. While we could achieve this by having
110
+ each intermediate task block on its handle, this keeps around the other resources
111
+ the task was using. To be more efficient, this is accomplished via "tombstones".
112
+
113
+ A tombstone is a closure, ~fn() -> bool, which will perform any waiting necessary
114
+ to collect the exit code of descendant tasks. In its environment is captured
115
+ the KillHandle of whichever task created the tombstone, and perhaps also any
116
+ tombstones that that task itself had, and finally also another tombstone,
117
+ effectively creating a lazy-list of heap closures.
118
+
119
+ When a child wishes to exit early and leave tombstones behind for its parent,
120
+ it must use a LittleLock (pthread mutex) to synchronize with any possible
121
+ sibling tasks which are trying to do the same thing with the same parent.
122
+ However, on the other side, when the parent is ready to pull on the tombstones,
123
+ it need not use this lock, because the unwrap() serves as a barrier that ensures
124
+ no children will remain with references to the handle.
125
+
126
+ The main logic for creating and assigning tombstones can be found in the
127
+ function reparent_children_to() in the impl for KillHandle.
128
+
129
+
130
+ IIA. Issues with exit code propagation.
131
+
132
+ There are two known issues with the current scheme for exit code propagation.
133
+
134
+ - As documented in issue #8136, the structure mandates the possibility for stack
135
+ overflow when collecting tombstones that are very deeply nested. This cannot
136
+ be avoided with the closure representation, as tombstones end up structured in
137
+ a sort of tree. However, notably, the tombstones do not actually need to be
138
+ collected in any particular order, and so a doubly-linked list may be used.
139
+ However we do not do this yet because DList is in libextra.
140
+
141
+ - A discussion with Graydon made me realize that if we decoupled the exit code
142
+ propagation from the parents-waiting action, this could result in a simpler
143
+ implementation as the exit codes themselves would not have to be propagated,
144
+ and could instead be propagated implicitly through the taskgroup mechanism
145
+ that we already have. The tombstoning scheme would still be required. I have
146
+ not implemented this because currently we can't receive a linked failure kill
147
+ signal during the task cleanup activity, as that is currently "unkillable",
148
+ and occurs outside the task's unwinder's "try" block, so would require some
149
+ restructuring.
66
150
67
151
*/
68
152
0 commit comments