Skip to content

Commit d468b7c

Browse files
committed
Document the task watching / exit code propagation implementation.
1 parent c678b22 commit d468b7c

File tree

1 file changed

+85
-1
lines changed

1 file changed

+85
-1
lines changed

src/libstd/rt/kill.rs

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ observed by the parent of a task::try task that itself spawns child tasks
2020
(such as any #[test] function). In both cases the data structures live in
2121
KillHandle.
2222
23+
2324
I. Task killing.
2425
2526
The model for killing involves two atomic flags, the "kill flag" and the
@@ -60,9 +61,92 @@ killer does perform both writes, it means it saw a KILL_RUNNING in the
6061
unkillable flag, which means an unkillable task will see KILL_KILLED and fail
6162
immediately (rendering the subsequent write to the kill flag unnecessary).
6263
64+
6365
II. Exit code propagation.
6466
65-
FIXME(#7544): Decide on the ultimate model for this and document it.
67+
The basic model for exit code propagation, which is used with the "watched"
68+
spawn mode (on by default for linked spawns, off for supervised and unlinked
69+
spawns), is that a parent will wait for all its watched children to exit
70+
before reporting whether it succeeded or failed. A watching parent will only
71+
report success if it succeeded and all its children also reported success;
72+
otherwise, it will report failure. This is most useful for writing test cases:
73+
74+
~~~
75+
#[test]
76+
fn test_something_in_another_task {
77+
do spawn {
78+
assert!(collatz_conjecture_is_false());
79+
}
80+
}
81+
~~~
82+
83+
Here, as the child task will certainly outlive the parent task, we might miss
84+
the failure of the child when deciding whether or not the test case passed.
85+
The watched spawn mode avoids this problem.
86+
87+
In order to propagate exit codes from children to their parents, any
88+
'watching' parent must wait for all of its children to exit before it can
89+
report its final exit status. We achieve this by using an UnsafeArc, using the
90+
reference counting to track how many children are still alive, and using the
91+
unwrap() operation in the parent's exit path to wait for all children to exit.
92+
The UnsafeArc referred to here is actually the KillHandle itself.
93+
94+
This also works transitively, as if a "middle" watched child task is itself
95+
watching a grandchild task, the "middle" task will do unwrap() on its own
96+
KillHandle (thereby waiting for the grandchild to exit) before dropping its
97+
reference to its watching parent (which will alert the parent).
98+
99+
While UnsafeArc::unwrap() accomplishes the synchronization, there remains the
100+
matter of reporting the exit codes themselves. This is easiest when an exiting
101+
watched task has no watched children of its own:
102+
103+
- If the task with no watched children exits successfully, it need do nothing.
104+
- If the task with no watched children has failed, it sets a flag in the
105+
parent's KillHandle ("any_child_failed") to false. It then stays false forever.
106+
107+
However, if a "middle" watched task with watched children of its own exits
108+
before its child exits, we need to ensure that the grandparent task may still
109+
see a failure from the grandchild task. While we could achieve this by having
110+
each intermediate task block on its handle, this keeps around the other resources
111+
the task was using. To be more efficient, this is accomplished via "tombstones".
112+
113+
A tombstone is a closure, ~fn() -> bool, which will perform any waiting necessary
114+
to collect the exit code of descendant tasks. In its environment is captured
115+
the KillHandle of whichever task created the tombstone, and perhaps also any
116+
tombstones that that task itself had, and finally also another tombstone,
117+
effectively creating a lazy-list of heap closures.
118+
119+
When a child wishes to exit early and leave tombstones behind for its parent,
120+
it must use a LittleLock (pthread mutex) to synchronize with any possible
121+
sibling tasks which are trying to do the same thing with the same parent.
122+
However, on the other side, when the parent is ready to pull on the tombstones,
123+
it need not use this lock, because the unwrap() serves as a barrier that ensures
124+
no children will remain with references to the handle.
125+
126+
The main logic for creating and assigning tombstones can be found in the
127+
function reparent_children_to() in the impl for KillHandle.
128+
129+
130+
IIA. Issues with exit code propagation.
131+
132+
There are two known issues with the current scheme for exit code propagation.
133+
134+
- As documented in issue #8136, the structure mandates the possibility for stack
135+
overflow when collecting tombstones that are very deeply nested. This cannot
136+
be avoided with the closure representation, as tombstones end up structured in
137+
a sort of tree. However, notably, the tombstones do not actually need to be
138+
collected in any particular order, and so a doubly-linked list may be used.
139+
However we do not do this yet because DList is in libextra.
140+
141+
- A discussion with Graydon made me realize that if we decoupled the exit code
142+
propagation from the parents-waiting action, this could result in a simpler
143+
implementation as the exit codes themselves would not have to be propagated,
144+
and could instead be propagated implicitly through the taskgroup mechanism
145+
that we already have. The tombstoning scheme would still be required. I have
146+
not implemented this because currently we can't receive a linked failure kill
147+
signal during the task cleanup activity, as that is currently "unkillable",
148+
and occurs outside the task's unwinder's "try" block, so would require some
149+
restructuring.
66150
67151
*/
68152

0 commit comments

Comments
 (0)