|
1 |
| -# The SLG solver |
| 1 | +# The On-Demand SLG solver |
2 | 2 |
|
3 |
| -TODO: <https://github.com/rust-lang-nursery/chalk/blob/master/chalk-engine/src/README.md> |
| 3 | +## Description of how it works |
| 4 | + |
| 5 | +The basis of the solver is the `Forest` type. A *forest* stores a |
| 6 | +collection of *tables* as well as a *stack*. Each *table* represents |
| 7 | +the stored results of a particular query that is being performed, as |
| 8 | +well as the various *strands*, which are basically suspended |
| 9 | +computations that may be used to find more answers. Tables are |
| 10 | +interdependent: solving one query may require solving others. |
| 11 | + |
| 12 | +### Walkthrough |
| 13 | + |
| 14 | +Perhaps the easiest way to explain how the solver works is to walk |
| 15 | +through an example. Let's imagine that we have the following program: |
| 16 | + |
| 17 | +```rust |
| 18 | +trait Debug { } |
| 19 | + |
| 20 | +struct u32 { } |
| 21 | +impl Debug for u32 { } |
| 22 | + |
| 23 | +struct Rc<T> { } |
| 24 | +impl<T: Debug> Debug for Rc<T> { } |
| 25 | + |
| 26 | +struct Vec<T> { } |
| 27 | +impl<T: Debug> Debug for Vec<T> { } |
| 28 | +``` |
| 29 | + |
| 30 | +Now imagine that we want to find answers for the query `exists<T> { |
| 31 | +Rc<T>: Debug }`. The first step would be to u-canonicalize this query; this |
| 32 | +is the act of giving canonical names to all the unbound inference variables based on the |
| 33 | +order of their left-most appearance, as well as canonicalizing the universes of any |
| 34 | +universally bound names (e.g., the `T` in `forall<T> { ... }`). In this case, there are no |
| 35 | +universally bound names, but the canonical form Q of the query might look something like: |
| 36 | + |
| 37 | + Rc<?0>: Debug |
| 38 | + |
| 39 | +where `?0` is a variable in the root universe U0. We would then go and |
| 40 | +look for a table with this as the key: since the forest is empty, this |
| 41 | +lookup will fail, and we will create a new table T0, corresponding to |
| 42 | +the u-canonical goal Q. |
| 43 | + |
| 44 | +**Ignoring negative reasoning and regions.** To start, we'll ignore |
| 45 | +the possibility of negative goals like `not { Foo }`. We'll phase them |
| 46 | +in later, as they bring several complications. |
| 47 | + |
| 48 | +**Creating a table.** When we first create a table, we also initialize |
| 49 | +it with a set of *initial strands*. A "strand" is kind of like a |
| 50 | +"thread" for the solver: it contains a particular way to produce an |
| 51 | +answer. The initial set of strands for a goal like `Rc<?0>: Debug` |
| 52 | +(i.e., a "domain goal") is determined by looking for *clauses* in the |
| 53 | +environment. In Rust, these clauses derive from impls, but also from |
| 54 | +where-clauses that are in scope. In the case of our example, there |
| 55 | +would be three clauses, each coming from the program. Using a |
| 56 | +Prolog-like notation, these look like: |
| 57 | + |
| 58 | +``` |
| 59 | +(u32: Debug). |
| 60 | +(Rc<T>: Debug) :- (T: Debug). |
| 61 | +(Vec<T>: Debug) :- (T: Debug). |
| 62 | +``` |
| 63 | + |
| 64 | +To create our initial strands, then, we will try to apply each of |
| 65 | +these clauses to our goal of `Rc<?0>: Debug`. The first and third |
| 66 | +clauses are inapplicable because `u32` and `Vec<?0>` cannot be unified |
| 67 | +with `Rc<?0>`. The second clause, however, will work. |
| 68 | + |
| 69 | +**What is a strand?** Let's talk a bit more about what a strand *is*. In the code, a strand |
| 70 | +is the combination of an inference table, an X-clause, and (possibly) |
| 71 | +a selected subgoal from that X-clause. But what is an X-clause |
| 72 | +(`ExClause`, in the code)? An X-clause pulls together a few things: |
| 73 | + |
| 74 | +- The current state of the goal we are trying to prove; |
| 75 | +- A set of subgoals that have yet to be proven; |
| 76 | +- There are also a few things we're ignoring for now: |
| 77 | + - delayed literals, region constraints |
| 78 | + |
| 79 | +The general form of an X-clause is written much like a Prolog clause, |
| 80 | +but with somewhat different semantics. Since we're ignoring delayed |
| 81 | +literals and region constraints, an X-clause just looks like this: |
| 82 | + |
| 83 | + G :- L |
| 84 | + |
| 85 | +where G is a goal and L is a set of subgoals that must be proven. |
| 86 | +(The L stands for *literal* -- when we address negative reasoning, a |
| 87 | +literal will be either a positive or negative subgoal.) The idea is |
| 88 | +that if we are able to prove L then the goal G can be considered true. |
| 89 | + |
| 90 | +In the case of our example, we would wind up creating one strand, with |
| 91 | +an X-clause like so: |
| 92 | + |
| 93 | + (Rc<?T>: Debug) :- (?T: Debug) |
| 94 | + |
| 95 | +Here, the `?T` refers to one of the inference variables created in the |
| 96 | +inference table that accompanies the strand. (I'll use named variables |
| 97 | +to refer to inference variables, and numbered variables like `?0` to |
| 98 | +refer to variables in a canonicalized goal; in the code, however, they |
| 99 | +are both represented with an index.) |
| 100 | + |
| 101 | +For each strand, we also optionally store a *selected subgoal*. This |
| 102 | +is the subgoal after the turnstile (`:-`) that we are currently trying |
| 103 | +to prove in this strand. Initally, when a strand is first created, |
| 104 | +there is no selected subgoal. |
| 105 | + |
| 106 | +**Activating a strand.** Now that we have created the table T0 and |
| 107 | +initialized it with strands, we have to actually try and produce an |
| 108 | +answer. We do this by invoking the `ensure_answer` operation on the |
| 109 | +table: specifically, we say `ensure_answer(T0, A0)`, meaning "ensure |
| 110 | +that there is a 0th answer". |
| 111 | + |
| 112 | +Remember that tables store not only strands, but also a vector of |
| 113 | +cached answers. The first thing that `ensure_answer` does is to check |
| 114 | +whether answer 0 is in this vector. If so, we can just return |
| 115 | +immediately. In this case, the vector will be empty, and hence that |
| 116 | +does not apply (this becomes important for cyclic checks later on). |
| 117 | + |
| 118 | +When there is no cached answer, `ensure_answer` will try to produce |
| 119 | +one. It does this by selecting a strand from the set of active |
| 120 | +strands -- the strands are stored in a `VecDeque` and hence processed |
| 121 | +in a round-robin fashion. Right now, we have only one strand, storing |
| 122 | +the following X-clause with no selected subgoal: |
| 123 | + |
| 124 | + (Rc<?T>: Debug) :- (?T: Debug) |
| 125 | + |
| 126 | +When we activate the strand, we see that we have no selected subgoal, |
| 127 | +and so we first pick one of the subgoals to process. Here, there is only |
| 128 | +one (`?T: Debug`), so that becomes the selected subgoal, changing |
| 129 | +the state of the strand to: |
| 130 | + |
| 131 | + (Rc<?T>: Debug) :- selected(?T: Debug, A0) |
| 132 | + |
| 133 | +Here, we write `selected(L, An)` to indicate that (a) the literal `L` |
| 134 | +is the selected subgoal and (b) which answer `An` we are looking for. We |
| 135 | +start out looking for `A0`. |
| 136 | + |
| 137 | +**Processing the selected subgoal.** Next, we have to try and find an |
| 138 | +answer to this selected goal. To do that, we will u-canonicalize it |
| 139 | +and try to find an associated table. In this case, the u-canonical |
| 140 | +form of the subgoal is `?0: Debug`: we don't have a table yet for |
| 141 | +that, so we can create a new one, T1. As before, we'll initialize T1 |
| 142 | +with strands. In this case, there will be three strands, because all |
| 143 | +the program clauses are potentially applicable. Those three strands |
| 144 | +will be: |
| 145 | + |
| 146 | +- `(u32: Debug) :-`, derived from the program clause `(u32: Debug).`. |
| 147 | + - Note: This strand has no subgoals. |
| 148 | +- `(Vec<?U>: Debug) :- (?U: Debug)`, derived from the `Vec` impl. |
| 149 | +- `(Rc<?U>: Debug) :- (?U: Debug)`, derived from the `Rc` impl. |
| 150 | + |
| 151 | +We can thus summarize the state of the whole forest at this point as |
| 152 | +follows: |
| 153 | + |
| 154 | +``` |
| 155 | +Table T0 [Rc<?0>: Debug] |
| 156 | + Strands: |
| 157 | + (Rc<?T>: Debug) :- selected(?T: Debug, A0) |
| 158 | + |
| 159 | +Table T1 [?0: Debug] |
| 160 | + Strands: |
| 161 | + (u32: Debug) :- |
| 162 | + (Vec<?U>: Debug) :- (?U: Debug) |
| 163 | + (Rc<?V>: Debug) :- (?V: Debug) |
| 164 | +``` |
| 165 | + |
| 166 | +**Delegation between tables.** Now that the active strand from T0 has |
| 167 | +created the table T1, it can try to extract an answer. It does this |
| 168 | +via that same `ensure_answer` operation we saw before. In this case, |
| 169 | +the strand would invoke `ensure_answer(T1, A0)`, since we will start |
| 170 | +with the first answer. This will cause T1 to activate its first |
| 171 | +strand, `u32: Debug :-`. |
| 172 | + |
| 173 | +This strand is somewhat special: it has no subgoals at all. This means |
| 174 | +that the goal is proven. We can therefore add `u32: Debug` to the set |
| 175 | +of *answers* for our table, calling it answer A0 (it is the first |
| 176 | +answer). The strand is then removed from the list of strands. |
| 177 | + |
| 178 | +The state of table T1 is therefore: |
| 179 | + |
| 180 | +``` |
| 181 | +Table T1 [?0: Debug] |
| 182 | + Answers: |
| 183 | + A0 = [?0 = u32] |
| 184 | + Strand: |
| 185 | + (Vec<?U>: Debug) :- (?U: Debug) |
| 186 | + (Rc<?V>: Debug) :- (?V: Debug) |
| 187 | +``` |
| 188 | + |
| 189 | +Note that I am writing out the answer A0 as a substitution that can be |
| 190 | +applied to the table goal; actually, in the code, the goals for each |
| 191 | +X-clause are also represented as substitutions, but in this exposition |
| 192 | +I've chosen to write them as full goals, following NFTD. |
| 193 | + |
| 194 | +Since we now have an answer, `ensure_answer(T1, A0)` will return `Ok` |
| 195 | +to the table T0, indicating that answer A0 is available. T0 now has |
| 196 | +the job of incorporating that result into its active strand. It does |
| 197 | +this in two ways. First, it creates a new strand that is looking for |
| 198 | +the next possible answer of T1. Next, it incorpoates the answer from |
| 199 | +A0 and removes the subgoal. The resulting state of table T0 is: |
| 200 | + |
| 201 | +``` |
| 202 | +Table T0 [Rc<?0>: Debug] |
| 203 | + Strands: |
| 204 | + (Rc<?T>: Debug) :- selected(?T: Debug, A1) |
| 205 | + (Rc<u32>: Debug) :- |
| 206 | +``` |
| 207 | + |
| 208 | +We then immediately activate the strand that incorporated the answer |
| 209 | +(the `Rc<u32>: Debug` one). In this case, that strand has no further |
| 210 | +subgoals, so it becomes an answer to the table T0. This answer can |
| 211 | +then be returned up to our caller, and the whole forest goes quiescent |
| 212 | +at this point (remember, we only do enough work to generate *one* |
| 213 | +answer). The ending state of the forest at this point will be: |
| 214 | + |
| 215 | +``` |
| 216 | +Table T0 [Rc<?0>: Debug] |
| 217 | + Answer: |
| 218 | + A0 = [?0 = u32] |
| 219 | + Strands: |
| 220 | + (Rc<?T>: Debug) :- selected(?T: Debug, A1) |
| 221 | +
|
| 222 | +Table T1 [?0: Debug] |
| 223 | + Answers: |
| 224 | + A0 = [?0 = u32] |
| 225 | + Strand: |
| 226 | + (Vec<?U>: Debug) :- (?U: Debug) |
| 227 | + (Rc<?V>: Debug) :- (?V: Debug) |
| 228 | +``` |
| 229 | + |
| 230 | +Here you can see how the forest captures both the answers we have |
| 231 | +created thus far *and* the strands that will let us try to produce |
| 232 | +more answers later on. |
| 233 | + |
| 234 | +## Heritage and acroynms |
| 235 | + |
| 236 | +This solver implements the SLG solving technique, though extended to |
| 237 | +accommodate hereditary harrop (HH) predicates, as well as the needs of |
| 238 | +lazy normalization. |
| 239 | + |
| 240 | +Its design is kind of a fusion of [MiniKanren] and the following |
| 241 | +papers, which I will refer to as EWFS and NTFD respectively: |
| 242 | + |
| 243 | +> Efficient Top-Down Computation of Queries Under the Well-formed Semantics |
| 244 | +> (Chen, Swift, and Warren; Journal of Logic Programming '95) |
| 245 | +
|
| 246 | +> A New Formulation of Tabled resolution With Delay |
| 247 | +> (Swift; EPIA '99) |
| 248 | +
|
| 249 | +[MiniKanren]: http://minikanren.org/ |
| 250 | + |
| 251 | +In addition, I incorporated extensions from the following papers, |
| 252 | +which I will refer to as SA and RR respectively, that describes how to |
| 253 | +do introduce approximation when processing subgoals and so forth: |
| 254 | + |
| 255 | +> Terminating Evaluation of Logic Programs with Finite Three-Valued Models |
| 256 | +> Riguzzi and Swift; ACM Transactions on Computational Logic 2013 |
| 257 | +> (Introduces "subgoal abstraction", hence the name SA) |
| 258 | +> |
| 259 | +> Radial Restraint |
| 260 | +> Grosof and Swift; 2013 |
| 261 | +
|
| 262 | +Another useful paper that gives a kind of high-level overview of |
| 263 | +concepts at play is the following: |
| 264 | + |
| 265 | +> XSB: Extending Prolog with Tabled Logic Programming |
| 266 | +> (Swift and Warren; Theory and Practice of Logic Programming '10) |
| 267 | +
|
| 268 | +There are a places where I intentionally diverged from the semantics |
| 269 | +as described in the papers -- e.g. by more aggressively approximating |
| 270 | +-- which I marked them with a comment DIVERGENCE. Those places may |
| 271 | +want to be evaluated in the future. |
| 272 | + |
| 273 | +A few other acronyms that I use: |
| 274 | + |
| 275 | +- WAM: Warren abstract machine, an efficient way to evaluate Prolog programs. |
| 276 | + See <http://wambook.sourceforge.net/>. |
| 277 | +- HH: Hereditary harrop predicates. What Chalk deals in. |
| 278 | + Popularized by Lambda Prolog. |
0 commit comments