Description
The NLL RFC goes to some lengths to discuss how to phrase NLL error messages. In particular, we want to adopt the so-called "three point" style:
error[E0506]: cannot write to `i` while borrowed
--> <anon>:4:5
|
3 | let x = &i;
| - (shared) borrow of `i` occurs here
4 | i += 1;
| ^^^^^^ write to `i` occurs here, while borrow is still active
5 | println!("{}", x);
| - borrow is later used here
This is going to require a bit of work. Right now, we easily have two of those points: the borrow checker has identified a borrow, and it knows where it occurred (that's the "borrow occurs here" point). It has also identified an access, and it knows where that occurs (that's the "write occurs here" point). What it does not know is the "borrow is later used here" point. That's a bit harder for us to find.
Right now, what we have readily available is just the region of the borrow. This indicates all the points in the control-flow graph where the borrow is still "live" (potentially in use), but it does not indicate why the borrow is potentially in use. It's the why (i.e., what use caused us to consider this point as part of the region) that we are interested in here.
Still, we have all the information we theoretically need. We have the region inference context, which contains the full set of constraints that we used to find the regions, and we have the MIR itself. (This is intentional: in the lexical checker, we only kept the final results of inference, and not the constraints that led to those results, which sometimes hobbled our ability to issue errors we wanted.)
However, it's still a bit of an open question how best to make use of those constraints to find the point in question. We could probably do some kind of graph-search heuristics that would lead us to the right point most of the time. I've also been considering another idea: we could re-run inference, but this time use different data structures. Rather than treating regions as simple sets of points, we would consider a region to be a set of (P, U)
tuples. Here P
is the liveness point, but U
is some use of a variable (i.e., probably itself a pair (Pu, X)
of a point of use and a variable name). The idea is that the region R
contains the point P
because of the use U
. We'd have to extend liveness to track not only which variables are live, but what use U
makes them live at that point.
The rest of inference is effectively unchanged, except that we track these pairs, so that whenever we add a point to a region, we also track the variable that made that point live (ultimately, every point in any region stems from some live use). This would be more expensive -- more data! -- but we're only doing it in the case of error. We could also do a more targeted form of inference, limiting ourselves to the regions that are in some way connected to the borrow region.
If we did that, then ultimately the borrow region would be inferred to contain some number of (P, U)
tuples where P
is the point of access and the use U
is the third point we want to highlight.
What is appealing about this is that it is a very precise, very general analysis. What is unappealing is that it requires reproducing a lot of code. But we might be able to factor out most of it with generics so that it's not so much re-use.
If we don't do this, I'm not sure quite sure what else we would do. I can imagine some heuristics search -- basically making a graph that shows which regions are connected to what and then searching for a use of some variable X that contains a region P that is connected in some way to the borrow region -- but I haven't thought of anything else that yields the right results without essentially reproducing the analysis in some way. That heuristic search might or might not be good enough; I'd really prefer not to give wrong results to the user.