Skip to content

Add Karrq's salsa chapter #529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion book.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ level = 1

[output.linkcheck]
follow-web-links = true
exclude = [ "crates\\.io", "gcc\\.godbolt\\.org", "youtube\\.com", "dl\\.acm\\.org" ]
exclude = [ "crates\\.io", "gcc\\.godbolt\\.org", "youtube\\.com", "youtu\\.be", "dl\\.acm\\.org" ]
cache-timeout = 172800
warning-policy = "error"
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
- [Incremental compilation](./queries/incremental-compilation.md)
- [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md)
- [Debugging and Testing](./incrcomp-debugging.md)
- [Salsa](./salsa.md)
- [Lexing and Parsing](./the-parser.md)
- [`#[test]` Implementation](./test-implementation.md)
- [Macro expansion](./macro-expansion.md)
Expand Down
214 changes: 214 additions & 0 deletions src/salsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# How Salsa works

This chapter is based on the explanation given by Niko Matsakis in this
[video](https://www.youtube.com/watch?v=_muY4HjSqVw) about
[Salsa](https://github.com/salsa-rs/salsa).

> Salsa is not used directly in rustc, but it is used extensively for
> rust-analyzer and may be integrated into the compiler in the future.

## What is Salsa?

Salsa is a library for incremental recomputation, this means reusing
computation that has already been done in the past to increase the efficiency
of future computations.

The objectives of Salsa are:
* Provide that functionality in an automatic way, so reusing old computations
is done automatically by the library
* Doing so in a "sound", or "correct", way, therefore leading to the same
results as if it had been done from scratch

Salsa's actual model is much richer, allowing many kinds of inputs and many
different outputs.
For example, integrating Salsa with an IDE could mean that the inputs could be
the manifest (`Cargo.toml`), entire source files (`foo.rs`), snippets and so
on; the outputs of such an integration could range from a binary executable, to
lints, types (for example, if a user selects a certain variable and wishes to
see its type), completions, etc.

## How does it work?

The first thing that Salsa has to do is identify the "base inputs" [^EN1].

Then Salsa has to also identify intermediate, "derived" values, which are
something that the library produces, but, for each derived value there's a
"pure" function that computes the derived value.

For example, there might be a function `ast(x: Path) -> AST`. The produced
`AST` isn't a final value, it's an intermidiate value that the library would
use for the computation.

This means that when you try to compute with the library, Salsa is going to
compute various derived values, and eventually read the input and produce the
result for the asked computation.

In the course of computing, Salsa tracks which inputs were accessed and which
values are derived. This information is used to determine what's going to
happen when the inputs change: are the derived values still valid?

This doesn't necessarily mean that each computation downstream from the input
is going to be checked, which could be costly. Salsa only needs to check each
downstream computation until it finds one that isn't changed. At that point, it
won't check other derived computations since they wouldn't need to change.

It's is helpful to think about this as a graph with nodes. Each derived value
has a dependency on other values, which could themselves be either base or
derived. Base values don't have a dependency.

```ignore
I <- A <- C ...
|
J <- B <--+
```

When an input `I` changes, the derived value `A` could change. The derived
value `B` , which does not depend on `I`, `A`, or any value derived from `A` or
`I`, is not subject to change. Therefore, Salsa can reuse the computation done
for `B` in the past, without having to compute it again.

The computation could also terminate early. Keeping the same graph as before,
say that input `I` has changed in some way (and input `J` hasn't) but, when
computing `A` again, it's found that `A` hasn't changed from the previous
computation. This leads to an "early termination", because there's no need to
check if `C` needs to change, since both `C` direct inputs, `A` and `B`,
haven't changed.

## Key Salsa concepts

### Query

A query is some value that Salsa can access in the course of computation. Each
query can have a number of keys (from 0 to many), and all queries have a
result, akin to functions. 0-key queries are called "input" queries.

### Database

The database is basically the context for the entire computation, it's meant to
store Salsa's internal state, all intermediate values for each query, and
anything else that the computation might need. The database must know all the
queries that the library is going to do before it can be built, but they don't
need to be specified in the same place.

After the database is formed, it can be accessed with queries that are very
similar to functions. Since each query's result is stored in the database,
when a query is invoked N times, it will return N **cloned** results, without
having to recompute the query (unless the input has changed in such a way that
it warrants recomputation).

For each input query (0-key), a "set" method is generated, allowing the user to
change the output of such query, and trigger previous memoized values to be
potentially invalidated.

### Query Groups

A query group is a set of queries which have been defined together as a unit.
The database is formed by combining query groups. Query groups are akin to
"Salsa modules" [^EN2].

A set of queries in a query group are just a set of methods in a trait.

To create a query group a trait annotated with a specific attribute
(`#[salsa::query_group(...)]`) has to be created.

An argument must also be provided to said attribute as it will be used by Salsa
to create a struct to be used later when the database is created.

Example input query group:

```rust,ignore
/// This attribute will process this tree, produce this tree as output, and produce
/// a bunch of intermidiate stuff that Salsa also uses. One of these things is a
/// "StorageStruct", whose name we have specified in the attribute.
///
/// This query group is a bunch of **input** queries, that do not rely on any
/// derived input.
#[salsa::query_group(InputsStorage)]
pub trait Inputs {
/// This attribute (`#[salsa::input]`) indicates that this query is a base
/// input, therefore `set_manifest` is going to be auto-generated
#[salsa::input]
fn manifest(&self) -> Manifest;

#[salsa::input]
fn source_text(&self, name: String) -> String;
}
```

To create a **derived** query group, one must specify which other query groups
this one depends on by specifying them as supertraits, as seen in the following
example:

```rust,ignore
/// This query group is going to contain queries that depend on derived values a
/// query group can access another query group's queries by specifying the
/// dependency as a super trait query groups can be stacked as much as needed using
/// that pattern.
#[salsa::query_group(ParserStorage)]
pub trait Parser: Inputs {
/// This query `ast` is not an input query, it's a derived query this means
/// that a definition is necessary.
fn ast(&self, name: String) -> String;
}
```

When creating a derived query the implementation of said query must be defined
outside the trait. The definition must take a database parameter as an `impl
Trait` (or `dyn Trait`), where `Trait` is the query group that the definition
belongs to, in addition to the other keys.

```rust,ignore
///This is going to be the definition of the `ast` query in the `Parser` trait.
///So, when the query `ast` is invoked, and it needs to be recomputed, Salsa is going to call this function
///and it's is going to give it the database as `impl Parser`.
///The function doesn't need to be aware of all the queries of all the query groups
fn ast(db: &impl Parser, name: String) -> String {
//! Note, `impl Parser` is used here but `dyn Parser` works just as well
/* code */
///By passing an `impl Parser`, this is allowed
let source_text = db.input_file(name);
/* do the actual parsing */
return ast;
}
```

Eventually, after all the query groups have been defined, the database can be
created by declaring a struct.

To specify which query groups are going to be part of the database an attribute
(`#[salsa::database(...)]`) must be added. The argument of said attribute is a
list of identifiers, specifying the query groups **storages**.

```rust,ignore
///This attribute specifies which query groups are going to be in the database
#[salsa::database(InputsStorage, ParserStorage)]
#[derive(Default)] //optional!
struct MyDatabase {
///You also need this one field
runtime : salsa::Runtime<MyDatabase>,
}
///And this trait has to be implemented
impl salsa::Databse for MyDatabase {
fn salsa_runtime(&self) -> &salsa::Runtime<MyDatabase> {
&self.runtime
}
}
```

Example usage:

```rust,ignore
fn main() {
let db = MyDatabase::default();
db.set_manifest(...);
db.set_source_text(...);
loop {
db.ast(...); //will reuse results
db.set_source_text(...);
}
}
```

[^EN1]: "They are not something that you **inaubible** but something that you kinda get **inaudible** from the outside [3:23](https://youtu.be/_muY4HjSqVw?t=203).

[^EN2]: What is a Salsa module?