A Plan for Making Rust Analyzer Faster

Some background context:
- For a while, I've been looking at and investigating rust-analyzer performance issues. 
- @Wilfred, @darichey, and I all work together, and our employer is paying us to improve rust-analyzer's performance over the next 6 months on a full-time basis. We've determined that rust-analyzer's current performance is the biggest issue faced by Rust programmers at our employer.
    - By listing these issues out, we're not attempting to "lick the cookie" or claim it—it's a lot of work for 3 people, and if anyone gets an issue before we do, please do it! Every rust-analyzer user would benefit with these things happening more quickly.

Anyways, here's non-exhaustive (but decently comprehensive!) list of things that we'd like to fix.

## Non-Parallel, Single-Threaded Improvements

- Rowan is slower than we'd like.
    - A few weeks ago, I benchmarked tree construction in Rowan against [syntree](https://github.com/udoprog/syntree/blob/main/benches/comparisons.rs) (a library with a similar API to rowan, but one that doesn’t implement structural sharing) and observed that tree construction is twice as fast as Rowan.
        - This isn’t the most fair comparison, as Rowan is optimized for editing trees—hence the structural sharing—but a syntax tree library structure with a more contiguous representation of the trees themselves would mean that the initial workspace loading/indexing could be substantially faster.
       - @CAD97’s work and analysis here is pertinent (https://github.com/rust-analyzer/rowan/pull/58; https://github.com/rust-analyzer/rowan/issues/107) and would likely be a building block for future work.
- `macro_rules!` expansion is quadratic. This most visible with tt-munchers ([discussion](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Macro.20Expansion.20Nonsense.20Reproduction/near/437297188)).
- Switch from an LRU cache to a scan-resistant cache (LRU-K)
    - This may or not be impactful, but rust-analyzer implements a limited form of garbage collection in the form of an LRU cache. rust-analyzer also does a lot of scan-like operations, which is a classic pathological case for a naive LRU cache. Replacing the LRU cache with LRU-k (specifically, a scan-resistant cache) might help with how often the syntax tree cache is rebuilt ([discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/LRU-k/near/361042158)).
- Switch the crate graph to a Salsa database.
    - Migrating the crate graph from an arena to a Salsa database would make the workspace reindexing that occurs after a user adds/removes a dependency to their project substantially faster and more incremental.
    - Crate IDs are not currently stable, as they are derived from `cargo-metadata` or a `rust-project.json`. Deriving the crate ID from the crate data itself (via Salsa’s interning infrastructure) will provide a stable ID ([discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer/topic/Planning.20Rust.20Analyzer.20Speedups/near/443775979)).
    - This is also necessary to properly support [Cargo scripts](https://rust-lang.github.io/rfcs/3424-cargo-script.html).
- During project loading, rust-analyzer repeatedly queries the project layout from rustc repeatedly, as there isn’t a clear end-state for project loading. Deferring this until the very end would likely help with loading large Rust workspaces.
    - I did some rough, back-of-napkin benchmarks: rust-analyzer spends 5 seconds waiting on rustc for every 100 crates it loads (established by printing `Instant::elapsed::as_millis`). Given that some Rust projects can easily have of over 1000 dependencies, it’s possible that people are waiting for nearly a minute.
- Moving to Salsa 3.0 _might_ reduce with how many allocations/Arc increments regularly occur because Salsa 3.0 hands references instead of cloned values.
    - I don’t know much of an impact this might have on the overall performance rust-analyzer, but if it does, it’d be because allocations/Arc increments/decrements would be smeared across the entire codebase.
    - From a best practices perspective, we should move to Salsa 3.0 _regardless_, especially if it doesn't regress rust-analyzer's performance.

## Parallel Improvements

From a sequencing standpoint, most parallelism improvements in rust-analyzer should happen after single-threaded improvements occurs, as introducing parallelism without making single-core performance improvements simply papers over existing, resolvable performance issues—I believe it is better to have those improvements compound. Besides, for a large number of these changes below, are waiting on Salsa 3.0, as it will enable trivial parallel computation for all databases (https://github.com/salsa-rs/salsa/issues/495) [^1].
- Parallelize autocomplete
   - rust-analyzer spends a lot of time in Chalk doing trait solving. While https://github.com/rust-lang/rust-analyzer/pull/16555 helps reduce how many traits need to checked, trait solving _should_ be relatively easy to parallelize (assuming an interface similar to the one outlined [here](https://hackmd.io/@nikomatsakis/HJHYSwgLR). Achieving sub-100ms autocomplete latencies, even on larger projects, should be possible on multi-core machines.
- Parallelize VFS loading
    - At the moment, the VFS loads all sources serially (https://github.com/rust-lang/rust-analyzer/issues/17373). rust-analyzer should attempt to saturate disk IO, as I've observed that the disk throughput is substantially less than what I'd expect from rust-analyzer on an SSD. This work does not depend on Salsa 3.0.
    -  As part of this work, we should redesign the VFS (https://github.com/rust-lang/rust-analyzer/issues/18948), with the most likely outcome being moving most or all of the VFS to live in new Salsa. The reason VFS is the way that it is, if I understand correctly, is that the [VFS needs to assign file IDs](https://rust-lang.zulipchat.com/#narrow/channel/185405-t-compiler.2Frust-analyzer/topic/Steering.20meeting/near/368084383)—interning, effectively—but at the time, Salsa was not capable of doing so.

--- 

I'll add more items as they come up, but I figured I should combine the different discussion topics into a single location.

[^1]: The version of Salsa used by rust-analyzer todays requires getting a snapshot from the `ParallelDatabase` trait, and `ParallelDatabase` is only implemented on the `RootDatabase`. Similarly, parallel query dependencies are not legible to Salsa, which complicates the invalidation story.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A Plan for Making Rust Analyzer Faster #17491

Non-Parallel, Single-Threaded Improvements

Parallel Improvements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A Plan for Making Rust Analyzer Faster #17491

Description

Non-Parallel, Single-Threaded Improvements

Parallel Improvements

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions