Skip to content

Commit a0c3e29

Browse files
committed
Blog post: measuring memory usage in Rust
1 parent 4b4490d commit a0c3e29

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
= Measuring Memory Usage in Rust
2+
@matklad
3+
:sectanchors:
4+
:experimental:
5+
:page-layout: post
6+
7+
****
8+
rust-analyzer is a new "IDE backend" for the https://www.rust-lang.org/[Rust] programming language.
9+
Support rust-analyzer on https://opencollective.com/rust-analyzer/[Open Collective] or https://github.com/sponsors/rust-analyzer[GitHub Sponsors].
10+
****
11+
12+
This post documents a couple of fun tricks we use in rust-analyzer for measuring memory consumption.
13+
14+
In general, there are two broad approaches to profiling the memory usage of a program.
15+
16+
_The first approach_ is based on "`heap parsing`".
17+
At a particular point in time, the profiler looks at all the memory currently occupied by the program (the heap).
18+
In its raw form, the memory is just a bag of bytes, `Vec<u8>`.
19+
However the profiler, using some help from the language's runtime, is able to re-interpret these bytes as collections of object ("`parse the heap`").
20+
It then traverses the graph of objects and computes how many instances of each object are there and how much memory they occupy.
21+
The profiler also tracks the ownership relations, to ferret out facts like "`90% of strings in this program are owned by the ``Config`` struct`".
22+
This is the approach I am familiar with from the JVM ecosystem.
23+
Java's garbage collector needs to understand the heap to search for unreachable objects, and the same information is used to analyze heap snapshots.
24+
25+
_The second approach_ is based on instrumenting the calls to allocation and deallocation routines.
26+
The profiler captures backtraces when the program calls `malloc` and `free` and constructs a flamegraph displaying "`hot`" functions which allocate a lot.
27+
This is how, for example, https://github.com/KDE/heaptrack[heaptrack] works (see also https://github.com/cuviper/alloc_geiger[alloc geiger]).
28+
29+
The two approaches are complimentary.
30+
If the problem is that the application does to many short-lived allocations (instead of re-using the buffers), it would be invisible for the first approach, but very clear in the second one.
31+
If the problem is that, in a steady state, the application uses to much memory, the first approach would work better for pointing out which data structures need most attention.
32+
33+
In rust-analyzer, we are generally interested in keeping the overall memory usage small, and can make better use of heap parsing approach.
34+
Specifically, most of the rust-analyzer's data is stored in the incremental computation tables, and we want to know which table is the heaviest.
35+
36+
Unfortunately, Rust use garbage collection, so just parsing the heap bytes at runtime is impossible.
37+
The best available alternative is instrumenting data structures for the purposes of measuring memory size.
38+
That is, writing a proc-macro which adds `fn total_size(&self) -> usize` method to annotated types, and calling that manually from the root of the data.
39+
There is Servo's https://github.com/servo/servo/tree/2d3811c21bf1c02911d5002f9670349c5cf4f500/components/malloc_size_of[`malloc_size_of`] crate for doing that, but it is not published to crates.io.
40+
41+
Another alternative is running the program under valgrind to gain runtime introspectability.
42+
https://www.valgrind.org/docs/manual/ms-manual.html[Massif] and and https://www.valgrind.org/docs/manual/dh-manual.html[DHAT] work that way.
43+
Running with valgrind is pretty slow, and still doesn't give the Java-level fidelity.
44+
45+
Instead, rust-analyzer mainly relies on a much simpler approach for figuring out which things are heavy.
46+
This is the first trick of this article:
47+
48+
== Archimedes' Method
49+
50+
It's relatively easy to find out the total memory allocated at any given point in time.
51+
For glibc, there's https://man7.org/linux/man-pages/man3/mallinfo.3.html[mallinfo] function, a https://docs.rs/jemalloc-ctl/0.3.3/jemalloc_ctl/stats/struct.allocated.html[similar API] exists for jemalloc.
52+
It's even possible to implement a https://doc.rust-lang.org/stable/std/alloc/trait.GlobalAlloc.html[`GlobalAlloc`] which tracks this number.
53+
54+
And, if you can measure total memory usage, you can measure memory usage of any specific data structure by:
55+
56+
. noting the current memory usage
57+
. dropping the data structure
58+
. noting the current memory usage again
59+
60+
The difference between the two measurements is the size of the data structure.
61+
And this is exactly what rust-analyzer does to find the largest caches: https://github.com/rust-analyzer/rust-analyzer/blob/b988c6f84e06bdc5562c70f28586b9eeaae3a39c/crates/ide_db/src/apply_change.rs#L104-L238[source].
62+
63+
Two small notes about this method:
64+
65+
* It's important to ask the allocator about the available memory, and not the operating system.
66+
OS can only tell how many pages the program consumes.
67+
Only the allocator knows which of those pages are free and which hold allocated objects.
68+
* When measuring relative sizes, it's important to note the unaccounted-for amount in the end, such that the total adds up to 100%.
69+
It might be the case that the bottleneck lies in the dark matter outside of explicit measurements!
70+
71+
== Amdahl's Estimator
72+
73+
The second trick is related to the https://en.wikipedia.org/wiki/Amdahl's_law[Amdahl's law].
74+
When optimizing a specific component, it's important to note not only how much more efficient it becomes, but also overall contribution of the component to the system.
75+
Making an algorithm twice as fast can improve the overall performance only by 5%, if the algorithm is only 10% of the whole task.
76+
77+
In rust-analyzer's case, the optimization we are considering is adding interning to `Name`.
78+
At the moment, a ``Name`` is represented with a small sized optimized string (24 bytes inline + maybe some heap storage):
79+
80+
[source,rust]
81+
----
82+
struct Name {
83+
text: SmolStr,
84+
}
85+
----
86+
87+
Instead, we can use an interned index (4 bytes):
88+
89+
[source,rust]
90+
----
91+
struct Name {
92+
idx: u32
93+
}
94+
----
95+
96+
However, just trying out this optimization is not easy, as an interner is a thorny piece of global state.
97+
Is it worth it?
98+
99+
If we look at the `Name` itself, it's pretty clear that the optimization is valuable: it reduces memory usage by 6x!
100+
But how much is it important in the grand scheme of things?
101+
How to measure the impact of ``Name``s on overall memory usage?
102+
103+
One approach is to just apply the optimization and measure the improvement after the fact.
104+
But there's a lazier way: instead of making the `Name` smaller and measuring the improvement, we make it *bigger* and measure the worsening.
105+
Specifically, its easy to change the `Name` to this:
106+
107+
[source,rust]
108+
----
109+
struct Name {
110+
text: SmolStr,
111+
// Copy of `text`
112+
_ballast: SmolStr,
113+
}
114+
----
115+
116+
Now, if the new `Name` increases the overall memory consumption by `N`, we can estimate the total size of old ``Name``s as `N` as well, as they are twice as small.
117+
118+
Sometimes, quick and simple hacks works better than the finest instruments :)

0 commit comments

Comments
 (0)