Skip to content

Commit a540bdd

Browse files
committed
Add a document to point towards resources that we use for performance investigation
1 parent 47a1000 commit a540bdd

File tree

1 file changed

+55
-0
lines changed

1 file changed

+55
-0
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Investigating GraalPy Performance
2+
3+
First, make sure to build GraalPy with debug symbols.
4+
`export CFLAGS=-g` before doing a fresh `mx build` adds the debug symbols flags to all our C extension libraries.
5+
When you build a native image, use `find` to get the `.debug` file somewhere from the `mxbuild` directory tree, it's called something like `libpythonvm.so.debug`.
6+
Make sure to get that one and put it next to the `libpythonvm.so` in the Python standalone so that tools can pick it up.
7+
8+
## Peak Performance
9+
10+
[Truffle docs](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Optimizing/) under graal/truffle/docs/Optimizing.md are a good starting point.
11+
They describe how to start with the profiler, especially useful is the [flamegraph](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Profiling/#creating-a-flame-graph-from-cpu-sampler).
12+
This gives you a high-level idea of where time is spent.
13+
Note that currently (GR-58204) executions with native extensions may be less accurate.
14+
15+
In GraalPy's case the flamegraph is also useful to compare performance to CPython.
16+
[Py-spy](https://pypi.org/project/py-spy/) is pretty good for that, since it generates a flamegraph that is sufficiently comparable.
17+
Note that `py-spy` is a sampling profiler that accesses CPython internals, so it often does not work on the latest CPython, use a bit older one.
18+
19+
```
20+
py-spy record -n -r 100 -o pyspy.svg -- foo.py
21+
```
22+
23+
Once you have identified something that takes way too long on GraalPy as compared to CPython, follow the Truffle guide.
24+
25+
When you use [IGV](https://www.graalvm.org/tools/igv/), an interesting thing about debugging deoptimizations with IGV is that if you trace deopts as per the Truffle document linked above, search for "JVMCI: installed code name=".
26+
If the name ends with "#2" it's a second tier compilation.
27+
You might notice the presence of a `debugId` or `debug_id` in the output of these options.
28+
That id can be searched via `id=NUMBER`, `idx=NUMBER` or `debugId=NUMBER` in IGV's `Search in Nodes` search box, then selecting `Open Search for node NUMBER in Node Searches window`, and then clicking the `Search in following phases` button.
29+
Another useful thing to know is the `compile_id` matches the `compilationId` in IGVs "properties" view of the dumped graph.
30+
31+
[Proftool](https://github.com/graalvm/mx/blob/master/README-proftool.md) can also be helpful.
32+
Note that this is not really prepared for language launchers, if it doesn't work, just get the commandline and build the arguments manually.
33+
34+
## Interpreter Performance
35+
36+
For interpreter performance async profiler is good and also allows for some visualizations.
37+
Backtrace view and flat views are good.
38+
It is only for JVM executions (not native images).
39+
Download async-profiler and make sure you also have debug symbols in your C extensions.
40+
Use these options:
41+
42+
```
43+
--vm.agentpath:/path/to/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=profile.html' --vm.XX:+UnlockDiagnosticVMOptions --vm.XX:+DebugNonSafepoints
44+
```
45+
46+
Another very useful tool is [gprofng](https://blogs.oracle.com/linux/post/gprofng-the-next-generation-gnu-profiling-tool), it is part of binutils these days.
47+
If you have debug symbols, it works quite well with JVM launchers since it understands Hotspot frames, but also works fine with native images.
48+
You might run into a bug with our language launchers: https://sourceware.org/bugzilla/show_bug.cgi?id=32110 The patch in that bugreport from me (Tim) -- while not entirely correct and not passing their testsuite -- lets you review recorded profiles (the bug only manifests when viewing a recorded profile).
49+
What's nice about gprofng is that it can attribute time spent to Java bytecodes, so you can even profile huge methods like bytecode loops that, for example, the DSL has generated.
50+
51+
For SVM builds it is very useful to look at Truffle's [HostInlining](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/HostOptimization/) docs and check the debugging section there.
52+
This helps ensure that expected code is inlined (or not).
53+
When I identify something that takes long using gprofng, for example, I find it useful to check if that stuff is inlined as expected on SVM during the HostInliningPhase.
54+
55+
Supposedly Intel VTune and Oracle Developer Studio work well, but I haven't tried them.

0 commit comments

Comments
 (0)