|
| 1 | +# Investigating GraalPy Performance |
| 2 | + |
| 3 | +First, make sure to build GraalPy with debug symbols. |
| 4 | +`export CFLAGS=-g` before doing a fresh `mx build` adds the debug symbols flags to all our C extension libraries. |
| 5 | +When you build a native image, use `find` to get the `.debug` file somewhere from the `mxbuild` directory tree, it's called something like `libpythonvm.so.debug`. |
| 6 | +Make sure to get that one and put it next to the `libpythonvm.so` in the Python standalone so that tools can pick it up. |
| 7 | + |
| 8 | +## Peak Performance |
| 9 | + |
| 10 | +[Truffle docs](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Optimizing/) under graal/truffle/docs/Optimizing.md are a good starting point. |
| 11 | +They describe how to start with the profiler, especially useful is the [flamegraph](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Profiling/#creating-a-flame-graph-from-cpu-sampler). |
| 12 | +This gives you a high-level idea of where time is spent. |
| 13 | +Note that currently (GR-58204) executions with native extensions may be less accurate. |
| 14 | + |
| 15 | +In GraalPy's case the flamegraph is also useful to compare performance to CPython. |
| 16 | +[Py-spy](https://pypi.org/project/py-spy/) is pretty good for that, since it generates a flamegraph that is sufficiently comparable. |
| 17 | +Note that `py-spy` is a sampling profiler that accesses CPython internals, so it often does not work on the latest CPython, use a bit older one. |
| 18 | + |
| 19 | +``` |
| 20 | +py-spy record -n -r 100 -o pyspy.svg -- foo.py |
| 21 | +``` |
| 22 | + |
| 23 | +Once you have identified something that takes way too long on GraalPy as compared to CPython, follow the Truffle guide. |
| 24 | + |
| 25 | +When you use [IGV](https://www.graalvm.org/tools/igv/), an interesting thing about debugging deoptimizations with IGV is that if you trace deopts as per the Truffle document linked above, search for "JVMCI: installed code name=". |
| 26 | +If the name ends with "#2" it's a second tier compilation. |
| 27 | +You might notice the presence of a `debugId` or `debug_id` in the output of these options. |
| 28 | +That id can be searched via `id=NUMBER`, `idx=NUMBER` or `debugId=NUMBER` in IGV's `Search in Nodes` search box, then selecting `Open Search for node NUMBER in Node Searches window`, and then clicking the `Search in following phases` button. |
| 29 | +Another useful thing to know is the `compile_id` matches the `compilationId` in IGVs "properties" view of the dumped graph. |
| 30 | + |
| 31 | +[Proftool](https://github.com/graalvm/mx/blob/master/README-proftool.md) can also be helpful. |
| 32 | +Note that this is not really prepared for language launchers, if it doesn't work, just get the commandline and build the arguments manually. |
| 33 | + |
| 34 | +## Interpreter Performance |
| 35 | + |
| 36 | +For interpreter performance async profiler is good and also allows for some visualizations. |
| 37 | +Backtrace view and flat views are good. |
| 38 | +It is only for JVM executions (not native images). |
| 39 | +Download async-profiler and make sure you also have debug symbols in your C extensions. |
| 40 | +Use these options: |
| 41 | + |
| 42 | +``` |
| 43 | +--vm.agentpath:/path/to/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=profile.html' --vm.XX:+UnlockDiagnosticVMOptions --vm.XX:+DebugNonSafepoints |
| 44 | +``` |
| 45 | + |
| 46 | +Another very useful tool is [gprofng](https://blogs.oracle.com/linux/post/gprofng-the-next-generation-gnu-profiling-tool), it is part of binutils these days. |
| 47 | +If you have debug symbols, it works quite well with JVM launchers since it understands Hotspot frames, but also works fine with native images. |
| 48 | +You might run into a bug with our language launchers: https://sourceware.org/bugzilla/show_bug.cgi?id=32110 The patch in that bugreport from me (Tim) -- while not entirely correct and not passing their testsuite -- lets you review recorded profiles (the bug only manifests when viewing a recorded profile). |
| 49 | +What's nice about gprofng is that it can attribute time spent to Java bytecodes, so you can even profile huge methods like bytecode loops that, for example, the DSL has generated. |
| 50 | + |
| 51 | +For SVM builds it is very useful to look at Truffle's [HostInlining](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/HostOptimization/) docs and check the debugging section there. |
| 52 | +This helps ensure that expected code is inlined (or not). |
| 53 | +When I identify something that takes long using gprofng, for example, I find it useful to check if that stuff is inlined as expected on SVM during the HostInliningPhase. |
| 54 | + |
| 55 | +Supposedly Intel VTune and Oracle Developer Studio work well, but I haven't tried them. |
0 commit comments