[GR-58201] Document performance investigation (and some other docs updates)

timfel · timfel · commit 928308e73953 · 2024-09-24T12:37:53.000Z
PullRequest: graalpython/3484
diff --git a/docs/contributor/CONTRIBUTING.md b/docs/contributor/CONTRIBUTING.md
@@ -16,18 +16,34 @@ git clone https://github.com/graalvm/mx.git
 ```
 Make sure to add the `mx` directory to your `PATH`.
 
-You can always use the latest stable JDK for development.
-You can also download a suitable JDK using mx:
+Use `mx` to get additional projects at the right versions.
+From within your `graalpython` checkout, run:
+```
+mx sforceimport
+```
+
+You can then download a suitable JDK:
 ```bash
-mx fetch-jdk
+mx -p ../graal/vm --env ce-python fetch-jdk -A --jdk-id labsjdk-ce-latest
+```
+
+Make sure that the `JAVA_HOME` environment variable is set:
+```bash
+export JAVA_HOME="${HOME}/.mx/jdks/labsjdk-ce-latest
+```
+
+(Or on Windows)
+```
+$env:JAVA_HOME="$HOME\.mx\jdks\labsjdk-ce-latest"
 ```
-Make sure that the `JAVA_HOME` environment variable is set.
 
 For building GraalPy, you will also need some native build tools and libraries. On a Debian based system, install:
 ```bash
 sudo apt install build-essential libc++-12-dev zlib1g-dev cmake
 ```
 
+(On Windows, make sure you are running in a Visual Studio Developer Powershell, that should have everything you need.)
+
 Lastly, download maven, extract it and include it on your `PATH`.
 
 Once you have all the necessary tools, you can run `mx python-jvm` in this repository.
@@ -36,7 +52,8 @@ If it succeeds without errors, you should already be able to run `mx python` and
 
 For development, we recommend running `mx ideinit` next.
 This will generate configurations for Eclipse, IntelliJ, and NetBeans so that you can open the projects in these IDEs.
-If you use another editor with support for the [Eclipse language server](https://github.com/eclipse/eclipse.jdt.ls) we have also had reports of useable development setups with that, but it's not something we support.
+See also the documentation in mx for [setting up your IDE](https://github.com/graalvm/mx/blob/master/docs/IDE.md).
+If you use another editor (such as VSCode, Emacs, or Neovim) with support for the [Eclipse language server](https://github.com/eclipse/eclipse.jdt.ls) or [Apache NetBeans language server](https://marketplace.visualstudio.com/items?itemName=ASF.apache-netbeans-java), you can also get useable development setups with that, but it's not something we explicitly support.
 
 ## Development Layout
 
@@ -110,6 +127,11 @@ If the IDE was initialized properly by using the command mentioned above, the ex
 
 Both of these commands also work when you have a `graalpy` executable, e.g. inside a `venv`.
 
+For debugging the C API and native extensions, first make sure you rebuild (`mx clean` first!) graalpything with the environment variable `CFLAGS=-g` set.
+This will keep debug symbols in our C API implementation which should allow you to use `gdb` or [`rr`](https://rr-project.org/) to debug.
+When you build an SVM image, debugging the entire application is possible, and there are [docs](https://www.graalvm.org/reference-manual/native-image/guides/debug-native-image-process/) to see Java code when inside the native debugger.
+Make sure you find and keep the `libpythonvm.so.debug` file around next to your GraalPy build, you can find it somewhere under `graal/sdk/mxbuild`.
+
 ## Advanced Commands to Develop and Debug
 
 Here are some advanced commands to debug test failures and fix issues.
@@ -299,22 +321,3 @@ mx --env ../../graal/vm/mx.vm/ce \
     --jvm-config=native \
     --python-vm-config=default --
 ```
-
-## Finding Memory Leaks
-
-For best performance we keep references to long-lived user objects (mostly functions, classes, and modules) directly in the AST nodes when using the default configuration of a single Python context (as is used when running the launcher).
-For better sharing of warm-up and where absolutely best peak performance is not needed, contexts can be configured with a shared engine and the ASTs will be shared across contexts.
-However, that implies we *must* not store any user objects strongly in the ASTs.
-We test that we have no PythonObjects alive after a Context is closed that are run as part of our JUnit tests.
-These can be run by themselves, for example, like so:
-
-```bash
-mx python-leak-test --lang python \
-    --shared-engine \
-      --code 'import site, json' \
-      --forbidden-class com.oracle.graal.python.builtins.objects.object.PythonObject \
-      --keep-dump
-```
-
-The `--keep-dump` option will print the heapdump location and leave the file there rather than deleting it.
-It can then be opened for example with VisualVM to check for the paths of any leaked object, if there are any.
diff --git a/docs/contributor/INVESTIGATING_PERFORMANCE.md b/docs/contributor/INVESTIGATING_PERFORMANCE.md
@@ -0,0 +1,80 @@
+# Investigating GraalPy Performance
+
+First, make sure to build GraalPy with debug symbols.
+`export CFLAGS=-g` before doing a fresh `mx build` adds the debug symbols flags to all our C extension libraries.
+When you build a native image, use `find` to get the `.debug` file somewhere from the `mxbuild` directory tree, it's called something like `libpythonvm.so.debug`.
+Make sure to get that one and put it next to the `libpythonvm.so` in the Python standalone so that tools can pick it up.
+
+## Peak Performance
+
+[Truffle docs](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Optimizing/) under graal/truffle/docs/Optimizing.md are a good starting point.
+They describe how to start with the profiler, especially useful is the [flamegraph](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/Profiling/#creating-a-flame-graph-from-cpu-sampler).
+This gives you a high-level idea of where time is spent.
+Note that currently (GR-58204) executions with native extensions may be less accurate.
+
+In GraalPy's case the flamegraph is also useful to compare performance to CPython.
+[Py-spy](https://pypi.org/project/py-spy/) is pretty good for that, since it generates a flamegraph that is sufficiently comparable.
+Note that `py-spy` is a sampling profiler that accesses CPython internals, so it often does not work on the latest CPython, use a bit older one.
+
+```
+py-spy record -n -r 100 -o pyspy.svg -- foo.py
+```
+
+Once you have identified something that takes way too long on GraalPy as compared to CPython, follow the Truffle guide.
+
+When you use [IGV](https://www.graalvm.org/tools/igv/), an interesting thing about debugging deoptimizations with IGV is that if you trace deopts as per the Truffle document linked above, search for "JVMCI: installed code name=".
+If the name ends with "#2" it's a second tier compilation.
+You might notice the presence of a `debugId` or `debug_id` in the output of these options.
+That id can be searched via `id=NUMBER`, `idx=NUMBER` or `debugId=NUMBER` in IGV's `Search in Nodes` search box, then selecting `Open Search for node NUMBER in Node Searches window`, and then clicking the `Search in following phases` button.
+Another useful thing to know is the `compile_id` matches the `compilationId` in IGVs "properties" view of the dumped graph.
+
+[Proftool](https://github.com/graalvm/mx/blob/master/README-proftool.md) can also be helpful.
+Note that this is not really prepared for language launchers, if it doesn't work, just get the commandline and build the arguments manually.
+
+## Interpreter Performance
+
+For interpreter performance async profiler is good and also allows for some visualizations.
+Backtrace view and flat views are good.
+It is only for JVM executions (not native images).
+Download async-profiler and make sure you also have debug symbols in your C extensions.
+Use these options:
+
+```
+--vm.agentpath:/path/to/async-profiler/lib/libasyncProfiler.so=start,event=cpu,file=profile.html' --vm.XX:+UnlockDiagnosticVMOptions --vm.XX:+DebugNonSafepoints
+```
+
+Another very useful tool is [gprofng](https://blogs.oracle.com/linux/post/gprofng-the-next-generation-gnu-profiling-tool), it is part of binutils these days.
+If you have debug symbols, it works quite well with JVM launchers since it understands Hotspot frames, but also works fine with native images.
+You might run into a bug with our language launchers: https://sourceware.org/bugzilla/show_bug.cgi?id=32110 The patch in that bugreport from me (Tim) -- while not entirely correct and not passing their testsuite -- lets you review recorded profiles (the bug only manifests when viewing a recorded profile).
+What's nice about gprofng is that it can attribute time spent to Java bytecodes, so you can even profile huge methods like bytecode loops that, for example, the DSL has generated.
+
+For SVM builds it is very useful to look at Truffle's [HostInlining](https://www.graalvm.org/graalvm-as-a-platform/language-implementation-framework/HostOptimization/) docs and check the debugging section there.
+This helps ensure that expected code is inlined (or not).
+When I identify something that takes long using gprofng, for example, I find it useful to check if that stuff is inlined as expected on SVM during the HostInliningPhase.
+
+Supposedly Intel VTune and Oracle Developer Studio work well, but I haven't tried them.
+
+## Memory Usage
+
+Memory usage is best tracked with VisualVM for the Java heap.
+For best performance we keep references to long-lived user objects (mostly functions, classes, and modules) directly in the AST nodes when using the default configuration of a single Python context (as is used when running the launcher).
+For better sharing of warm-up and where absolutely best peak performance is not needed, contexts can be configured with a shared engine and the ASTs will be shared across contexts.
+However, that implies we *must* not store any user objects strongly in the ASTs.
+We test that we have no PythonObjects alive after a Context is closed that are run as part of our JUnit tests.
+These can be run by themselves, for example, like so:
+
+```bash
+mx python-leak-test --lang python \
+    --shared-engine \
+      --code 'import site, json' \
+      --forbidden-class com.oracle.graal.python.builtins.objects.object.PythonObject \
+      --keep-dump
+```
+
+The `--keep-dump` option will print the heapdump location and leave the file there rather than deleting it.
+It can then be opened for example with VisualVM to check for the paths of any leaked object, if there are any.
+
+For native code, use native memory profiling tools.
+I have used [`massif`](https://valgrind.org/docs/manual/ms-manual.html) in the past to find allocations and memory issues in native extensions, but be aware of the large overhead.
+However, once you do find something interesting using `massif`, [`rr`](https://rr-project.org/) is a good option to dive further into it, because then you can break around places massif found allocations, and use memory breakpoints and reverse and forward execution to find where the memory is allocated and released.
+This can be useful to identify memory leaks in our C API emulation.
diff --git a/docs/contributor/MISSING.md b/docs/contributor/MISSING.md
diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/SysModuleBuiltins.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/SysModuleBuiltins.java
@@ -152,7 +152,6 @@
 import com.oracle.graal.python.builtins.PythonOS;
 import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsClinicProviders.GetFrameNodeClinicProviderGen;
 import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsClinicProviders.SetDlopenFlagsClinicProviderGen;
-import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsFactory.ExcInfoNodeFactory;
 import com.oracle.graal.python.builtins.modules.io.BufferedReaderBuiltins;
 import com.oracle.graal.python.builtins.modules.io.BufferedWriterBuiltins;
 import com.oracle.graal.python.builtins.modules.io.FileIOBuiltins;
@@ -856,12 +855,6 @@ static PTuple run(VirtualFrame frame,
                 return factory.createTuple(new Object[]{getClassNode.execute(inliningTarget, exceptionObject), exceptionObject, traceback});
             }
         }
-
-        @NeverDefault
-        public static ExcInfoNode create() {
-            return ExcInfoNodeFactory.create(null);
-        }
-
     }
 
     // ATTENTION: this is intentionally a PythonBuiltinNode and not PythonUnaryBuiltinNode,
diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/cext/PythonCextErrBuiltins.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/cext/PythonCextErrBuiltins.java
@@ -73,6 +73,7 @@
 import com.oracle.graal.python.builtins.modules.PosixModuleBuiltins.ExitNode;
 import com.oracle.graal.python.builtins.modules.SysModuleBuiltins;
 import com.oracle.graal.python.builtins.modules.SysModuleBuiltins.ExcInfoNode;
+import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsFactory.ExcInfoNodeFactory;
 import com.oracle.graal.python.builtins.modules.cext.PythonCextBuiltins.CApiBinaryBuiltinNode;
 import com.oracle.graal.python.builtins.modules.cext.PythonCextBuiltins.CApiBuiltin;
 import com.oracle.graal.python.builtins.modules.cext.PythonCextBuiltins.CApiNullaryBuiltinNode;
@@ -444,11 +445,15 @@ static Object write(Object msg, Object obj,
 
     @CApiBuiltin(ret = Void, args = {Int}, call = Direct)
     abstract static class PyErr_PrintEx extends CApiUnaryBuiltinNode {
+        static ExcInfoNode createExcInfoNode() {
+            return ExcInfoNodeFactory.create(null);
+        }
+
         @TruffleBoundary
         @Specialization
         static Object raise(int set_sys_last_vars,
                         @Cached IsInstanceNode isInstanceNode,
-                        @Cached ExcInfoNode excInfoNode,
+                        @Cached(neverDefault = true, value = "createExcInfoNode()") ExcInfoNode excInfoNode,
                         @Cached PyErr_Restore restoreNode,
                         @Cached PyFile_WriteObject writeFileNode,
                         @Cached ExitNode exitNode,
diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/objects/cext/capi/PythonNativeWrapper.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/objects/cext/capi/PythonNativeWrapper.java
@@ -119,11 +119,11 @@ public final boolean isNative() {
      * transition code will consider that and eagerly return the pointer object. If {@code true} is
      * returned, the wrapper must also implement {@link #getReplacement(InteropLibrary)} which
      * returns the pointer object. Furthermore, wrappers must use
-     * {@link #registerReplacement(Object, InteropLibrary)} to register the allocated native memory
-     * in order that the native pointer can be resolved to the managed wrapper in the
+     * {@link #registerReplacement(Object, boolean, InteropLibrary)} to register the allocated
+     * native memory in order that the native pointer can be resolved to the managed wrapper in the
      * <it>native-to-Python</it> transition.
      * </p>
-     * 
+     *
      * @return {@code true} if the wrapper should be materialized eagerly, {@code false} otherwise.
      */
     public final boolean isReplacingWrapper() {
diff --git a/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/argument/CreateArgumentsNode.java b/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/argument/CreateArgumentsNode.java
@@ -235,7 +235,7 @@ public abstract static class CreateAndCheckArgumentsNode extends PNodeWithContex
          *
          * @param inliningTarget The inlining target.
          * @param callableOrName This object can either be the function/method object or just a name
-         *            ({@link TruffleString)}. It is primarily used to create error messages. It is
+         *            ({@link TruffleString}). It is primarily used to create error messages. It is
          *            also used to check if the function
          * @param userArguments The positional arguments as provided by the caller (must not be
          *            {@code null} but may be empty).
diff --git a/mx.graalpython/eclipse-settings/org.eclipse.jdt.core.prefs b/mx.graalpython/eclipse-settings/org.eclipse.jdt.core.prefs
@@ -1 +1,2 @@
 org.eclipse.jdt.core.compiler.problem.unusedParameter=ignore
+org.eclipse.jdt.core.compiler.problem.missingOverrideAnnotation=warning

Original file line number	Diff line number	Diff line change
`@@ -152,7 +152,6 @@`
`152`	`152`	`import com.oracle.graal.python.builtins.PythonOS;`
`153`	`153`	`import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsClinicProviders.GetFrameNodeClinicProviderGen;`
`154`	`154`	`import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsClinicProviders.SetDlopenFlagsClinicProviderGen;`
`155`		`-import com.oracle.graal.python.builtins.modules.SysModuleBuiltinsFactory.ExcInfoNodeFactory;`
`156`	`155`	`import com.oracle.graal.python.builtins.modules.io.BufferedReaderBuiltins;`
`157`	`156`	`import com.oracle.graal.python.builtins.modules.io.BufferedWriterBuiltins;`
`158`	`157`	`import com.oracle.graal.python.builtins.modules.io.FileIOBuiltins;`
`@@ -856,12 +855,6 @@ static PTuple run(VirtualFrame frame,`
`856`	`855`	`return factory.createTuple(new Object[]{getClassNode.execute(inliningTarget, exceptionObject), exceptionObject, traceback});`
`857`	`856`	`}`
`858`	`857`	`}`
`859`		`-`
`860`		`- @NeverDefault`
`861`		`- public static ExcInfoNode create() {`
`862`		`- return ExcInfoNodeFactory.create(null);`
`863`		`- }`
`864`		`-`
`865`	`858`	`}`
`866`	`859`
`867`	`860`	`// ATTENTION: this is intentionally a PythonBuiltinNode and not PythonUnaryBuiltinNode,`
Original file line number	Diff line number	Diff line change
`@@ -235,7 +235,7 @@ public abstract static class CreateAndCheckArgumentsNode extends PNodeWithContex`
`235`	`235`	`*`
`236`	`236`	`* @param inliningTarget The inlining target.`
`237`	`237`	`* @param callableOrName This object can either be the function/method object or just a name`
`238`		`- * ({@link TruffleString)}. It is primarily used to create error messages. It is`
	`238`	`+ * ({@link TruffleString}). It is primarily used to create error messages. It is`
`239`	`239`	`* also used to check if the function`
`240`	`240`	`* @param userArguments The positional arguments as provided by the caller (must not be`
`241`	`241`	`* {@code null} but may be empty).`
Original file line number	Diff line number	Diff line change
`@@ -1 +1,2 @@`
`1`	`1`	`org.eclipse.jdt.core.compiler.problem.unusedParameter=ignore`
	`2`	`+org.eclipse.jdt.core.compiler.problem.missingOverrideAnnotation=warning`