Description
Ever since adding BLAS/LAPACK to stdlib the build times have exploded.
I have observed in my local machine (Windows + gfortran + CMake + Ninja with 16 threads) that the build takes 199 seconds. Using this tool https://ui.perfetto.dev/ as suggested by @ivan-pi it is possible to load the .ninja_log file and observe where are the bottle necks:
In the figure we observe the build chocking up at the lapack per kind files.
I started a major refactoring of stdlib's BLAS/LAPACK here with help from @ivan-pi and @perazz : https://github.com/jalvesz/stdlib/tree/split_lapack using a python script to rewrite the blas/lapack-per-kind files into submodule files per groups.
Doing that I managed to get down the build time to 38 seconds (x5 faster):
There are still some spots in which the build stalls but it is quite evident that by splitting into several submodule files enables better parallel build.
Before opening a PR, I would like to gather some broad opinions on this refactoring.
Some observations/open questions:
-
In order to achieve this splitting, the
refactor_blaslapack_subm.py
script contains two dictionariesblas_groups
andlapack_groups
. They define the procedures to be included in each submodule file. This splitting started by following the lapack reference manual, then I had to arbitrarily split further. I'm sure that with a proper dependency graph it could be possible to achieve a better grouping. If anyone feels like proposing a mechanism to find such grouping that would be very nice. Here a couple of references I've found interesting about such topic: https://paulcavallaro.com/blog/optimizing-function-placement-with-call-chain-clustering/ & https://medium.com/@avidaneran/using-a-graph-representation-to-analyze-python-dependencies-a57cd681fa09 -
In the draft branch, the per-kind files were moved to a
legacy
folder. When opening the PR I think these should not be there as there is no point in having so many duplicates of the sources. Nonetheless, the splitting script can be improved and it is based on those files. Should these files be saved in a separate repo for reference? -
The submodules approach is proving very neat to organize and increase build performance. Now, this implies a very large header module file in order to have all interfaces for lapack functions. A better work on (1) could probably lead to finding clearer macro dependencies which could enable splitting also this header module.
-
While performing this refactoring I noticed that there are some procedures which I don't understand why should they be kept such as
lamc3
, which simply doesc = a + b
... is there a reason to actually keep such a function?