Reduce default parallelism of tox test runs #1352

wshanks · 2023-12-21T21:26:05Z

Measurements of the total test time for different parallelism settings showed that total test time increased when QISKIT_PARALLEL was set to TRUE or OMP_NUM_THREADS was set higher than 1 when the stestr concurrency was higher than the number of physical cores. Additionally, no improvement was seen when increasing the stestr concurrency beyond the number of physical cores (for example up to two times that number on a system with 2x multithreading), but also no degradation in performance was observed, so the stestr concurrency was left at the default value (the number of logical CPUs rather than the number of physical CPUs) for simplicity.

Measurements of the total test time for different parallelism settings showed that total test time increased when `QISKIT_PARALLEL` was set to `TRUE` or `OMP_NUM_THREADS` was set higher than 1 when the `stestr` concurrency was higher than the number of physical cores. Additionally, no improvement was seen when increasing the `stestr` concurrency beyond the number of physical cores (for example up to two times that number on a system with 2x multithreading), but also no degradation in performance was observed, so the `stestr` concurrency was left at the default value (the number of logical CPUs rather than the number of physical CPUs) for simplicity.

wshanks · 2023-12-22T15:38:24Z

Here are the results from some tests that I did locally. These tests were done on a laptop with an Intel i5-1145G7 processor that has 4 physical cores with 2x hyperthreading running Linux. The O label marks the setting for OMP_NUM_THREADS, the QP marks QISKIT_PARALLEL status, and # marks the concurrency level set for stestr. We see that concurrency cuts down the total test time as we go from 1->2->4 workers, but beyond 4 it levels off. I guess the hyperthreading is pretty good at switching between additional tasks so there is no performance hit but there is also no gain. Allowing parallelism either through OMP_NUM_THREADS(used by numpy) orQISKIT_PARALLELresults in a net loss of performance --stestr` is already using the compute resources efficiently and adding more parallelism just hurts performance.

The default on Linux is for QISKIT_PARALLEL to be true (it is false on Windows and Mac) and the default for OMP_NUM_THREADS is to be the number of processors on the system. I think we generally want these turned off so I made that the default in the tox configuration. It looks like this change has reduced the Linux test time from 6 minutes to 5 minutes. Also, the Windows time might have dropped from 7 minutes to 6 minutes but it has more variability so it is hard to say. Since we were already running with OMP_NUM_THREADS=1 on Mac (QISKIT_PARALLEL defaults to false there), we don't expect Mac to change. That is unfortunate since the Mac times are usually the slowest.

I didn't try to reduce the stestr concurrency to the physical core number. Maybe that would be worth doing. I am not sure what the CPUs in the GitHub VMs are actually like or if they actually do hyperthreading. It might be worth playing around with the number though, especially for Mac. While the Mac test times can be 7 minutes, sometimes they are 12 minutes, and I don't know if that variability is something we can influence or is just part of how the Mac systems are hosted/shared between jobs.

mtreinish · 2023-12-22T18:29:32Z

The other environment variable to consider is RAYON_NUM_THREADS which controls the threads in the rayon threadpool that rustwork and qiskit's rust code uses internally. By default it will use all available CPUs, although for qiskit's usage by default threading and multiprocessing are mutually exclusive by default, you can use QISKIT_FORCE_THREADS to force rust multithreading with python multiprocessing. (These options are documented here https://docs.quantum.ibm.com/start/configure-qiskit-local#environment-variables )

wshanks · 2023-12-26T15:33:54Z

Thanks, @mtreinish. Actually in this PR, I did also set RAYON_NUM_THREADS=1 in tox.ini. I knew about it because it was already there in passenv. I just assumed that setting it to 1 would be best for similar reasons to setting OMP_NUM_THREADS, but I didn't actually test it (so it is not in the plot above). I haven't tried to test this, but I suspect that turning off QISKIT_PARALLEL speeds up the experiments tests mostly because it avoids the overhead of spawning a process pool and serializing and deserializing the circuits rather than because it avoids resource contention. I could be wrong about that though, but a lot of the circuits in the experiments tests are very simple -- just a few gates on a single qubit or sometimes two qubits -- so I don't think there is too much processing happening in the transpile step (so I don't think the Rayon threading should help much any way). There are a lot of tests that turn circuit results into a numpy array and run a curve fit on it (that's how most experiments work). I think limiting OMP_NUM_THREADS does help avoid resource contention in that part.

coruscating

Looks good to me, thanks for testing these parameters! I'm good with merging this and leaving tweaking parameters for Macs as a follow up.

Measurements of the total test time for different parallelism settings showed that total test time increased when `QISKIT_PARALLEL` was set to `TRUE` or `OMP_NUM_THREADS` was set higher than 1 when the `stestr` concurrency was higher than the number of physical cores. Additionally, no improvement was seen when increasing the `stestr` concurrency beyond the number of physical cores (for example up to two times that number on a system with 2x multithreading), but also no degradation in performance was observed, so the `stestr` concurrency was left at the default value (the number of logical CPUs rather than the number of physical CPUs) for simplicity.

wshanks force-pushed the test-parallelism branch from c4098e1 to 4346ef6 Compare December 22, 2023 03:42

coruscating approved these changes Jan 3, 2024

View reviewed changes

wshanks added this pull request to the merge queue Jan 3, 2024

Merged via the queue into qiskit-community:main with commit 441e549 Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce default parallelism of tox test runs #1352

Reduce default parallelism of tox test runs #1352

Uh oh!

wshanks commented Dec 21, 2023

Uh oh!

wshanks commented Dec 22, 2023

Uh oh!

mtreinish commented Dec 22, 2023

Uh oh!

wshanks commented Dec 26, 2023 •

edited

Loading

Uh oh!

coruscating left a comment

Uh oh!

Uh oh!

Reduce default parallelism of tox test runs #1352

Reduce default parallelism of tox test runs #1352

Uh oh!

Conversation

wshanks commented Dec 21, 2023

Uh oh!

wshanks commented Dec 22, 2023

Uh oh!

mtreinish commented Dec 22, 2023

Uh oh!

wshanks commented Dec 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coruscating left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wshanks commented Dec 26, 2023 •

edited

Loading