From 00f29e76b28a6ac9af77f616da66836552a64065 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 28 Oct 2020 20:56:19 +0000 Subject: [PATCH] Add content to the Parallelism section. This is a short, neutral summary mostly saying that the whole topic is out of scope for this version of the standard, and referring to the relevant GitHub issue for more details for those who are interested. --- spec/design_topics/parallelism.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/spec/design_topics/parallelism.md b/spec/design_topics/parallelism.md index 5c3fb8484..38ccb9343 100644 --- a/spec/design_topics/parallelism.md +++ b/spec/design_topics/parallelism.md @@ -1 +1,23 @@ # Parallelism + +Parallelism is mostly, but not completely, an execution or runtime concern +rather than an API concern. Execution semantics are out of scope for this API +standard, and hence won't be discussed further here. The API related part +involves how libraries allow users to exercise control over the parallelism +they offer, such as: + +- Via environment variables. This is the method of choice for BLAS libraries and libraries using OpenMP. +- Via a keyword to individual functions or methods. Examples include the `n_jobs` keyword used in scikit-learn and the `workers` keyword used in SciPy. +- Build-time settings to enable a parallel or distributed backend. +- Via letting the user set chunk sizes. Dask uses this approach. + +When combining multiple libraries, one has to deal with auto-parallelization +semantics and nested parallelism. Two things that could help improve the +coordination of parallelization behavior in a stack of Python libraries are: + +1. A common API pattern for enabling parallelism +2. A common library providing a parallelization layer + +Option (1) may possibly fit in a future version of this array API standard. +[array-api issue 4](https://github.com/data-apis/array-api/issues/4) contains +more detailed discussion on the topic of parallelism. \ No newline at end of file