From c647cff466d500350b59932a35fd883116ccf6ed Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 9 Sep 2020 16:59:04 +0100 Subject: [PATCH 1/6] Use cases section: add SciPy use case --- spec/design_topics/C_API.md | 2 + spec/use_cases.md | 104 ++++++++++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+) diff --git a/spec/design_topics/C_API.md b/spec/design_topics/C_API.md index 89586d4d8..e236fc4e9 100644 --- a/spec/design_topics/C_API.md +++ b/spec/design_topics/C_API.md @@ -1 +1,3 @@ +.. _C-api: + # C API diff --git a/spec/use_cases.md b/spec/use_cases.md index 648f17c8a..80b16293f 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -1,7 +1,111 @@ # Use cases +Use cases inform the requirements for, and design choices made in, this array +API standard. This section first discusses what types of use cases are +considered, and then works out a few concrete use cases in more detail. + ## Types of use cases +- Packages that depend on a specific array library currently, and would like + to support multiple of them (e.g. for GPU or distributed array support, for + improved performance, or for reaching a wider user base). +- Writing new libraries/tools that wrap multiple array libraries. +- Projects that implement new types of arrays with, e.g., hardware-specific + optimizations or auto-parallelization behavior, and need an API to put on + top that is familiar to end users. +- End users that want to switch from one library to another without learning + about all the small differences between those libraries. ## Concrete use cases + +### Use case 1: add GPU and distributed support to SciPy + +When surveying a representative set of advanced users and research software +engineers in 2019 (for [this NSF proposal](https://figshare.com/articles/Mid-Scale_Research_Infrastructure_-_The_Scientific_Python_Ecosystem/8009441)), +the single most common pain point brought up about SciPy was performance. + +SciPy heavily relies on NumPy (its only non-optional runtime dependency). +NumPy provides an array implementation that's in-memory, CPU-only and +single-threaded. Common performance-related wishes users have are: + +- parallel algorithms (can be multi-threaded or multiprocessing based) +- support for distributed arrays (with Dask in particular) +- support for GPUs + +Some parallelism can be supported in SciPy, it has a `workers` keyword +(similar to scikit-learn's `n_jobs` keyword) that allows specifying to use +parallelism in some algorithms. However SciPy itself will not directly start +depending on a GPU or distributed array implementation, or contain (e.g.) +CUDA code - that's not maintainable given the resources for development. +_However_, there is a way to provide distributed or GPU support. Part of the +solution is provided by NumPy's "array protocols" (see gh-1), that allow +dispatching to other array implementations. The main problem then becomes how +to know whether this will work with a particular distributed or GPU array +implementation - given that there are zero other array implementations that +are even close to providing full NumPy compatibility - without adding that +array implementation as a dependency. + +It's clear that SciPy functionality that relies on compiled extensions (C, +C++, Cython, Fortran) directly can't easily be run on another array library +than NumPy (see :ref:`C-api` for more details about this topic). Pure Python +code can work though. There's two main possibilities: + +1. Testing with another package, manually or in CI, and simply provide a list + of functionality that is found to work. Then make ad-hoc fixes to expand + the set that works. +2. Start relying on a well-defined subset of the NumPy API (or a new + NumPy-like API), for which compatibility is guaranteed. + +Option (2) seems strongly preferable, and that "well-defined subset" is _what +an API standard should provide_. Testing will still be needed, to ensure there +are no critical corner cases or bugs between array implementations, however +that's then a very tractable task. + +As a concrete example, consider the spectral analysis functions in `scipy.signal`. +All of those functions (e.g., `periodogram`, `spectrogram`, `csd`, `welch`, `stft`, +`istft`) are pure Python - with the exception of `lombscargle` which is ~40 +lines of Cython - and uses NumPy function calls, array attributes and +indexing. The beginning of each function could be changed to retrieve the +module that implements the array API standard for the given input array type, +and then functions from that module could be used instead of NumPy functions. + +If the user has another array type, say a CuPy or PyTorch array `x` on their +GPU, doing: +``` +from scipy import signal + +signal.welch(x) +``` +will result in: +``` +# For CuPy +ValueError: object __array__ method not producing an array + +# For PyTorch +TypeError: can't convert cuda:0 device type tensor to numpy. +``` +and therefore the user will have to explicitly convert to and from a +`numpy.ndarray` (which is quite inefficient): +``` +# For CuPy +x_np = cupy.asnumpy(x) +freq, Pxx = (cupy.asarray(res) for res in signal.welch(x_np)) + +# For PyTorch +x_np = x.cpu().numpy() +# Note: ends up with tensors on CPU, may still have to move them back +freq, Pxx = (torch.tensor(res) for res in signal.welch(x_np)) +``` +This code will look a little different for each array library. The end goal +here is to be able to write this instead as: +``` +freq, Pxx = signal.welch(x) +``` +and have `freq`, `Pxx` be arrays of the same type and on the same device as `x`. + +.. note:: + + This type of use case applies to many other libraries, from scikit-learn + and scikit-image to domain-specific libraries like AstroPy and + scikit-bio, to code written for a single purpose or user. From f6dc91160be9ae8ee712134f1a0e775ac3e27d7b Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 9 Sep 2020 17:27:55 +0100 Subject: [PATCH 2/6] Use cases section: add einops use case --- spec/use_cases.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/spec/use_cases.md b/spec/use_cases.md index 80b16293f..16a9976f2 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -109,3 +109,37 @@ and have `freq`, `Pxx` be arrays of the same type and on the same device as `x`. This type of use case applies to many other libraries, from scikit-learn and scikit-image to domain-specific libraries like AstroPy and scikit-bio, to code written for a single purpose or user. + + +### Use case 2: simplify einops by removing the backend system + +[einops](https://github.com/arogozhnikov/einops) is a library that provides flexible tensor operations and supports many array libraries (NumPy, TensorFlow, PyTorch, CuPy, MXNet, JAX). +Most of the code in `einops` is: + +- [einops.py](https://github.com/arogozhnikov/einops/blob/master/einops/einops.py) + contains the functions it offers as public API (`rearrange`, `reduce`, `repeat`). +- [_backends.py](https://github.com/arogozhnikov/einops/blob/master/einops/_backends.py) + contains the glue code needed to support that many array libraries. + +The amount of code in each of those two files is almost the same (~550 LoC each). +The typical pattern in `einops.py` is: +``` +def some_func(x): + ... + backend = get_backend(x) + shape = backend.shape(x) + result = backend.reduce(x) + ... +``` +With a standard array API, the `_backends.py` glue layer could almost completely disappear, +because the purpose it serves (providing a unified interface to array operations from each +of the supported backends) is already addressed by the array API standard. +Hence the complete `einops` code base could be close to 50% smaller, and easier to maintain or add to. + +.. note:: + + Other libraries that have a similar backend system to support many array libraries + include [TensorLy](https://github.com/tensorly/tensorly), + [Unumpy](https://github.com/Quansight-Labs/unumpy) and + [EagerPy](https://github.com/jonasrauber/eagerpy). Many end users and organizations will also have such glue code - it tends to be needed whenever one tries to support multiple + array types in a single API. From bb772d598284d354cdc26a6e1f60c77c37236bfb Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 9 Sep 2020 17:51:35 +0100 Subject: [PATCH 3/6] Use cases section: add "Python API for xtensor" use case --- spec/use_cases.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/spec/use_cases.md b/spec/use_cases.md index 16a9976f2..5ad116b52 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -143,3 +143,32 @@ Hence the complete `einops` code base could be close to 50% smaller, and easier [Unumpy](https://github.com/Quansight-Labs/unumpy) and [EagerPy](https://github.com/jonasrauber/eagerpy). Many end users and organizations will also have such glue code - it tends to be needed whenever one tries to support multiple array types in a single API. + + +### Use case 3: adding a Python API to xtensor + +[xtensor](https://github.com/xtensor-stack/xtensor) is a C++ array library +that is NumPy-inspired and provides lazy arrays. It has Python (and Julia and R) +bindings, however it does not have a Python array API. + +Xtensor aims to follow NumPy closely, however it only implements a subset of functionality +and documents some API differences in +[Notable differences with NumPy](https://xtensor.readthedocs.io/en/latest/numpy-differences.html). + +Note that other libraries document similar differences, see for example +[this page for JAX](https://jax.readthedocs.io/en/latest/jax.numpy.html) and +[this page for TensorFlow](https://www.tensorflow.org/guide/tf_numpy). + +Each time an array library author designs a new API, they have to choose (a) +what subset of NumPy makes sense to implement, and (b) where to deviate +because NumPy's API for a particular function is suboptimal or the semantics +don't fit their execution model. + +This array API standard aims to provide an API that can be readily adopted, +without having to make the above-mentioned choices. + +.. note:: + + XND is another array library, written in C, that still needs a Python API. + Array implementations in other languages are often in a similar situation, + and could translate this array API standard 1:1 to their language. From 723344a61f2266f11b64e78df11e74a1e723638a Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 9 Sep 2020 18:54:18 +0100 Subject: [PATCH 4/6] Use cases section: add a Numba/JIT use case --- spec/use_cases.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/spec/use_cases.md b/spec/use_cases.md index 5ad116b52..eb57c2d08 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -172,3 +172,43 @@ without having to make the above-mentioned choices. XND is another array library, written in C, that still needs a Python API. Array implementations in other languages are often in a similar situation, and could translate this array API standard 1:1 to their language. + + +### Use case 4: make JIT compilation of array computations easier and more robust + +[Numba](https://github.com/numba/numba) is a Just-In-Time (JIT) compiler for +numerical functions in Python; it is NumPy-aware. [PyPy](https://pypy.org) +is an implementation of Python with a JIT at its core; its NumPy support relies +on running NumPy itself through a compatibility layer (`cpyext`), while a +previous attempt to implement NumPy support directly was unsuccessful. + +Other array libraries may have an internal JIT (e.g., TensorFlow, PyTorch, +JAX, MXNet) or work with an external JIT like +[XLA](https://www.tensorflow.org/xla) or [VTA](https://tvm.apache.org/docs/vta/index.html). + +Numba currently has to jump through some hoops to accommodate NumPy's casting rules +and may not attain full compatibility with NumPy in some cases - see, e.g., +[this](https://github.com/numba/numba/issues/4749) or +[this](https://github.com/numba/numba/issues/5907) example issue regarding (array) scalar +return values. + +An [explicit suggestion from a Numba developer](https://twitter.com/esc___/status/1295389487485333505) +for this array API standard was: + +> for JIT compilers (e.g. Numba) it will be important, that the type of the + returned value(s) depends only on the *types* of the input but not on the + *values*. + +A concrete goal for this use case is to have better matching between +JIT-compiled and non-JIT execution. Here is an example from the Numba code +base, the need for which should be avoided in the future: + +``` +def check(x, y): + got = cfunc(x, y) + np.testing.assert_array_almost_equal(got, pyfunc(x, y)) + # Check the power operation conserved the input's dtype + # (this is different from Numpy, whose behaviour depends on + # the *values* of the arguments -- see PyArray_CanCastArrayTo). + self.assertEqual(got.dtype, x.dtype) +``` \ No newline at end of file From 58edfc8dc18825b45130bb781638fa146fe4da0c Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Wed, 9 Sep 2020 18:56:59 +0100 Subject: [PATCH 5/6] Add a list of all use cases at the top of the section --- spec/use_cases.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/spec/use_cases.md b/spec/use_cases.md index eb57c2d08..7b32572c0 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -19,6 +19,14 @@ considered, and then works out a few concrete use cases in more detail. ## Concrete use cases +- :ref:`use-case-scipy` +- :ref:`use-case-einops` +- :ref:`use-case-xtensor` +- :ref:`use-case-numba` + + +.. _use-case-scipy: + ### Use case 1: add GPU and distributed support to SciPy When surveying a representative set of advanced users and research software @@ -111,6 +119,8 @@ and have `freq`, `Pxx` be arrays of the same type and on the same device as `x`. scikit-bio, to code written for a single purpose or user. +.. _use-case-einops: + ### Use case 2: simplify einops by removing the backend system [einops](https://github.com/arogozhnikov/einops) is a library that provides flexible tensor operations and supports many array libraries (NumPy, TensorFlow, PyTorch, CuPy, MXNet, JAX). @@ -145,6 +155,8 @@ Hence the complete `einops` code base could be close to 50% smaller, and easier array types in a single API. +.. _use-case-xtensor: + ### Use case 3: adding a Python API to xtensor [xtensor](https://github.com/xtensor-stack/xtensor) is a C++ array library @@ -174,6 +186,8 @@ without having to make the above-mentioned choices. and could translate this array API standard 1:1 to their language. +.. _use-case-numba: + ### Use case 4: make JIT compilation of array computations easier and more robust [Numba](https://github.com/numba/numba) is a Just-In-Time (JIT) compiler for From 1db2b526cba4a24f272770836ced27527a698861 Mon Sep 17 00:00:00 2001 From: Ralf Gommers Date: Thu, 10 Sep 2020 17:12:54 +0100 Subject: [PATCH 6/6] Update use cases section for review comments --- spec/use_cases.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/spec/use_cases.md b/spec/use_cases.md index 7b32572c0..7a696ef17 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -27,7 +27,7 @@ considered, and then works out a few concrete use cases in more detail. .. _use-case-scipy: -### Use case 1: add GPU and distributed support to SciPy +### Use case 1: add hardware accelerator and distributed support to SciPy When surveying a representative set of advanced users and research software engineers in 2019 (for [this NSF proposal](https://figshare.com/articles/Mid-Scale_Research_Infrastructure_-_The_Scientific_Python_Ecosystem/8009441)), @@ -39,7 +39,8 @@ single-threaded. Common performance-related wishes users have are: - parallel algorithms (can be multi-threaded or multiprocessing based) - support for distributed arrays (with Dask in particular) -- support for GPUs +- support for GPUs and other hardware accelerators (shortened to just "GPU" + in the rest of this use case) Some parallelism can be supported in SciPy, it has a `workers` keyword (similar to scikit-learn's `n_jobs` keyword) that allows specifying to use @@ -149,7 +150,8 @@ Hence the complete `einops` code base could be close to 50% smaller, and easier .. note:: Other libraries that have a similar backend system to support many array libraries - include [TensorLy](https://github.com/tensorly/tensorly), + include [TensorLy](https://github.com/tensorly/tensorly), the (now discontinued) + multi-backend version of [Keras](https://github.com/keras-team/keras), [Unumpy](https://github.com/Quansight-Labs/unumpy) and [EagerPy](https://github.com/jonasrauber/eagerpy). Many end users and organizations will also have such glue code - it tends to be needed whenever one tries to support multiple array types in a single API.