|
1 | 1 | # Use cases
|
2 | 2 |
|
| 3 | +Use cases inform the requirements for, and design choices made in, this array |
| 4 | +API standard. This section first discusses what types of use cases are |
| 5 | +considered, and then works out a few concrete use cases in more detail. |
| 6 | + |
3 | 7 | ## Types of use cases
|
4 | 8 |
|
| 9 | +- Packages that depend on a specific array library currently, and would like |
| 10 | + to support multiple of them (e.g. for GPU or distributed array support, for |
| 11 | + improved performance, or for reaching a wider user base). |
| 12 | +- Writing new libraries/tools that wrap multiple array libraries. |
| 13 | +- Projects that implement new types of arrays with, e.g., hardware-specific |
| 14 | + optimizations or auto-parallelization behavior, and need an API to put on |
| 15 | + top that is familiar to end users. |
| 16 | +- End users that want to switch from one library to another without learning |
| 17 | + about all the small differences between those libraries. |
5 | 18 |
|
6 | 19 |
|
7 | 20 | ## Concrete use cases
|
| 21 | + |
| 22 | +### Use case 1: add GPU and distributed support to SciPy |
| 23 | + |
| 24 | +When surveying a representative set of advanced users and research software |
| 25 | +engineers in 2019 (for [this NSF proposal](https://figshare.com/articles/Mid-Scale_Research_Infrastructure_-_The_Scientific_Python_Ecosystem/8009441)), |
| 26 | +the single most common pain point brought up about SciPy was performance. |
| 27 | + |
| 28 | +SciPy heavily relies on NumPy (its only non-optional runtime dependency). |
| 29 | +NumPy provides an array implementation that's in-memory, CPU-only and |
| 30 | +single-threaded. Common performance-related wishes users have are: |
| 31 | + |
| 32 | +- parallel algorithms (can be multi-threaded or multiprocessing based) |
| 33 | +- support for distributed arrays (with Dask in particular) |
| 34 | +- support for GPUs |
| 35 | + |
| 36 | +Some parallelism can be supported in SciPy, it has a `workers` keyword |
| 37 | +(similar to scikit-learn's `n_jobs` keyword) that allows specifying to use |
| 38 | +parallelism in some algorithms. However SciPy itself will not directly start |
| 39 | +depending on a GPU or distributed array implementation, or contain (e.g.) |
| 40 | +CUDA code - that's not maintainable given the resources for development. |
| 41 | +_However_, there is a way to provide distributed or GPU support. Part of the |
| 42 | +solution is provided by NumPy's "array protocols" (see gh-1), that allow |
| 43 | +dispatching to other array implementations. The main problem then becomes how |
| 44 | +to know whether this will work with a particular distributed or GPU array |
| 45 | +implementation - given that there are zero other array implementations that |
| 46 | +are even close to providing full NumPy compatibility - without adding that |
| 47 | +array implementation as a dependency. |
| 48 | + |
| 49 | +It's clear that SciPy functionality that relies on compiled extensions (C, |
| 50 | +C++, Cython, Fortran) directly can't easily be run on another array library |
| 51 | +than NumPy (see :ref:`C-api` for more details about this topic). Pure Python |
| 52 | +code can work though. There's two main possibilities: |
| 53 | + |
| 54 | +1. Testing with another package, manually or in CI, and simply provide a list |
| 55 | + of functionality that is found to work. Then make ad-hoc fixes to expand |
| 56 | + the set that works. |
| 57 | +2. Start relying on a well-defined subset of the NumPy API (or a new |
| 58 | + NumPy-like API), for which compatibility is guaranteed. |
| 59 | + |
| 60 | +Option (2) seems strongly preferable, and that "well-defined subset" is _what |
| 61 | +an API standard should provide_. Testing will still be needed, to ensure there |
| 62 | +are no critical corner cases or bugs between array implementations, however |
| 63 | +that's then a very tractable task. |
| 64 | + |
| 65 | +As a concrete example, consider the spectral analysis functions in `scipy.signal`. |
| 66 | +All of those functions (e.g., `periodogram`, `spectrogram`, `csd`, `welch`, `stft`, |
| 67 | +`istft`) are pure Python - with the exception of `lombscargle` which is ~40 |
| 68 | +lines of Cython - and uses NumPy function calls, array attributes and |
| 69 | +indexing. The beginning of each function could be changed to retrieve the |
| 70 | +module that implements the array API standard for the given input array type, |
| 71 | +and then functions from that module could be used instead of NumPy functions. |
| 72 | + |
| 73 | +If the user has another array type, say a CuPy or PyTorch array `x` on their |
| 74 | +GPU, doing: |
| 75 | +``` |
| 76 | +from scipy import signal |
| 77 | +
|
| 78 | +signal.welch(x) |
| 79 | +``` |
| 80 | +will result in: |
| 81 | +``` |
| 82 | +# For CuPy |
| 83 | +ValueError: object __array__ method not producing an array |
| 84 | +
|
| 85 | +# For PyTorch |
| 86 | +TypeError: can't convert cuda:0 device type tensor to numpy. |
| 87 | +``` |
| 88 | +and therefore the user will have to explicitly convert to and from a |
| 89 | +`numpy.ndarray` (which is quite inefficient): |
| 90 | +``` |
| 91 | +# For CuPy |
| 92 | +x_np = cupy.asnumpy(x) |
| 93 | +freq, Pxx = (cupy.asarray(res) for res in signal.welch(x_np)) |
| 94 | +
|
| 95 | +# For PyTorch |
| 96 | +x_np = x.cpu().numpy() |
| 97 | +# Note: ends up with tensors on CPU, may still have to move them back |
| 98 | +freq, Pxx = (torch.tensor(res) for res in signal.welch(x_np)) |
| 99 | +``` |
| 100 | +This code will look a little different for each array library. The end goal |
| 101 | +here is to be able to write this instead as: |
| 102 | +``` |
| 103 | +freq, Pxx = signal.welch(x) |
| 104 | +``` |
| 105 | +and have `freq`, `Pxx` be arrays of the same type and on the same device as `x`. |
| 106 | + |
| 107 | +.. note:: |
| 108 | + |
| 109 | + This type of use case applies to many other libraries, from scikit-learn |
| 110 | + and scikit-image to domain-specific libraries like AstroPy and |
| 111 | + scikit-bio, to code written for a single purpose or user. |
0 commit comments