|
| 1 | ++++ |
| 2 | +date = "2023-03-01T08:00:00+00:00" |
| 3 | +author = "Athan Reines" |
| 4 | +title = "2022 release of the Array API Standard" |
| 5 | +tags = ["APIs", "standard", "consortium", "arrays", "community"] |
| 6 | +categories = ["Consortium", "Standardization"] |
| 7 | +description = "The 2022 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries." |
| 8 | +draft = false |
| 9 | +weight = 30 |
| 10 | ++++ |
| 11 | + |
| 12 | +Today marks another significant milestone for the Consortium for Python Data |
| 13 | +API Standards. We're excited to announce the release of the 2022 revision of |
| 14 | +the Array API Standard. This release is a culmination of extensive discussion |
| 15 | +and coordination among array libraries to build on the [initial 2021 |
| 16 | +release](https://data-apis.org/blog/array_api_standard_release/) of the Array |
| 17 | +API Standard and to continue reaching consensus on unified API design and |
| 18 | +behavior among array libraries within the PyData ecosystem. |
| 19 | + |
| 20 | +Multi-dimensional arrays (a.k.a. tensors) are the fundamental data structure |
| 21 | +for many scientific and numerical computing applications, and the PyData |
| 22 | +ecosystem has a rich set of libraries for working with arrays, including NumPy, |
| 23 | +CuPy, Dask, PyTorch, MXNet, JAX, TensorFlow, and beyond. Historically, |
| 24 | +interoperation among array libraries has been challenging due to divergent API |
| 25 | +designs and subtle variation in behavior such that code written for one array |
| 26 | +library cannot be readily ported to another array library. To address these |
| 27 | +challenges, the Consortium for Python Data API Standards was established to |
| 28 | +facilitate coordination among array and dataframe library maintainers, |
| 29 | +sponsoring organizations, and key stakeholders and to provide a transparent and |
| 30 | +inclusive process--with input from the broader Python community--for |
| 31 | +standardizing array API design. |
| 32 | + |
| 33 | +## Brief Timeline |
| 34 | + |
| 35 | +The Consortium was established in May, 2020, and work immediately began to |
| 36 | +identify key pain points among array libraries and to research usage patterns |
| 37 | +to help inform future API design. In the fall of 2020, we released an initial |
| 38 | +draft of the array API specification and sought input from the broader PyData |
| 39 | +ecosystem during an extended community review period. |
| 40 | + |
| 41 | +During the community review period, we incorporated community feedback and |
| 42 | +continued iterating on existing API design. To facilitate community adoption of |
| 43 | +the array API standard, we worked with the NumPy community to implement a |
| 44 | +conforming reference implementation. The CuPy, PyTorch, and MXNet communities |
| 45 | +built upon this work and soon began efforts to adopt the array API in their own |
| 46 | +array libraries. |
| 47 | + |
| 48 | +Throughout 2021, we engaged in a tight feedback loop with array API adopters to |
| 49 | +refine and improve the initial draft specification. With each tweak to the |
| 50 | +specification, we continued our efforts to provide a portable [test |
| 51 | +suite](https://github.com/data-apis/array-api-tests) for testing compliance |
| 52 | +with the array API standard. During this time, we also introduced a data |
| 53 | +interchange protocol based on [DLPack](https://github.com/dmlc/dlpack) to |
| 54 | +facilitate zero-copy memory exchange between array libraries. |
| 55 | + |
| 56 | +In addition to a core set of API designs for array creation, mutation, and |
| 57 | +element-wise computation, we introduced "extensions". Extensions are defined as |
| 58 | +coherent sets of functionality that are commonly implemented across array |
| 59 | +libraries. In contrast to the set of "core" specification-defined APIs, |
| 60 | +conforming array libraries are not required to implement extensions, as some |
| 61 | +extension APIs may pose an undue development burden due to device constraints, |
| 62 | +algorithmic complexity, or other library-specific considerations. The first |
| 63 | +extension included in the specification was the `linalg` extension, which |
| 64 | +defines a set of linear algebra APIs for computing eigenvalues, performing |
| 65 | +singular value decomposition, solving a system of linear equations, and other |
| 66 | +linear algebra operations. |
| 67 | + |
| 68 | +By the end of 2021, we neared completion of the first official release of the |
| 69 | +Array API Standard. And after some last minute (and rather thorny) concerns |
| 70 | +delayed finalization (looking at you copy-view mutability!), we were finally |
| 71 | +able to tag the 2021 revision in April, 2022. Phew! And hurray! |
| 72 | + |
| 73 | +## 2022 Revision |
| 74 | + |
| 75 | +After finalizing the 2021 revision of the Array API Standard, we began in |
| 76 | +earnest on the 2022 revision with the ambitious goal to finalize its release by |
| 77 | +year's end. We had two key objectives: 1) standardize complex number support |
| 78 | +and 2) standardize an extension for Fast Fourier Transforms (FFTs). |
| 79 | + |
| 80 | +Complex numbers have a wide range of applications, including signal processing, |
| 81 | +control theory, quantum mechanics, fluid dynamics, linear algebra, cartography, |
| 82 | +and in various other physics domains. Up until recently, complex number support |
| 83 | +among array libraries was spotty, at best, due to additional algorithmic |
| 84 | +complexity and lack of device support, something which especially limited |
| 85 | +GPU-based accelerator libraries. However, the tide began to change in recent |
| 86 | +years as array libraries sought to replicate additional APIs found in NumPy in |
| 87 | +their own libraries and device support steadily increased. |
| 88 | + |
| 89 | +During our work on the 2021 revision, standardizing complex number behavior was |
| 90 | +one of the top requests from the community; however, array libraries, such as |
| 91 | +CuPy and PyTorch, were still in the process of adding full complex number |
| 92 | +support across their APIs. Given the still evolving landscape across the |
| 93 | +ecosystem, we wanted to avoid prematurely constraining API design before full |
| 94 | +consideration of the real-world experience gained while attempting to support |
| 95 | +complex numbers across heterogeneous platforms and device types, and we wanted |
| 96 | +to allow array libraries the flexibility to continue experimenting with API |
| 97 | +design choices. |
| 98 | + |
| 99 | +By the time we put the finishing touches on the 2021 revision, we had enough |
| 100 | +data, cross-library experience, and insight to chart a path forward. Helping |
| 101 | +motivate this initiative were two desires. First, several linear algebra APIs |
| 102 | +specified in the `linalg` extension, such as those for eigenvalue |
| 103 | +decomposition, singular value decomposition, and Cholesky decomposition, |
| 104 | +required complex number support in order to be full-featured. And second, if we |
| 105 | +wanted to standardize APIs for computing Fast Fourier Transforms (FFTs), we |
| 106 | +needed complex numbers. |
| 107 | + |
| 108 | +FFTs are a class of algorithms for computing the discrete Fourier transform |
| 109 | +(DFT) of a sequence, or its inverse (IDFT), and are widely used in signal |
| 110 | +processing applications in engineering, music, science, and mathematics. As |
| 111 | +array libraries added complex number support, FFT APIs followed close behind. |
| 112 | +Luckily for us, FFT API design was fairly consistent across the ecosystem, |
| 113 | +making these APIs good candidates for standardization. |
| 114 | + |
| 115 | +With our priorities set, the 6 months following the 2021 revision were |
| 116 | +comprised of requirements gathering, API design iteration, and engaging |
| 117 | +community stakeholders. One of the significant challenges in specifying complex |
| 118 | +number behavior for element-wise algebraic and transcendental functions was the |
| 119 | +absence of a widely followed specification equivalent to the IEEE 754 |
| 120 | +specification for real-valued floating-point numbers. In particular, how and |
| 121 | +where to choose branch cuts and how to handle complex floating-point infinity |
| 122 | +remain matters of choice, with equally valid arguments to be made for following |
| 123 | +different conventions. In the end, we made the decision to adhere to C99 |
| 124 | +semantics, as this was the dominant convention among array libraries, with |
| 125 | +allowance for divergent behavior in a small number of special cases. |
| 126 | + |
| 127 | +In addition to complex number support and FFTs, the 2022 revision specifies |
| 128 | +`take` for returning an arbitrary list of elements along a specified axis. |
| 129 | +Standardizing this API was a high priority request among downstream array API |
| 130 | +consumers, such as scikit-learn, which commonly use `take` for sampling |
| 131 | +multi-dimensional arrays. And one other notable addition was the inclusion of |
| 132 | +`isdtype`, which provides a consistent API across array libraries for testing |
| 133 | +whether a provided data type is of a specified data type kind--something that, |
| 134 | +prior to this specification, was widely divergent across array libraries, thus |
| 135 | +making `isdtype` a definite ergonomic and portability win. |
| 136 | + |
| 137 | +The full list of API additions, updates, and errata can be found in the |
| 138 | +specification |
| 139 | +[changelog](https://github.com/data-apis/array-api/blob/main/CHANGELOG.md). |
| 140 | + |
| 141 | +## Facilitating Array API Adoption |
| 142 | + |
| 143 | +Array API adoption requires buy-in from both array libraries and the downstream |
| 144 | +consumers of those libraries. As such, adoption faces two key challenges. |
| 145 | +First, to facilitate development, array libraries need a robust mechanism for |
| 146 | +determining whether they are specification compliant. Second, while array |
| 147 | +libraries work to become fully specification compliant, downstream libraries |
| 148 | +need to be able to target a stable compatibility layer in order to smooth over |
| 149 | +subtle differences in array library behavior. |
| 150 | + |
| 151 | +To address the first challenge, we've released a comprehensive portable [test |
| 152 | +suite](https://github.com/data-apis/array-api-tests) built on Pytest and |
| 153 | +Hypothesis for testing Array API Standard compliance. The test suite supports |
| 154 | +custom configurations in order to accommodate library-specific specification |
| 155 | +deviations and supports vendoring, thus allowing array libraries to easily |
| 156 | +include the test suite alongside their existing tests. Upon running the test |
| 157 | +suite, the test suite provides a detailed overview of specification compliance, |
| 158 | +providing a handy benchmark as array libraries work to iteratively improve |
| 159 | +their compliance score. |
| 160 | + |
| 161 | +To address the second challenge, we've released an [array compatibility |
| 162 | +layer](https://github.com/data-apis/array-api-compat) which provides a small |
| 163 | +wrapper around existing array libraries to ensure Array API Standard compliant |
| 164 | +behavior. Using the compatibility layer is as simple as updating your imports. |
| 165 | +For example, instead of |
| 166 | + |
| 167 | +```python |
| 168 | +import numpy as np |
| 169 | +``` |
| 170 | + |
| 171 | +do |
| 172 | + |
| 173 | +```python |
| 174 | +import array_api_compat.numpy as np |
| 175 | +``` |
| 176 | + |
| 177 | +And instead of |
| 178 | + |
| 179 | +```python |
| 180 | +import cupy as cp |
| 181 | +``` |
| 182 | + |
| 183 | +do |
| 184 | + |
| 185 | +```python |
| 186 | +import array_api_compat.cupy as cp |
| 187 | +``` |
| 188 | + |
| 189 | +Each import includes all the functions from the normal NumPy or CuPy namespace, |
| 190 | +with the exception that functions having counterparts in the Array API Standard |
| 191 | +are wrapped to ensure specification-compliant behavior. |
| 192 | + |
| 193 | +Currently, the compatibility layer supports NumPy, CuPy, and PyTorch, but we're |
| 194 | +hoping to extend support to additional array libraries in the year ahead. In |
| 195 | +the meantime, if you're an array library consumer, we'd love to get your |
| 196 | +feedback. To get started, install from |
| 197 | +[PyPI](https://pypi.org/project/array-api-compat/) |
| 198 | + |
| 199 | +```bash |
| 200 | +pip install array-api-compat |
| 201 | +``` |
| 202 | + |
| 203 | +and take it for a spin! If you encounter any issues, please be sure to let us |
| 204 | +know over on the library issue |
| 205 | +[tracker](https://github.com/data-apis/array-api-compat/issues). |
| 206 | + |
| 207 | +## The Road Ahead |
| 208 | + |
| 209 | +So what's in store for 2023?! The primary theme for 2023 is adoption, adoption, |
| 210 | +and more adoption. We're deeply committed to ensuring the success of this |
| 211 | +Consortium and to improving the landscape of array computing within the PyData |
| 212 | +ecosystem. While achieving buy-in from array libraries across the ecosystem has |
| 213 | +been a significant achievement, what is critical for the long-term success of |
| 214 | +this collective effort is driving adoption among downstream libraries, such as |
| 215 | +SciPy, scikit-learn, and others, in order to achieve our stated goal of |
| 216 | +facilitating interoperability among array libraries. In short, we want to |
| 217 | +unshackle downstream libraries from any one particular array library and |
| 218 | +provide users of SciPy et al the freedom to use, not just NumPy, but the array |
| 219 | +library which best makes sense for them and their use cases. |
| 220 | + |
| 221 | +To drive this effort, we'll be |
| 222 | + |
| 223 | +1. working closely with downstream libraries to identify existing pain points |
| 224 | + and blockers preventing adoption. |
| 225 | +2. developing a robust set of tools for specification compliance monitoring and |
| 226 | + reporting. |
| 227 | +3. extending the [array compatibility |
| 228 | + layer](https://github.com/data-apis/array-api-compat) to support additional |
| 229 | + array libraries and thus further smooth the transition to a shackle-free |
| 230 | + future. |
| 231 | + |
| 232 | +We're excited for the year ahead, and we'd love to get your feedback! To |
| 233 | +provide feedback on the Array API Standard, please open issues or pull requests |
| 234 | +on <https://github.com/data-apis/array-api>. |
| 235 | + |
| 236 | +Cheers! |
0 commit comments