Skip to content

Commit fa7e6de

Browse files
authored
Add blog post announcing v2022 Array API Standard release (#16)
1 parent 5acbd60 commit fa7e6de

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
+++
2+
date = "2023-03-01T08:00:00+00:00"
3+
author = "Athan Reines"
4+
title = "2022 release of the Array API Standard"
5+
tags = ["APIs", "standard", "consortium", "arrays", "community"]
6+
categories = ["Consortium", "Standardization"]
7+
description = "The 2022 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries."
8+
draft = false
9+
weight = 30
10+
+++
11+
12+
Today marks another significant milestone for the Consortium for Python Data
13+
API Standards. We're excited to announce the release of the 2022 revision of
14+
the Array API Standard. This release is a culmination of extensive discussion
15+
and coordination among array libraries to build on the [initial 2021
16+
release](https://data-apis.org/blog/array_api_standard_release/) of the Array
17+
API Standard and to continue reaching consensus on unified API design and
18+
behavior among array libraries within the PyData ecosystem.
19+
20+
Multi-dimensional arrays (a.k.a. tensors) are the fundamental data structure
21+
for many scientific and numerical computing applications, and the PyData
22+
ecosystem has a rich set of libraries for working with arrays, including NumPy,
23+
CuPy, Dask, PyTorch, MXNet, JAX, TensorFlow, and beyond. Historically,
24+
interoperation among array libraries has been challenging due to divergent API
25+
designs and subtle variation in behavior such that code written for one array
26+
library cannot be readily ported to another array library. To address these
27+
challenges, the Consortium for Python Data API Standards was established to
28+
facilitate coordination among array and dataframe library maintainers,
29+
sponsoring organizations, and key stakeholders and to provide a transparent and
30+
inclusive process--with input from the broader Python community--for
31+
standardizing array API design.
32+
33+
## Brief Timeline
34+
35+
The Consortium was established in May, 2020, and work immediately began to
36+
identify key pain points among array libraries and to research usage patterns
37+
to help inform future API design. In the fall of 2020, we released an initial
38+
draft of the array API specification and sought input from the broader PyData
39+
ecosystem during an extended community review period.
40+
41+
During the community review period, we incorporated community feedback and
42+
continued iterating on existing API design. To facilitate community adoption of
43+
the array API standard, we worked with the NumPy community to implement a
44+
conforming reference implementation. The CuPy, PyTorch, and MXNet communities
45+
built upon this work and soon began efforts to adopt the array API in their own
46+
array libraries.
47+
48+
Throughout 2021, we engaged in a tight feedback loop with array API adopters to
49+
refine and improve the initial draft specification. With each tweak to the
50+
specification, we continued our efforts to provide a portable [test
51+
suite](https://github.com/data-apis/array-api-tests) for testing compliance
52+
with the array API standard. During this time, we also introduced a data
53+
interchange protocol based on [DLPack](https://github.com/dmlc/dlpack) to
54+
facilitate zero-copy memory exchange between array libraries.
55+
56+
In addition to a core set of API designs for array creation, mutation, and
57+
element-wise computation, we introduced "extensions". Extensions are defined as
58+
coherent sets of functionality that are commonly implemented across array
59+
libraries. In contrast to the set of "core" specification-defined APIs,
60+
conforming array libraries are not required to implement extensions, as some
61+
extension APIs may pose an undue development burden due to device constraints,
62+
algorithmic complexity, or other library-specific considerations. The first
63+
extension included in the specification was the `linalg` extension, which
64+
defines a set of linear algebra APIs for computing eigenvalues, performing
65+
singular value decomposition, solving a system of linear equations, and other
66+
linear algebra operations.
67+
68+
By the end of 2021, we neared completion of the first official release of the
69+
Array API Standard. And after some last minute (and rather thorny) concerns
70+
delayed finalization (looking at you copy-view mutability!), we were finally
71+
able to tag the 2021 revision in April, 2022. Phew! And hurray!
72+
73+
## 2022 Revision
74+
75+
After finalizing the 2021 revision of the Array API Standard, we began in
76+
earnest on the 2022 revision with the ambitious goal to finalize its release by
77+
year's end. We had two key objectives: 1) standardize complex number support
78+
and 2) standardize an extension for Fast Fourier Transforms (FFTs).
79+
80+
Complex numbers have a wide range of applications, including signal processing,
81+
control theory, quantum mechanics, fluid dynamics, linear algebra, cartography,
82+
and in various other physics domains. Up until recently, complex number support
83+
among array libraries was spotty, at best, due to additional algorithmic
84+
complexity and lack of device support, something which especially limited
85+
GPU-based accelerator libraries. However, the tide began to change in recent
86+
years as array libraries sought to replicate additional APIs found in NumPy in
87+
their own libraries and device support steadily increased.
88+
89+
During our work on the 2021 revision, standardizing complex number behavior was
90+
one of the top requests from the community; however, array libraries, such as
91+
CuPy and PyTorch, were still in the process of adding full complex number
92+
support across their APIs. Given the still evolving landscape across the
93+
ecosystem, we wanted to avoid prematurely constraining API design before full
94+
consideration of the real-world experience gained while attempting to support
95+
complex numbers across heterogeneous platforms and device types, and we wanted
96+
to allow array libraries the flexibility to continue experimenting with API
97+
design choices.
98+
99+
By the time we put the finishing touches on the 2021 revision, we had enough
100+
data, cross-library experience, and insight to chart a path forward. Helping
101+
motivate this initiative were two desires. First, several linear algebra APIs
102+
specified in the `linalg` extension, such as those for eigenvalue
103+
decomposition, singular value decomposition, and Cholesky decomposition,
104+
required complex number support in order to be full-featured. And second, if we
105+
wanted to standardize APIs for computing Fast Fourier Transforms (FFTs), we
106+
needed complex numbers.
107+
108+
FFTs are a class of algorithms for computing the discrete Fourier transform
109+
(DFT) of a sequence, or its inverse (IDFT), and are widely used in signal
110+
processing applications in engineering, music, science, and mathematics. As
111+
array libraries added complex number support, FFT APIs followed close behind.
112+
Luckily for us, FFT API design was fairly consistent across the ecosystem,
113+
making these APIs good candidates for standardization.
114+
115+
With our priorities set, the 6 months following the 2021 revision were
116+
comprised of requirements gathering, API design iteration, and engaging
117+
community stakeholders. One of the significant challenges in specifying complex
118+
number behavior for element-wise algebraic and transcendental functions was the
119+
absence of a widely followed specification equivalent to the IEEE 754
120+
specification for real-valued floating-point numbers. In particular, how and
121+
where to choose branch cuts and how to handle complex floating-point infinity
122+
remain matters of choice, with equally valid arguments to be made for following
123+
different conventions. In the end, we made the decision to adhere to C99
124+
semantics, as this was the dominant convention among array libraries, with
125+
allowance for divergent behavior in a small number of special cases.
126+
127+
In addition to complex number support and FFTs, the 2022 revision specifies
128+
`take` for returning an arbitrary list of elements along a specified axis.
129+
Standardizing this API was a high priority request among downstream array API
130+
consumers, such as scikit-learn, which commonly use `take` for sampling
131+
multi-dimensional arrays. And one other notable addition was the inclusion of
132+
`isdtype`, which provides a consistent API across array libraries for testing
133+
whether a provided data type is of a specified data type kind--something that,
134+
prior to this specification, was widely divergent across array libraries, thus
135+
making `isdtype` a definite ergonomic and portability win.
136+
137+
The full list of API additions, updates, and errata can be found in the
138+
specification
139+
[changelog](https://github.com/data-apis/array-api/blob/main/CHANGELOG.md).
140+
141+
## Facilitating Array API Adoption
142+
143+
Array API adoption requires buy-in from both array libraries and the downstream
144+
consumers of those libraries. As such, adoption faces two key challenges.
145+
First, to facilitate development, array libraries need a robust mechanism for
146+
determining whether they are specification compliant. Second, while array
147+
libraries work to become fully specification compliant, downstream libraries
148+
need to be able to target a stable compatibility layer in order to smooth over
149+
subtle differences in array library behavior.
150+
151+
To address the first challenge, we've released a comprehensive portable [test
152+
suite](https://github.com/data-apis/array-api-tests) built on Pytest and
153+
Hypothesis for testing Array API Standard compliance. The test suite supports
154+
custom configurations in order to accommodate library-specific specification
155+
deviations and supports vendoring, thus allowing array libraries to easily
156+
include the test suite alongside their existing tests. Upon running the test
157+
suite, the test suite provides a detailed overview of specification compliance,
158+
providing a handy benchmark as array libraries work to iteratively improve
159+
their compliance score.
160+
161+
To address the second challenge, we've released an [array compatibility
162+
layer](https://github.com/data-apis/array-api-compat) which provides a small
163+
wrapper around existing array libraries to ensure Array API Standard compliant
164+
behavior. Using the compatibility layer is as simple as updating your imports.
165+
For example, instead of
166+
167+
```python
168+
import numpy as np
169+
```
170+
171+
do
172+
173+
```python
174+
import array_api_compat.numpy as np
175+
```
176+
177+
And instead of
178+
179+
```python
180+
import cupy as cp
181+
```
182+
183+
do
184+
185+
```python
186+
import array_api_compat.cupy as cp
187+
```
188+
189+
Each import includes all the functions from the normal NumPy or CuPy namespace,
190+
with the exception that functions having counterparts in the Array API Standard
191+
are wrapped to ensure specification-compliant behavior.
192+
193+
Currently, the compatibility layer supports NumPy, CuPy, and PyTorch, but we're
194+
hoping to extend support to additional array libraries in the year ahead. In
195+
the meantime, if you're an array library consumer, we'd love to get your
196+
feedback. To get started, install from
197+
[PyPI](https://pypi.org/project/array-api-compat/)
198+
199+
```bash
200+
pip install array-api-compat
201+
```
202+
203+
and take it for a spin! If you encounter any issues, please be sure to let us
204+
know over on the library issue
205+
[tracker](https://github.com/data-apis/array-api-compat/issues).
206+
207+
## The Road Ahead
208+
209+
So what's in store for 2023?! The primary theme for 2023 is adoption, adoption,
210+
and more adoption. We're deeply committed to ensuring the success of this
211+
Consortium and to improving the landscape of array computing within the PyData
212+
ecosystem. While achieving buy-in from array libraries across the ecosystem has
213+
been a significant achievement, what is critical for the long-term success of
214+
this collective effort is driving adoption among downstream libraries, such as
215+
SciPy, scikit-learn, and others, in order to achieve our stated goal of
216+
facilitating interoperability among array libraries. In short, we want to
217+
unshackle downstream libraries from any one particular array library and
218+
provide users of SciPy et al the freedom to use, not just NumPy, but the array
219+
library which best makes sense for them and their use cases.
220+
221+
To drive this effort, we'll be
222+
223+
1. working closely with downstream libraries to identify existing pain points
224+
and blockers preventing adoption.
225+
2. developing a robust set of tools for specification compliance monitoring and
226+
reporting.
227+
3. extending the [array compatibility
228+
layer](https://github.com/data-apis/array-api-compat) to support additional
229+
array libraries and thus further smooth the transition to a shackle-free
230+
future.
231+
232+
We're excited for the year ahead, and we'd love to get your feedback! To
233+
provide feedback on the Array API Standard, please open issues or pull requests
234+
on <https://github.com/data-apis/array-api>.
235+
236+
Cheers!

0 commit comments

Comments
 (0)