Skip to content

Add blog post announcing v2022 Array API Standard release #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 2, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 236 additions & 0 deletions content/blog/array_api_v2022_release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
+++
date = "2023-03-01T08:00:00+00:00"
author = "Athan Reines"
title = "2022 release of the Array API Standard"
tags = ["APIs", "standard", "consortium", "arrays", "community"]
categories = ["Consortium", "Standardization"]
description = "The 2022 revision of the array API standard has been finalized and is ready for adoption by conforming array libraries."
draft = false
weight = 30
+++

Today marks another significant milestone for the Consortium for Python Data
API Standards. We're excited to announce the release of the 2022 revision of
the Array API Standard. This release is a culmination of extensive discussion
and coordination among array libraries to build on the [initial 2021
release](https://data-apis.org/blog/array_api_standard_release/) of the Array
API Standard and to continue reaching consensus on unified API design and
behavior among array libraries within the PyData ecosystem.

Multi-dimensional arrays (a.k.a. tensors) are the fundamental data structure
for many scientific and numerical computing applications, and the PyData
ecosystem has a rich set of libraries for working with arrays, including NumPy,
CuPy, Dask, PyTorch, MXNet, JAX, TensorFlow, and beyond. Historically,
interoperation among array libraries has been challenging due to divergent API
designs and subtle variation in behavior such that code written for one array
library cannot be readily ported to another array library. To address these
challenges, the Consortium for Python Data API Standards was established to
facilitate coordination among array and dataframe library maintainers,
sponsoring organizations, and key stakeholders and to provide a transparent and
inclusive process--with input from the broader Python community--for
standardizing array API design.

## Brief Timeline

The Consortium was established in May, 2020, and work immediately began to
identify key pain points among array libraries and to research usage patterns
to help inform future API design. In the fall of 2020, we released an initial
draft of the array API specification and sought input from the broader PyData
ecosystem during an extended community review period.

During the community review period, we incorporated community feedback and
continued iterating on existing API design. To facilitate community adoption of
the array API standard, we worked with the NumPy community to implement a
conforming reference implementation. The CuPy, PyTorch, and MXNet communities
built upon this work and soon began efforts to adopt the array API in their own
array libraries.

Throughout 2021, we engaged in a tight feedback loop with array API adopters to
refine and improve the initial draft specification. With each tweak to the
specification, we continued our efforts to provide a portable [test
suite](https://github.com/data-apis/array-api-tests) for testing compliance
with the array API standard. During this time, we also introduced a data
interchange protocol based on [DLPack](https://github.com/dmlc/dlpack) to
facilitate zero-copy memory exchange between array libraries.

In addition to a core set of API designs for array creation, mutation, and
element-wise computation, we introduced "extensions". Extensions are defined as
coherent sets of functionality that are commonly implemented across array
libraries. In contrast to the set of "core" specification-defined APIs,
conforming array libraries are not required to implement extensions, as some
extension APIs may pose an undue development burden due to device constraints,
algorithmic complexity, or other library-specific considerations. The first
extension included in the specification was the `linalg` extension, which
defines a set of linear algebra APIs for computing eigenvalues, performing
singular value decomposition, solving a system of linear equations, and other
linear algebra operations.

By the end of 2021, we neared completion of the first official release of the
Array API Standard. And after some last minute (and rather thorny) concerns
delayed finalization (looking at you copy-view mutability!), we were finally
able to tag the 2021 revision in April, 2022. Phew! And hurray!

## 2022 Revision

After finalizing the 2021 revision of the Array API Standard, we began in
earnest on the 2022 revision with the ambitious goal to finalize its release by
year's end. We had two key objectives: 1) standardize complex number support
and 2) standardize an extension for Fast Fourier Transforms (FFTs).

Complex numbers have a wide range of applications, including signal processing,
control theory, quantum mechanics, fluid dynamics, linear algebra, cartography,
and in various other physics domains. Up until recently, complex number support
among array libraries was spotty, at best, due to additional algorithmic
complexity and lack of device support, something which especially limited
GPU-based accelerator libraries. However, the tide began to change in recent
years as array libraries sought to replicate additional APIs found in NumPy in
their own libraries and device support steadily increased.

During our work on the 2021 revision, standardizing complex number behavior was
one of the top requests from the community; however, array libraries, such as
CuPy and PyTorch, were still in the process of adding full complex number
support across their APIs. Given the still evolving landscape across the
ecosystem, we wanted to avoid prematurely constraining API design before full
consideration of the real-world experience gained while attempting to support
complex numbers across heterogeneous platforms and device types, and we wanted
to allow array libraries the flexibility to continue experimenting with API
design choices.

By the time we put the finishing touches on the 2021 revision, we had enough
data, cross-library experience, and insight to chart a path forward. Helping
motivate this initiative were two desires. First, several linear algebra APIs
specified in the `linalg` extension, such as those for eigenvalue
decomposition, singular value decomposition, and Cholesky decomposition,
required complex number support in order to be full-featured. And second, if we
wanted to standardize APIs for computing Fast Fourier Transforms (FFTs), we
needed complex numbers.

FFTs are a class of algorithms for computing the discrete Fourier transform
(DFT) of a sequence, or its inverse (IDFT), and are widely used in signal
processing applications in engineering, music, science, and mathematics. As
array libraries added complex number support, FFT APIs followed close behind.
Luckily for us, FFT API design was fairly consistent across the ecosystem,
making these APIs good candidates for standardization.

With our priorities set, the 6 months following the 2021 revision were
comprised of requirements gathering, API design iteration, and engaging
community stakeholders. One of the significant challenges in specifying complex
number behavior for element-wise algebraic and transcendental functions was the
absence of a widely followed specification equivalent to the IEEE 754
specification for real-valued floating-point numbers. In particular, how and
where to choose branch cuts and how to handle complex floating-point infinity
remain matters of choice, with equally valid arguments to be made for following
different conventions. In the end, we made the decision to adhere to C99
semantics, as this was the dominant convention among array libraries, with
allowance for divergent behavior in a small number of special cases.

In addition to complex number support and FFTs, the 2022 revision specifies
`take` for returning an arbitrary list of elements along a specified axis.
Standardizing this API was a high priority request among downstream array API
consumers, such as scikit-learn, which commonly use `take` for sampling
multi-dimensional arrays. And one other notable addition was the inclusion of
`isdtype`, which provides a consistent API across array libraries for testing
whether a provided data type is of a specified data type kind--something that,
prior to this specification, was widely divergent across array libraries, thus
making `isdtype` a definite ergonomic and portability win.

The full list of API additions, updates, and errata can be found in the
specification
[changelog](https://github.com/data-apis/array-api/blob/main/CHANGELOG.md).

## Facilitating Array API Adoption

Array API adoption requires buy-in from both array libraries and the downstream
consumers of those libraries. As such, adoption faces two key challenges.
First, to facilitate development, array libraries need a robust mechanism for
determining whether they are specification compliant. Second, while array
libraries work to become fully specification compliant, downstream libraries
need to be able to target a stable compatibility layer in order to smooth over
subtle differences in array library behavior.

To address the first challenge, we've released a comprehensive portable [test
suite](https://github.com/data-apis/array-api-tests) built on Pytest and
Hypothesis for testing Array API Standard compliance. The test suite supports
custom configurations in order to accommodate library-specific specification
deviations and supports vendoring, thus allowing array libraries to easily
include the test suite alongside their existing tests. Upon running the test
suite, the test suite provides a detailed overview of specification compliance,
providing a handy benchmark as array libraries work to iteratively improve
their compliance score.

To address the second challenge, we've released an [array compatibility
layer](https://github.com/data-apis/array-api-compat) which provides a small
wrapper around existing array libraries to ensure Array API Standard compliant
behavior. Using the compatibility layer is as simple as updating your imports.
For example, instead of

```python
import numpy as np
```

do

```python
import array_api_compat.numpy as np
```

And instead of

```python
import cupy as cp
```

do

```python
import array_api_compat.cupy as cp
```

Each import includes all the functions from the normal NumPy or CuPy namespace,
with the exception that functions having counterparts in the Array API Standard
are wrapped to ensure specification-compliant behavior.

Currently, the compatibility layer supports NumPy, CuPy, and PyTorch, but we're
hoping to extend support to additional array libraries in the year ahead. In
the meantime, if you're an array library consumer, we'd love to get your
feedback. To get started, install from
[PyPI](https://pypi.org/project/array-api-compat/)

```bash
pip install array-api-compat
```

and take it for a spin! If you encounter any issues, please be sure to let us
know over on the library issue
[tracker](https://github.com/data-apis/array-api-compat/issues).

## The Road Ahead

So what's in store for 2023?! The primary theme for 2023 is adoption, adoption,
and more adoption. We're deeply committed to ensuring the success of this
Consortium and to improving the landscape of array computing within the PyData
ecosystem. While achieving buy-in from array libraries across the ecosystem has
been a significant achievement, what is critical for the long-term success of
this collective effort is driving adoption among downstream libraries, such as
SciPy, scikit-learn, and others, in order to achieve our stated goal of
facilitating interoperability among array libraries. In short, we want to
unshackle downstream libraries from any one particular array library and
provide users of SciPy et al the freedom to use, not just NumPy, but the array
library which best makes sense for them and their use cases.

To drive this effort, we'll be

1. working closely with downstream libraries to identify existing pain points
and blockers preventing adoption.
2. developing a robust set of tools for specification compliance monitoring and
reporting.
3. extending the [array compatibility
layer](https://github.com/data-apis/array-api-compat) to support additional
array libraries and thus further smooth the transition to a shackle-free
future.

We're excited for the year ahead, and we'd love to get your feedback! To
provide feedback on the Array API Standard, please open issues or pull requests
on <https://github.com/data-apis/array-api>.

Cheers!