Skip to content

Related topic: NumPy array protocols #1

Closed
@rgommers

Description

@rgommers

This issue is meant to summarize the current status and likely future direction of the NumPy array protocols, and their relevance to the array API standard.

What are these array protocols?

In summary, they are dispatching mechanisms that allow calling the public NumPy API with other numpy.ndarray-like arrays (e.g. CuPy or Dask arrays, or any other array that implements the protocols) and have the function call dispatch to that library. There are two protocols, __array_ufunc__ and __array_function__, that are very similar - the difference is that with __array_ufunc__ the library being dispatched to knows it's getting a ufunc and it can therefore make use of some properties all ufuncs have. The dispatching works the same for both protocols though.

Why were they created?

__array_ufunc__ was created first, the original driver was to be able to call numpy ufuncs on scipy.sparse matrices. __array_function__ was created later, to be able to cover most of the NumPy API (every function that takes an array as input) and use the NumPy API with other array/tensor implementations:

image

What is the current status?

The protocols have been adopted by:

  • CuPy
  • Dask
  • Xarray
  • MXNet
  • PyData Sparse
  • Pint

They have not (or not yet) been adopted by:

  • Tensorflow (because no compatible API to dispatch to, interest of maintainers unclear)
  • PyTorch (because no compatible API to dispatch to, maintainers do have interest)
  • JAX (concerns about added value and backwards compatibility - see NEP 37 introduction)
  • scipy.sparse (semantics not compatible)

The RAPIDS ecosystem, which builds on Dask and CuPy, has been particularly happy with these protocols, and use them heavily. There they've also run into some of the limitations, the most painful one being that array creation functions cannot be dispatched on.

What is likely to change in the near future?

There is still active exploration of new ideas and design alternatives (or additions to) the array protocols. There's 3 main "contenders":

  1. extend the protocols to cover the most painful shortcomings: NEP 30 (__duckarray__) + NEP 35 (like=).
  2. use a separate module namespace: NEP 37 (__array_module__)
  3. use a multiple dispatch library: NEP 31 (unumpy)

At the moment, the most likely outcome is doing both (1) and (2). It needs prototyping and testing though - any solution should only be accepted when it's clear that it not only solves the immediate pain points RAPIDS ran into, but also that libraries like scikit-learn and SciPy can then adopt it.

What is the relationship of the array protocols with an API standard?

There's several connections:

  • The original idea of __array_function__ (figure above) doesn't require an API that's the same as the NumPy one, but in practice the protocols can only be adopted when there's an API with matching signatures and semantics.
  • The lack of an API standard has meant that it's hard to predict what NumPy functions will work for another array library that implements the protocols.
  • The separate namespaces (__array_module__, unumpy) provide a good opportunity to introduce a new API standard once that's agreed on.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions