diff --git a/spec/future_API_evolution.md b/spec/future_API_evolution.md index a173d1d11..bb8323f14 100644 --- a/spec/future_API_evolution.md +++ b/spec/future_API_evolution.md @@ -1,3 +1,5 @@ +.. _future-API-evolution: + # Future API standard evolution ## Scope extensions diff --git a/spec/purpose_and_scope.md b/spec/purpose_and_scope.md index 7466e9ac7..a2401f8d2 100644 --- a/spec/purpose_and_scope.md +++ b/spec/purpose_and_scope.md @@ -2,12 +2,86 @@ ## Introduction +Python users have a wealth of choice for libraries and frameworks for +numerical computing, data science, machine learning, and deep learning. New +frameworks pushing forward the state of the art in these fields are appearing +every year. One unintended consequence of all this activity and creativity +has been fragmentation in multidimensional array (a.k.a. tensor) libraries - +which are the fundamental data structure for these fields. Choices include +NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, Xarray, and others. + +The APIs of each of these libraries are largely similar, but with enough +differences that it's quite difficult to write code that works with multiple +(or all) of these libraries. This array API standard aims to address that +issue, by specifying an API for the most common ways arrays are constructed +and used. + +Why not simply pick an existing API and bless that as the standard? In short, +because there are often good reasons for the current inconsistencies between +libraries. The most obvious candidate for that existing API is NumPy. However +NumPy was not designed with non-CPU devices, graph-based libraries, or JIT +compilers in mind. Other libraries often deviate from NumPy for good +(necessary) reasons. Choices made in this API standard are often the same +ones NumPy makes, or close to it, but are different where necessary to make +sure all existing array libraries can adopt this API. + ### This API standard +This document aims to standardize functionality that exists in most/all array +libraries and either is commonly used or is needed for +consistency/completeness. Usage is determined via analysis of downstream +libraries, see :ref:`usage-data`. An example of consistency is: there are +functional equivalents for all Python operators (including the rarely used +ones). + +Beyond usage and consistency, there's a set of use cases that inform the API +design to ensure it's fit for a wide range of users and situations - see +:ref:`use-cases`. + +A question that may arise when reading this document is: _"what about +functionality that's not present in this document?_ This: + +- means that there is no guarantee the functionality is present in libraries + adhering to the standard +- does _not_ mean that that functionality is unimportant +- may indicate that that functionality, if present in a particular array + library, is unlikely to be present in all other libraries -## History +.. note:: + This document is ready for wider community review, but still contains a + number of TODOs, and is expected to change and evolve before a first + official release. See :ref:`future-API-evolution` for proposed + versioning. + + +### History + +The first library for numerical and scientific computing in Python was +Numeric, developed in the mid-1990s. In the early 2000s a second, similar +library, Numarray, was created. In 2005 NumPy was written, superceding both +Numeric and Numarray and resolving the fragmentation at that time. For +roughly a decade, NumPy was the only widely used array library. Over the past +~5 years, mainly due to the emergence of new hardware and the rise of deep +learning, many other libraries have appeared, leading to more severe +fragmentation. Concepts and APIs in newer libraries were often inspired by +(or copied from) those in older ones - and then changed or improved upon to +fit new needs and use cases. Individual library authors discussed ideas, +however there was never (before this array API standard) an serious attempt +to coordinate between all libraries to avoid fragmentation and arrive at a +common API standard. + +The idea for this array API standard grew gradually out of many conversations +between maintainers during 2019-2020. It quickly became clear that any +attempt to write a new "reference library" to fix the current fragmentation +was infeasible - unlike in 2005, there are now too many different use cases +and too many stakeholders, and the speed of innovation is too high. In May +2020 an initial group of maintainers was assembled in the [Consortium for +Python Data API Standards](https://data-apis.org/) to start drafting a +specification for an array API that could be adopted by each of the existing +array and tensor libraries. That resulted in this document, describing that +API. ## Scope (includes out-of-scope / non-goals) @@ -306,44 +380,34 @@ For the purposes of this specification, the following terms and definitions appl -### array - +**array**: a (usually fixed-size) multidimensional container of items of the same type and size. -### axis - +**axis**: an array dimension. -### broadcast - +**broadcast**: automatic (implicit) expansion of array dimensions to be of equal sizes without copying array data for the purpose of making arrays with different shapes have compatible shapes for element-wise operations. -### compatible - +**compatible**: two arrays whose dimensions are compatible (i.e., where the size of each dimension in one array is either equal to one or to the size of the corresponding dimension in a second array). -### element-wise - +**element-wise**: an operation performed element-by-element, in which individual array elements are considered in isolation and independently of other elements within the same array. -### matrix - +**matrix**: a two-dimensional array. -### rank - +**rank**: number of array dimensions (not to be confused with the number of linearly independent columns of a matrix). -### shape - +**shape**: a tuple of `N` non-negative integers that specify the sizes of each dimension and where `N` corresponds to the number of dimensions. -### singleton dimension - +**singleton dimension**: a dimension whose size is one. -### vector - +**vector**: a one-dimensional array. * * * diff --git a/spec/usage_data.md b/spec/usage_data.md index 8678e5968..5b7482929 100644 --- a/spec/usage_data.md +++ b/spec/usage_data.md @@ -1,3 +1,5 @@ +.. _usage-data: + # Usage Data > Summary of existing array API design and usage. diff --git a/spec/use_cases.md b/spec/use_cases.md index 7a696ef17..9562d9935 100644 --- a/spec/use_cases.md +++ b/spec/use_cases.md @@ -1,3 +1,5 @@ +.. _use-cases: + # Use cases Use cases inform the requirements for, and design choices made in, this array