From b5637944a0fda796f315b6d4046fae11141151a5 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Fri, 5 Nov 2021 11:26:03 -0400 Subject: [PATCH 1/5] add unique_counts and a few fixes --- spec/API_specification/set_functions.md | 53 ++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/spec/API_specification/set_functions.md b/spec/API_specification/set_functions.md index 74408b0b1..38d948796 100644 --- a/spec/API_specification/set_functions.md +++ b/spec/API_specification/set_functions.md @@ -21,7 +21,7 @@ A conforming implementation of the array API standard must provide and support t The shapes of two of the output arrays for this function depend on the data values in the input array; hence, array libraries which build computation graphs (e.g., JAX, Dask, etc.) may find this function difficult to implement without knowing array values. Accordingly, such libraries may choose to omit this function. See {ref}`data-dependent-output-shapes` section for more details. ::: -Returns the unique elements of an input array `x`. +Returns the unique elements of an input array `x`, the first occurring indices for each unique element in `x`, the indices from the set of unique elements that reconstruct `x`, and the corresponding counts for each unique element in `x`. ```{note} Uniqueness should be determined based on value equality (i.e., `x_i == x_j`). For input arrays having floating-point data types, value-based equality implies the following behavior. @@ -47,9 +47,50 @@ Each `nan` value should have a count of one, while the counts for signed zeros s - a namedtuple `(values, indices, inverse_indices, counts)` whose - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. - - second element must have the field name `indices` and must be an array containing the indices (first occurrences) of `x` that result in `values`. The array must have the same shape as `values` and must have the default integer data type. - - third element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and must have the default integer data type. - - fourth element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type. + - second element must have the field name `indices` and must be an array containing the indices (first occurrences) of `x` that result in `values`. The array must have the same shape as `values` and must have the default integer data type (or promoted if needed). + - third element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and must have the default integer data type (or promoted if needed). + - fourth element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type (or promoted if needed). + + ```{note} + The order of unique elements is not specified and may vary between implementations. + ``` + +(function-unique-count)= +### unique_count(x, /) + +:::{admonition} Data-dependent output shape +:class: important + +The shapes of two of the output arrays for this function depend on the data values in the input array; hence, array libraries which build computation graphs (e.g., JAX, Dask, etc.) may find this function difficult to implement without knowing array values. Accordingly, such libraries may choose to omit this function. See {ref}`data-dependent-output-shapes` section for more details. +::: + +Returns the unique elements of an input array `x` and the corresponding counts for each unique element in `x`. + +```{note} +Uniqueness should be determined based on value equality (i.e., `x_i == x_j`). For input arrays having floating-point data types, value-based equality implies the following behavior. + +- As `nan` values compare as `False`, `nan` values should be considered distinct. +- As `-0` and `+0` compare as `True`, signed zeros should not be considered distinct, and the corresponding unique element will be implementation-dependent (e.g., an implementation could choose to return `-0` if `-0` occurs before `+0`). + +As signed zeros are not distinct, using `inverse_indices` to reconstruct the input array is not guaranteed to return an array having the exact same values. + +Each `nan` value should have a count of one, while the counts for signed zeros should be aggregated as a single count. +``` + +#### Parameters + +- **x**: _<array>_ + + - input array. If `x` has more than one dimension, the function must flatten `x` and return the unique elements of the flattened array. + +#### Returns + +- **out**: _Tuple\[ <array>, <array>, <array>, <array> ]_ + + - a namedtuple `(values, counts)` whose + + - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. + - second element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type (or promoted if needed). ```{note} The order of unique elements is not specified and may vary between implementations. @@ -88,7 +129,7 @@ As signed zeros are not distinct, using `inverse_indices` to reconstruct the inp - a namedtuple `(values, inverse_indices)` whose - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. - - second element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and have the default integer data type. + - second element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and have the default integer data type (or promoted if needed). ```{note} The order of unique elements is not specified and may vary between implementations. @@ -126,4 +167,4 @@ Uniqueness should be determined based on value equality (i.e., `x_i == x_j`). Fo ```{note} The order of unique elements is not specified and may vary between implementations. - ``` \ No newline at end of file + ``` From 83f3c2bae28ea8079d99115c30220cf2d46444f2 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Fri, 5 Nov 2021 11:45:25 -0400 Subject: [PATCH 2/5] fix typo --- spec/API_specification/set_functions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/API_specification/set_functions.md b/spec/API_specification/set_functions.md index 38d948796..d6915d308 100644 --- a/spec/API_specification/set_functions.md +++ b/spec/API_specification/set_functions.md @@ -55,8 +55,8 @@ Each `nan` value should have a count of one, while the counts for signed zeros s The order of unique elements is not specified and may vary between implementations. ``` -(function-unique-count)= -### unique_count(x, /) +(function-unique-counts)= +### unique_counts(x, /) :::{admonition} Data-dependent output shape :class: important From f1557a1800b8d5b8c81e5f042c450a5983576627 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Sun, 7 Nov 2021 23:48:02 -0500 Subject: [PATCH 3/5] defer the discussion on index type promotion to another PR --- spec/API_specification/set_functions.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/spec/API_specification/set_functions.md b/spec/API_specification/set_functions.md index d6915d308..d36222664 100644 --- a/spec/API_specification/set_functions.md +++ b/spec/API_specification/set_functions.md @@ -47,9 +47,9 @@ Each `nan` value should have a count of one, while the counts for signed zeros s - a namedtuple `(values, indices, inverse_indices, counts)` whose - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. - - second element must have the field name `indices` and must be an array containing the indices (first occurrences) of `x` that result in `values`. The array must have the same shape as `values` and must have the default integer data type (or promoted if needed). - - third element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and must have the default integer data type (or promoted if needed). - - fourth element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type (or promoted if needed). + - second element must have the field name `indices` and must be an array containing the indices (first occurrences) of `x` that result in `values`. The array must have the same shape as `values` and must have the default integer data type. + - third element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and must have the default integer data type. + - fourth element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type. ```{note} The order of unique elements is not specified and may vary between implementations. @@ -90,7 +90,7 @@ Each `nan` value should have a count of one, while the counts for signed zeros s - a namedtuple `(values, counts)` whose - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. - - second element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type (or promoted if needed). + - second element must have the field name `counts` and must be an array containing the number of times each unique element occurs in `x`. The returned array must have same shape as `values` and must have the default integer data type. ```{note} The order of unique elements is not specified and may vary between implementations. @@ -129,7 +129,7 @@ As signed zeros are not distinct, using `inverse_indices` to reconstruct the inp - a namedtuple `(values, inverse_indices)` whose - first element must have the field name `values` and must be an array containing the unique elements of `x`. The array must have the same data type as `x`. - - second element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and have the default integer data type (or promoted if needed). + - second element must have the field name `inverse_indices` and must be an array containing the indices of `values` that reconstruct `x`. The array must have the same shape as `x` and have the default integer data type. ```{note} The order of unique elements is not specified and may vary between implementations. From 7a87583b8ab5d491d35e2c5a3fb17bebcdbdfa06 Mon Sep 17 00:00:00 2001 From: Athan Date: Sun, 7 Nov 2021 22:45:53 -0800 Subject: [PATCH 4/5] Update note --- spec/API_specification/set_functions.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/spec/API_specification/set_functions.md b/spec/API_specification/set_functions.md index d36222664..7c7fc583a 100644 --- a/spec/API_specification/set_functions.md +++ b/spec/API_specification/set_functions.md @@ -72,8 +72,6 @@ Uniqueness should be determined based on value equality (i.e., `x_i == x_j`). Fo - As `nan` values compare as `False`, `nan` values should be considered distinct. - As `-0` and `+0` compare as `True`, signed zeros should not be considered distinct, and the corresponding unique element will be implementation-dependent (e.g., an implementation could choose to return `-0` if `-0` occurs before `+0`). -As signed zeros are not distinct, using `inverse_indices` to reconstruct the input array is not guaranteed to return an array having the exact same values. - Each `nan` value should have a count of one, while the counts for signed zeros should be aggregated as a single count. ``` From 0b293bd352fdbc85bad47931125fddf236134b86 Mon Sep 17 00:00:00 2001 From: Athan Date: Sun, 7 Nov 2021 22:46:43 -0800 Subject: [PATCH 5/5] Update type annotation --- spec/API_specification/set_functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/API_specification/set_functions.md b/spec/API_specification/set_functions.md index 7c7fc583a..c45f5bd54 100644 --- a/spec/API_specification/set_functions.md +++ b/spec/API_specification/set_functions.md @@ -83,7 +83,7 @@ Each `nan` value should have a count of one, while the counts for signed zeros s #### Returns -- **out**: _Tuple\[ <array>, <array>, <array>, <array> ]_ +- **out**: _Tuple\[ <array>, <array> ]_ - a namedtuple `(values, counts)` whose