From 3ea6134ff56d9a44c64dd23d4a2828ba01f75493 Mon Sep 17 00:00:00 2001 From: Athan Reines Date: Thu, 10 Dec 2020 02:19:09 -0700 Subject: [PATCH 1/5] Add statistical methods --- .../statistical_methods.md | 181 ++++++++++++++++++ 1 file changed, 181 insertions(+) create mode 100644 spec/06_API_specification/statistical_methods.md diff --git a/spec/06_API_specification/statistical_methods.md b/spec/06_API_specification/statistical_methods.md new file mode 100644 index 00000000..22a173da --- /dev/null +++ b/spec/06_API_specification/statistical_methods.md @@ -0,0 +1,181 @@ +# Statistical Methods + +> Dataframe API specification for statistical methods. + +A conforming implementation of the dataframe API standard must provide and support the following methods adhering to the following conventions. + +- Positional parameters must be [positional-only](https://www.python.org/dev/peps/pep-0570/) parameters. Positional-only parameters have no externally-usable name. When a method accepting positional-only parameters is called, positional arguments are mapped to these parameters based solely on their order. +- Optional parameters must be [keyword-only](https://www.python.org/dev/peps/pep-3102/) arguments. + +## Methods + + + +(method-max)= +### dataframe.max(/, *, axis=None) + +Calculates the maximum value. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which maximum values must be computed. If equal to `0`, the maximum values must be computed over the index. If equal to `1`, the maximum values must be computed over the columns. By default, maximum values must be computed over the index. Default: `None`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the maximum values. + +(method-mean)= +### dataframe.mean(/, *, axis=None) + +Calculates the arithmetic mean. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which arithmetic means must be computed. If equal to `0`, arithmetic means must be computed over the index. If equal to `1`, arithmetic means must be computed over the columns. By default, arithmetic means must be computed over the index. Default: `None`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the arithmetic means. + +(method-min)= +### dataframe.min(/, *, axis=None) + +Calculates the minimum value. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which minimum values must be computed. If equal to `0`, the minimum values must be computed over the index. If equal to `1`, the minimum values must be computed over the columns. By default, minimum values must be computed over the index. Default: `None`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the minimum values. + +(method-nlargest)= +### dataframe.nlargest(n, columns, /) + +Returns the first `n` rows having the largest values in `columns` and sorted in descending order. + +#### Parameters + +- **n**: _int_ + + - Number of rows. + +- **columns**: _Any_ + + - Column label(s) to order by. If provided a list of column labels, the first list element (label) determines the first `n` rows, the second label orders the `n` rows, the third label orders ties for the first two labels, and so on and so forth. In other words, ordering by label is applied sequentially. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the first `n` rows. + +(method-nsmallest)= +### dataframe.nsmallest(n, columns, /) + +Returns the first `n` rows having the smallest values in `columns` and sorted in ascending order. + +#### Parameters + +- **n**: _int_ + + - Number of rows. + +- **columns**: _Any_ + + - Column label(s) to order by. If provided a list of column labels, the first list element (label) determines the first `n` rows, the second label orders the `n` rows, the third label orders ties for the first two labels, and so on and so forth. In other words, ordering by label is applied sequentially. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the first `n` rows. + +(method-prod)= +### dataframe.prod(/, *, axis=None) + +Calculates the product. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which products must be computed. If equal to `0`, the products must be computed over the index. If equal to `1`, the products must be computed over the columns. By default, products must be computed over the index. Default: `None`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the products. + +(method-std)= +### dataframe.std(/, *, axis=None, correction=1.0) + +Calculates the standard deviation. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which standard deviations must be computed. If equal to `0`, standard deviations must be computed over the index. If equal to `1`, standard deviations must be computed over the columns. By default, standard deviations must be computed over the index. Default: `None`. + +- **correction**: _Union\[ int, float ]_ + + - degrees of freedom adjustment. Setting this parameter to a value other than `0` has the effect of adjusting the divisor during the calculation of the standard deviation according to `N-c` where `N` corresponds to the total number of elements over which the standard deviation is computed and `c` corresponds to the provided degrees of freedom adjustment. When computing the standard deviation of a population, setting this parameter to `0` is the standard choice (i.e., the provided array contains data constituting an entire population). When computing the corrected sample standard deviation, setting this parameter to `1` is the standard choice (i.e., the provided array contains data sampled from a larger population; this is commonly referred to as Bessel's correction). Default: `1.0`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the standard deviations. + +(method-sum)= +### dataframe.sum(/, *, axis=None) + +Calculates the sum. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which sums must be computed. If equal to `0`, the sums must be computed over the index. If equal to `1`, the sums must be computed over the columns. By default, sums must be computed over the index. Default: `None`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the sums. + +(method-var)= +### dataframe.var(/, *, axis=None, correction=1.0) + +Calculates the variance. + +#### Parameters + +- **axis**: _Optional\[ int ]_ + + - axis along which variances must be computed. If equal to `0`, variances must be computed over the index. If equal to `1`, variances must be computed over the columns. By default, variances must be computed over the index. Default: `None`. + +- **correction**: _Union\[ int, float ]_ + + - degrees of freedom adjustment. Setting this parameter to a value other than `0` has the effect of adjusting the divisor during the calculation of the variance according to `N-c` where `N` corresponds to the total number of elements over which the variance is computed and `c` corresponds to the provided degrees of freedom adjustment. When computing the variance of a population, setting this parameter to `0` is the standard choice (i.e., the provided array contains data constituting an entire population). When computing the unbiased sample variance, setting this parameter to `1` is the standard choice (i.e., the provided array contains data sampled from a larger population; this is commonly referred to as Bessel's correction). Default: `1.0`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing the variances. \ No newline at end of file From cca8c19896a239017b3f73199740ea5f89670703 Mon Sep 17 00:00:00 2001 From: Athan Reines Date: Thu, 10 Dec 2020 02:32:38 -0700 Subject: [PATCH 2/5] Add cumulative methods --- .../statistical_methods.md | 138 ++++++++++++++++-- 1 file changed, 129 insertions(+), 9 deletions(-) diff --git a/spec/06_API_specification/statistical_methods.md b/spec/06_API_specification/statistical_methods.md index 22a173da..c1c55917 100644 --- a/spec/06_API_specification/statistical_methods.md +++ b/spec/06_API_specification/statistical_methods.md @@ -11,13 +11,101 @@ A conforming implementation of the dataframe API standard must provide and suppo +(method-cummax)= +### cummax(x, /, *, skipna=True) + +Calculates the cumulative maximum value. + +#### Parameters + +- **x**: _<dataframe>_ + + - dataframe instance. + +- **skipna**: _Optional\[ bool ]_ + + - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing cumulative maximum values. The returned dataframe must have the same size as `x`. + +(method-cummin)= +### cummin(x, /, *, skipna=True) + +Calculates the cumulative minimum value. + +#### Parameters + +- **x**: _<dataframe>_ + + - dataframe instance. + +- **skipna**: _Optional\[ bool ]_ + + - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing cumulative minimum values. The returned dataframe must have the same size as `x`. + +(method-cumsum)= +### cummax(x, /, *, skipna=True) + +Calculates the cumulative sum. + +#### Parameters + +- **x**: _<dataframe>_ + + - dataframe instance. + +- **skipna**: _Optional\[ bool ]_ + + - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing cumulative sums. The returned dataframe must have the same size as `x`. + +(method-cumprod)= +### cumprod(x, /, *, skipna=True) + +Calculates the cumulative product. + +#### Parameters + +- **x**: _<dataframe>_ + + - dataframe instance. + +- **skipna**: _Optional\[ bool ]_ + + - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. + +#### Returns + +- **out**: _<dataframe>_ + + - dataframe containing cumulative products. The returned dataframe must have the same size as `x`. + (method-max)= -### dataframe.max(/, *, axis=None) +### max(x, /, *, axis=None) Calculates the maximum value. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which maximum values must be computed. If equal to `0`, the maximum values must be computed over the index. If equal to `1`, the maximum values must be computed over the columns. By default, maximum values must be computed over the index. Default: `None`. @@ -29,12 +117,16 @@ Calculates the maximum value. - dataframe containing the maximum values. (method-mean)= -### dataframe.mean(/, *, axis=None) +### mean(x, /, *, axis=None) Calculates the arithmetic mean. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which arithmetic means must be computed. If equal to `0`, arithmetic means must be computed over the index. If equal to `1`, arithmetic means must be computed over the columns. By default, arithmetic means must be computed over the index. Default: `None`. @@ -46,12 +138,16 @@ Calculates the arithmetic mean. - dataframe containing the arithmetic means. (method-min)= -### dataframe.min(/, *, axis=None) +### min(x, /, *, axis=None) Calculates the minimum value. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which minimum values must be computed. If equal to `0`, the minimum values must be computed over the index. If equal to `1`, the minimum values must be computed over the columns. By default, minimum values must be computed over the index. Default: `None`. @@ -63,12 +159,16 @@ Calculates the minimum value. - dataframe containing the minimum values. (method-nlargest)= -### dataframe.nlargest(n, columns, /) +### nlargest(x, n, columns, /) Returns the first `n` rows having the largest values in `columns` and sorted in descending order. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **n**: _int_ - Number of rows. @@ -84,12 +184,16 @@ Returns the first `n` rows having the largest values in `columns` and sorted in - dataframe containing the first `n` rows. (method-nsmallest)= -### dataframe.nsmallest(n, columns, /) +### nsmallest(x, n, columns, /) Returns the first `n` rows having the smallest values in `columns` and sorted in ascending order. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **n**: _int_ - Number of rows. @@ -105,12 +209,16 @@ Returns the first `n` rows having the smallest values in `columns` and sorted in - dataframe containing the first `n` rows. (method-prod)= -### dataframe.prod(/, *, axis=None) +### prod(x, /, *, axis=None) Calculates the product. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which products must be computed. If equal to `0`, the products must be computed over the index. If equal to `1`, the products must be computed over the columns. By default, products must be computed over the index. Default: `None`. @@ -122,12 +230,16 @@ Calculates the product. - dataframe containing the products. (method-std)= -### dataframe.std(/, *, axis=None, correction=1.0) +### std(x, /, *, axis=None, correction=1.0) Calculates the standard deviation. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which standard deviations must be computed. If equal to `0`, standard deviations must be computed over the index. If equal to `1`, standard deviations must be computed over the columns. By default, standard deviations must be computed over the index. Default: `None`. @@ -143,12 +255,16 @@ Calculates the standard deviation. - dataframe containing the standard deviations. (method-sum)= -### dataframe.sum(/, *, axis=None) +### sum(x, /, *, axis=None) Calculates the sum. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which sums must be computed. If equal to `0`, the sums must be computed over the index. If equal to `1`, the sums must be computed over the columns. By default, sums must be computed over the index. Default: `None`. @@ -160,12 +276,16 @@ Calculates the sum. - dataframe containing the sums. (method-var)= -### dataframe.var(/, *, axis=None, correction=1.0) +### var(x, /, *, axis=None, correction=1.0) Calculates the variance. #### Parameters +- **x**: _<dataframe>_ + + - dataframe instance. + - **axis**: _Optional\[ int ]_ - axis along which variances must be computed. If equal to `0`, variances must be computed over the index. If equal to `1`, variances must be computed over the columns. By default, variances must be computed over the index. Default: `None`. From 70e6a8a6de4e91f6bbd95519797d73f932ae4d1d Mon Sep 17 00:00:00 2001 From: Athan Reines Date: Thu, 10 Dec 2020 03:02:25 -0700 Subject: [PATCH 3/5] Update spec --- spec/06_API_specification/statistical_methods.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/spec/06_API_specification/statistical_methods.md b/spec/06_API_specification/statistical_methods.md index c1c55917..65e7e9fa 100644 --- a/spec/06_API_specification/statistical_methods.md +++ b/spec/06_API_specification/statistical_methods.md @@ -22,7 +22,7 @@ Calculates the cumulative maximum value. - dataframe instance. -- **skipna**: _Optional\[ bool ]_ +- **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -43,7 +43,7 @@ Calculates the cumulative minimum value. - dataframe instance. -- **skipna**: _Optional\[ bool ]_ +- **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -64,7 +64,7 @@ Calculates the cumulative sum. - dataframe instance. -- **skipna**: _Optional\[ bool ]_ +- **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -85,7 +85,7 @@ Calculates the cumulative product. - dataframe instance. -- **skipna**: _Optional\[ bool ]_ +- **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. From 2cc42362f996951b4fbbfef4fa1ee6a8f8dbffe2 Mon Sep 17 00:00:00 2001 From: Athan Reines Date: Thu, 10 Dec 2020 03:03:14 -0700 Subject: [PATCH 4/5] Fix method name --- spec/06_API_specification/statistical_methods.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/06_API_specification/statistical_methods.md b/spec/06_API_specification/statistical_methods.md index 65e7e9fa..8695f54c 100644 --- a/spec/06_API_specification/statistical_methods.md +++ b/spec/06_API_specification/statistical_methods.md @@ -54,7 +54,7 @@ Calculates the cumulative minimum value. - dataframe containing cumulative minimum values. The returned dataframe must have the same size as `x`. (method-cumsum)= -### cummax(x, /, *, skipna=True) +### cumsum(x, /, *, skipna=True) Calculates the cumulative sum. From 60242c1b3b6d147732f626efbf4a4ccdba569f81 Mon Sep 17 00:00:00 2001 From: Athan Reines Date: Thu, 7 Jan 2021 01:39:37 -0800 Subject: [PATCH 5/5] Update signatures --- spec/API_specification/statistical_methods.md | 86 ++++--------------- 1 file changed, 17 insertions(+), 69 deletions(-) diff --git a/spec/API_specification/statistical_methods.md b/spec/API_specification/statistical_methods.md index 8695f54c..348cd29b 100644 --- a/spec/API_specification/statistical_methods.md +++ b/spec/API_specification/statistical_methods.md @@ -12,16 +12,12 @@ A conforming implementation of the dataframe API standard must provide and suppo (method-cummax)= -### cummax(x, /, *, skipna=True) +### cummax(/, *, skipna=True) Calculates the cumulative maximum value. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -30,19 +26,15 @@ Calculates the cumulative maximum value. - **out**: _<dataframe>_ - - dataframe containing cumulative maximum values. The returned dataframe must have the same size as `x`. + - dataframe containing cumulative maximum values. The returned dataframe must have the same size as the original dataframe instance. (method-cummin)= -### cummin(x, /, *, skipna=True) +### cummin(/, *, skipna=True) Calculates the cumulative minimum value. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -51,19 +43,15 @@ Calculates the cumulative minimum value. - **out**: _<dataframe>_ - - dataframe containing cumulative minimum values. The returned dataframe must have the same size as `x`. + - dataframe containing cumulative minimum values. The returned dataframe must have the same size as the original dataframe instance. (method-cumsum)= -### cumsum(x, /, *, skipna=True) +### cumsum(/, *, skipna=True) Calculates the cumulative sum. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -72,19 +60,15 @@ Calculates the cumulative sum. - **out**: _<dataframe>_ - - dataframe containing cumulative sums. The returned dataframe must have the same size as `x`. + - dataframe containing cumulative sums. The returned dataframe must have the same size as the original dataframe instance. (method-cumprod)= -### cumprod(x, /, *, skipna=True) +### cumprod(/, *, skipna=True) Calculates the cumulative product. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **skipna**: _bool_ - Exclude `NA` and `null` values. If an entire column is `NA`, the result must be `NA`. Default: `True`. @@ -93,19 +77,15 @@ Calculates the cumulative product. - **out**: _<dataframe>_ - - dataframe containing cumulative products. The returned dataframe must have the same size as `x`. + - dataframe containing cumulative products. The returned dataframe must have the same size as the original dataframe instance. (method-max)= -### max(x, /, *, axis=None) +### max(/, *, axis=None) Calculates the maximum value. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which maximum values must be computed. If equal to `0`, the maximum values must be computed over the index. If equal to `1`, the maximum values must be computed over the columns. By default, maximum values must be computed over the index. Default: `None`. @@ -117,16 +97,12 @@ Calculates the maximum value. - dataframe containing the maximum values. (method-mean)= -### mean(x, /, *, axis=None) +### mean(/, *, axis=None) Calculates the arithmetic mean. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which arithmetic means must be computed. If equal to `0`, arithmetic means must be computed over the index. If equal to `1`, arithmetic means must be computed over the columns. By default, arithmetic means must be computed over the index. Default: `None`. @@ -138,16 +114,12 @@ Calculates the arithmetic mean. - dataframe containing the arithmetic means. (method-min)= -### min(x, /, *, axis=None) +### min(/, *, axis=None) Calculates the minimum value. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which minimum values must be computed. If equal to `0`, the minimum values must be computed over the index. If equal to `1`, the minimum values must be computed over the columns. By default, minimum values must be computed over the index. Default: `None`. @@ -159,16 +131,12 @@ Calculates the minimum value. - dataframe containing the minimum values. (method-nlargest)= -### nlargest(x, n, columns, /) +### nlargest(n, columns, /) Returns the first `n` rows having the largest values in `columns` and sorted in descending order. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **n**: _int_ - Number of rows. @@ -184,16 +152,12 @@ Returns the first `n` rows having the largest values in `columns` and sorted in - dataframe containing the first `n` rows. (method-nsmallest)= -### nsmallest(x, n, columns, /) +### nsmallest(n, columns, /) Returns the first `n` rows having the smallest values in `columns` and sorted in ascending order. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **n**: _int_ - Number of rows. @@ -209,16 +173,12 @@ Returns the first `n` rows having the smallest values in `columns` and sorted in - dataframe containing the first `n` rows. (method-prod)= -### prod(x, /, *, axis=None) +### prod(/, *, axis=None) Calculates the product. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which products must be computed. If equal to `0`, the products must be computed over the index. If equal to `1`, the products must be computed over the columns. By default, products must be computed over the index. Default: `None`. @@ -230,16 +190,12 @@ Calculates the product. - dataframe containing the products. (method-std)= -### std(x, /, *, axis=None, correction=1.0) +### std(/, *, axis=None, correction=1.0) Calculates the standard deviation. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which standard deviations must be computed. If equal to `0`, standard deviations must be computed over the index. If equal to `1`, standard deviations must be computed over the columns. By default, standard deviations must be computed over the index. Default: `None`. @@ -255,16 +211,12 @@ Calculates the standard deviation. - dataframe containing the standard deviations. (method-sum)= -### sum(x, /, *, axis=None) +### sum(/, *, axis=None) Calculates the sum. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which sums must be computed. If equal to `0`, the sums must be computed over the index. If equal to `1`, the sums must be computed over the columns. By default, sums must be computed over the index. Default: `None`. @@ -276,16 +228,12 @@ Calculates the sum. - dataframe containing the sums. (method-var)= -### var(x, /, *, axis=None, correction=1.0) +### var(/, *, axis=None, correction=1.0) Calculates the variance. #### Parameters -- **x**: _<dataframe>_ - - - dataframe instance. - - **axis**: _Optional\[ int ]_ - axis along which variances must be computed. If equal to `0`, variances must be computed over the index. If equal to `1`, variances must be computed over the columns. By default, variances must be computed over the index. Default: `None`.