You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Intel(R) Extension for Scikit-learn* support](#intelr-extension-for-scikit-learn-support)
34
+
-[Algorithms parameters](#algorithms-parameters)
35
35
36
36
## How to create conda environment for benchmarking
37
37
38
38
Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.
Run `python runner.py --configs configs/config_example.json [--output-file result.json --verbose INFO --report]` to launch benchmarks.
71
71
72
72
Options:
73
-
*``--configs``: specify the path to a configuration file.
74
-
*``--no-intel-optimized``: use Scikit-learn without [Intel(R) Extension for Scikit-learn*](#intelr-extension-for-scikit-learn-support). Now available for [scikit-learn benchmarks](https://github.com/IntelPython/scikit-learn_bench/tree/master/sklearn_bench). By default, the runner uses Intel(R) Extension for Scikit-learn.
75
-
*``--output-file``: output file name for the benchmark result. The default name is `result.json`
76
-
*``--report``: create an Excel report based on benchmark results. The `openpyxl` library is required.
77
-
*``--dummy-run``: run configuration parser and dataset generation without benchmarks running.
78
-
*``--verbose``: *WARNING*, *INFO*, *DEBUG*. print additional information during benchmarks running. Default is *INFO*.
73
+
74
+
-``--configs``: specify the path to a configuration file.
75
+
-``--no-intel-optimized``: use Scikit-learn without [Intel(R) Extension for Scikit-learn*](#intelr-extension-for-scikit-learn-support). Now available for [scikit-learn benchmarks](https://github.com/IntelPython/scikit-learn_bench/tree/master/sklearn_bench). By default, the runner uses Intel(R) Extension for Scikit-learn.
76
+
-``--output-file``: specify the name of the output file for the benchmark result. The default name is `result.json`
77
+
-``--report``: create an Excel report based on benchmark results. The `openpyxl` library is required.
78
+
-``--dummy-run``: run configuration parser and dataset generation without benchmarks running.
79
+
-``--verbose``: *WARNING*, *INFO*, *DEBUG*. Print out additional information when the benchmarks are running. The default is *INFO*.
79
80
80
81
| Level | Description |
81
82
|-----------|---------------|
@@ -84,10 +85,11 @@ Options:
84
85
|*WARNING*| An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. |
85
86
86
87
Benchmarks currently support the following frameworks:
87
-
***scikit-learn**
88
-
***daal4py**
89
-
***cuml**
90
-
***xgboost**
88
+
89
+
-**scikit-learn**
90
+
-**daal4py**
91
+
-**cuml**
92
+
-**xgboost**
91
93
92
94
The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.
93
95
@@ -117,27 +119,32 @@ The configuration of benchmarks allows you to select the frameworks to run, sele
117
119
When you run scikit-learn benchmarks on CPU, [Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex) is used by default. Use the ``--no-intel-optimized`` option to run the benchmarks without the extension.
118
120
119
121
The following benchmarks have a GPU support:
120
-
* dbscan
121
-
* kmeans
122
-
* linear
123
-
* log_reg
122
+
123
+
- dbscan
124
+
- kmeans
125
+
- linear
126
+
- log_reg
124
127
125
128
You may use the [configuration file for these benchmarks](https://github.com/IntelPython/scikit-learn_bench/blob/master/configs/skl_xpu_config.json) to run them on both CPU and GPU.
126
129
127
-
## Algorithms parameters
130
+
## Algorithms parameters
128
131
129
132
You can launch benchmarks for each algorithm separately.
130
133
To do this, go to the directory with the benchmark:
131
134
132
-
cd <framework>
135
+
```bash
136
+
cd<framework>
137
+
```
133
138
134
139
Run the following command:
135
140
136
-
python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>
141
+
```bash
142
+
python <benchmark_file> --dataset-name <path to the dataset><other algorithm parameters>
143
+
```
137
144
138
145
The list of supported parameters for each algorithm you can find here:
Configure benchmarks by editing the `config.json` file.
4
4
You can configure some algorithm parameters, datasets, a list of frameworks to use, and the usage of some environment variables.
@@ -11,58 +11,59 @@ Refer to the tables below for descriptions of all fields in the configuration fi
11
11
-[Training Object](#training-object)
12
12
-[Testing Object](#testing-object)
13
13
14
-
### Root Config Object
14
+
## Root Config Object
15
+
15
16
| Field Name | Type | Description |
16
17
| ----- | ---- |------------ |
17
-
|omp_env| array[string]| For xgboost only. Specify an environment variable to set the number of omp threads |
18
18
|common|[Common Object](#common-object)|**REQUIRED** common benchmarks setting: frameworks and input data settings |
19
-
|cases|array[[Case Object](#case-object)]|**REQUIRED** list of algorithms, their parameters and training data |
19
+
|cases|List[[Case Object](#case-object)]|**REQUIRED** list of algorithms, their parameters and training data |
20
20
21
-
### Common Object
21
+
## Common Object
22
22
23
23
| Field Name | Type | Description |
24
24
| ----- | ---- |------------ |
25
-
|lib| array[string]|**REQUIRED** list of test frameworks. It can be *sklearn*, *daal4py*, *cuml* or *xgboost*|
26
-
|data-format| array[string]|**REQUIRED** input data format. Data formats: *numpy*, *pandas* or *cudf*|
27
-
|data-order| array[string]|**REQUIRED** input data order. Data order: *C* (row-major, default) or *F* (column-major) |
28
-
|dtype| array[string]|**REQUIRED** input data type. Data type: *float64* (default) or *float32*|
29
-
|check-finitness| array[]| Check finiteness in sklearn input check(disabled by default) |
30
-
|device| array[string]| For scikit-learn only. The list of devices to run the benchmarks on. It can be *None* (default, run on CPU without sycl context) or one of the types of sycl devices: *cpu*, *gpu*, *host*. Refer to [SYCL specification](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf) for details|
25
+
|data-format| Union[str, List[str]]|**REQUIRED** Input data format: *numpy*, *pandas*, or *cudf*. |
26
+
|data-order| Union[str, List[str]]|**REQUIRED** Input data order: *C* (row-major, default) or *F* (column-major). |
27
+
|dtype| Union[str, List[str]]|**REQUIRED** Input data type: *float64* (default) or *float32*. |
28
+
|check-finitness| List[]| Check finiteness during scikit-learn input check (disabled by default). |
29
+
|device| array[string]| For scikit-learn only. The list of devices to run the benchmarks on.<br/>It can be *None* (default, run on CPU without sycl context) or one of the types of sycl devices: *cpu*, *gpu*, *host*.<br/>Refer to [SYCL specification](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf) for details.|
31
30
32
-
### Case Object
31
+
## Case Object
33
32
34
33
| Field Name | Type | Description |
35
34
| ----- | ---- |------------ |
36
-
|lib| array[string]|**REQUIRED** list of test frameworks. It can be *sklearn*, *daal4py*, *cuml* or *xgboost*|
37
-
|algorithm| string |**REQUIRED** benchmark name |
38
-
|dataset| array[[Dataset Object](#dataset-object)]|**REQUIRED** input data specifications. |
39
-
|benchmark parameters| array[Any]|**REQUIRED** algorithm parameters. a list of supported parameters can be found here |
35
+
|lib| Union[str, List[str]]|**REQUIRED** A test framework or a list of frameworks. Must be from [*sklearn*, *daal4py*, *cuml*, *xgboost*]. |
**Important:** You can move any parameter from **"cases"** to **"common"** if this parameter is common to all cases
40
41
41
-
### Dataset Object
42
+
## Dataset Object
42
43
43
44
| Field Name | Type | Description |
44
45
| ----- | ---- |------------ |
45
-
|source| string |**REQUIRED**data source. It can be *synthetic*or *csv*|
46
-
|type| string |**REQUIRED**for synthetic data only. The type of task for which the dataset is generated. It can be *classification*, *blobs* or *regression*|
46
+
|source| string |**REQUIRED**Data source: *synthetic*, *csv*, or *npy*.|
47
+
|type| string |**REQUIREDfor synthetic data**. The type of task for which the dataset is generated: *classification*, *blobs*, or *regression*.|
47
48
|n_classes| int | For *synthetic* data and for *classification* type only. The number of classes (or labels) of the classification problem |
48
49
|n_clusters| int | For *synthetic* data and for *blobs* type only. The number of centers to generate |
49
-
|n_features| int |**REQUIRED** For *synthetic* data only. The number of features to generate |
50
-
|name| string | Name of dataset |
51
-
|training|[Training Object](#training-object)|**REQUIRED**algorithm parameters. a list of supported parameters can be found here|
52
-
|testing|[Testing Object](#testing-object)|**REQUIRED** algorithm parameters. a list of supported parameters can be found here|
50
+
|n_features| int |**REQUIRED for *synthetic* data**. The number of features to generate.|
51
+
|name| string | Name of the dataset.|
52
+
|training|[Training Object](#training-object)|**REQUIRED**An object with the paths to the training datasets.|
53
+
|testing|[Testing Object](#testing-object)|An object with the paths to the testing datasets. If not provided, the training datasets are used.|
53
54
54
-
### Training Object
55
+
## Training Object
55
56
56
57
| Field Name | Type | Description |
57
58
| ----- | ---- |------------ |
58
-
| n_samples | int | The total number of the training points|
59
-
| x | str | The path to the training samples |
60
-
| y | str | The path to the training labels |
59
+
| n_samples | int |**REQUIRED**The total number of the training samples|
60
+
| x | str |**REQUIRED**The path to the training samples |
61
+
| y | str |**REQUIRED**The path to the training labels |
61
62
62
-
### Testing Object
63
+
## Testing Object
63
64
64
65
| Field Name | Type | Description |
65
66
| ----- | ---- |------------ |
66
-
| n_samples | int | The total number of the testing points|
67
-
| x | str | The path to the testing samples |
68
-
| y | str | The path to the testing labels |
67
+
| n_samples | int |**REQUIRED**The total number of the testing samples|
68
+
| x | str |**REQUIRED**The path to the testing samples |
69
+
| y | str |**REQUIRED**The path to the testing labels |
0 commit comments