diff --git a/README.md b/README.md
index 8099a0511..97bee26e1 100755
--- a/README.md
+++ b/README.md
@@ -27,42 +27,42 @@ We publish blogs on Medium, so [follow us](https://medium.com/intel-analytics-so
 
 ## Table of content
 
-* [How to create conda environment for benchmarking](#how-to-create-conda-environment-for-benchmarking)
-* [Running Python benchmarks with runner script](#running-python-benchmarks-with-runner-script)
-* [Benchmark supported algorithms](#benchmark-supported-algorithms)
-* [Intel(R) Extension for Scikit-learn* support](#intelr-extension-for-scikit-learn-support)
-* [Algorithms parameters](#algorithms-parameters)
+- [How to create conda environment for benchmarking](#how-to-create-conda-environment-for-benchmarking)
+- [Running Python benchmarks with runner script](#running-python-benchmarks-with-runner-script)
+- [Benchmark supported algorithms](#benchmark-supported-algorithms)
+- [Intel(R) Extension for Scikit-learn* support](#intelr-extension-for-scikit-learn-support)
+- [Algorithms parameters](#algorithms-parameters)
 
 ## How to create conda environment for benchmarking
 
 Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework.
 
-* [**scikit-learn**](sklearn_bench#how-to-create-conda-environment-for-benchmarking)
+- [**scikit-learn**](sklearn_bench#how-to-create-conda-environment-for-benchmarking)
 
 ```bash
 pip install -r sklearn_bench/requirements.txt
 # or
-conda install -c intel scikit-learn scikit-learn-intelex pandas
+conda install -c intel scikit-learn scikit-learn-intelex pandas tqdm
 ```
 
-* [**daal4py**](daal4py_bench#how-to-create-conda-environment-for-benchmarking)
+- [**daal4py**](daal4py_bench#how-to-create-conda-environment-for-benchmarking)
 
 ```bash
-conda install -c conda-forge scikit-learn daal4py pandas
+conda install -c conda-forge scikit-learn daal4py pandas tqdm
 ```
 
-* [**cuml**](cuml_bench#how-to-create-conda-environment-for-benchmarking)
+- [**cuml**](cuml_bench#how-to-create-conda-environment-for-benchmarking)
 
 ```bash
-conda install -c rapidsai -c conda-forge cuml pandas cudf
+conda install -c rapidsai -c conda-forge cuml pandas cudf tqdm
 ```
 
-* [**xgboost**](xgboost_bench#how-to-create-conda-environment-for-benchmarking)
+- [**xgboost**](xgboost_bench#how-to-create-conda-environment-for-benchmarking)
 
 ```bash
 pip install -r xgboost_bench/requirements.txt
 # or
-conda install -c conda-forge xgboost pandas
+conda install -c conda-forge xgboost scikit-learn pandas tqdm
 ```
 
 ## Running Python benchmarks with runner script
@@ -70,12 +70,13 @@ conda install -c conda-forge xgboost pandas
 Run `python runner.py --configs configs/config_example.json [--output-file result.json --verbose INFO --report]` to launch benchmarks.
 
 Options:
-* ``--configs``: specify the path to a configuration file.
-* ``--no-intel-optimized``: use Scikit-learn without [Intel(R) Extension for Scikit-learn*](#intelr-extension-for-scikit-learn-support). Now available for [scikit-learn benchmarks](https://github.com/IntelPython/scikit-learn_bench/tree/master/sklearn_bench). By default, the runner uses Intel(R) Extension for Scikit-learn.
-* ``--output-file``: output file name for the benchmark result. The default name is `result.json`
-* ``--report``: create an Excel report based on benchmark results. The `openpyxl` library is required.
-* ``--dummy-run``: run configuration parser and dataset generation without benchmarks running.
-* ``--verbose``: *WARNING*, *INFO*, *DEBUG*. print additional information during benchmarks running. Default is *INFO*.
+
+- ``--configs``: specify the path to a configuration file.
+- ``--no-intel-optimized``: use Scikit-learn without [Intel(R) Extension for Scikit-learn*](#intelr-extension-for-scikit-learn-support). Now available for [scikit-learn benchmarks](https://github.com/IntelPython/scikit-learn_bench/tree/master/sklearn_bench). By default, the runner uses Intel(R) Extension for Scikit-learn.
+- ``--output-file``: specify the name of the output file for the benchmark result. The default name is `result.json`
+- ``--report``: create an Excel report based on benchmark results. The `openpyxl` library is required.
+- ``--dummy-run``: run configuration parser and dataset generation without benchmarks running.
+- ``--verbose``: *WARNING*, *INFO*, *DEBUG*. Print out additional information when the benchmarks are running. The default is *INFO*.
 
 |   Level   |  Description  |
 |-----------|---------------|
@@ -84,10 +85,11 @@ Options:
 | *WARNING* | An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. |
 
 Benchmarks currently support the following frameworks:
-* **scikit-learn**
-* **daal4py**
-* **cuml**
-* **xgboost**
+
+- **scikit-learn**
+- **daal4py**
+- **cuml**
+- **xgboost**
 
 The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms.
 
@@ -117,27 +119,32 @@ The configuration of benchmarks allows you to select the frameworks to run, sele
 When you run scikit-learn benchmarks on CPU, [Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex) is used by default. Use the ``--no-intel-optimized`` option to run the benchmarks without the extension.
 
 The following benchmarks have a GPU support:
-* dbscan
-* kmeans
-* linear
-* log_reg
+
+- dbscan
+- kmeans
+- linear
+- log_reg
 
 You may use the [configuration file for these benchmarks](https://github.com/IntelPython/scikit-learn_bench/blob/master/configs/skl_xpu_config.json) to run them on both CPU and GPU.
 
-##  Algorithms parameters
+## Algorithms parameters
 
 You can launch benchmarks for each algorithm separately.
 To do this, go to the directory with the benchmark:
 
-    cd <framework>
+```bash
+cd <framework>
+```
 
 Run the following command:
 
-    python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>
+```bash
+python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters>
+```
 
 The list of supported parameters for each algorithm you can find here:
 
-* [**scikit-learn**](sklearn_bench#algorithms-parameters)
-* [**daal4py**](daal4py_bench#algorithms-parameters)
-* [**cuml**](cuml_bench#algorithms-parameters)
-* [**xgboost**](xgboost_bench#algorithms-parameters)
+- [**scikit-learn**](sklearn_bench#algorithms-parameters)
+- [**daal4py**](daal4py_bench#algorithms-parameters)
+- [**cuml**](cuml_bench#algorithms-parameters)
+- [**xgboost**](xgboost_bench#algorithms-parameters)
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
index 86db13ef6..34b1efec5 100755
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -33,7 +33,7 @@ jobs:
   steps:
   - script: |
       conda update -y -q conda
-      conda create -n bench -q -y -c conda-forge python=3.7 pandas scikit-learn daal4py
+      conda create -n bench -q -y -c conda-forge python=3.7 pandas scikit-learn daal4py tqdm
     displayName: Create Anaconda environment
   - script: |
       . /usr/share/miniconda/etc/profile.d/conda.sh
@@ -46,7 +46,7 @@ jobs:
   steps:
   - script: |
       conda update -y -q conda
-      conda create -n bench -q -y -c conda-forge python=3.7 pandas xgboost scikit-learn daal4py
+      conda create -n bench -q -y -c conda-forge python=3.7 pandas xgboost scikit-learn daal4py tqdm
     displayName: Create Anaconda environment
   - script: |
       . /usr/share/miniconda/etc/profile.d/conda.sh
diff --git a/bench.py b/bench.py
index 11b9e3d7c..cd26c166e 100644
--- a/bench.py
+++ b/bench.py
@@ -16,6 +16,7 @@
 
 import argparse
 import json
+import logging
 import sys
 import timeit
 
@@ -200,15 +201,16 @@ def parse_args(parser, size=None, loop_types=(),
             from sklearnex import patch_sklearn
             patch_sklearn()
         except ImportError:
-            print('Failed to import sklearnex.patch_sklearn.'
-                  'Use stock version scikit-learn', file=sys.stderr)
+            logging.info('Failed to import sklearnex.patch_sklearn.'
+                         'Use stock version scikit-learn', file=sys.stderr)
             params.device = 'None'
     else:
         if params.device != 'None':
-            print('Device context is not supported for stock scikit-learn.'
-                  'Please use --no-intel-optimized=False with'
-                  f'--device={params.device} parameter. Fallback to --device=None.',
-                  file=sys.stderr)
+            logging.info(
+                'Device context is not supported for stock scikit-learn.'
+                'Please use --no-intel-optimized=False with'
+                f'--device={params.device} parameter. Fallback to --device=None.',
+                file=sys.stderr)
             params.device = 'None'
 
     # disable finiteness check (default)
@@ -218,7 +220,7 @@ def parse_args(parser, size=None, loop_types=(),
     # Ask DAAL what it thinks about this number of threads
     num_threads = prepare_daal_threads(num_threads=params.threads)
     if params.verbose:
-        print(f'@ DAAL gave us {num_threads} threads')
+        logging.info(f'@ DAAL gave us {num_threads} threads')
 
     n_jobs = None
     if n_jobs_supported:
@@ -234,7 +236,7 @@ def parse_args(parser, size=None, loop_types=(),
 
     # Very verbose output
     if params.verbose:
-        print(f'@ params = {params.__dict__}')
+        logging.info(f'@ params = {params.__dict__}')
 
     return params
 
@@ -249,8 +251,8 @@ def set_daal_num_threads(num_threads):
         if num_threads:
             daal4py.daalinit(nthreads=num_threads)
     except ImportError:
-        print('@ Package "daal4py" was not found. Number of threads '
-              'is being ignored')
+        logging.info('@ Package "daal4py" was not found. Number of threads '
+                     'is being ignored')
 
 
 def prepare_daal_threads(num_threads=-1):
@@ -417,7 +419,7 @@ def load_data(params, generated_data=[], add_dtype=False, label_2d=False,
         # load and convert data from npy/csv file if path is specified
         if param_vars[file_arg] is not None:
             if param_vars[file_arg].name.endswith('.npy'):
-                data = np.load(param_vars[file_arg].name)
+                data = np.load(param_vars[file_arg].name, allow_pickle=True)
             else:
                 data = read_csv(param_vars[file_arg].name, params)
             full_data[element] = convert_data(
diff --git a/configs/README.md b/configs/README.md
index 44ce2ae21..02dee119b 100644
--- a/configs/README.md
+++ b/configs/README.md
@@ -1,4 +1,4 @@
-##  Config JSON Schema
+# Config JSON Schema
 
 Configure benchmarks by editing the `config.json` file.
 You can configure some algorithm parameters, datasets, a list of frameworks to use, and the usage of some environment variables.
@@ -11,58 +11,59 @@ Refer to the tables below for descriptions of all fields in the configuration fi
 - [Training Object](#training-object)
 - [Testing Object](#testing-object)
 
-###  Root Config Object
+## Root Config Object
+
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-|omp_env| array[string] | For xgboost only. Specify an environment variable to set the number of omp threads |
 |common| [Common Object](#common-object)| **REQUIRED** common benchmarks setting: frameworks and input data settings |
-|cases| array[[Case Object](#case-object)] | **REQUIRED**  list of algorithms, their parameters and training data |
+|cases| List[[Case Object](#case-object)] | **REQUIRED**  list of algorithms, their parameters and training data |
 
-###  Common Object
+## Common Object
 
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-|lib| array[string] | **REQUIRED** list of test frameworks. It can be *sklearn*, *daal4py*, *cuml* or *xgboost* |
-|data-format| array[string] | **REQUIRED** input data format. Data formats: *numpy*, *pandas* or *cudf* |
-|data-order| array[string] | **REQUIRED**  input data order. Data order: *C* (row-major, default) or *F* (column-major) |
-|dtype| array[string] | **REQUIRED**  input data type. Data type: *float64* (default) or *float32* |
-|check-finitness| array[] | Check finiteness in sklearn input check(disabled by default) |
-|device| array[string] | For scikit-learn only. The list of devices to run the benchmarks on. It can be *None* (default, run on CPU without sycl context) or one of the types of sycl devices: *cpu*, *gpu*, *host*. Refer to [SYCL specification](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf) for details|
+|data-format| Union[str, List[str]] | **REQUIRED** Input data format: *numpy*, *pandas*, or *cudf*. |
+|data-order| Union[str, List[str]] | **REQUIRED**  Input data order: *C* (row-major, default) or *F* (column-major). |
+|dtype| Union[str, List[str]] | **REQUIRED**  Input data type: *float64* (default) or *float32*. |
+|check-finitness| List[] | Check finiteness during scikit-learn input check (disabled by default). |
+|device| array[string] | For scikit-learn only. The list of devices to run the benchmarks on.<br/>It can be *None* (default, run on CPU without sycl context) or one of the types of sycl devices: *cpu*, *gpu*, *host*.<br/>Refer to [SYCL specification](https://www.khronos.org/files/sycl/sycl-2020-reference-guide.pdf) for details.|
 
-###  Case Object
+## Case Object
 
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-|lib| array[string] | **REQUIRED** list of test frameworks. It can be *sklearn*, *daal4py*, *cuml* or *xgboost*|
-|algorithm| string | **REQUIRED** benchmark name |
-|dataset| array[[Dataset Object](#dataset-object)] | **REQUIRED**  input data specifications. |
-|benchmark parameters| array[Any] | **REQUIRED** algorithm parameters. a list of supported parameters can be found here |
+|lib| Union[str, List[str]] | **REQUIRED** A test framework or a list of frameworks. Must be from [*sklearn*, *daal4py*, *cuml*, *xgboost*]. |
+|algorithm| string | **REQUIRED** Benchmark file name. |
+|dataset| List[[Dataset Object](#dataset-object)] | **REQUIRED**  Input data specifications. |
+|**specific algorithm parameters**| Union[int, float, str, List[int], List[float], List[str]] | Other algorithm-specific parameters |
+
+**Important:** You can move any parameter from **"cases"** to **"common"** if this parameter is common to all cases
 
-###  Dataset Object
+## Dataset Object
 
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-|source| string | **REQUIRED** data source. It can be *synthetic* or *csv* |
-|type| string | **REQUIRED**  for synthetic data only. The type of task for which the dataset is generated. It can be *classification*, *blobs* or *regression* |
+|source| string | **REQUIRED** Data source: *synthetic*, *csv*, or *npy*. |
+|type| string | **REQUIRED for synthetic data**. The type of task for which the dataset is generated: *classification*, *blobs*, or *regression*. |
 |n_classes| int | For *synthetic* data and for *classification* type only. The number of classes (or labels) of the classification problem |
 |n_clusters| int | For *synthetic* data and for *blobs* type only. The number of centers to generate |
-|n_features| int | **REQUIRED**  For *synthetic* data only. The number of features to generate |
-|name| string | Name of dataset |
-|training| [Training Object](#training-object) | **REQUIRED** algorithm parameters. a list of supported parameters can be found here |
-|testing| [Testing Object](#testing-object) | **REQUIRED** algorithm parameters. a list of supported parameters can be found here |
+|n_features| int | **REQUIRED for *synthetic* data**. The number of features to generate. |
+|name| string | Name of the dataset. |
+|training| [Training Object](#training-object) | **REQUIRED** An object with the paths to the training datasets. |
+|testing| [Testing Object](#testing-object) | An object with the paths to the testing datasets. If not provided, the training datasets are used. |
 
-###  Training Object
+## Training Object
 
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-| n_samples | int | The total number of the training points |
-| x | str | The path to the training samples |
-| y | str | The path to the training labels |
+| n_samples | int | **REQUIRED** The total number of the training samples |
+| x | str | **REQUIRED** The path to the training samples |
+| y | str | **REQUIRED** The path to the training labels |
 
-###  Testing Object
+## Testing Object
 
 | Field Name  | Type | Description |
 | ----- | ---- |------------ |
-| n_samples | int | The total number of the testing points |
-| x | str | The path to the testing samples |
-| y | str | The path to the testing labels |
+| n_samples | int | **REQUIRED** The total number of the testing samples |
+| x | str | **REQUIRED** The path to the testing samples |
+| y | str | **REQUIRED** The path to the testing labels |
diff --git a/configs/cuml_config.json b/configs/cuml_config.json
index 01ec8333b..6217b6e96 100755
--- a/configs/cuml_config.json
+++ b/configs/cuml_config.json
@@ -1,5 +1,4 @@
 {
-    "omp_env": ["OMP_NUM_THREADS"],
     "common": {
         "lib": ["cuml"],
         "data-format": ["cudf"],
@@ -104,31 +103,31 @@
             "dtype": ["float32"],
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "higgs1m",
                     "training":
                     {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
+                        "x": "data/higgs1m_x_train.npy",
+                        "y": "data/higgs1m_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/higgs1m_x_test.csv",
-                        "y": "data/higgs1m_y_test.csv"
+                        "x": "data/higgs1m_x_test.npy",
+                        "y": "data/higgs1m_y_test.npy"
                     }
                 },
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "airline-ohe",
                     "training":
                     {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
+                        "x": "data/airline-ohe_x_train.npy",
+                        "y": "data/airline-ohe_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/airline-ohe_x_test.csv",
-                        "y": "data/airline-ohe_y_test.csv"
+                        "x": "data/airline-ohe_x_test.npy",
+                        "y": "data/airline-ohe_y_test.npy"
                     }
                 }
             ],
@@ -227,17 +226,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "ijcnn",
                     "training":
                     {
-                        "x": "data/ijcnn_x_train.csv",
-                        "y": "data/ijcnn_y_train.csv"
+                        "x": "data/ijcnn_x_train.npy",
+                        "y": "data/ijcnn_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/ijcnn_x_test.csv",
-                        "y": "data/ijcnn_y_test.csv"
+                        "x": "data/ijcnn_x_test.npy",
+                        "y": "data/ijcnn_y_test.npy"
                     }
                 }
             ],
@@ -248,17 +247,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "a9a",
                     "training":
                     {
-                        "x": "data/a9a_x_train.csv",
-                        "y": "data/a9a_y_train.csv"
+                        "x": "data/a9a_x_train.npy",
+                        "y": "data/a9a_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/a9a_x_test.csv",
-                        "y": "data/a9a_y_test.csv"
+                        "x": "data/a9a_x_test.npy",
+                        "y": "data/a9a_y_test.npy"
                     }
                 }
             ],
@@ -269,17 +268,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "gisette",
                     "training":
                     {
-                        "x": "data/gisette_x_train.csv",
-                        "y": "data/gisette_y_train.csv"
+                        "x": "data/gisette_x_train.npy",
+                        "y": "data/gisette_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/gisette_x_test.csv",
-                        "y": "data/gisette_y_test.csv"
+                        "x": "data/gisette_x_test.npy",
+                        "y": "data/gisette_y_test.npy"
                     }
                 }
             ],
@@ -290,17 +289,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "klaverjas",
                     "training":
                     {
-                        "x": "data/klaverjas_x_train.csv",
-                        "y": "data/klaverjas_y_train.csv"
+                        "x": "data/klaverjas_x_train.npy",
+                        "y": "data/klaverjas_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/klaverjas_x_test.csv",
-                        "y": "data/klaverjas_y_test.csv"
+                        "x": "data/klaverjas_x_test.npy",
+                        "y": "data/klaverjas_y_test.npy"
                     }
                 }
             ],
@@ -311,17 +310,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "skin_segmentation",
                     "training":
                     {
-                        "x": "data/skin_segmentation_x_train.csv",
-                        "y": "data/skin_segmentation_y_train.csv"
+                        "x": "data/skin_segmentation_x_train.npy",
+                        "y": "data/skin_segmentation_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/skin_segmentation_x_test.csv",
-                        "y": "data/skin_segmentation_y_test.csv"
+                        "x": "data/skin_segmentation_x_test.npy",
+                        "y": "data/skin_segmentation_y_test.npy"
                     }
                 }
             ],
@@ -453,12 +452,12 @@
             "algorithm": "train_test_split",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "census",
                     "training":
                     {
-                        "x": "data/census_x.csv",
-                        "y": "data/census_y.csv"
+                        "x": "data/census_x_train.npy",
+                        "y": "data/census_y_train.npy"
                     }
                 }
             ],
@@ -469,12 +468,12 @@
             "algorithm": "lasso",
             "dataset": [
                 {
-                    "source": "csv",
-                    "name": "mortgage",
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
                     "training":
                     {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
                     }
                 }
             ],
@@ -485,17 +484,17 @@
             "algorithm": "elasticnet",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "year_prediction_msd",
                     "training":
                     {
-                        "x": "data/year_prediction_msd_x_train.csv",
-                        "y": "data/year_prediction_msd_y_train.csv"
+                        "x": "data/year_prediction_msd_x_train.npy",
+                        "y": "data/year_prediction_msd_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/year_prediction_msd_x_test.csv",
-                        "y": "data/year_prediction_msd_y_test.csv"
+                        "x": "data/year_prediction_msd_x_test.npy",
+                        "y": "data/year_prediction_msd_y_test.npy"
                     }
                 }
             ],
diff --git a/configs/lgbm_mb_cpu_config.json b/configs/lgbm_mb_cpu_config.json
deleted file mode 100755
index e8a2111da..000000000
--- a/configs/lgbm_mb_cpu_config.json
+++ /dev/null
@@ -1,109 +0,0 @@
-{
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
-    "common": {
-        "lib": ["modelbuilders"],
-        "data-format": ["pandas"],
-        "data-order": ["F"],
-        "dtype": ["float32"]
-    },
-    "cases": [
-        {
-            "algorithm": "lgbm_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "mortgage1Q",
-                    "training":
-                    {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
-                    }
-                }
-            ],
-            "n-estimators": [100],
-            "objective": ["regression"],
-            "max-depth": [8],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-alpha": [0.9],
-            "reg-lambda": [1],
-            "min-child-weight": [0],
-            "max-leaves": [256]
-        },
-        {
-            "algorithm": "lgbm_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "airline-ohe",
-                    "training":
-                    {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary"]
-        },
-        {
-            "algorithm": "lgbm_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "higgs1m",
-                    "training":
-                    {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary"]
-        },
-        {
-            "algorithm": "lgbm_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "msrank",
-                    "training":
-                    {
-                        "x": "data/mlsr_x_train.csv",
-                        "y": "data/mlsr_y_train.csv"
-                    }
-                }
-            ],
-            "max-bin": [256],
-            "learning-rate": [0.3],
-            "subsample": [1],
-            "reg-lambda":  [2],
-            "min-child-weight": [1],
-            "min-split-gain": [0.1],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [200],
-            "objective": ["multiclass"]
-        }
-    ]
-}
diff --git a/configs/modelbuilders/lgbm_mb_cpu_config.json b/configs/modelbuilders/lgbm_mb_cpu_config.json
new file mode 100755
index 000000000..a0dabdffa
--- /dev/null
+++ b/configs/modelbuilders/lgbm_mb_cpu_config.json
@@ -0,0 +1,115 @@
+{
+    "common": {
+        "lib":          "modelbuilders",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "lgbm_mb"
+    },
+    "cases": [
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "airline-ohe",
+                    "training":
+                    {
+                        "x":    "data/airline-ohe_x_train.npy",
+                        "y":    "data/airline-ohe_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/airline-ohe_x_test.npy",
+                        "y":    "data/airline-ohe_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "higgs1m",
+                    "training":
+                    {
+                        "x":    "data/higgs1m_x_train.npy",
+                        "y":    "data/higgs1m_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/higgs1m_x_test.npy",
+                        "y":    "data/higgs1m_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
+                    "training":
+                    {
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
+                    }
+                }
+            ],
+            "n-estimators":     100,
+            "objective":        "regression",
+            "max-depth":        8,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-alpha":        0.9,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-leaves":       256
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mlsr",
+                    "training":
+                    {
+                        "x":    "data/mlsr_x_train.npy",
+                        "y":    "data/mlsr_y_train.npy"
+                    }
+                }
+            ],
+            "max-bin":          256,
+            "learning-rate":    0.3,
+            "subsample":        1,
+            "reg-lambda":       2,
+            "min-child-weight": 1,
+            "min-split-loss":   0.1,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     200,
+            "objective":        "multiclass"
+        }
+    ]
+}
diff --git a/configs/modelbuilders/xgb_mb_cpu_config.json b/configs/modelbuilders/xgb_mb_cpu_config.json
new file mode 100755
index 000000000..483f3c158
--- /dev/null
+++ b/configs/modelbuilders/xgb_mb_cpu_config.json
@@ -0,0 +1,118 @@
+{
+    "common": {
+        "lib":          "modelbuilders",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "xgb_mb",
+        "tree-method":  "hist",
+        "count-dmatrix":""
+    },
+    "cases": [
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "airline-ohe",
+                    "training":
+                    {
+                        "x":    "data/airline-ohe_x_train.npy",
+                        "y":    "data/airline-ohe_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/airline-ohe_x_test.npy",
+                        "y":    "data/airline-ohe_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary:logistic"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "higgs1m",
+                    "training":
+                    {
+                        "x":    "data/higgs1m_x_train.npy",
+                        "y":    "data/higgs1m_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/higgs1m_x_test.npy",
+                        "y":    "data/higgs1m_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":                                0.9,
+            "max-bin":                                  256,
+            "scale-pos-weight":                         2,
+            "learning-rate":                            0.1,
+            "subsample":                                1,
+            "reg-lambda":                               1,
+            "min-child-weight":                         0,
+            "max-depth":                                8,
+            "max-leaves":                               256,
+            "n-estimators":                             1000,
+            "objective":                                "binary:logistic",
+            "enable-experimental-json-serialization":   "False",
+            "inplace-predict":                          ""
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
+                    "training":
+                    {
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
+                    }
+                }
+            ],
+            "n-estimators":     100,
+            "objective":        "reg:squarederror",
+            "max-depth":        8,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-alpha":        0.9,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-leaves":       256
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mlsr",
+                    "training":
+                    {
+                        "x":    "data/mlsr_x_train.npy",
+                        "y":    "data/mlsr_y_train.npy"
+                    }
+                }
+            ],
+            "max-bin":          256,
+            "learning-rate":    0.3,
+            "subsample":        1,
+            "reg-lambda":       2,
+            "min-child-weight": 1,
+            "min-split-loss":   0.1,
+            "max-depth":        8,
+            "n-estimators":     200,
+            "objective":        "multi:softprob"
+        }
+    ]
+}
diff --git a/configs/skl_config.json b/configs/skl_config.json
index 93c23e068..a385e50be 100755
--- a/configs/skl_config.json
+++ b/configs/skl_config.json
@@ -115,31 +115,31 @@
             "dtype": ["float32"],
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "higgs1m",
                     "training":
                     {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
+                        "x": "data/higgs1m_x_train.npy",
+                        "y": "data/higgs1m_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/higgs1m_x_test.csv",
-                        "y": "data/higgs1m_y_test.csv"
+                        "x": "data/higgs1m_x_test.npy",
+                        "y": "data/higgs1m_y_test.npy"
                     }
                 },
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "airline-ohe",
                     "training":
                     {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
+                        "x": "data/airline-ohe_x_train.npy",
+                        "y": "data/airline-ohe_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/airline-ohe_x_test.csv",
-                        "y": "data/airline-ohe_y_test.csv"
+                        "x": "data/airline-ohe_x_test.npy",
+                        "y": "data/airline-ohe_y_test.npy"
                     }
                 }
             ],
@@ -238,17 +238,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "ijcnn",
                     "training":
                     {
-                        "x": "data/ijcnn_x_train.csv",
-                        "y": "data/ijcnn_y_train.csv"
+                        "x": "data/ijcnn_x_train.npy",
+                        "y": "data/ijcnn_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/ijcnn_x_test.csv",
-                        "y": "data/ijcnn_y_test.csv"
+                        "x": "data/ijcnn_x_test.npy",
+                        "y": "data/ijcnn_y_test.npy"
                     }
                 }
             ],
@@ -259,17 +259,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "a9a",
                     "training":
                     {
-                        "x": "data/a9a_x_train.csv",
-                        "y": "data/a9a_y_train.csv"
+                        "x": "data/a9a_x_train.npy",
+                        "y": "data/a9a_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/a9a_x_test.csv",
-                        "y": "data/a9a_y_test.csv"
+                        "x": "data/a9a_x_test.npy",
+                        "y": "data/a9a_y_test.npy"
                     }
                 }
             ],
@@ -280,17 +280,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "gisette",
                     "training":
                     {
-                        "x": "data/gisette_x_train.csv",
-                        "y": "data/gisette_y_train.csv"
+                        "x": "data/gisette_x_train.npy",
+                        "y": "data/gisette_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/gisette_x_test.csv",
-                        "y": "data/gisette_y_test.csv"
+                        "x": "data/gisette_x_test.npy",
+                        "y": "data/gisette_y_test.npy"
                     }
                 }
             ],
@@ -301,17 +301,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "klaverjas",
                     "training":
                     {
-                        "x": "data/klaverjas_x_train.csv",
-                        "y": "data/klaverjas_y_train.csv"
+                        "x": "data/klaverjas_x_train.npy",
+                        "y": "data/klaverjas_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/klaverjas_x_test.csv",
-                        "y": "data/klaverjas_y_test.csv"
+                        "x": "data/klaverjas_x_test.npy",
+                        "y": "data/klaverjas_y_test.npy"
                     }
                 }
             ],
@@ -322,17 +322,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
-                    "name": "connect4",
+                    "source": "npy",
+                    "name": "connect",
                     "training":
                     {
-                        "x": "data/connect_x_train.csv",
-                        "y": "data/connect_y_train.csv"
+                        "x": "data/connect_x_train.npy",
+                        "y": "data/connect_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/connect_x_test.csv",
-                        "y": "data/connect_y_test.csv"
+                        "x": "data/connect_x_test.npy",
+                        "y": "data/connect_y_test.npy"
                     }
                 }
             ],
@@ -343,17 +343,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "mnist",
                     "training":
                     {
-                        "x": "data/mnist_x_train.csv",
-                        "y": "data/mnist_y_train.csv"
+                        "x": "data/mnist_x_train.npy",
+                        "y": "data/mnist_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/mnist_x_test.csv",
-                        "y": "data/mnist_y_test.csv"
+                        "x": "data/mnist_x_test.npy",
+                        "y": "data/mnist_y_test.npy"
                     }
                 }
             ],
@@ -364,17 +364,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "sensit",
                     "training":
                     {
-                        "x": "data/sensit_x_train.csv",
-                        "y": "data/sensit_y_train.csv"
+                        "x": "data/sensit_x_train.npy",
+                        "y": "data/sensit_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/sensit_x_test.csv",
-                        "y": "data/sensit_y_test.csv"
+                        "x": "data/sensit_x_test.npy",
+                        "y": "data/sensit_y_test.npy"
                     }
                 }
             ],
@@ -385,17 +385,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "skin_segmentation",
                     "training":
                     {
-                        "x": "data/skin_segmentation_x_train.csv",
-                        "y": "data/skin_segmentation_y_train.csv"
+                        "x": "data/skin_segmentation_x_train.npy",
+                        "y": "data/skin_segmentation_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/skin_segmentation_x_test.csv",
-                        "y": "data/skin_segmentation_y_test.csv"
+                        "x": "data/skin_segmentation_x_test.npy",
+                        "y": "data/skin_segmentation_y_test.npy"
                     }
                 }
             ],
@@ -406,17 +406,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "covertype",
                     "training":
                     {
-                        "x": "data/covertype_x_train.csv",
-                        "y": "data/covertype_y_train.csv"
+                        "x": "data/covertype_x_train.npy",
+                        "y": "data/covertype_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/covertype_x_test.csv",
-                        "y": "data/covertype_y_test.csv"
+                        "x": "data/covertype_x_test.npy",
+                        "y": "data/covertype_y_test.npy"
                     }
                 }
             ],
@@ -427,17 +427,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "codrnanorm",
                     "training":
                     {
-                        "x": "data/codrnanorm_x_train.csv",
-                        "y": "data/codrnanorm_y_train.csv"
+                        "x": "data/codrnanorm_x_train.npy",
+                        "y": "data/codrnanorm_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/codrnanorm_x_test.csv",
-                        "y": "data/codrnanorm_y_test.csv"
+                        "x": "data/codrnanorm_x_test.npy",
+                        "y": "data/codrnanorm_y_test.npy"
                     }
                 }
             ],
@@ -570,12 +570,12 @@
             "algorithm": "train_test_split",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "census",
                     "training":
                     {
-                        "x": "data/census_x.csv",
-                        "y": "data/census_y.csv"
+                        "x": "data/census_x_train.npy",
+                        "y": "data/census_y_train.npy"
                     }
                 }
             ],
@@ -589,12 +589,12 @@
             "algorithm": "lasso",
             "dataset": [
                 {
-                    "source": "csv",
-                    "name": "mortgage",
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
                     "training":
                     {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
                     }
                 }
             ],
@@ -605,17 +605,17 @@
             "algorithm": "elasticnet",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "year_prediction_msd",
                     "training":
                     {
-                        "x": "data/year_prediction_msd_x_train.csv",
-                        "y": "data/year_prediction_msd_y_train.csv"
+                        "x": "data/year_prediction_msd_x_train.npy",
+                        "y": "data/year_prediction_msd_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/year_prediction_msd_x_test.csv",
-                        "y": "data/year_prediction_msd_y_test.csv"
+                        "x": "data/year_prediction_msd_x_test.npy",
+                        "y": "data/year_prediction_msd_y_test.npy"
                     }
                 }
             ],
diff --git a/configs/svm/svc_proba_cuml.json b/configs/svm/svc_proba_cuml.json
index 85fe1f0df..c765a2164 100755
--- a/configs/svm/svc_proba_cuml.json
+++ b/configs/svm/svc_proba_cuml.json
@@ -12,17 +12,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "ijcnn",
                     "training":
                     {
-                        "x": "data/ijcnn_x_train.csv",
-                        "y": "data/ijcnn_y_train.csv"
+                        "x": "data/ijcnn_x_train.npy",
+                        "y": "data/ijcnn_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/ijcnn_x_test.csv",
-                        "y": "data/ijcnn_y_test.csv"
+                        "x": "data/ijcnn_x_test.npy",
+                        "y": "data/ijcnn_y_test.npy"
                     }
                 }
             ],
@@ -33,17 +33,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "a9a",
                     "training":
                     {
-                        "x": "data/a9a_x_train.csv",
-                        "y": "data/a9a_y_train.csv"
+                        "x": "data/a9a_x_train.npy",
+                        "y": "data/a9a_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/a9a_x_test.csv",
-                        "y": "data/a9a_y_test.csv"
+                        "x": "data/a9a_x_test.npy",
+                        "y": "data/a9a_y_test.npy"
                     }
                 }
             ],
@@ -54,17 +54,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "gisette",
                     "training":
                     {
-                        "x": "data/gisette_x_train.csv",
-                        "y": "data/gisette_y_train.csv"
+                        "x": "data/gisette_x_train.npy",
+                        "y": "data/gisette_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/gisette_x_test.csv",
-                        "y": "data/gisette_y_test.csv"
+                        "x": "data/gisette_x_test.npy",
+                        "y": "data/gisette_y_test.npy"
                     }
                 }
             ],
@@ -75,17 +75,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "klaverjas",
                     "training":
                     {
-                        "x": "data/klaverjas_x_train.csv",
-                        "y": "data/klaverjas_y_train.csv"
+                        "x": "data/klaverjas_x_train.npy",
+                        "y": "data/klaverjas_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/klaverjas_x_test.csv",
-                        "y": "data/klaverjas_y_test.csv"
+                        "x": "data/klaverjas_x_test.npy",
+                        "y": "data/klaverjas_y_test.npy"
                     }
                 }
             ],
@@ -96,17 +96,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "connect",
                     "training":
                     {
-                        "x": "data/connect_x_train.csv",
-                        "y": "data/connect_y_train.csv"
+                        "x": "data/connect_x_train.npy",
+                        "y": "data/connect_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/connect_x_test.csv",
-                        "y": "data/connect_y_test.csv"
+                        "x": "data/connect_x_test.npy",
+                        "y": "data/connect_y_test.npy"
                     }
                 }
             ],
@@ -117,17 +117,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "mnist",
                     "training":
                     {
-                        "x": "data/mnist_x_train.csv",
-                        "y": "data/mnist_y_train.csv"
+                        "x": "data/mnist_x_train.npy",
+                        "y": "data/mnist_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/mnist_x_test.csv",
-                        "y": "data/mnist_y_test.csv"
+                        "x": "data/mnist_x_test.npy",
+                        "y": "data/mnist_y_test.npy"
                     }
                 }
             ],
@@ -138,17 +138,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "sensit",
                     "training":
                     {
-                        "x": "data/sensit_x_train.csv",
-                        "y": "data/sensit_y_train.csv"
+                        "x": "data/sensit_x_train.npy",
+                        "y": "data/sensit_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/sensit_x_test.csv",
-                        "y": "data/sensit_y_test.csv"
+                        "x": "data/sensit_x_test.npy",
+                        "y": "data/sensit_y_test.npy"
                     }
                 }
             ],
@@ -159,17 +159,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "skin_segmentation",
                     "training":
                     {
-                        "x": "data/skin_segmentation_x_train.csv",
-                        "y": "data/skin_segmentation_y_train.csv"
+                        "x": "data/skin_segmentation_x_train.npy",
+                        "y": "data/skin_segmentation_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/skin_segmentation_x_test.csv",
-                        "y": "data/skin_segmentation_y_test.csv"
+                        "x": "data/skin_segmentation_x_test.npy",
+                        "y": "data/skin_segmentation_y_test.npy"
                     }
                 }
             ],
@@ -180,17 +180,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "covertype",
                     "training":
                     {
-                        "x": "data/covertype_x_train.csv",
-                        "y": "data/covertype_y_train.csv"
+                        "x": "data/covertype_x_train.npy",
+                        "y": "data/covertype_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/covertype_x_test.csv",
-                        "y": "data/covertype_y_test.csv"
+                        "x": "data/covertype_x_test.npy",
+                        "y": "data/covertype_y_test.npy"
                     }
                 }
             ],
@@ -201,17 +201,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "codrnanorm",
                     "training":
                     {
-                        "x": "data/codrnanorm_x_train.csv",
-                        "y": "data/codrnanorm_y_train.csv"
+                        "x": "data/codrnanorm_x_train.npy",
+                        "y": "data/codrnanorm_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/codrnanorm_x_test.csv",
-                        "y": "data/codrnanorm_y_test.csv"
+                        "x": "data/codrnanorm_x_test.npy",
+                        "y": "data/codrnanorm_y_test.npy"
                     }
                 }
             ],
diff --git a/configs/svm/svc_proba_sklearn.json b/configs/svm/svc_proba_sklearn.json
index 53c1676cf..3ded70b29 100755
--- a/configs/svm/svc_proba_sklearn.json
+++ b/configs/svm/svc_proba_sklearn.json
@@ -12,17 +12,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "ijcnn",
                     "training":
                     {
-                        "x": "data/ijcnn_x_train.csv",
-                        "y": "data/ijcnn_y_train.csv"
+                        "x": "data/ijcnn_x_train.npy",
+                        "y": "data/ijcnn_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/ijcnn_x_test.csv",
-                        "y": "data/ijcnn_y_test.csv"
+                        "x": "data/ijcnn_x_test.npy",
+                        "y": "data/ijcnn_y_test.npy"
                     }
                 }
             ],
@@ -33,17 +33,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "a9a",
                     "training":
                     {
-                        "x": "data/a9a_x_train.csv",
-                        "y": "data/a9a_y_train.csv"
+                        "x": "data/a9a_x_train.npy",
+                        "y": "data/a9a_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/a9a_x_test.csv",
-                        "y": "data/a9a_y_test.csv"
+                        "x": "data/a9a_x_test.npy",
+                        "y": "data/a9a_y_test.npy"
                     }
                 }
             ],
@@ -54,17 +54,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "gisette",
                     "training":
                     {
-                        "x": "data/gisette_x_train.csv",
-                        "y": "data/gisette_y_train.csv"
+                        "x": "data/gisette_x_train.npy",
+                        "y": "data/gisette_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/gisette_x_test.csv",
-                        "y": "data/gisette_y_test.csv"
+                        "x": "data/gisette_x_test.npy",
+                        "y": "data/gisette_y_test.npy"
                     }
                 }
             ],
@@ -75,17 +75,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "klaverjas",
                     "training":
                     {
-                        "x": "data/klaverjas_x_train.csv",
-                        "y": "data/klaverjas_y_train.csv"
+                        "x": "data/klaverjas_x_train.npy",
+                        "y": "data/klaverjas_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/klaverjas_x_test.csv",
-                        "y": "data/klaverjas_y_test.csv"
+                        "x": "data/klaverjas_x_test.npy",
+                        "y": "data/klaverjas_y_test.npy"
                     }
                 }
             ],
@@ -96,17 +96,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "connect",
                     "training":
                     {
-                        "x": "data/connect_x_train.csv",
-                        "y": "data/connect_y_train.csv"
+                        "x": "data/connect_x_train.npy",
+                        "y": "data/connect_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/connect_x_test.csv",
-                        "y": "data/connect_y_test.csv"
+                        "x": "data/connect_x_test.npy",
+                        "y": "data/connect_y_test.npy"
                     }
                 }
             ],
@@ -117,17 +117,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "mnist",
                     "training":
                     {
-                        "x": "data/mnist_x_train.csv",
-                        "y": "data/mnist_y_train.csv"
+                        "x": "data/mnist_x_train.npy",
+                        "y": "data/mnist_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/mnist_x_test.csv",
-                        "y": "data/mnist_y_test.csv"
+                        "x": "data/mnist_x_test.npy",
+                        "y": "data/mnist_y_test.npy"
                     }
                 }
             ],
@@ -138,17 +138,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "sensit",
                     "training":
                     {
-                        "x": "data/sensit_x_train.csv",
-                        "y": "data/sensit_y_train.csv"
+                        "x": "data/sensit_x_train.npy",
+                        "y": "data/sensit_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/sensit_x_test.csv",
-                        "y": "data/sensit_y_test.csv"
+                        "x": "data/sensit_x_test.npy",
+                        "y": "data/sensit_y_test.npy"
                     }
                 }
             ],
@@ -159,17 +159,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "skin_segmentation",
                     "training":
                     {
-                        "x": "data/skin_segmentation_x_train.csv",
-                        "y": "data/skin_segmentation_y_train.csv"
+                        "x": "data/skin_segmentation_x_train.npy",
+                        "y": "data/skin_segmentation_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/skin_segmentation_x_test.csv",
-                        "y": "data/skin_segmentation_y_test.csv"
+                        "x": "data/skin_segmentation_x_test.npy",
+                        "y": "data/skin_segmentation_y_test.npy"
                     }
                 }
             ],
@@ -180,17 +180,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "covertype",
                     "training":
                     {
-                        "x": "data/covertype_x_train.csv",
-                        "y": "data/covertype_y_train.csv"
+                        "x": "data/covertype_x_train.npy",
+                        "y": "data/covertype_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/covertype_x_test.csv",
-                        "y": "data/covertype_y_test.csv"
+                        "x": "data/covertype_x_test.npy",
+                        "y": "data/covertype_y_test.npy"
                     }
                 }
             ],
@@ -201,17 +201,17 @@
             "algorithm": "svm",
             "dataset": [
                 {
-                    "source": "csv",
+                    "source": "npy",
                     "name": "codrnanorm",
                     "training":
                     {
-                        "x": "data/codrnanorm_x_train.csv",
-                        "y": "data/codrnanorm_y_train.csv"
+                        "x": "data/codrnanorm_x_train.npy",
+                        "y": "data/codrnanorm_y_train.npy"
                     },
                     "testing":
                     {
-                        "x": "data/codrnanorm_x_test.csv",
-                        "y": "data/codrnanorm_y_test.csv"
+                        "x": "data/codrnanorm_x_test.npy",
+                        "y": "data/codrnanorm_y_test.npy"
                     }
                 }
             ],
diff --git a/configs/testing/daal4py_xgboost.json b/configs/testing/daal4py_xgboost.json
index 56accdce3..548ec82bf 100755
--- a/configs/testing/daal4py_xgboost.json
+++ b/configs/testing/daal4py_xgboost.json
@@ -1,20 +1,21 @@
 {
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
     "common": {
-        "lib": ["modelbuilders"],
-        "data-format": ["pandas"],
-        "data-order": ["F"],
-        "dtype": ["float32"]
+        "lib":          "modelbuilders",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "xgb_mb",
+        "tree-method":  "hist",
+        "count-dmatrix":""
     },
     "cases": [
         {
-            "algorithm": "xgb_mb",
             "dataset": [
                 {
-                    "source": "synthetic",
-                    "type": "classification",
-                    "n_classes": 5,
-                    "n_features": 10,
+                    "source":       "synthetic",
+                    "type":         "classification",
+                    "n_classes":    5,
+                    "n_features":   10,
                     "training": {
                         "n_samples": 100
                     },
@@ -23,10 +24,9 @@
                     }
                 }
             ],
-            "n-estimators": [10],
-            "tree-method": ["hist"],
-            "objective": ["multi:softprob"],
-            "max-depth": [8]
+            "n-estimators": 10,
+            "max-depth":    8,
+            "objective":    "multi:softprob"
         }
     ]
 }
diff --git a/configs/testing/xgboost.json b/configs/testing/xgboost.json
index 5107ee793..33242a630 100755
--- a/configs/testing/xgboost.json
+++ b/configs/testing/xgboost.json
@@ -1,21 +1,21 @@
 {
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
     "common": {
-        "lib": ["xgboost"],
-        "data-format": ["pandas"],
-        "data-order": ["F"],
-        "dtype": ["float64"]
+        "lib":          "xgboost",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "gbt",
+        "tree-method":  "hist",
+        "count-dmatrix":""
     },
     "cases": [
-
         {
-            "algorithm": "gbt",
             "dataset": [
                 {
-                    "source": "synthetic",
-                    "type": "classification",
-                    "n_classes": 5,
-                    "n_features": 10,
+                    "source":       "synthetic",
+                    "type":         "classification",
+                    "n_classes":    5,
+                    "n_features":   10,
                     "training": {
                         "n_samples": 1000
                     },
@@ -24,21 +24,19 @@
                     }
                 }
             ],
-            "n-estimators": [50],
-            "objective": ["multi:softprob"],
-            "tree-method": ["hist"],
-            "max-depth": [7],
-            "subsample": [0.7],
-            "colsample-bytree": [0.7]
+            "n-estimators":     50,
+            "max-depth":        7,
+            "subsample":        0.7,
+            "colsample-bytree": 0.7,
+            "objective":        "multi:softprob"
         },
         {
-            "algorithm": "gbt",
             "dataset": [
                 {
-                    "source": "synthetic",
-                    "type": "regression",
-                    "n_classes": 5,
-                    "n_features": 10,
+                    "source":       "synthetic",
+                    "type":         "regression",
+                    "n_classes":    5,
+                    "n_features":   10,
                     "training": {
                         "n_samples": 100
                     },
@@ -47,12 +45,11 @@
                     }
                 }
             ],
-            "n-estimators": [50],
-            "objective": ["reg:squarederror"],
-            "tree-method": ["hist"],
-            "max-depth": [8],
-            "learning-rate": [0.1],
-            "reg-alpha": [0.9]
+            "n-estimators":     50,
+            "max-depth":        8,
+            "learning-rate":    0.1,
+            "reg-alpha":        0.9,
+            "objective":        "reg:squarederror"
         }
     ]
 }
diff --git a/configs/xgb_cpu_config.json b/configs/xgb_cpu_config.json
deleted file mode 100644
index ecc0da15b..000000000
--- a/configs/xgb_cpu_config.json
+++ /dev/null
@@ -1,163 +0,0 @@
-{
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
-    "common": {
-        "lib": ["xgboost"],
-        "data-format": ["pandas"],
-        "data-order": ["F"],
-        "dtype": ["float32"],
-        "count-dmatrix": [""]
-    },
-    "cases": [
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "plasticc",
-                    "training":
-                    {
-                        "x": "data/plasticc_x_train.csv",
-                        "y": "data/plasticc_y_train.csv"
-                    },
-                    "testing":
-                    {
-                        "x": "data/plasticc_x_test.csv",
-                        "y": "data/plasticc_y_test.csv"
-                    }
-                }
-            ],
-            "n-estimators": [60],
-            "objective": ["multi:softprob"],
-            "tree-method": ["hist"],
-            "max-depth": [7],
-            "subsample": [0.7],
-            "colsample-bytree": [0.7]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "santander",
-                    "training":
-                    {
-                        "x": "data/santander_x_train.csv",
-                        "y": "data/santander_y_train.csv"
-                    }
-                }
-            ],
-            "n-estimators": [10000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["hist"],
-            "max-depth": [1],
-            "subsample": [0.5],
-            "eta": [0.1],
-            "colsample-bytree": [0.05],
-            "single-precision-histogram": [""]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "mortgage1Q",
-                    "training":
-                    {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
-                    }
-                }
-            ],
-            "n-estimators": [100],
-            "objective": ["reg:squarederror"],
-            "tree-method": ["hist"],
-            "max-depth": [8],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-alpha": [0.9],
-            "reg-lambda": [1],
-            "min-child-weight": [0],
-            "max-leaves": [256]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "airline-ohe",
-                    "training":
-                    {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["hist"]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "higgs1m",
-                    "training":
-                    {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["hist"],
-            "enable-experimental-json-serialization": ["False"],
-            "inplace-predict": [""]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "msrank",
-                    "training":
-                    {
-                        "x": "data/mlsr_x_train.csv",
-                        "y": "data/mlsr_y_train.csv"
-                    }
-                }
-            ],
-            "max-bin": [256],
-            "learning-rate": [0.3],
-            "subsample": [1],
-            "reg-lambda":  [2],
-            "min-child-weight": [1],
-            "min-split-loss": [0.1],
-            "max-depth": [8],
-            "n-estimators": [200],
-            "objective": ["multi:softprob"],
-            "tree-method": ["hist"],
-            "single-precision-histogram": [""]
-        }
-    ]
-}
diff --git a/configs/xgb_gpu_config.json b/configs/xgb_gpu_config.json
deleted file mode 100644
index 44d9aec45..000000000
--- a/configs/xgb_gpu_config.json
+++ /dev/null
@@ -1,160 +0,0 @@
-{
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
-    "common": {
-        "lib": ["xgboost"],
-        "data-format": ["cudf"],
-        "data-order": ["F"],
-        "dtype": ["float32"],
-        "count-dmatrix": [""]
-    },
-    "cases": [
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "plasticc",
-                    "training":
-                    {
-                        "x": "data/plasticc_x_train.csv",
-                        "y": "data/plasticc_y_train.csv"
-                    },
-                    "testing":
-                    {
-                        "x": "data/plasticc_x_test.csv",
-                        "y": "data/plasticc_y_test.csv"
-                    }
-                }
-            ],
-            "n-estimators": [60],
-            "objective": ["multi:softprob"],
-            "tree-method": ["gpu_hist"],
-            "max-depth": [7],
-            "subsample": [0.7],
-            "colsample-bytree": [0.7]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "santander",
-                    "training":
-                    {
-                        "x": "data/santander_x_train.csv",
-                        "y": "data/santander_y_train.csv"
-                    }
-                }
-            ],
-            "n-estimators": [10000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["gpu_hist"],
-            "max-depth": [1],
-            "subsample": [0.5],
-            "eta": [0.1],
-            "colsample-bytree": [0.05]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "mortgage1Q",
-                    "training":
-                    {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
-                    }
-                }
-            ],
-            "n-estimators": [100],
-            "objective": ["reg:squarederror"],
-            "tree-method": ["gpu_hist"],
-            "max-depth": [8],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-alpha": [0.9],
-            "reg-lambda": [1],
-            "min-child-weight": [0],
-            "max-leaves": [256]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "airline-ohe",
-                    "training":
-                    {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["gpu_hist"]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "higgs1m",
-                    "training":
-                    {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["gpu_hist"],
-            "inplace-predict": [""]
-        },
-        {
-            "algorithm": "gbt",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "msrank",
-                    "training":
-                    {
-                        "x": "data/mlsr_x_train.csv",
-                        "y": "data/mlsr_y_train.csv"
-                    }
-                }
-            ],
-            "max-bin": [256],
-            "learning-rate": [0.3],
-            "subsample": [1],
-            "reg-lambda":  [2],
-            "min-child-weight": [1],
-            "min-split-loss": [0.1],
-            "max-depth": [8],
-            "n-estimators": [200],
-            "objective": ["multi:softprob"],
-            "tree-method": ["gpu_hist"]
-        }
-    ]
-}
diff --git a/configs/xgb_mb_cpu_config.json b/configs/xgb_mb_cpu_config.json
deleted file mode 100755
index 0c8128aef..000000000
--- a/configs/xgb_mb_cpu_config.json
+++ /dev/null
@@ -1,114 +0,0 @@
-{
-    "omp_env": ["OMP_NUM_THREADS", "OMP_PLACES"],
-    "common": {
-        "lib": ["modelbuilders"],
-        "data-format": ["pandas"],
-        "data-order": ["F"],
-        "dtype": ["float32"],
-        "count-dmatrix": [""]
-    },
-    "cases": [
-        {
-            "algorithm": "xgb_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "mortgage1Q",
-                    "training":
-                    {
-                        "x": "data/mortgage_x.csv",
-                        "y": "data/mortgage_y.csv"
-                    }
-                }
-            ],
-            "n-estimators": [100],
-            "objective": ["reg:squarederror"],
-            "tree-method": ["hist"],
-            "max-depth": [8],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-alpha": [0.9],
-            "reg-lambda": [1],
-            "min-child-weight": [0],
-            "max-leaves": [256]
-        },
-        {
-            "algorithm": "xgb_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "airline-ohe",
-                    "training":
-                    {
-                        "x": "data/airline-ohe_x_train.csv",
-                        "y": "data/airline-ohe_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["hist"]
-        },
-        {
-            "algorithm": "xgb_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "higgs1m",
-                    "training":
-                    {
-                        "x": "data/higgs1m_x_train.csv",
-                        "y": "data/higgs1m_y_train.csv"
-                    }
-                }
-            ],
-            "reg-alpha": [0.9],
-            "max-bin": [256],
-            "scale-pos-weight": [2],
-            "learning-rate": [0.1],
-            "subsample": [1],
-            "reg-lambda":  [1],
-            "min-child-weight": [0],
-            "max-depth": [8],
-            "max-leaves": [256],
-            "n-estimators": [1000],
-            "objective": ["binary:logistic"],
-            "tree-method": ["hist"],
-            "enable-experimental-json-serialization": ["False"]
-        },
-        {
-            "algorithm": "xgb_mb",
-            "dataset": [
-                {
-                    "source": "csv",
-                    "name": "msrank",
-                    "training":
-                    {
-                        "x": "data/mlsr_x_train.csv",
-                        "y": "data/mlsr_y_train.csv"
-                    }
-                }
-            ],
-            "max-bin": [256],
-            "learning-rate": [0.3],
-            "subsample": [1],
-            "reg-lambda":  [2],
-            "min-child-weight": [1],
-            "min-split-loss": [0.1],
-            "max-depth": [8],
-            "n-estimators": [200],
-            "objective": ["multi:softprob"],
-            "tree-method": ["hist"]
-        }
-    ]
-}
diff --git a/configs/xgboost/xgb_cpu_additional_config.json b/configs/xgboost/xgb_cpu_additional_config.json
new file mode 100644
index 000000000..a3f738c00
--- /dev/null
+++ b/configs/xgboost/xgb_cpu_additional_config.json
@@ -0,0 +1,155 @@
+{
+    "common": {
+        "lib":          "xgboost",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "gbt",
+        "tree-method":  "hist",
+        "count-dmatrix":"",
+        "max-depth":    8,
+        "learning-rate":0.1,
+        "reg-lambda":   1,
+        "max-leaves":   256
+    },
+    "cases": [
+        {
+            "objective":        "binary:logistic",
+            "scale-pos-weight": 2.1067817411664587,
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "airline",
+                    "training":
+                    {
+                        "x":    "data/airline_x_train.npy",
+                        "y":    "data/airline_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/airline_x_test.npy",
+                        "y":    "data/airline_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "binary:logistic",
+            "scale-pos-weight": 173.63348001466812,
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "bosch",
+                    "training":
+                    {
+                        "x":    "data/bosch_x_train.npy",
+                        "y":    "data/bosch_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/bosch_x_test.npy",
+                        "y":    "data/bosch_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "multi:softmax",
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "covtype",
+                    "training":
+                    {
+                        "x":    "data/covtype_x_train.npy",
+                        "y":    "data/covtype_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/covtype_x_test.npy",
+                        "y":    "data/covtype_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "binary:logistic",
+            "scale-pos-weight": 2.0017715678375363,
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "epsilon",
+                    "training":
+                    {
+                        "x":    "data/epsilon_x_train.npy",
+                        "y":    "data/epsilon_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/epsilon_x_test.npy",
+                        "y":    "data/epsilon_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "binary:logistic",
+            "scale-pos-weight": 578.2868020304569,
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "fraud",
+                    "training":
+                    {
+                        "x":    "data/fraud_x_train.npy",
+                        "y":    "data/fraud_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/fraud_x_test.npy",
+                        "y":    "data/fraud_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "binary:logistic",
+            "scale-pos-weight": 1.8872389605086624,
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "higgs",
+                    "training":
+                    {
+                        "x":    "data/higgs_x_train.npy",
+                        "y":    "data/higgs_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/higgs_x_test.npy",
+                        "y":    "data/higgs_y_test.npy"
+                    }
+                }
+            ]
+        },
+        {
+            "objective":        "reg:squarederror",
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "year_prediction_msd",
+                    "training":
+                    {
+                        "x":    "data/year_prediction_msd_x_train.npy",
+                        "y":    "data/year_prediction_msd_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/year_prediction_msd_x_test.npy",
+                        "y":    "data/year_prediction_msd_y_test.npy"
+                    }
+                }
+            ]
+        }
+    ]
+}
diff --git a/configs/xgboost/xgb_cpu_main_config.json b/configs/xgboost/xgb_cpu_main_config.json
new file mode 100644
index 000000000..f5a2c4b67
--- /dev/null
+++ b/configs/xgboost/xgb_cpu_main_config.json
@@ -0,0 +1,211 @@
+{
+    "common": {
+        "lib":          "xgboost",
+        "data-format":  "pandas",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "gbt",
+        "tree-method":  "hist",
+        "count-dmatrix":""
+    },
+    "cases": [
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "abalone",
+                    "training":
+                    {
+                        "x":    "data/abalone_x_train.npy",
+                        "y":    "data/abalone_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/abalone_x_test.npy",
+                        "y":    "data/abalone_y_test.npy"
+                    }
+                }
+            ],
+            "learning-rate":    0.03,
+            "max-depth":        6,
+            "n-estimators":     1000,
+            "objective":        "reg:squarederror"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "airline-ohe",
+                    "training":
+                    {
+                        "x":    "data/airline-ohe_x_train.npy",
+                        "y":    "data/airline-ohe_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/airline-ohe_x_test.npy",
+                        "y":    "data/airline-ohe_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary:logistic"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "higgs1m",
+                    "training":
+                    {
+                        "x":    "data/higgs1m_x_train.npy",
+                        "y":    "data/higgs1m_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/higgs1m_x_test.npy",
+                        "y":    "data/higgs1m_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":                                0.9,
+            "max-bin":                                  256,
+            "scale-pos-weight":                         2,
+            "learning-rate":                            0.1,
+            "subsample":                                1,
+            "reg-lambda":                               1,
+            "min-child-weight":                         0,
+            "max-depth":                                8,
+            "max-leaves":                               256,
+            "n-estimators":                             1000,
+            "objective":                                "binary:logistic",
+            "enable-experimental-json-serialization":   "False",
+            "inplace-predict":                          ""
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "letters",
+                    "training":
+                    {
+                        "x":    "data/letters_x_train.npy",
+                        "y":    "data/letters_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/letters_x_test.npy",
+                        "y":    "data/letters_y_test.npy"
+                    }
+                }
+            ],
+            "learning-rate":    0.03,
+            "max-depth":        6,
+            "n-estimators":     1000,
+            "objective":        "multi:softprob"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mlsr",
+                    "training":
+                    {
+                        "x":    "data/mlsr_x_train.npy",
+                        "y":    "data/mlsr_y_train.npy"
+                    }
+                }
+            ],
+            "max-bin":                      256,
+            "learning-rate":                0.3,
+            "subsample":                    1,
+            "reg-lambda":                   2,
+            "min-child-weight":             1,
+            "min-split-loss":               0.1,
+            "max-depth":                    8,
+            "n-estimators":                 200,
+            "objective":                    "multi:softprob",
+            "single-precision-histogram":   ""
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
+                    "training":
+                    {
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
+                    }
+                }
+            ],
+            "n-estimators":     100,
+            "objective":        "reg:squarederror",
+            "max-depth":        8,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-alpha":        0.9,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-leaves":       256
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "plasticc",
+                    "training":
+                    {
+                        "x":    "data/plasticc_x_train.npy",
+                        "y":    "data/plasticc_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/plasticc_x_test.npy",
+                        "y":    "data/plasticc_y_test.npy"
+                    }
+                }
+            ],
+            "n-estimators":     60,
+            "objective":        "multi:softprob",
+            "max-depth":        7,
+            "subsample":        0.7,
+            "colsample-bytree": 0.7
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "santander",
+                    "training":
+                    {
+                        "x":    "data/santander_x_train.npy",
+                        "y":    "data/santander_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/santander_x_test.npy",
+                        "y":    "data/santander_y_test.npy"
+                    }
+                }
+            ],
+            "n-estimators":                 10000,
+            "objective":                    "binary:logistic",
+            "max-depth":                    1,
+            "subsample":                    0.5,
+            "eta":                          0.1,
+            "colsample-bytree":             0.05,
+            "single-precision-histogram":   ""
+        }
+    ]
+}
diff --git a/configs/xgboost/xgb_gpu_config.json b/configs/xgboost/xgb_gpu_config.json
new file mode 100644
index 000000000..506ac0cfd
--- /dev/null
+++ b/configs/xgboost/xgb_gpu_config.json
@@ -0,0 +1,208 @@
+{
+    "common": {
+        "lib":          "xgboost",
+        "data-format":  "cudf",
+        "data-order":   "F",
+        "dtype":        "float32",
+        "algorithm":    "gbt",
+        "tree-method":  "gpu_hist",
+        "count-dmatrix":""
+    },
+    "cases": [
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "abalone",
+                    "training":
+                    {
+                        "x":    "data/abalone_x_train.npy",
+                        "y":    "data/abalone_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/abalone_x_test.npy",
+                        "y":    "data/abalone_y_test.npy"
+                    }
+                }
+            ],
+            "learning-rate":    0.03,
+            "max-depth":        6,
+            "n-estimators":     1000,
+            "objective":        "reg:squarederror"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "airline-ohe",
+                    "training":
+                    {
+                        "x":    "data/airline-ohe_x_train.npy",
+                        "y":    "data/airline-ohe_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/airline-ohe_x_test.npy",
+                        "y":    "data/airline-ohe_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary:logistic"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "higgs1m",
+                    "training":
+                    {
+                        "x":    "data/higgs1m_x_train.npy",
+                        "y":    "data/higgs1m_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/higgs1m_x_test.npy",
+                        "y":    "data/higgs1m_y_test.npy"
+                    }
+                }
+            ],
+            "reg-alpha":        0.9,
+            "max-bin":          256,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-depth":        8,
+            "max-leaves":       256,
+            "n-estimators":     1000,
+            "objective":        "binary:logistic",
+            "inplace-predict":  ""
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "letters",
+                    "training":
+                    {
+                        "x":    "data/letters_x_train.npy",
+                        "y":    "data/letters_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/letters_x_test.npy",
+                        "y":    "data/letters_y_test.npy"
+                    }
+                }
+            ],
+            "learning-rate":0.03,
+            "max-depth":    6,
+            "n-estimators": 1000,
+            "objective":    "multi:softprob"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mlsr",
+                    "training":
+                    {
+                        "x":    "data/mlsr_x_train.npy",
+                        "y":    "data/mlsr_y_train.npy"
+                    }
+                }
+            ],
+            "max-bin":          256,
+            "learning-rate":    0.3,
+            "subsample":        1,
+            "reg-lambda":       2,
+            "min-child-weight": 1,
+            "min-split-loss":   0.1,
+            "max-depth":        8,
+            "n-estimators":     200,
+            "objective":        "multi:softprob"
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "mortgage1Q",
+                    "training":
+                    {
+                        "x":    "data/mortgage1Q_x_train.npy",
+                        "y":    "data/mortgage1Q_y_train.npy"
+                    }
+                }
+            ],
+            "n-estimators":     100,
+            "objective":        "reg:squarederror",
+            "max-depth":        8,
+            "scale-pos-weight": 2,
+            "learning-rate":    0.1,
+            "subsample":        1,
+            "reg-alpha":        0.9,
+            "reg-lambda":       1,
+            "min-child-weight": 0,
+            "max-leaves":       256
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "plasticc",
+                    "training":
+                    {
+                        "x":    "data/plasticc_x_train.npy",
+                        "y":    "data/plasticc_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/plasticc_x_test.npy",
+                        "y":    "data/plasticc_y_test.npy"
+                    }
+                }
+            ],
+            "n-estimators":     60,
+            "objective":        "multi:softprob",
+            "max-depth":        7,
+            "subsample":        0.7,
+            "colsample-bytree": 0.7
+        },
+        {
+            "dataset": [
+                {
+                    "source":   "npy",
+                    "name":     "santander",
+                    "training":
+                    {
+                        "x":    "data/santander_x_train.npy",
+                        "y":    "data/santander_y_train.npy"
+                    },
+                    "testing":
+                    {
+                        "x":    "data/santander_x_test.npy",
+                        "y":    "data/santander_y_test.npy"
+                    }
+                }
+            ],
+            "n-estimators":     10000,
+            "objective":        "binary:logistic",
+            "max-depth":        1,
+            "subsample":        0.5,
+            "eta":              0.1,
+            "colsample-bytree": 0.05
+        }
+    ]
+}
diff --git a/cuml_bench/README.md b/cuml_bench/README.md
index e65f11432..e36e77f3b 100644
--- a/cuml_bench/README.md
+++ b/cuml_bench/README.md
@@ -1,6 +1,6 @@
 
 ## How to create conda environment for benchmarking
-`conda create -n bench -c rapidsai -c conda-forge python=3.7 cuml pandas cudf`
+`conda create -n bench -c rapidsai -c conda-forge python=3.7 scikit-learn cuml pandas cudf tqdm`
 
 ##  Algorithms parameters
 
diff --git a/daal4py_bench/README.md b/daal4py_bench/README.md
index 85c7831df..c1c940ef0 100644
--- a/daal4py_bench/README.md
+++ b/daal4py_bench/README.md
@@ -1,7 +1,7 @@
 
 ## How to create conda environment for benchmarking
 
-`conda create -n bench -c intel python=3.7 daal4py pandas scikit-learn`
+`conda create -n bench -c intel python=3.7 daal4py pandas scikit-learn tqdm`
 
 ##  Algorithms parameters
 
diff --git a/datasets/load_datasets.py b/datasets/load_datasets.py
index e16c6c918..5fad3ac4b 100755
--- a/datasets/load_datasets.py
+++ b/datasets/load_datasets.py
@@ -18,26 +18,50 @@
 import logging
 import os
 import sys
+from pathlib import Path
+from typing import Callable, Dict
 
-from .loader import (a9a, codrnanorm, connect, covertype, gisette, ijcnn,
-                     klaverjas, mnist, sensit, skin_segmentation)
+from .loader_classification import (a_nine_a, airline, airline_ohe, bosch,
+                                    census, codrnanorm, epsilon, fraud,
+                                    gisette, higgs, higgs_one_m, ijcnn,
+                                    klaverjas, santander, skin_segmentation)
+from .loader_multiclass import (connect, covertype, covtype, letters, mlsr,
+                                mnist, msrank, plasticc, sensit)
+from .loader_regression import abalone, mortgage_first_q, year_prediction_msd
 
-dataset_loaders = {
-    "a9a": a9a,
+dataset_loaders: Dict[str, Callable[[Path], bool]] = {
+    "a9a": a_nine_a,
+    "abalone": abalone,
+    "airline": airline,
+    "airline-ohe": airline_ohe,
+    "bosch": bosch,
+    "census": census,
+    "codrnanorm": codrnanorm,
+    "connect": connect,
+    "covertype": covertype,
+    "covtype": covtype,
+    "epsilon": epsilon,
+    "fraud": fraud,
     "gisette": gisette,
+    "higgs": higgs,
+    "higgs1m": higgs_one_m,
     "ijcnn": ijcnn,
-    "skin_segmentation": skin_segmentation,
     "klaverjas": klaverjas,
-    "connect": connect,
+    "letters": letters,
+    "mlsr": mlsr,
     "mnist": mnist,
+    "mortgage1Q": mortgage_first_q,
+    "msrank": msrank,
+    "plasticc": plasticc,
+    "santander": santander,
     "sensit": sensit,
-    "covertype": covertype,
-    "codrnanorm": codrnanorm,
+    "skin_segmentation": skin_segmentation,
+    "year_prediction_msd": year_prediction_msd,
 }
 
 
-def try_load_dataset(dataset_name, output_directory):
-    if dataset_name in dataset_loaders.keys():
+def try_load_dataset(dataset_name: str, output_directory: Path) -> bool:
+    if dataset_name in dataset_loaders:
         try:
             return dataset_loaders[dataset_name](output_directory)
         except BaseException as ex:
@@ -60,11 +84,11 @@ def try_load_dataset(dataset_name, output_directory):
     args = parser.parse_args()
 
     if args.list:
-        for key in dataset_loaders.keys():
+        for key in dataset_loaders:
             print(key)
         sys.exit(0)
 
-    root_dir = os.environ['DATASETSROOT']
+    root_dir = Path(os.environ['DATASETSROOT'])
 
     if args.datasets is not None:
         for val in dataset_loaders.values():
diff --git a/datasets/loader.py b/datasets/loader.py
deleted file mode 100755
index 45690c8ae..000000000
--- a/datasets/loader.py
+++ /dev/null
@@ -1,423 +0,0 @@
-# ===============================================================================
-# Copyright 2020-2021 Intel Corporation
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ===============================================================================
-
-import logging
-import os
-from urllib.request import urlretrieve
-
-import numpy as np
-import pandas as pd
-from sklearn.datasets import fetch_openml
-from sklearn.model_selection import train_test_split
-
-
-def a9a(dataset_dir=None):
-    """
-    Author: Ronny Kohavi","Barry Becker
-    libSVM","AAD group
-    Source: original - Date unknown
-    Cite: http://archive.ics.uci.edu/ml/datasets/Adult
-
-    Classification task. n_classes = 2.
-    a9a X train dataset (39073, 123)
-    a9a y train dataset (39073, 1)
-    a9a X test dataset  (9769,  123)
-    a9a y test dataset  (9769,  1)
-    """
-    dataset_name = 'a9a'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='a9a', return_X_y=True,
-                        as_frame=False, data_home=dataset_dir)
-    X = pd.DataFrame(X.todense())
-    y = pd.DataFrame(y)
-
-    y[y == -1] = 0
-
-    logging.info('a9a dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=11)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def ijcnn(dataset_dir=None):
-    """
-    Author: Danil Prokhorov.
-    libSVM,AAD group
-    Cite: Danil Prokhorov. IJCNN 2001 neural network competition.
-    Slide presentation in IJCNN'01,
-    Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf.
-
-    Classification task. n_classes = 2.
-    ijcnn X train dataset (153344, 22)
-    ijcnn y train dataset (153344, 1)
-    ijcnn X test dataset  (38337,  22)
-    ijcnn y test dataset  (38337,  1)
-    """
-    dataset_name = 'ijcnn'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='ijcnn', return_X_y=True,
-                        as_frame=False, data_home=dataset_dir)
-    X = pd.DataFrame(X.todense())
-    y = pd.DataFrame(y)
-
-    y[y == -1] = 0
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def skin_segmentation(dataset_dir=None):
-    """
-    Abstract:
-    The Skin Segmentation dataset is constructed over B, G, R color space.
-    Skin and Nonskin dataset is generated using skin textures from
-    face images of diversity of age, gender, and race people.
-    Author: Rajen Bhatt, Abhinav Dhall, rajen.bhatt '@' gmail.com, IIT Delhi.
-
-    Classification task. n_classes = 2.
-    skin_segmentation X train dataset (196045, 3)
-    skin_segmentation y train dataset (196045, 1)
-    skin_segmentation X test dataset  (49012,  3)
-    skin_segmentation y test dataset  (49012,  1)
-    """
-    dataset_name = 'skin_segmentation'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='skin-segmentation',
-                        return_X_y=True, as_frame=True, data_home=dataset_dir)
-    y = y.astype(int)
-    y[y == 2] = 0
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def klaverjas(dataset_dir=None):
-    """
-    Abstract:
-    Klaverjas is an example of the Jack-Nine card games,
-    which are characterized as trick-taking games where the the Jack
-    and nine of the trump suit are the highest-ranking trumps, and
-    the tens and aces of other suits are the most valuable cards
-    of these suits. It is played by four players in two teams.
-
-    Task Information:
-    Classification task. n_classes = 2.
-    klaverjas X train dataset (196045, 3)
-    klaverjas y train dataset (196045, 1)
-    klaverjas X test dataset  (49012,  3)
-    klaverjas y test dataset  (49012,  1)
-    """
-    dataset_name = 'klaverjas'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='Klaverjas2018', return_X_y=True,
-                        as_frame=True, data_home=dataset_dir)
-
-    y = y.cat.codes
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, train_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def connect(dataset_dir=None):
-    """
-    Source:
-    UC Irvine Machine Learning Repository
-    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.htm
-
-    Classification task. n_classes = 3.
-    connect X train dataset (60801, 126)
-    connect y train dataset (60801, 1)
-    connect X test dataset  (49012,  126)
-    connect y test dataset  (49012,  1)
-    """
-    dataset_name = 'connect'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='connect-4', version=1, return_X_y=True,
-                        as_frame=False, data_home=dataset_dir)
-    X = pd.DataFrame(X.todense())
-    y = pd.DataFrame(y)
-    y = y.astype(int)
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.1, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def mnist(dataset_dir=None):
-    """
-    Abstract:
-    The MNIST database of handwritten digits with 784 features.
-    It can be split in a training set of the first 60,000 examples,
-    and a test set of 10,000 examples
-    Source:
-    Yann LeCun, Corinna Cortes, Christopher J.C. Burges
-    http://yann.lecun.com/exdb/mnist/
-
-    Classification task. n_classes = 10.
-    mnist X train dataset (60000, 784)
-    mnist y train dataset (60000, 1)
-    mnist X test dataset  (10000,  784)
-    mnist y test dataset  (10000,  1)
-    """
-    dataset_name = 'mnist'
-
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='mnist_784', return_X_y=True,
-                        as_frame=True, data_home=dataset_dir)
-    y = y.astype(int)
-    X = X / 255
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=10000, shuffle=False)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def sensit(dataset_dir=None):
-    """
-    Abstract: Vehicle classification in distributed sensor networks.
-    Author: M. Duarte, Y. H. Hu
-    Source: [original](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets)
-
-    Classification task. n_classes = 2.
-    sensit X train dataset (196045, 3)
-    sensit y train dataset (196045, 1)
-    sensit X test dataset  (49012,  3)
-    sensit y test dataset  (49012,  1)
-    """
-    dataset_name = 'sensit'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='SensIT-Vehicle-Combined',
-                        return_X_y=True, as_frame=False, data_home=dataset_dir)
-    X = pd.DataFrame(X.todense())
-    y = pd.DataFrame(y)
-    y = y.astype(int)
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def covertype(dataset_dir=None):
-    """
-    Abstract: This is the original version of the famous
-    covertype dataset in ARFF format.
-    Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson
-    Source: [original](https://archive.ics.uci.edu/ml/datasets/covertype)
-
-    Classification task. n_classes = 7.
-    covertype X train dataset (390852, 54)
-    covertype y train dataset (390852, 1)
-    covertype X test dataset  (97713,  54)
-    covertype y test dataset  (97713,  1)
-    """
-    dataset_name = 'covertype'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='covertype', version=3, return_X_y=True,
-                        as_frame=True, data_home=dataset_dir)
-    y = y.astype(int)
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def codrnanorm(dataset_dir=None):
-    """
-    Abstract: Detection of non-coding RNAs on the basis of predicted secondary
-    structure formation free energy change.
-    Author: Andrew V Uzilov,Joshua M Keegan,David H Mathews.
-    Source: [original](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets)
-
-    Classification task. n_classes = 2.
-    codrnanorm X train dataset (390852, 8)
-    codrnanorm y train dataset (390852, 1)
-    codrnanorm X test dataset  (97713,  8)
-    codrnanorm y test dataset  (97713,  1)
-    """
-    dataset_name = 'codrnanorm'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    X, y = fetch_openml(name='codrnaNorm', return_X_y=True,
-                        as_frame=False, data_home=dataset_dir)
-    X = pd.DataFrame(X.todense())
-    y = pd.DataFrame(y)
-
-    logging.info(f'{dataset_name} dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    x_train, x_test, y_train, y_test = train_test_split(
-        X, y, test_size=0.2, random_state=42)
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-    logging.info(f'dataset {dataset_name} ready.')
-    return True
-
-
-def gisette(dataset_dir=None):
-    """
-    GISETTE is a handwritten digit recognition problem.
-    The problem is to separate the highly confusable digits '4' and '9'.
-    This dataset is one of five datasets of the NIPS 2003 feature selection challenge.
-
-    Classification task. n_classes = 2.
-    gisette X train dataset (6000, 5000)
-    gisette y train dataset (6000, 1)
-    gisette X test dataset  (1000, 5000)
-    gisette y test dataset  (1000, 1)
-    """
-    dataset_name = 'gisette'
-    os.makedirs(dataset_dir, exist_ok=True)
-
-    cache_dir = os.path.join(dataset_dir, '_gisette')
-    os.makedirs(cache_dir, exist_ok=True)
-
-    domen_hhtp = 'http://archive.ics.uci.edu/ml/machine-learning-databases/'
-
-    gisette_train_data_url = domen_hhtp + '/gisette/GISETTE/gisette_train.data'
-    filename_train_data = os.path.join(cache_dir, 'gisette_train.data')
-    if not os.path.exists(filename_train_data):
-        urlretrieve(gisette_train_data_url, filename_train_data)
-
-    gisette_train_labels_url = domen_hhtp + '/gisette/GISETTE/gisette_train.labels'
-    filename_train_labels = os.path.join(cache_dir, 'gisette_train.labels')
-    if not os.path.exists(filename_train_labels):
-        urlretrieve(gisette_train_labels_url, filename_train_labels)
-
-    gisette_test_data_url = domen_hhtp + '/gisette/GISETTE/gisette_valid.data'
-    filename_test_data = os.path.join(cache_dir, 'gisette_valid.data')
-    if not os.path.exists(filename_test_data):
-        urlretrieve(gisette_test_data_url, filename_test_data)
-
-    gisette_test_labels_url = domen_hhtp + '/gisette/gisette_valid.labels'
-    filename_test_labels = os.path.join(cache_dir, 'gisette_valid.labels')
-    if not os.path.exists(filename_test_labels):
-        urlretrieve(gisette_test_labels_url, filename_test_labels)
-
-    logging.info('gisette dataset is downloaded')
-    logging.info('reading CSV file...')
-
-    num_cols = 5000
-
-    df_train = pd.read_csv(filename_train_data, header=None)
-    df_labels = pd.read_csv(filename_train_labels, header=None)
-    num_train = 6000
-    x_train = df_train.iloc[:num_train].values
-    x_train = pd.DataFrame(np.array([np.fromstring(
-        elem[0], dtype=int, count=num_cols, sep=' ') for elem in x_train]))
-    y_train = df_labels.iloc[:num_train].values
-    y_train = pd.DataFrame((y_train > 0).astype(int))
-
-    num_train = 1000
-    df_test = pd.read_csv(filename_test_data, header=None)
-    df_labels = pd.read_csv(filename_test_labels, header=None)
-    x_test = df_test.iloc[:num_train].values
-    x_test = pd.DataFrame(np.array(
-        [np.fromstring(elem[0], dtype=int, count=num_cols, sep=' ') for elem in x_test]))
-    y_test = df_labels.iloc[:num_train].values
-    y_test = pd.DataFrame((y_test > 0).astype(int))
-
-    for data, name in zip((x_train, x_test, y_train, y_test),
-                          ('x_train', 'x_test', 'y_train', 'y_test')):
-        filename = f'{dataset_name}_{name}.csv'
-        data.to_csv(os.path.join(dataset_dir, filename),
-                    header=False, index=False)
-
-    logging.info('dataset gisette ready.')
-    return True
diff --git a/datasets/loader_classification.py b/datasets/loader_classification.py
new file mode 100644
index 000000000..be981952e
--- /dev/null
+++ b/datasets/loader_classification.py
@@ -0,0 +1,598 @@
+# ===============================================================================
+# Copyright 2020-2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===============================================================================
+
+import logging
+import os
+import subprocess
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+from sklearn.datasets import fetch_openml, load_svmlight_file
+from sklearn.model_selection import train_test_split
+
+from .loader_utils import retrieve
+
+
+def a_nine_a(dataset_dir: Path) -> bool:
+    """
+    Author: Ronny Kohavi","Barry Becker
+    libSVM","AAD group
+    Source: original - Date unknown
+    Site: http://archive.ics.uci.edu/ml/datasets/Adult
+
+    Classification task. n_classes = 2.
+    a9a X train dataset (39073, 123)
+    a9a y train dataset (39073, 1)
+    a9a X test dataset  (9769,  123)
+    a9a y test dataset  (9769,  1)
+    """
+    dataset_name = 'a9a'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='a9a', return_X_y=True,
+                        as_frame=False, data_home=dataset_dir)
+    X = pd.DataFrame(X.todense())
+    y = pd.DataFrame(y)
+
+    y[y == -1] = 0
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=11)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def airline(dataset_dir: Path) -> bool:
+    """
+    Airline dataset
+    http://kt.ijs.si/elena_ikonomovska/data.html
+
+    TaskType:binclass
+    NumberOfFeatures:13
+    NumberOfInstances:115M
+    """
+    dataset_name = 'airline'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = 'http://kt.ijs.si/elena_ikonomovska/datasets/airline/airline_14col.data.bz2'
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    cols = [
+        "Year", "Month", "DayofMonth", "DayofWeek", "CRSDepTime",
+        "CRSArrTime", "UniqueCarrier", "FlightNum", "ActualElapsedTime",
+        "Origin", "Dest", "Distance", "Diverted", "ArrDelay"
+    ]
+
+    # load the data as int16
+    dtype = np.int16
+
+    dtype_columns = {
+        "Year": dtype, "Month": dtype, "DayofMonth": dtype, "DayofWeek": dtype,
+        "CRSDepTime": dtype, "CRSArrTime": dtype, "FlightNum": dtype,
+        "ActualElapsedTime": dtype, "Distance":
+            dtype,
+        "Diverted": dtype, "ArrDelay": dtype,
+    }
+
+    df: Any = pd.read_csv(local_url, names=cols, dtype=dtype_columns)
+
+    # Encode categoricals as numeric
+    for col in df.select_dtypes(['object']).columns:
+        df[col] = df[col].astype("category").cat.codes
+
+    # Turn into binary classification problem
+    df["ArrDelayBinary"] = 1 * (df["ArrDelay"] > 0)
+
+    X = df[df.columns.difference(["ArrDelay", "ArrDelayBinary"])
+           ].to_numpy(dtype=np.float32)
+    y = df["ArrDelayBinary"].to_numpy(dtype=np.float32)
+    del df
+    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=77,
+                                                        test_size=0.2,
+                                                        )
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def airline_ohe(dataset_dir: Path) -> bool:
+    """
+    Dataset from szilard benchmarks: https://github.com/szilard/GBM-perf
+    TaskType:binclass
+    NumberOfFeatures:700
+    NumberOfInstances:10100000
+    """
+    dataset_name = 'airline-ohe'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url_train = 'https://s3.amazonaws.com/benchm-ml--main/train-10m.csv'
+    url_test = 'https://s3.amazonaws.com/benchm-ml--main/test.csv'
+    local_url_train = os.path.join(dataset_dir, os.path.basename(url_train))
+    local_url_test = os.path.join(dataset_dir, os.path.basename(url_test))
+    if not os.path.isfile(local_url_train):
+        logging.info(f'Started loading {dataset_name} train')
+        retrieve(url_train, local_url_train)
+    if not os.path.isfile(local_url_test):
+        logging.info(f'Started loading {dataset_name} test')
+        retrieve(url_test, local_url_test)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    sets = []
+    labels = []
+
+    categorical_names = ["Month", "DayofMonth",
+                         "DayOfWeek", "UniqueCarrier", "Origin", "Dest"]
+
+    for local_url in [local_url_train, local_url_test]:
+        df = pd.read_csv(local_url, nrows=1000000
+                         if local_url.endswith('train-10m.csv') else None)
+        X = df.drop('dep_delayed_15min', 1)
+        y: Any = df["dep_delayed_15min"]
+
+        y_num = np.where(y == "Y", 1, 0)
+
+        sets.append(X)
+        labels.append(y_num)
+
+    n_samples_train = sets[0].shape[0]
+
+    X_final: Any = pd.concat(sets)
+    X_final = pd.get_dummies(X_final, columns=categorical_names)
+    sets = [X_final[:n_samples_train], X_final[n_samples_train:]]
+
+    for data, name in zip((sets[0], sets[1], labels[0], labels[1]),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)  # type: ignore
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def bosch(dataset_dir: Path) -> bool:
+    """
+    Bosch Production Line Performance data set
+    https://www.kaggle.com/c/bosch-production-line-performance
+
+    Requires Kaggle API and API token (https://github.com/Kaggle/kaggle-api)
+    Contains missing values as NaN.
+
+    TaskType:binclass
+    NumberOfFeatures:968
+    NumberOfInstances:1.184M
+    """
+    dataset_name = 'bosch'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    filename = "train_numeric.csv.zip"
+    local_url = os.path.join(dataset_dir, filename)
+
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        args = ["kaggle", "competitions", "download", "-c",
+                "bosch-production-line-performance", "-f", filename, "-p", str(dataset_dir)]
+        _ = subprocess.check_output(args)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+    X = pd.read_csv(local_url, index_col=0, compression='zip', dtype=np.float32)
+    y = X.iloc[:, -1].to_numpy(dtype=np.float32)
+    X.drop(X.columns[-1], axis=1, inplace=True)
+    X_np = X.to_numpy(dtype=np.float32)
+    X_train, X_test, y_train, y_test = train_test_split(X_np, y, random_state=77,
+                                                        test_size=0.2,
+                                                        )
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def census(dataset_dir: Path) -> bool:
+    """
+    # TODO: add an loading instruction
+    """
+    return False
+
+
+def codrnanorm(dataset_dir: Path) -> bool:
+    """
+    Abstract: Detection of non-coding RNAs on the basis of predicted secondary
+    structure formation free energy change.
+    Author: Andrew V Uzilov,Joshua M Keegan,David H Mathews.
+    Source: [original](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets)
+
+    Classification task. n_classes = 2.
+    codrnanorm X train dataset (390852, 8)
+    codrnanorm y train dataset (390852, 1)
+    codrnanorm X test dataset  (97713,  8)
+    codrnanorm y test dataset  (97713,  1)
+    """
+    dataset_name = 'codrnanorm'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='codrnaNorm', return_X_y=True,
+                        as_frame=False, data_home=dataset_dir)
+    X = pd.DataFrame(X.todense())
+    y = pd.DataFrame(y)
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def epsilon(dataset_dir: Path) -> bool:
+    """
+    Epsilon dataset
+    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
+
+    TaskType:binclass
+    NumberOfFeatures:2000
+    NumberOfInstances:500K
+    """
+    dataset_name = 'epsilon'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url_train = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary' \
+                '/epsilon_normalized.bz2'
+    url_test = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary' \
+               '/epsilon_normalized.t.bz2'
+    local_url_train = os.path.join(dataset_dir, os.path.basename(url_train))
+    local_url_test = os.path.join(dataset_dir, os.path.basename(url_test))
+
+    if not os.path.isfile(local_url_train):
+        logging.info(f'Started loading {dataset_name}, train')
+        retrieve(url_train, local_url_train)
+    if not os.path.isfile(local_url_test):
+        logging.info(f'Started loading {dataset_name}, test')
+        retrieve(url_test, local_url_test)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+    X_train, y_train = load_svmlight_file(local_url_train,
+                                          dtype=np.float32)
+    X_test, y_test = load_svmlight_file(local_url_test,
+                                        dtype=np.float32)
+    X_train = X_train.toarray()
+    X_test = X_test.toarray()
+    y_train[y_train <= 0] = 0
+    y_test[y_test <= 0] = 0
+
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def fraud(dataset_dir: Path) -> bool:
+    """
+    Credit Card Fraud Detection contest
+    https://www.kaggle.com/mlg-ulb/creditcardfraud
+
+    Requires Kaggle API and API token (https://github.com/Kaggle/kaggle-api)
+    Contains missing values as NaN.
+
+    TaskType:binclass
+    NumberOfFeatures:28
+    NumberOfInstances:285K
+    """
+    dataset_name = 'fraud'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    filename = "creditcard.csv"
+    local_url = os.path.join(dataset_dir, filename)
+
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        args = ["kaggle", "datasets", "download", "mlg-ulb/creditcardfraud", "-f",
+                filename, "-p", str(dataset_dir)]
+        _ = subprocess.check_output(args)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    df = pd.read_csv(local_url + ".zip", dtype=np.float32)
+    X = df[[col for col in df.columns if col.startswith('V')]].to_numpy(dtype=np.float32)
+    y = df['Class'].to_numpy(dtype=np.float32)
+    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=77,
+                                                        test_size=0.2,
+                                                        )
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def gisette(dataset_dir: Path) -> bool:
+    """
+    GISETTE is a handwritten digit recognition problem.
+    The problem is to separate the highly confusable digits '4' and '9'.
+    This dataset is one of five datasets of the NIPS 2003 feature selection challenge.
+
+    Classification task. n_classes = 2.
+    gisette X train dataset (6000, 5000)
+    gisette y train dataset (6000, 1)
+    gisette X test dataset  (1000, 5000)
+    gisette y test dataset  (1000, 1)
+    """
+    dataset_name = 'gisette'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    cache_dir = os.path.join(dataset_dir, '_gisette')
+    os.makedirs(cache_dir, exist_ok=True)
+
+    domen_hhtp = 'http://archive.ics.uci.edu/ml/machine-learning-databases/'
+
+    gisette_train_data_url = domen_hhtp + '/gisette/GISETTE/gisette_train.data'
+    filename_train_data = os.path.join(cache_dir, 'gisette_train.data')
+    if not os.path.exists(filename_train_data):
+        retrieve(gisette_train_data_url, filename_train_data)
+
+    gisette_train_labels_url = domen_hhtp + '/gisette/GISETTE/gisette_train.labels'
+    filename_train_labels = os.path.join(cache_dir, 'gisette_train.labels')
+    if not os.path.exists(filename_train_labels):
+        retrieve(gisette_train_labels_url, filename_train_labels)
+
+    gisette_test_data_url = domen_hhtp + '/gisette/GISETTE/gisette_valid.data'
+    filename_test_data = os.path.join(cache_dir, 'gisette_valid.data')
+    if not os.path.exists(filename_test_data):
+        retrieve(gisette_test_data_url, filename_test_data)
+
+    gisette_test_labels_url = domen_hhtp + '/gisette/gisette_valid.labels'
+    filename_test_labels = os.path.join(cache_dir, 'gisette_valid.labels')
+    if not os.path.exists(filename_test_labels):
+        retrieve(gisette_test_labels_url, filename_test_labels)
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    num_cols = 5000
+
+    df_train = pd.read_csv(filename_train_data, header=None)
+    df_labels = pd.read_csv(filename_train_labels, header=None)
+    num_train = 6000
+    x_train_arr = df_train.iloc[:num_train].values
+    x_train = pd.DataFrame(np.array([np.fromstring(
+        elem[0], dtype=int, count=num_cols, sep=' ') for elem in x_train_arr]))  # type: ignore
+    y_train_arr = df_labels.iloc[:num_train].values
+    y_train = pd.DataFrame((y_train_arr > 0).astype(int))
+
+    num_train = 1000
+    df_test = pd.read_csv(filename_test_data, header=None)
+    df_labels = pd.read_csv(filename_test_labels, header=None)
+    x_test_arr = df_test.iloc[:num_train].values
+    x_test = pd.DataFrame(np.array(
+        [np.fromstring(
+            elem[0],
+            dtype=int, count=num_cols, sep=' ')  # type: ignore
+         for elem in x_test_arr]))
+    y_test_arr = df_labels.iloc[:num_train].values
+    y_test = pd.DataFrame((y_test_arr > 0).astype(int))
+
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data.to_numpy())
+    logging.info('dataset gisette is ready.')
+    return True
+
+
+def higgs(dataset_dir: Path) -> bool:
+    """
+    Higgs dataset from UCI machine learning repository
+    https://archive.ics.uci.edu/ml/datasets/HIGGS
+
+    TaskType:binclass
+    NumberOfFeatures:28
+    NumberOfInstances:11M
+    """
+    dataset_name = 'higgs'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz'
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    higgs = pd.read_csv(local_url)
+    X = higgs.iloc[:, 1:].to_numpy(dtype=np.float32)
+    y = higgs.iloc[:, 0].to_numpy(dtype=np.float32)
+    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=77,
+                                                        test_size=0.2,
+                                                        )
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def higgs_one_m(dataset_dir: Path) -> bool:
+    """
+    Higgs dataset from UCI machine learning repository
+    https://archive.ics.uci.edu/ml/datasets/HIGGS
+
+    Only first 1.5M samples is taken
+
+    TaskType:binclass
+    NumberOfFeatures:28
+    NumberOfInstances:1.5M
+    """
+    dataset_name = 'higgs1m'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz'
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    nrows_train, nrows_test, dtype = 1000000, 500000, np.float32
+    data: Any = pd.read_csv(local_url, delimiter=",", header=None,
+                            compression="gzip", dtype=dtype, nrows=nrows_train+nrows_test)
+
+    data = data[list(data.columns[1:])+list(data.columns[0:1])]
+    n_features = data.shape[1]-1
+    train_data = np.ascontiguousarray(data.values[:nrows_train, :n_features], dtype=dtype)
+    train_label = np.ascontiguousarray(data.values[:nrows_train, n_features], dtype=dtype)
+    test_data = np.ascontiguousarray(
+        data.values[nrows_train: nrows_train + nrows_test, : n_features],
+        dtype=dtype)
+    test_label = np.ascontiguousarray(
+        data.values[nrows_train: nrows_train + nrows_test, n_features],
+        dtype=dtype)
+    for data, name in zip((train_data, test_data, train_label, test_label),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def ijcnn(dataset_dir: Path) -> bool:
+    """
+    Author: Danil Prokhorov.
+    libSVM,AAD group
+    Cite: Danil Prokhorov. IJCNN 2001 neural network competition.
+    Slide presentation in IJCNN'01,
+    Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf.
+
+    Classification task. n_classes = 2.
+    ijcnn X train dataset (153344, 22)
+    ijcnn y train dataset (153344, 1)
+    ijcnn X test dataset  (38337,  22)
+    ijcnn y test dataset  (38337,  1)
+    """
+    dataset_name = 'ijcnn'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='ijcnn', return_X_y=True,
+                        as_frame=False, data_home=dataset_dir)
+    X = pd.DataFrame(X.todense())
+    y = pd.DataFrame(y)
+
+    y[y == -1] = 0
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def klaverjas(dataset_dir: Path) -> bool:
+    """
+    Abstract:
+    Klaverjas is an example of the Jack-Nine card games,
+    which are characterized as trick-taking games where the the Jack
+    and nine of the trump suit are the highest-ranking trumps, and
+    the tens and aces of other suits are the most valuable cards
+    of these suits. It is played by four players in two teams.
+
+    Task Information:
+    Classification task. n_classes = 2.
+    klaverjas X train dataset (196308, 32)
+    klaverjas y train dataset (196308, 1)
+    klaverjas X test dataset  (785233, 32)
+    klaverjas y test dataset  (785233, 1)
+    """
+    dataset_name = 'klaverjas'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='Klaverjas2018', return_X_y=True,
+                        as_frame=True, data_home=dataset_dir)
+
+    y = y.cat.codes
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, train_size=0.2, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def santander(dataset_dir: Path) -> bool:
+    """
+    # TODO: add an loading instruction
+    """
+    return False
+
+
+def skin_segmentation(dataset_dir: Path) -> bool:
+    """
+    Abstract:
+    The Skin Segmentation dataset is constructed over B, G, R color space.
+    Skin and Nonskin dataset is generated using skin textures from
+    face images of diversity of age, gender, and race people.
+    Author: Rajen Bhatt, Abhinav Dhall, rajen.bhatt '@' gmail.com, IIT Delhi.
+
+    Classification task. n_classes = 2.
+    skin_segmentation X train dataset (196045, 3)
+    skin_segmentation y train dataset (196045, 1)
+    skin_segmentation X test dataset  (49012,  3)
+    skin_segmentation y test dataset  (49012,  1)
+    """
+    dataset_name = 'skin_segmentation'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='skin-segmentation',
+                        return_X_y=True, as_frame=True, data_home=dataset_dir)
+    y = y.astype(int)
+    y[y == 2] = 0
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
diff --git a/datasets/loader_multiclass.py b/datasets/loader_multiclass.py
new file mode 100644
index 000000000..69b1da1e6
--- /dev/null
+++ b/datasets/loader_multiclass.py
@@ -0,0 +1,290 @@
+# ===============================================================================
+# Copyright 2020-2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===============================================================================
+
+import logging
+import os
+import tarfile
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+from sklearn.datasets import fetch_covtype, fetch_openml
+from sklearn.model_selection import train_test_split
+
+from .loader_utils import count_lines, read_libsvm_msrank, retrieve
+
+
+def connect(dataset_dir: Path) -> bool:
+    """
+    Source:
+    UC Irvine Machine Learning Repository
+    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.htm
+
+    Classification task. n_classes = 3.
+    connect X train dataset (60801, 126)
+    connect y train dataset (60801, 1)
+    connect X test dataset  (6756,  126)
+    connect y test dataset  (6756,  1)
+    """
+    dataset_name = 'connect'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='connect-4', return_X_y=True,
+                        as_frame=False, data_home=dataset_dir)
+    X = pd.DataFrame(X.todense())
+    y = pd.DataFrame(y)
+    y = y.astype(int)
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.1, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def covertype(dataset_dir: Path) -> bool:
+    """
+    Abstract: This is the original version of the famous
+    covertype dataset in ARFF format.
+    Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson
+    Source: [original](https://archive.ics.uci.edu/ml/datasets/covertype)
+
+    Classification task. n_classes = 7.
+    covertype X train dataset (390852, 54)
+    covertype y train dataset (390852, 1)
+    covertype X test dataset  (97713,  54)
+    covertype y test dataset  (97713,  1)
+    """
+    dataset_name = 'covertype'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='covertype', version=3, return_X_y=True,
+                        as_frame=True, data_home=dataset_dir)
+    y = y.astype(int)
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.4, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def covtype(dataset_dir: Path) -> bool:
+    """
+    Cover type dataset from UCI machine learning repository
+    https://archive.ics.uci.edu/ml/datasets/covertype
+
+    y contains 7 unique class labels from 1 to 7 inclusive.
+    TaskType:multiclass
+    NumberOfFeatures:54
+    NumberOfInstances:581012
+    """
+    dataset_name = 'covtype'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    logging.info(f'Started loading {dataset_name}')
+    X, y = fetch_covtype(return_X_y=True)  # pylint: disable=unexpected-keyword-arg
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=77,
+                                                        test_size=0.2,
+                                                        )
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def letters(dataset_dir: Path) -> bool:
+    """
+    http://archive.ics.uci.edu/ml/datasets/Letter+Recognition
+
+    TaskType:multiclass
+    NumberOfFeatures:16
+    NumberOfInstances:20.000
+    """
+    dataset_name = 'letters'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = ('http://archive.ics.uci.edu/ml/machine-learning-databases/' +
+           'letter-recognition/letter-recognition.data')
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    letters = pd.read_csv(local_url, header=None)
+    X = letters.iloc[:, 1:].values
+    y: Any = letters.iloc[:, 0]
+    y = y.astype('category').cat.codes.values
+
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def mlsr(dataset: Path) -> bool:
+    """
+    # TODO: add an loading instruction
+    """
+    return False
+
+
+def mnist(dataset_dir: Path) -> bool:
+    """
+    Abstract:
+    The MNIST database of handwritten digits with 784 features.
+    It can be split in a training set of the first 60,000 examples,
+    and a test set of 10,000 examples
+    Source:
+    Yann LeCun, Corinna Cortes, Christopher J.C. Burges
+    http://yann.lecun.com/exdb/mnist/
+
+    Classification task. n_classes = 10.
+    mnist X train dataset (60000, 784)
+    mnist y train dataset (60000, 1)
+    mnist X test dataset  (10000,  784)
+    mnist y test dataset  (10000,  1)
+    """
+    dataset_name = 'mnist'
+
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='mnist_784', return_X_y=True,
+                        as_frame=True, data_home=dataset_dir)
+    y = y.astype(int)
+    X = X / 255
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=10000, shuffle=False)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def msrank(dataset_dir: Path) -> bool:
+    """
+    Dataset from szilard benchmarks: https://github.com/szilard/GBM-perf
+
+    TaskType:multiclass
+    NumberOfFeatures:137
+    NumberOfInstances:1.2M
+    """
+    dataset_name = 'msrank'
+    os.makedirs(dataset_dir, exist_ok=True)
+    url = "https://storage.mds.yandex.net/get-devtools-opensource/471749/msrank.tar.gz"
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    unzipped_url = os.path.join(dataset_dir, "MSRank")
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    if not os.path.isdir(unzipped_url):
+        logging.info(f'{dataset_name} is loaded, unzipping...')
+        tar = tarfile.open(local_url, "r:gz")
+        tar.extractall(dataset_dir)
+        tar.close()
+    logging.info(f'{dataset_name} is unzipped, started parsing...')
+
+    sets = []
+    labels = []
+    n_features = 137
+
+    for set_name in ['train.txt', 'vali.txt', 'test.txt']:
+        file_name = os.path.join(unzipped_url, set_name)
+
+        n_samples = count_lines(file_name)
+        with open(file_name, 'r') as file_obj:
+            X, y = read_libsvm_msrank(file_obj, n_samples, n_features, np.float32)
+
+        sets.append(X)
+        labels.append(y)
+
+    sets[0] = np.vstack((sets[0], sets[1]))
+    labels[0] = np.hstack((labels[0], labels[1]))
+
+    sets = [np.ascontiguousarray(sets[i]) for i in [0, 2]]
+    labels = [np.ascontiguousarray(labels[i]) for i in [0, 2]]
+
+    for data, name in zip((sets[0], sets[1], labels[0], labels[1]),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def plasticc(dataset_dir: Path) -> bool:
+    """
+    # TODO: add an loading instruction
+    """
+    return False
+
+
+def sensit(dataset_dir: Path) -> bool:
+    """
+    Abstract: Vehicle classification in distributed sensor networks.
+    Author: M. Duarte, Y. H. Hu
+    Source: [original](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets)
+
+    Multiclass classification task
+    sensit X train dataset (78822, 100)
+    sensit y train dataset (78822, 1)
+    sensit X test dataset  (19706, 100)
+    sensit y test dataset  (19706, 1)
+    """
+    dataset_name = 'sensit'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    X, y = fetch_openml(name='SensIT-Vehicle-Combined',
+                        return_X_y=True, as_frame=False, data_home=dataset_dir)
+    X = pd.DataFrame(X.todense())
+    y = pd.DataFrame(y)
+    y = y.astype(int)
+
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    x_train, x_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42)
+    for data, name in zip((x_train, x_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
diff --git a/datasets/loader_regression.py b/datasets/loader_regression.py
new file mode 100644
index 000000000..c19cdf55c
--- /dev/null
+++ b/datasets/loader_regression.py
@@ -0,0 +1,102 @@
+# ===============================================================================
+# Copyright 2020-2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===============================================================================
+
+import logging
+import os
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+from sklearn.model_selection import train_test_split
+
+from .loader_utils import retrieve
+
+
+def abalone(dataset_dir: Path) -> bool:
+    """
+    https://archive.ics.uci.edu/ml/machine-learning-databases/abalone
+
+    TaskType:regression
+    NumberOfFeatures:8
+    NumberOfInstances:4177
+    """
+    dataset_name = 'abalone'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    abalone: Any = pd.read_csv(local_url, header=None)
+    abalone[0] = abalone[0].astype('category').cat.codes
+    X = abalone.iloc[:, :-1].values
+    y = abalone.iloc[:, -1].values
+
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
+
+
+def mortgage_first_q(dataset_dir: Path) -> bool:
+    """
+    # TODO: add an loading instruction
+    """
+    return False
+
+
+def year_prediction_msd(dataset_dir: Path) -> bool:
+    """
+    YearPredictionMSD dataset from UCI repository
+    https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd
+
+    TaskType:regression
+    NumberOfFeatures:90
+    NumberOfInstances:515345
+    """
+    dataset_name = 'year_prediction_msd'
+    os.makedirs(dataset_dir, exist_ok=True)
+
+    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt' \
+          '.zip'
+    local_url = os.path.join(dataset_dir, os.path.basename(url))
+    if not os.path.isfile(local_url):
+        logging.info(f'Started loading {dataset_name}')
+        retrieve(url, local_url)
+    logging.info(f'{dataset_name} is loaded, started parsing...')
+
+    year = pd.read_csv(local_url, header=None)
+    X = year.iloc[:, 1:].to_numpy(dtype=np.float32)
+    y = year.iloc[:, 0].to_numpy(dtype=np.float32)
+
+    X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=False,
+                                                        train_size=463715,
+                                                        test_size=51630)
+
+    for data, name in zip((X_train, X_test, y_train, y_test),
+                          ('x_train', 'x_test', 'y_train', 'y_test')):
+        filename = f'{dataset_name}_{name}.npy'
+        np.save(os.path.join(dataset_dir, filename), data)
+    logging.info(f'dataset {dataset_name} is ready.')
+    return True
diff --git a/datasets/loader_utils.py b/datasets/loader_utils.py
new file mode 100755
index 000000000..29366eccb
--- /dev/null
+++ b/datasets/loader_utils.py
@@ -0,0 +1,76 @@
+# ===============================================================================
+# Copyright 2020-2021 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===============================================================================
+
+import re
+from urllib.request import urlretrieve
+
+import numpy as np
+import tqdm
+
+pbar: tqdm.tqdm = None
+
+
+def _show_progress(block_num: int, block_size: int, total_size: int) -> None:
+    global pbar
+    if pbar is None:
+        pbar = tqdm.tqdm(total=total_size / 1024, unit='kB')
+
+    downloaded = block_num * block_size
+    if downloaded < total_size:
+        pbar.update(block_size / 1024)
+    else:
+        pbar.close()
+        pbar = None
+
+
+def retrieve(url: str, filename: str) -> None:
+    urlretrieve(url, filename, reporthook=_show_progress)
+
+
+def read_libsvm_msrank(file_obj, n_samples, n_features, dtype):
+    X = np.zeros((n_samples, n_features))
+    y = np.zeros((n_samples,))
+
+    counter = 0
+
+    regexp = re.compile(r'[A-Za-z0-9]+:(-?\d*\.?\d+)')
+
+    for line in file_obj:
+        line = str(line).replace("\\n'", "")
+        line = regexp.sub(r'\g<1>', line)
+        line = line.rstrip(" \n\r").split(' ')
+
+        y[counter] = int(line[0])
+        X[counter] = [float(i) for i in line[1:]]
+
+        counter += 1
+        if counter == n_samples:
+            break
+
+    return np.array(X, dtype=dtype), np.array(y, dtype=dtype)
+
+
+def _make_gen(reader):
+    b = reader(1024 * 1024)
+    while b:
+        yield b
+        b = reader(1024 * 1024)
+
+
+def count_lines(filename):
+    with open(filename, 'rb') as f:
+        f_gen = _make_gen(f.read)
+        return sum(buf.count(b'\n') for buf in f_gen)
diff --git a/runner.py b/runner.py
index a77c29a51..c4cba2449 100755
--- a/runner.py
+++ b/runner.py
@@ -18,35 +18,12 @@
 import json
 import logging
 import os
-import pathlib
 import socket
 import sys
+from typing import Any, Dict, List, Union
 
 import datasets.make_datasets as make_datasets
 import utils
-from datasets.load_datasets import try_load_dataset
-
-
-def generate_cases(params):
-    '''
-    Generate cases for benchmarking by iterating of
-    parameters values
-    '''
-    global cases
-    if len(params) == 0:
-        return cases
-    prev_length = len(cases)
-    param_name = list(params.keys())[0]
-    n_param_values = len(params[param_name])
-    cases = cases * n_param_values
-    dashes = '-' if len(param_name) == 1 else '--'
-    for i in range(n_param_values):
-        for j in range(prev_length):
-            cases[prev_length * i + j] += f' {dashes}{param_name} ' \
-                + f'{params[param_name][i]}'
-    del params[param_name]
-    generate_cases(params)
-
 
 if __name__ == '__main__':
     parser = argparse.ArgumentParser()
@@ -54,7 +31,7 @@ def generate_cases(params):
                         default='configs/config_example.json',
                         help='Path to configuration files')
     parser.add_argument('--dummy-run', default=False, action='store_true',
-                        help='Run configuration parser and datasets generation'
+                        help='Run configuration parser and datasets generation '
                              'without benchmarks running')
     parser.add_argument('--no-intel-optimized', default=False, action='store_true',
                         help='Use no intel optimized version. '
@@ -69,7 +46,6 @@ def generate_cases(params):
                         help='Create an Excel report based on benchmarks results. '
                              'Need "openpyxl" library')
     args = parser.parse_args()
-    env = os.environ.copy()
 
     logging.basicConfig(
         stream=sys.stdout, format='%(levelname)s: %(message)s', level=args.verbose)
@@ -78,7 +54,7 @@ def generate_cases(params):
     # make directory for data if it doesn't exist
     os.makedirs('data', exist_ok=True)
 
-    json_result = {
+    json_result: Dict[str, Union[Dict[str, Any], List[Any]]] = {
         'hardware': utils.get_hw_parameters(),
         'software': utils.get_sw_parameters(),
         'results': []
@@ -90,51 +66,39 @@ def generate_cases(params):
         with open(config_name, 'r') as config_file:
             config = json.load(config_file)
 
-        if 'omp_env' not in config.keys():
-            config['omp_env'] = []
         # get parameters that are common for all cases
         common_params = config['common']
         for params_set in config['cases']:
-            cases = ['']
             params = common_params.copy()
             params.update(params_set.copy())
             algorithm = params['algorithm']
             libs = params['lib']
+            if not isinstance(libs, list):
+                libs = [libs]
             del params['dataset'], params['algorithm'], params['lib']
-            generate_cases(params)
+            cases = utils.generate_cases(params)
             logging.info(f'{algorithm} algorithm: {len(libs) * len(cases)} case(s),'
                          f' {len(params_set["dataset"])} dataset(s)\n')
 
             for dataset in params_set['dataset']:
                 if dataset['source'] in ['csv', 'npy']:
-                    train_data = dataset["training"]
-                    file_train_data_x = train_data["x"]
-                    paths = f'--file-X-train {file_train_data_x}'
-                    if 'y' in dataset['training'].keys():
-                        file_train_data_y = train_data["y"]
-                        paths += f' --file-y-train {file_train_data_y}'
-                    if 'testing' in dataset.keys():
-                        test_data = dataset["testing"]
-                        file_test_data_x = test_data["x"]
-                        paths += f' --file-X-test {file_test_data_x}'
-                        if 'y' in dataset['testing'].keys():
-                            file_test_data_y = test_data["y"]
-                            paths += f' --file-y-test {file_test_data_y}'
-                    if 'name' in dataset.keys():
-                        dataset_name = dataset['name']
-                    else:
-                        dataset_name = 'unknown'
-
-                    if not utils.is_exists_files([file_train_data_x]):
-                        directory_dataset = pathlib.Path(file_train_data_x).parent
-                        if not try_load_dataset(dataset_name=dataset_name,
-                                                output_directory=directory_dataset):
-                            logging.warning(f'Dataset {dataset_name} '
-                                            'could not be loaded. \n'
-                                            'Check the correct name or expand '
-                                            'the download in the folder dataset.')
-                            continue
-
+                    dataset_name = dataset['name'] if 'name' in dataset else 'unknown'
+                    if 'training' not in dataset or \
+                        'x' not in dataset['training'] or \
+                        not utils.find_the_dataset(dataset_name,
+                                                   dataset['training']['x']):
+                        logging.warning(
+                            f'Dataset {dataset_name} could not be loaded. \n'
+                            'Check the correct name or expand the download in '
+                            'the folder dataset.')
+                        continue
+                    paths = '--file-X-train ' + dataset['training']["x"]
+                    if 'y' in dataset['training']:
+                        paths += ' --file-y-train ' + dataset['training']["y"]
+                    if 'testing' in dataset:
+                        paths += ' --file-X-test ' + dataset["testing"]["x"]
+                        if 'y' in dataset['testing']:
+                            paths += ' --file-y-test ' + dataset["testing"]["y"]
                 elif dataset['source'] == 'synthetic':
                     class GenerationArgs:
                         classes: int
@@ -151,7 +115,7 @@ class GenerationArgs:
                     gen_args = GenerationArgs()
                     paths = ''
 
-                    if 'seed' in params_set.keys():
+                    if 'seed' in params_set:
                         gen_args.seed = params_set['seed']
                     else:
                         gen_args.seed = 777
@@ -161,10 +125,10 @@ class GenerationArgs:
                     gen_args.type = dataset['type']
                     gen_args.samples = dataset['training']['n_samples']
                     gen_args.features = dataset['n_features']
-                    if 'n_classes' in dataset.keys():
+                    if 'n_classes' in dataset:
                         gen_args.classes = dataset['n_classes']
                         cls_num_for_file = f'-{dataset["n_classes"]}'
-                    elif 'n_clusters' in dataset.keys():
+                    elif 'n_clusters' in dataset:
                         gen_args.clusters = dataset['n_clusters']
                         cls_num_for_file = f'-{dataset["n_clusters"]}'
                     else:
@@ -179,7 +143,7 @@ class GenerationArgs:
                         gen_args.filey = f'{file_prefix}y-train{file_postfix}'
                         paths += f' --file-y-train {gen_args.filey}'
 
-                    if 'testing' in dataset.keys():
+                    if 'testing' in dataset:
                         gen_args.test_samples = dataset['testing']['n_samples']
                         gen_args.filextest = f'{file_prefix}X-test{file_postfix}'
                         paths += f' --file-X-test {gen_args.filextest}'
@@ -204,26 +168,20 @@ class GenerationArgs:
                     logging.warning('Unknown dataset source. Only synthetics datasets '
                                     'and csv/npy files are supported now')
 
-                omp_env = utils.get_omp_env()
                 no_intel_optimize = \
                     '--no-intel-optimized ' if args.no_intel_optimized else ''
                 for lib in libs:
-                    env = os.environ.copy()
-                    if lib == 'xgboost':
-                        for var in config['omp_env']:
-                            env[var] = omp_env[var]
                     for i, case in enumerate(cases):
                         command = f'python {lib}_bench/{algorithm}.py ' \
                             + no_intel_optimize \
                             + f'--arch {hostname} {case} {paths} ' \
                             + f'--dataset-name {dataset_name}'
-                        while '  ' in command:
-                            command = command.replace('  ', ' ')
+                        command = ' '.join(command.split())
                         logging.info(command)
                         if not args.dummy_run:
                             case = f'{lib},{algorithm} ' + case
                             stdout, stderr = utils.read_output_from_command(
-                                command, env=env)
+                                command, env=os.environ.copy())
                             stdout, extra_stdout = utils.filter_stdout(stdout)
                             stderr = utils.filter_stderr(stderr)
 
@@ -233,8 +191,8 @@ class GenerationArgs:
                                 stderr += f'CASE {case} EXTRA OUTPUT:\n' \
                                     + f'{extra_stdout}\n'
                             try:
-                                json_result['results'].extend(
-                                    json.loads(stdout))
+                                if isinstance(json_result['results'], list):
+                                    json_result['results'].extend(json.loads(stdout))
                             except json.JSONDecodeError as decoding_exception:
                                 stderr += f'CASE {case} JSON DECODING ERROR:\n' \
                                     + f'{decoding_exception}\n{stdout}\n'
diff --git a/sklearn_bench/README.md b/sklearn_bench/README.md
index b21da94da..8cca0f81d 100644
--- a/sklearn_bench/README.md
+++ b/sklearn_bench/README.md
@@ -1,15 +1,14 @@
-
-## How to create conda environment for benchmarking
+# How to create conda environment for benchmarking
 
 If you want to test scikit-learn, then use
 
 ```bash
 pip install -r sklearn_bench/requirements.txt
 # or
-conda install -c intel scikit-learn scikit-learn-intelex pandas
+conda install -c intel scikit-learn scikit-learn-intelex pandas tqdm
 ```
 
-##  Algorithms parameters
+## Algorithms parameters
 
 You can launch benchmarks for each algorithm separately. The tables below list all supported parameters for each algorithm:
 
@@ -27,7 +26,8 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 - [SVC](#svc)
 - [train_test_split](#train_test_split)
 
-#### General
+### General
+
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
 |num-threads|int|-1| The number of threads to use|
@@ -50,14 +50,14 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 |seed|int|12345|Seed to pass as random_state|
 |dataset-name|str|None|Dataset name|
 
+### DBSCAN
 
-#### DBSCAN
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
 | epsilon | float | 10 | Radius of neighborhood of a point|
 | min_samples | int | 5 | The minimum number of samples required in a 'neighborhood to consider a point a core point |
 
-#### RandomForestClassifier
+### RandomForestClassifier
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -70,7 +70,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | min-impurity-decrease | float | 0 | Needed impurity decrease for node splitting |
 | no-bootstrap | store_false | True | Don't control bootstraping |
 
-#### RandomForestRegressor
+### RandomForestRegressor
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -84,13 +84,13 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | no-bootstrap | action | True | Don't control bootstraping |
 | use-sklearn-class | action |  | Force use of sklearn.ensemble.RandomForestClassifier |
 
-#### pairwise_distances
+### pairwise_distances
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
 | metric | str | cosine | *cosine* or *correlation* Metric to test for pairwise distances |
 
-#### KMeans
+### KMeans
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -99,7 +99,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | maxiter | inte | 100 | Maximum number of iterations |
 | n-clusters | int |  | The number of clusters |
 
-#### KNeighborsClassifier
+### KNeighborsClassifier
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -108,13 +108,13 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | method | str | brute | Algorithm used to compute the nearest neighbors |
 | metric | str | euclidean | Distance metric to use |
 
-#### LinearRegression
+### LinearRegression
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
 | no-fit-intercept | action | True | Don't fit intercept (assume data already centered) |
 
-#### LogisticRegression
+### LogisticRegression
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -125,7 +125,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | C | float | 1.0 | Regularization parameter |
 | tol | float | None | Tolerance for solver |
 
-#### PCA
+### PCA
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -133,7 +133,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | n-components | int | None | The number of components to find |
 | whiten | action | False | Perform whitening |
 
-#### Ridge
+### Ridge
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -141,7 +141,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | solver | str | auto | Solver used for training |
 | alpha | float | 1.0 | Regularization strength |
 
-#### SVC
+### SVC
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
@@ -152,7 +152,7 @@ You can launch benchmarks for each algorithm separately. The tables below list a
 | tol | float | 1e-16 | Tolerance passed to sklearn.svm.SVC |
 | probability | action | True | Use probability for SVC |
 
-#### train_test_split
+### train_test_split
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
diff --git a/sklearn_bench/requirements.txt b/sklearn_bench/requirements.txt
index d25373e5e..28c7de80d 100755
--- a/sklearn_bench/requirements.txt
+++ b/sklearn_bench/requirements.txt
@@ -2,3 +2,4 @@ scikit-learn
 pandas
 scikit-learn-intelex
 openpyxl
+tqdm
diff --git a/utils.py b/utils.py
index 40eef714e..5593ef443 100755
--- a/utils.py
+++ b/utils.py
@@ -15,24 +15,25 @@
 # ===============================================================================
 
 import json
-import logging
-import multiprocessing
 import os
+import pathlib
 import platform
 import subprocess
 import sys
+from typing import Any, Dict, List, Tuple, Union, cast
 
+from datasets.load_datasets import try_load_dataset
 
-def filter_stderr(text):
+
+def filter_stderr(text: str) -> str:
     # delete 'Intel(R) Extension for Scikit-learn usage in sklearn' messages
-    fake_error_message = 'Intel(R) Extension for Scikit-learn* enabled ' + \
-                         '(https://github.com/intel/scikit-learn-intelex)'
-    while fake_error_message in text:
-        text = text.replace(fake_error_message, '')
-    return text
+    fake_error_message = ('Intel(R) Extension for Scikit-learn* enabled ' +
+                          '(https://github.com/intel/scikit-learn-intelex)')
+
+    return ''.join(text.split(fake_error_message))
 
 
-def filter_stdout(text):
+def filter_stdout(text: str) -> Tuple[str, str]:
     verbosity_letters = 'EWIDT'
     filtered, extra = '', ''
     for line in text.split('\n'):
@@ -50,14 +51,13 @@ def filter_stdout(text):
     return filtered, extra
 
 
-def is_exists_files(files):
-    for f in files:
-        if not os.path.isfile(f):
-            return False
-    return True
+def find_the_dataset(name: str, fullpath: str) -> bool:
+    return os.path.isfile(fullpath) or try_load_dataset(
+        dataset_name=name, output_directory=pathlib.Path(fullpath).parent)
 
 
-def read_output_from_command(command, env=os.environ.copy()):
+def read_output_from_command(command: str,
+                             env: Dict[str, str] = os.environ.copy()) -> Tuple[str, str]:
     if "PYTHONPATH" in env:
         env["PYTHONPATH"] += ":" + os.path.dirname(os.path.abspath(__file__))
     else:
@@ -67,104 +67,74 @@ def read_output_from_command(command, env=os.environ.copy()):
     return res.stdout[:-1], res.stderr[:-1]
 
 
-def _is_ht_enabled():
+def parse_lscpu_lscl_info(command_output: str) -> Dict[str, str]:
+    res: Dict[str, str] = {}
+    for elem in command_output.strip().split('\n'):
+        splt = elem.split(':')
+        res[splt[0]] = splt[1]
+    return res
+
+
+def get_hw_parameters() -> Dict[str, Union[Dict[str, Any], float]]:
+    if 'Linux' not in platform.platform():
+        return {}
+
+    hw_params: Dict[str, Union[Dict[str, str], float]] = {'CPU': {}}
+    # get CPU information
+    lscpu_info, _ = read_output_from_command('lscpu')
+    lscpu_info = ' '.join(lscpu_info.split())
+    for line in lscpu_info.split('\n'):
+        k, v = line.split(": ")[:2]
+        if k == 'CPU MHz':
+            continue
+        cast(Dict[str, str], hw_params['CPU'])[k] = v
+
+    # get RAM size
+    mem_info, _ = read_output_from_command('free -b')
+    mem_info = mem_info.split('\n')[1]
+    mem_info = ' '.join(mem_info.split())
+    hw_params['RAM size[GB]'] = int(mem_info.split(' ')[1]) / 2 ** 30
+
+    # get Intel GPU information
     try:
-        cpu_info, _ = read_output_from_command('lscpu')
-        cpu_info = cpu_info.split('\n')
-        for el in cpu_info:
-            if 'Thread(s) per core' in el:
-                threads_per_core = int(el[-1])
-                if threads_per_core > 1:
-                    return True
-                else:
-                    return False
-        return False
-    except FileNotFoundError:
-        logging.info('Impossible to check hyperthreading via lscpu')
-        return False
-
-
-def get_omp_env():
-    cpu_count = multiprocessing.cpu_count()
-    omp_num_threads = str(cpu_count // 2) if _is_ht_enabled() else str(cpu_count)
-
-    omp_env = {
-        'OMP_PLACES': f'{{0}}:{cpu_count}:1',
-        'OMP_NUM_THREADS': omp_num_threads
-    }
-    return omp_env
-
-
-def parse_lscpu_lscl_info(command_output):
-    command_output = command_output.strip().split('\n')
-    for i in range(len(command_output)):
-        command_output[i] = command_output[i].split(':')
-    return {line[0].strip(): line[1].strip() for line in command_output}
-
-
-def get_hw_parameters():
-    hw_params = {}
-
-    if 'Linux' in platform.platform():
-        # get CPU information
-        lscpu_info, _ = read_output_from_command('lscpu')
-        hw_params.update({'CPU': parse_lscpu_lscl_info(lscpu_info)})
-        if 'CPU MHz' in hw_params['CPU'].keys():
-            del hw_params['CPU']['CPU MHz']
-
-        # get RAM size
-        mem_info, _ = read_output_from_command('free -b')
-        mem_info = mem_info.split('\n')[1]
-        while '  ' in mem_info:
-            mem_info = mem_info.replace('  ', ' ')
-        mem_info = int(mem_info.split(' ')[1]) / 2 ** 30
-        hw_params.update({'RAM size[GB]': mem_info})
-
-        # get Intel GPU information
-        try:
-            lsgpu_info, _ = read_output_from_command(
-                'lscl --device-type=gpu --platform-vendor=Intel')
-            device_num = 0
-            start_idx = lsgpu_info.find('Device ')
-            while start_idx >= 0:
-                start_idx = lsgpu_info.find(':', start_idx) + 1
-                end_idx = lsgpu_info.find('Device ', start_idx)
-                platform_info = parse_lscpu_lscl_info(lsgpu_info[start_idx:end_idx])
-                hw_params.update({f'GPU Intel #{device_num + 1}': platform_info})
-                device_num += 1
-                start_idx = end_idx
-        except (FileNotFoundError, json.JSONDecodeError):
-            pass
-
-        # get Nvidia GPU information
-        try:
-            gpu_info, _ = read_output_from_command(
-                'nvidia-smi --query-gpu=name,memory.total,driver_version,pstate '
-                '--format=csv,noheader')
-            gpu_info = gpu_info.split(', ')
-            hw_params.update({
-                'GPU Nvidia': {
-                    'Name': gpu_info[0],
-                    'Memory size': gpu_info[1],
-                    'Performance mode': gpu_info[3]
-                }
-            })
-        except (FileNotFoundError, json.JSONDecodeError):
-            pass
+        lsgpu_info, _ = read_output_from_command(
+            'lscl --device-type=gpu --platform-vendor=Intel')
+        device_num = 0
+        start_idx = lsgpu_info.find('Device ')
+        while start_idx >= 0:
+            start_idx = lsgpu_info.find(':', start_idx) + 1
+            end_idx = lsgpu_info.find('Device ', start_idx)
+            hw_params[f'GPU Intel #{device_num + 1}'] = parse_lscpu_lscl_info(
+                lsgpu_info[start_idx: end_idx])
+            device_num += 1
+            start_idx = end_idx
+    except (FileNotFoundError, json.JSONDecodeError):
+        pass
 
+    # get Nvidia GPU information
+    try:
+        gpu_info, _ = read_output_from_command(
+            'nvidia-smi --query-gpu=name,memory.total,driver_version,pstate '
+            '--format=csv,noheader')
+        gpu_info_arr = gpu_info.split(', ')
+        hw_params['GPU Nvidia'] = {
+            'Name': gpu_info_arr[0],
+            'Memory size': gpu_info_arr[1],
+            'Performance mode': gpu_info_arr[3]
+        }
+    except (FileNotFoundError, json.JSONDecodeError):
+        pass
     return hw_params
 
 
-def get_sw_parameters():
+def get_sw_parameters() -> Dict[str, Dict[str, Any]]:
     sw_params = {}
     try:
         gpu_info, _ = read_output_from_command(
             'nvidia-smi --query-gpu=name,memory.total,driver_version,pstate '
             '--format=csv,noheader')
-        gpu_info = gpu_info.split(', ')
-
-        sw_params.update(
-            {'GPU_driver': {'version': gpu_info[2]}})
+        info_arr = gpu_info.split(', ')
+        sw_params['GPU_driver'] = {'version': info_arr[2]}
         # alert if GPU is already running any processes
         gpu_processes, _ = read_output_from_command(
             'nvidia-smi --query-compute-apps=name,pid,used_memory '
@@ -179,14 +149,35 @@ def get_sw_parameters():
     try:
         conda_list, _ = read_output_from_command('conda list --json')
         needed_columns = ['version', 'build_string', 'channel']
-        conda_list = json.loads(conda_list)
-        for pkg in conda_list:
+        conda_list_json: List[Dict[str, str]] = json.loads(conda_list)
+        for pkg in conda_list_json:
             pkg_info = {}
             for col in needed_columns:
-                if col in pkg.keys():
-                    pkg_info.update({col: pkg[col]})
-            sw_params.update({pkg['name']: pkg_info})
+                if col in pkg:
+                    pkg_info[col] = pkg[col]
+            sw_params[pkg['name']] = pkg_info
     except (FileNotFoundError, json.JSONDecodeError):
         pass
 
     return sw_params
+
+
+def generate_cases(params: Dict[str, Union[List[Any], Any]]) -> List[str]:
+    '''
+    Generate cases for benchmarking by iterating the parameter values
+    '''
+    commands = ['']
+    for param, values in params.items():
+        if isinstance(values, list):
+            prev_len = len(commands)
+            commands *= len(values)
+            dashes = '-' if len(param) == 1 else '--'
+            for command_num in range(prev_len):
+                for value_num in range(len(values)):
+                    commands[prev_len * value_num + command_num] += ' ' + \
+                        dashes + param + ' ' + str(values[value_num])
+        else:
+            dashes = '-' if len(param) == 1 else '--'
+            for command_num in range(len(commands)):
+                commands[command_num] += ' ' + dashes + param + ' ' + str(values)
+    return commands
diff --git a/xgboost_bench/README.md b/xgboost_bench/README.md
index 2b4e93ec5..45f27be87 100644
--- a/xgboost_bench/README.md
+++ b/xgboost_bench/README.md
@@ -1,16 +1,17 @@
-## How to create conda environment for benchmarking
+# How to create conda environment for benchmarking
 
 ```bash
 pip install -r xgboost_bench/requirements.txt
 # or
-conda install -c conda-forge xgboost pandas
+conda install -c intel scikit-learn scikit-learn-intelex pandas tqdm
 ```
 
-##  Algorithms parameters
+## Algorithms parameters
 
 You can launch benchmarks for each algorithm separately. The table below lists all supported parameters for each algorithm.
 
-#### General
+### General
+
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
 |num-threads|int|-1| The number of threads to use|
@@ -33,7 +34,7 @@ You can launch benchmarks for each algorithm separately. The table below lists a
 |seed|int|12345|Seed to pass as random_state|
 |dataset-name|str|None|Dataset name|
 
-#### GradientBoostingTrees
+### GradientBoostingTrees
 
 | parameter Name  | Type | default value | description |
 | ----- | ---- |---- |---- |
diff --git a/xgboost_bench/gbt.py b/xgboost_bench/gbt.py
index c903e6008..0c44acfaa 100644
--- a/xgboost_bench/gbt.py
+++ b/xgboost_bench/gbt.py
@@ -15,7 +15,6 @@
 # ===============================================================================
 
 import argparse
-import os
 
 import bench
 import numpy as np
@@ -34,57 +33,60 @@ def convert_xgb_predictions(y_pred, objective):
     return y_pred
 
 
-parser = argparse.ArgumentParser(description='xgboost gradient boosted trees '
-                                             'benchmark')
+parser = argparse.ArgumentParser(description='xgboost gradient boosted trees benchmark')
+
 
-parser.add_argument('--n-estimators', type=int, default=100,
-                    help='Number of gradient boosted trees')
-parser.add_argument('--learning-rate', '--eta', type=float, default=0.3,
-                    help='Step size shrinkage used in update '
-                         'to prevents overfitting')
-parser.add_argument('--min-split-loss', '--gamma', type=float, default=0,
-                    help='Minimum loss reduction required to make'
-                         ' partition on a leaf node')
-parser.add_argument('--max-depth', type=int, default=6,
-                    help='Maximum depth of a tree')
-parser.add_argument('--min-child-weight', type=float, default=1,
-                    help='Minimum sum of instance weight needed in a child')
-parser.add_argument('--max-delta-step', type=float, default=0,
-                    help='Maximum delta step we allow each leaf output to be')
-parser.add_argument('--subsample', type=float, default=1,
-                    help='Subsample ratio of the training instances')
 parser.add_argument('--colsample-bytree', type=float, default=1,
                     help='Subsample ratio of columns '
                          'when constructing each tree')
-parser.add_argument('--reg-lambda', type=float, default=1,
-                    help='L2 regularization term on weights')
-parser.add_argument('--reg-alpha', type=float, default=0,
-                    help='L1 regularization term on weights')
-parser.add_argument('--tree-method', type=str, required=True,
-                    help='The tree construction algorithm used in XGBoost')
-parser.add_argument('--scale-pos-weight', type=float, default=1,
-                    help='Controls a balance of positive and negative weights')
+parser.add_argument('--count-dmatrix', default=False, action='store_true',
+                    help='Count DMatrix creation in time measurements')
+parser.add_argument('--enable-experimental-json-serialization', default=True,
+                    choices=('True', 'False'), help='Use JSON to store memory snapshots')
 parser.add_argument('--grow-policy', type=str, default='depthwise',
                     help='Controls a way new nodes are added to the tree')
-parser.add_argument('--max-leaves', type=int, default=0,
-                    help='Maximum number of nodes to be added')
+parser.add_argument('--inplace-predict', default=False, action='store_true',
+                    help='Perform inplace_predict instead of default')
+parser.add_argument('--learning-rate', '--eta', type=float, default=0.3,
+                    help='Step size shrinkage used in update '
+                         'to prevents overfitting')
 parser.add_argument('--max-bin', type=int, default=256,
                     help='Maximum number of discrete bins to '
                          'bucket continuous features')
+parser.add_argument('--max-delta-step', type=float, default=0,
+                    help='Maximum delta step we allow each leaf output to be')
+parser.add_argument('--max-depth', type=int, default=6,
+                    help='Maximum depth of a tree')
+parser.add_argument('--max-leaves', type=int, default=0,
+                    help='Maximum number of nodes to be added')
+parser.add_argument('--min-child-weight', type=float, default=1,
+                    help='Minimum sum of instance weight needed in a child')
+parser.add_argument('--min-split-loss', '--gamma', type=float, default=0,
+                    help='Minimum loss reduction required to make'
+                         ' partition on a leaf node')
+parser.add_argument('--n-estimators', type=int, default=100,
+                    help='The number of gradient boosted trees')
 parser.add_argument('--objective', type=str, required=True,
                     choices=('reg:squarederror', 'binary:logistic',
                              'multi:softmax', 'multi:softprob'),
-                    help='Control a balance of positive and negative weights')
-parser.add_argument('--count-dmatrix', default=False, action='store_true',
-                    help='Count DMatrix creation in time measurements')
-parser.add_argument('--inplace-predict', default=False, action='store_true',
-                    help='Perform inplace_predict instead of default')
+                    help='Specifies the learning task')
+parser.add_argument('--reg-alpha', type=float, default=0,
+                    help='L1 regularization term on weights')
+parser.add_argument('--reg-lambda', type=float, default=1,
+                    help='L2 regularization term on weights')
+parser.add_argument('--scale-pos-weight', type=float, default=1,
+                    help='Controls a balance of positive and negative weights')
 parser.add_argument('--single-precision-histogram', default=False, action='store_true',
                     help='Build histograms instead of double precision')
-parser.add_argument('--enable-experimental-json-serialization', default=True,
-                    choices=('True', 'False'), help='Use JSON to store memory snapshots')
+parser.add_argument('--subsample', type=float, default=1,
+                    help='Subsample ratio of the training instances')
+parser.add_argument('--tree-method', type=str, required=True,
+                    help='The tree construction algorithm used in XGBoost')
 
 params = bench.parse_args(parser)
+# Default seed
+if params.seed == 12345:
+    params.seed = 0
 
 # Load and convert data
 X_train, X_test, y_train, y_test = bench.load_data(params)
@@ -119,9 +121,6 @@ def convert_xgb_predictions(y_pred, objective):
 if params.threads != -1:
     xgb_params.update({'nthread': params.threads})
 
-if 'OMP_NUM_THREADS' in os.environ.keys():
-    xgb_params['nthread'] = int(os.environ['OMP_NUM_THREADS'])
-
 if params.objective.startswith('reg'):
     task = 'regression'
     metric_name, metric_func = 'rmse', bench.rmse_score
@@ -133,6 +132,11 @@ def convert_xgb_predictions(y_pred, objective):
         params.n_classes = y_train[y_train.columns[0]].nunique()
     else:
         params.n_classes = len(np.unique(y_train))
+
+    # Covtype has one class more than there is in train
+    if params.dataset_name == 'covtype':
+        params.n_classes += 1
+
     if params.n_classes > 2:
         xgb_params['num_class'] = params.n_classes
 
@@ -170,4 +174,4 @@ def predict():
                    params=params, functions=['gbt.fit', 'gbt.predict'],
                    times=[fit_time, predict_time], accuracy_type=metric_name,
                    accuracies=[train_metric, test_metric], data=[X_train, X_test],
-                   alg_instance=booster)
+                   alg_instance=booster, alg_params=xgb_params)
diff --git a/xgboost_bench/requirements.txt b/xgboost_bench/requirements.txt
index 1540ec04f..79bc07cc5 100755
--- a/xgboost_bench/requirements.txt
+++ b/xgboost_bench/requirements.txt
@@ -2,3 +2,4 @@ scikit-learn
 pandas
 xgboost
 openpyxl
+tqdm