Skip to content

Commit 813cca5

Browse files
authored
Refactor code (#97)
* refactoring * remove global * refactor * fix typo * refactor? * refactor * refactor * refactor? * refactor? * Revert "refactor?" 3509b1d * undo unecessary change * refactor load_data? * refactor load_data? * undo mistake * undo pbar * rewrite urlretrieve w/o urllib * Revert "rewrite urlretrieve w/o urllib" 9b04139 * Reapply "rewrite urlretrieve w/o urllib" 9b04139 * fix bug * Revert "rewrite urlretrieve w/o urllib" 9b04139 * fix bug * Reapply "rewrite urlretrieve w/o urllib" 9b04139 * rewrite urlretrieve w/o urllib * undo refactoring * add requests to requirements * add requests as requirement * fix line too long * attempt to fix mypy error * add mising params * autopep8 fix * fix wrong indentation lvl * pep8 fixes? * undo if return None change * not use getattr for daal4py * debugging for tsne * undo logging for tsne * ignore daal4py warning * fix typo * suppress FutureWarning * ignore daal4py warning * pep8 fix
1 parent 63defad commit 813cca5

File tree

10 files changed

+144
-122
lines changed

10 files changed

+144
-122
lines changed

azure-pipelines.yml

Lines changed: 76 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,80 +1,80 @@
11
variables:
22
- name: python.version
3-
value: '3.8'
3+
value: "3.8"
44

55
jobs:
6-
- job: Linux_Sklearn
7-
pool:
8-
vmImage: 'ubuntu-20.04'
9-
steps:
10-
- task: UsePythonVersion@0
11-
displayName: 'Use Python $(python.version)'
12-
inputs:
13-
versionSpec: '$(python.version)'
14-
- script: |
15-
pip install -r sklearn_bench/requirements.txt
16-
python runner.py --configs configs/testing/sklearn.json
17-
displayName: Run bench
18-
- job: Linux_XGBoost
19-
pool:
20-
vmImage: 'ubuntu-20.04'
21-
steps:
22-
- task: UsePythonVersion@0
23-
displayName: 'Use Python $(python.version)'
24-
inputs:
25-
versionSpec: '$(python.version)'
26-
- script: |
27-
pip install -r xgboost_bench/requirements.txt
28-
python runner.py --configs configs/testing/xgboost.json --no-intel-optimized
29-
displayName: Run bench
30-
- job: Linux_daal4py
31-
pool:
32-
vmImage: 'ubuntu-20.04'
33-
steps:
34-
- task: UsePythonVersion@0
35-
displayName: 'Use Python $(python.version)'
36-
inputs:
37-
versionSpec: '$(python.version)'
38-
- script: |
39-
pip install -r daal4py_bench/requirements.txt
40-
python runner.py --configs configs/testing/daal4py.json --no-intel-optimized
41-
displayName: Run bench
42-
- job: Linux_XGBoost_and_daal4py
43-
pool:
44-
vmImage: 'ubuntu-20.04'
45-
steps:
46-
- script: |
47-
conda update -y -q conda
48-
conda create -n bench -q -y -c conda-forge python=3.7 pandas xgboost scikit-learn daal4py tqdm
49-
displayName: Create Anaconda environment
50-
- script: |
51-
. /usr/share/miniconda/etc/profile.d/conda.sh
52-
conda activate bench
53-
python runner.py --configs configs/testing/daal4py_xgboost.json --no-intel-optimized
54-
displayName: Run bench
55-
- job: Pep8
56-
pool:
57-
vmImage: 'ubuntu-20.04'
58-
steps:
59-
- task: UsePythonVersion@0
60-
inputs:
61-
versionSpec: '$(python.version)'
62-
addToPath: true
63-
- script: |
64-
python -m pip install --upgrade pip setuptools
65-
pip install flake8
66-
flake8 --max-line-length=100 --count
67-
displayName: 'PEP 8 check'
68-
- job: Mypy
69-
pool:
70-
vmImage: 'ubuntu-20.04'
71-
steps:
72-
- task: UsePythonVersion@0
73-
inputs:
74-
versionSpec: '$(python.version)'
75-
addToPath: true
76-
- script: |
77-
python -m pip install --upgrade pip setuptools
78-
pip install mypy data-science-types
79-
mypy . --ignore-missing-imports
80-
displayName: 'mypy check'
6+
- job: Linux_Sklearn
7+
pool:
8+
vmImage: "ubuntu-20.04"
9+
steps:
10+
- task: UsePythonVersion@0
11+
displayName: "Use Python $(python.version)"
12+
inputs:
13+
versionSpec: "$(python.version)"
14+
- script: |
15+
pip install -r sklearn_bench/requirements.txt
16+
python runner.py --configs configs/testing/sklearn.json
17+
displayName: Run bench
18+
- job: Linux_XGBoost
19+
pool:
20+
vmImage: "ubuntu-20.04"
21+
steps:
22+
- task: UsePythonVersion@0
23+
displayName: "Use Python $(python.version)"
24+
inputs:
25+
versionSpec: "$(python.version)"
26+
- script: |
27+
pip install -r xgboost_bench/requirements.txt
28+
python runner.py --configs configs/testing/xgboost.json --no-intel-optimized
29+
displayName: Run bench
30+
- job: Linux_daal4py
31+
pool:
32+
vmImage: "ubuntu-20.04"
33+
steps:
34+
- task: UsePythonVersion@0
35+
displayName: "Use Python $(python.version)"
36+
inputs:
37+
versionSpec: "$(python.version)"
38+
- script: |
39+
pip install -r daal4py_bench/requirements.txt
40+
python runner.py --configs configs/testing/daal4py.json --no-intel-optimized
41+
displayName: Run bench
42+
- job: Linux_XGBoost_and_daal4py
43+
pool:
44+
vmImage: "ubuntu-20.04"
45+
steps:
46+
- script: |
47+
conda update -y -q conda
48+
conda create -n bench -q -y -c conda-forge python=3.7 pandas xgboost scikit-learn daal4py tqdm requests
49+
displayName: Create Anaconda environment
50+
- script: |
51+
. /usr/share/miniconda/etc/profile.d/conda.sh
52+
conda activate bench
53+
python runner.py --configs configs/testing/daal4py_xgboost.json --no-intel-optimized
54+
displayName: Run bench
55+
- job: Pep8
56+
pool:
57+
vmImage: "ubuntu-20.04"
58+
steps:
59+
- task: UsePythonVersion@0
60+
inputs:
61+
versionSpec: "$(python.version)"
62+
addToPath: true
63+
- script: |
64+
python -m pip install --upgrade pip setuptools
65+
pip install flake8 requests
66+
flake8 --max-line-length=100 --count
67+
displayName: "PEP 8 check"
68+
- job: Mypy
69+
pool:
70+
vmImage: "ubuntu-20.04"
71+
steps:
72+
- task: UsePythonVersion@0
73+
inputs:
74+
versionSpec: "$(python.version)"
75+
addToPath: true
76+
- script: |
77+
python -m pip install --upgrade pip setuptools
78+
pip install mypy data-science-types requests types-requests
79+
mypy . --ignore-missing-imports
80+
displayName: "mypy check"

bench.py

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -389,14 +389,13 @@ def convert_data(data, dtype, data_order, data_format):
389389
# Secondly, change format of data
390390
if data_format == 'numpy':
391391
return data
392-
elif data_format == 'pandas':
392+
if data_format == 'pandas':
393393
import pandas as pd
394394

395395
if data.ndim == 1:
396396
return pd.Series(data)
397-
else:
398-
return pd.DataFrame(data)
399-
elif data_format == 'cudf':
397+
return pd.DataFrame(data)
398+
if data_format == 'cudf':
400399
import cudf
401400
import pandas as pd
402401

@@ -439,16 +438,24 @@ def load_data(params, generated_data=[], add_dtype=False, label_2d=False,
439438
for element in full_data:
440439
file_arg = f'file_{element}'
441440
# load and convert data from npy/csv file if path is specified
441+
new_dtype = int_dtype if 'y' in element and int_label else params.dtype
442442
if param_vars[file_arg] is not None:
443443
if param_vars[file_arg].name.endswith('.npy'):
444444
data = np.load(param_vars[file_arg].name, allow_pickle=True)
445445
else:
446446
data = read_csv(param_vars[file_arg].name, params)
447447
full_data[element] = convert_data(
448448
data,
449-
int_dtype if 'y' in element and int_label else params.dtype,
449+
new_dtype,
450450
params.data_order, params.data_format
451451
)
452+
if full_data[element] is None:
453+
# generate and convert data if it's marked and path isn't specified
454+
if element in generated_data:
455+
full_data[element] = convert_data(
456+
np.random.rand(*params.shape),
457+
new_dtype,
458+
params.data_order, params.data_format)
452459
# generate and convert data if it's marked and path isn't specified
453460
if full_data[element] is None and element in generated_data:
454461
full_data[element] = convert_data(
@@ -522,13 +529,12 @@ def print_output(library, algorithm, stages, params, functions,
522529
result = gen_basic_dict(library, algorithm, stage, params,
523530
data[i], alg_instance, alg_params)
524531
result.update({'time[s]': times[i]})
525-
if metric_type is not None:
526-
if isinstance(metric_type, str):
527-
result.update({f'{metric_type}': metrics[i]})
528-
elif isinstance(metric_type, list):
529-
for ind, val in enumerate(metric_type):
530-
if metrics[ind][i] is not None:
531-
result.update({f'{val}': metrics[ind][i]})
532+
if isinstance(metric_type, str):
533+
result.update({f'{metric_type}': metrics[i]})
534+
elif isinstance(metric_type, list):
535+
for ind, val in enumerate(metric_type):
536+
if metrics[ind][i] is not None:
537+
result.update({f'{val}': metrics[ind][i]})
532538
if hasattr(params, 'n_classes'):
533539
result['input_data'].update({'classes': params.n_classes})
534540
if hasattr(params, 'n_clusters'):
@@ -542,8 +548,7 @@ def print_output(library, algorithm, stages, params, functions,
542548
if 'init' in result['algorithm_parameters'].keys():
543549
if not isinstance(result['algorithm_parameters']['init'], str):
544550
result['algorithm_parameters']['init'] = 'random'
545-
if 'handle' in result['algorithm_parameters'].keys():
546-
del result['algorithm_parameters']['handle']
551+
result['algorithm_parameters'].pop('handle', None)
547552
output.append(result)
548553
print(json.dumps(output, indent=4))
549554

daal4py_bench/distances.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import argparse
1818

1919
import bench
20-
import daal4py
20+
from daal4py import cosine_distance, correlation_distance
2121
from daal4py.sklearn._utils import getFPType
2222

2323

@@ -34,9 +34,10 @@ def compute_distances(pairwise_distances, X):
3434
params = bench.parse_args(parser)
3535

3636
# Load data
37-
X, _, _, _ = bench.load_data(params, generated_data=['X_train'], add_dtype=True)
37+
X, _, _, _ = bench.load_data(params, generated_data=[
38+
'X_train'], add_dtype=True)
3839

39-
pairwise_distances = getattr(daal4py, f'{params.metric}_distance')
40+
pairwise_distances = cosine_distance if params.metric == 'cosine' else correlation_distance
4041

4142
time, _ = bench.measure_function_time(
4243
compute_distances, pairwise_distances, X, params=params)

daal4py_bench/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ pandas < 1.3.0
33
daal4py
44
openpyxl
55
tqdm
6+
requests

datasets/loader_utils.py

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -15,29 +15,35 @@
1515
# ===============================================================================
1616

1717
import re
18-
from urllib.request import urlretrieve
19-
18+
import requests
19+
import os
20+
from shutil import copyfile
2021
import numpy as np
21-
import tqdm
22-
23-
pbar: tqdm.tqdm = None
24-
25-
26-
def _show_progress(block_num: int, block_size: int, total_size: int) -> None:
27-
global pbar
28-
if pbar is None:
29-
pbar = tqdm.tqdm(total=total_size / 1024, unit='kB')
30-
31-
downloaded = block_num * block_size
32-
if downloaded < total_size:
33-
pbar.update(block_size / 1024)
34-
else:
35-
pbar.close()
36-
pbar = None
22+
from tqdm import tqdm
3723

3824

3925
def retrieve(url: str, filename: str) -> None:
40-
urlretrieve(url, filename, reporthook=_show_progress)
26+
# rewritting urlretrieve without using urllib library,
27+
# otherwise it would fail codefactor test due to security issues.
28+
if os.path.isfile(url):
29+
# reporthook is ignored for local urls
30+
copyfile(url, filename)
31+
elif url.startswith('http'):
32+
response = requests.get(url, stream=True)
33+
if response.status_code != 200:
34+
raise AssertionError(f"Failed to download from {url},\n" +
35+
"Response returned status code {response.status_code}")
36+
total_size = int(response.headers.get('content-length', 0))
37+
block_size = 8192
38+
pbar = tqdm(total=total_size/1024, unit='kB')
39+
with open(filename, 'wb+') as file:
40+
for data in response.iter_content(block_size):
41+
pbar.update(len(data)/1024)
42+
file.write(data)
43+
pbar.close()
44+
if total_size != 0 and pbar.n != total_size/1024:
45+
raise AssertionError(
46+
"Some content was present but not downloaded/written")
4147

4248

4349
def read_libsvm_msrank(file_obj, n_samples, n_features, dtype):

runner.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,8 @@ def get_configs(path: Path) -> List[str]:
120120
if 'testing' in dataset:
121121
paths += ' --file-X-test ' + dataset["testing"]["x"]
122122
if 'y' in dataset['testing']:
123-
paths += ' --file-y-test ' + dataset["testing"]["y"]
123+
paths += ' --file-y-test ' + \
124+
dataset["testing"]["y"]
124125
elif dataset['source'] == 'synthetic':
125126
class GenerationArgs:
126127
classes: int
@@ -214,14 +215,17 @@ class GenerationArgs:
214215
+ f'{extra_stdout}\n'
215216
try:
216217
if isinstance(json_result['results'], list):
217-
json_result['results'].extend(json.loads(stdout))
218+
json_result['results'].extend(
219+
json.loads(stdout))
218220
except json.JSONDecodeError as decoding_exception:
219221
stderr += f'CASE {case} JSON DECODING ERROR:\n' \
220222
+ f'{decoding_exception}\n{stdout}\n'
221223

222224
if stderr != '':
223-
is_successful = False
224-
logging.warning('Error in benchmark: \n' + stderr)
225+
if 'daal4py' not in stderr:
226+
is_successful = False
227+
logging.warning(
228+
'Error in benchmark: \n' + stderr)
225229

226230
json.dump(json_result, args.output_file, indent=4)
227231
name_result_file = args.output_file.name

sklearn_bench/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ pandas
33
scikit-learn-intelex
44
openpyxl
55
tqdm
6+
requests

sklearn_bench/tsne.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@
1414
# limitations under the License.
1515
# ===============================================================================
1616

17-
import argparse
1817
import bench
18+
import argparse
19+
import warnings
20+
warnings.simplefilter(action='ignore', category=FutureWarning)
1921

2022

2123
def main():

utils.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,11 +175,12 @@ def generate_cases(params: Dict[str, Union[List[Any], Any]]) -> List[str]:
175175
commands *= len(values)
176176
dashes = '-' if len(param) == 1 else '--'
177177
for command_num in range(prev_len):
178-
for value_num in range(len(values)):
179-
commands[prev_len * value_num + command_num] += ' ' + \
180-
dashes + param + ' ' + str(values[value_num])
178+
for idx, val in enumerate(values):
179+
commands[prev_len * idx + command_num] += ' ' + \
180+
dashes + param + ' ' + str(val)
181181
else:
182182
dashes = '-' if len(param) == 1 else '--'
183-
for command_num in range(len(commands)):
184-
commands[command_num] += ' ' + dashes + param + ' ' + str(values)
183+
for command_num, _ in enumerate(commands):
184+
commands[command_num] += ' ' + \
185+
dashes + param + ' ' + str(values)
185186
return commands

xgboost_bench/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@ pandas
33
xgboost
44
openpyxl
55
tqdm
6+
requests

0 commit comments

Comments
 (0)