Skip to content

Commit 39cc4f2

Browse files
committed
Added comment for sample_shift.
1 parent 2ae3c39 commit 39cc4f2

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

sklbench/datasets/transformer.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,10 @@ def split_and_transform_data(bench_case, data, data_description):
118118
from mpi4py import MPI
119119

120120
rank = MPI.COMM_WORLD.Get_rank()
121+
# This approach was chosen to shift the distribution of synthetic data on each rank
122+
# for KMeans weak scaling tests. When testing with a large number of tiles, this method avoids duplication of data on each rank.
123+
# For example, if there are 24,576 tiles being used, each data point in the 24,576th tile would be multiplied by 1.47.
124+
# The factor 0.003 was chosen arbitrarily and can be fine-tuned for other datasets and algorithms if needed.
121125
adjust_number = (math.sqrt(rank) * 0.003) + 1
122126
x_test = x_test * adjust_number
123127

0 commit comments

Comments
 (0)