Closed
Description
After merging #1446, dpt.tensor.sum
became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).
Before the PR:
import dpctl, dpctl.tensor as dpt, numpy
dpctl.__version__
# Out: '0.15.1dev0+62.g2eba93eac'
sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)
%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.67 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.64 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The new times:
import dpctl, dpctl.tensor as dpt, numpy
dpctl.__version__
# Out: '0.15.1dev0+63.g03fd73794'
sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)
%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Devices info:
$ python -m dpctl -f
Platform 0 ::
Name Intel(R) OpenCL
Version OpenCL 3.0 LINUX
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Xeon(R) Platinum 8469 CPU @2.00GHz
Version 2023.16.6.0.22_223734
Filter string opencl:cpu:0
Platform 1 ::
Name Intel(R) OpenCL Graphics
Version OpenCL 3.0
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Data Center GPU Max 1100
Version 23.35.27191.25
Filter string opencl:gpu:0
Platform 2 ::
Name Intel(R) FPGA Emulation Platform for OpenCL(TM)
Version OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) FPGA Emulation Device
Version 2023.16.6.0.22_223734
Filter string opencl:accelerator:0
Platform 3 ::
Name Intel(R) Level-Zero
Version 1.3
Vendor Intel(R) Corporation
Backend ext_oneapi_level_zero
Num Devices 1
# 0
Name Intel(R) Data Center GPU Max 1100
Version 1.3.27191
Filter string level_zero:gpu:0
Host info:
$ uname -a
Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Metadata
Metadata
Assignees
Labels
No labels