Skip to content

Sensible performance degradation in dpt.tensor.sum #1461

Closed
@antonwolfy

Description

@antonwolfy

After merging #1446, dpt.tensor.sum became significantly slow (observing when running L2-norm benchmark for dpnp on PVC).
Before the PR:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+62.g2eba93eac'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.67 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 6.64 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The new times:

import dpctl, dpctl.tensor as dpt, numpy

dpctl.__version__
# Out: '0.15.1dev0+63.g03fd73794'

sh = (134217728, 3)
dt = numpy.float32
a = dpt.ones(sh, dtype=dt)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit _ = dpt.sum(a, axis=1, dtype=dt)
# 2.35 s ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Devices info:

$ python -m dpctl -f
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 LINUX
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Xeon(R) Platinum 8469 CPU @2.00GHz
        Version             2023.16.6.0.22_223734
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        Intel(R) OpenCL Graphics
    Version     OpenCL 3.0
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             23.35.27191.25
        Filter string       opencl:gpu:0
Platform  2 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2023.16.6.0.22_223734
        Filter string       opencl:accelerator:0
Platform  3 ::
    Name        Intel(R) Level-Zero
    Version     1.3
    Vendor      Intel(R) Corporation
    Backend     ext_oneapi_level_zero
    Num Devices 1
      # 0
        Name                Intel(R) Data Center GPU Max 1100
        Version             1.3.27191
        Filter string       level_zero:gpu:0

Host info:

$ uname -a
Linux DUT7050PVC 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions