Speedup DenseOps compilation #25

pinzhenx · 2020-05-25T06:23:37Z

This PR follows the same approach as what pytorch has done for VariableType.cpp, to split the all-in-one DenseOps.cpp into 8 shards, called DenseOps_0.cpp ... DenseOps_7.cpp.

We've removed RegisterIpexDenseOPs() and all occurrences of _initialize_aten_bindings(), since we could register ops in global space while keep its internal linkage. This way, we don't have to worry about multiple calls to the RegisterIpexDenseOPs.

Before:

void RegisterIpexDenseOPs() {
  static auto dispatch = torch::RegisterOperators()
    .op(torch::RegisterOperators::options().schema("aten::__and__.Scalar(Tensor self, Scalar other) -> Tensor")
      .impl_unboxedOnlyKernel<at::Tensor(const at::Tensor &, at::Scalar), &AtenIpexCPUDefault::__and__>(at::DispatchKey::DPCPPTensorId)
      .aliasAnalysis(c10::AliasAnalysisKind::FROM_SCHEMA));
}

After:

namespace {
  static auto dispatch = torch::RegisterOperators()
    .op(torch::RegisterOperators::options().schema("aten::__and__.Tensor(Tensor self, Tensor other) -> Tensor")
      .impl_unboxedOnlyKernel<at::Tensor(const at::Tensor &, const at::Tensor &), &AtenIpexCPUDefault::__and__>(at::DispatchKey::DPCPPTensorId)
      .aliasAnalysis(c10::AliasAnalysisKind::FROM_SCHEMA));
}

After checkout this branch, you have to run python setup.py clean first.

pinzhenx · 2020-05-25T06:29:10Z

@EikanWang

hongzhen1 · 2020-05-26T04:50:38Z

intel_pytorch_extension_py/__init__.py

-import _torch_ipex as core
-
-core._initialize_aten_bindings()
+import _torch_ipex as core


better integrate these two packages into one, so that end user only needs to import one package.

* rm usages of _initialize_aten_bindings * allow user to build with ninja * skip writing unchanged file * move gen_code out of cleaning procedure * split denseops translation unit

enable vectorized nms_kernel

pinzhenx marked this pull request as draft May 25, 2020 07:26

pinzhenx marked this pull request as ready for review May 25, 2020 07:47

pinzhenx force-pushed the speedup branch 3 times, most recently from e560e6b to 7a37427 Compare May 26, 2020 01:58

EikanWang merged commit 101fb32 into intel:master May 26, 2020

hongzhen1 reviewed May 26, 2020

View reviewed changes

pinzhenx added 5 commits May 26, 2020 09:41

split denseops translation unit

1f14c71

rm usages of _initialize_aten_bindings

5a0db22

allow user to build with ninja

05290cd

skip writing unchanged file

ecafb3d

move gen_code out of cleaning procedure

7a37427

EikanWang pushed a commit that referenced this pull request Oct 4, 2021

nms kernel optimization (#25)

d3990d4

enable vectorized nms_kernel

NathanJHLee mentioned this pull request Jul 7, 2022

required rank 4 tensor to use channels_last format #234

Open

Steve-Tech mentioned this pull request Aug 6, 2023

RuntimeError: Number of dpcpp devices should be greater than zero! #287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup DenseOps compilation #25

Speedup DenseOps compilation #25

Uh oh!

pinzhenx commented May 25, 2020 •

edited

Loading

Uh oh!

pinzhenx commented May 25, 2020

Uh oh!

hongzhen1 May 26, 2020

Uh oh!

Uh oh!

Speedup DenseOps compilation #25

Speedup DenseOps compilation #25

Uh oh!

Conversation

pinzhenx commented May 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pinzhenx commented May 25, 2020

Uh oh!

hongzhen1 May 26, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pinzhenx commented May 25, 2020 •

edited

Loading