Description
Hi, I am Nathan.
These day I tried to make inference based on libtorch(c++) speed up by using intel-extension
I am using wenet-e2e toolkit for speech recognition system.
As follow your Release(intel-extension) for 1.11.200
I applied run file to orginal libtorch_1.11.0_cpu as
"bash libintel-ext-pt-shared-with-deps-1.11.200+cpu.run install workspace/libtorch_cpu/"
And there are two things i did.
First, (in python3.9 and torch1.10(train)).
I think libtorch(intel-extension) is ready for running on c++
I also Prepare torch-script model like
"script_model = torch.jit.script(model)"
"script_model.save(args.output_file)"
Second,(in c++)
torch::Tensor feats =torch::zeros({1, num_frames, feature_dim}, torch::kFloat);
feats = feats.to(c10::MemoryFormat::ChannelsLast); //for intel-extension
Finally, I build run file(c++)
check for linking lib
$ldd e2e-intel-ext | grep torch
libintel-ext-pt-cpu.so => ../e2edecoder/libtorch/lib/libintel-ext-pt-cpu.so (0x00007f638f59f000)
libtorch_cpu.so => ../e2edecoder/libtorch/lib/libtorch_cpu.so (0x00007f63789b6000)
libc10.so => ../e2edecoder/libtorch/lib/libc10.so (0x00007f6392bff000)
libgomp-a34b3233.so.1 => ../e2edecoder/libtorch/lib/libgomp-a34b3233.so.1 (0x00007f6374da8000)
When i run binary-file and get error
required rank 4 tensor to use channels_last format
Exception raised from empty_tensor_restride at ../c10/core/TensorImpl.h:2145 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa7f4abbf72 in ../e2edecoder/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x5f (0x7fa7f4ab86bf in ../e2edecoder/libtorch/lib/libc10.so)
frame #2: + 0x106709f (0x7fa7db89d09f in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #3: at::detail::empty_generic(c10::ArrayRef, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optionalc10::MemoryFormat) + 0x80e (0x7fa7db899ffe in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef, c10::ScalarType, bool, c10::optionalc10::MemoryFormat) + 0x41 (0x7fa7db89aa61 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x34 (0x7fa7db89aab4 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1e (0x7fa7dbd6e01e in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #7: + 0x1c2577a (0x7fa7dc45b77a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x12c (0x7fa7dc21cfdc in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #9: + 0x1c0c2af (0x7fa7dc4422af in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1e0 (0x7fa7dc257e30 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #11: at::native::_to_copy(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x582 (0x7fa7dbd6a632 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #12: + 0x1d2758a (0x7fa7dc55d58a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x139 (0x7fa7dc024229 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #14: + 0x1c0c6c7 (0x7fa7dc4426c7 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #15: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x139 (0x7fa7dc024229 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #16: + 0x28405a6 (0x7fa7dd0765a6 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #17: + 0x2840aed (0x7fa7dd076aed in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #18: at::_ops::_to_copy::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, c10::optionalc10::MemoryFormat) + 0x1aa (0x7fa7dc091b7a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #19: at::native::to(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, bool, c10::optionalc10::MemoryFormat) + 0x112 (0x7fa7dbd66702 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #20: + 0x1dbdf30 (0x7fa7dc5f3f30 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #21: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, bool, bool, c10::optionalc10::MemoryFormat) + 0x1c1 (0x7fa7dc184141 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #22: + 0x14d50a (0x7fa7f433850a in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #23: wenet::TorchAsrDecoder::AdvanceDecoding() + 0x44d (0x7fa7f433ae3d in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #24: E2EInference::E2EInfer(short const*, int, bool) + 0x31b (0x7fa7f434658b in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #25: ./e2e-intel-ext() [0x41980b]
frame #26: __libc_start_main + 0xf5 (0x7fa7d9a4f555 in /lib64/libc.so.6)
frame #27: ./e2e-intel-ext() [0x41beff]
- Could you please look at my error message? What is a problem in my case.
- Training torch and libtorch version must be same?
- Before saving torch-script model, do i have to do as follow
model = model.to(memory_format=torch.channels_last) #in python
model = ipex.optimize(model) - Is this only for batch type inference?
Thank you for reading mine.