required rank 4 tensor to use channels_last format

Hi, I am Nathan.
These day I tried to make inference based on libtorch(c++) speed up by using intel-extension
I am using wenet-e2e toolkit for speech recognition system.

As follow your Release(intel-extension) for 1.11.200
I applied run file to orginal libtorch_1.11.0_cpu as 
"bash libintel-ext-pt-shared-with-deps-1.11.200+cpu.run   install   workspace/libtorch_cpu/"

**And there are two things i did.
First, (in python3.9 and torch1.10(train)).** 
I think libtorch(intel-extension) is ready for running on c++
I also Prepare torch-script model  like 
"script_model = torch.jit.script(model)"
"script_model.save(args.output_file)"

**Second,(in c++)**
torch::Tensor feats =torch::zeros({1, num_frames, feature_dim}, torch::kFloat);
feats = feats.to(c10::MemoryFormat::ChannelsLast); //for intel-extension

**Finally, I build run file(c++)** 
check for linking lib
$ldd e2e-intel-ext | grep torch
        libintel-ext-pt-cpu.so => ../e2edecoder/libtorch/lib/libintel-ext-pt-cpu.so (0x00007f638f59f000)
        libtorch_cpu.so => ../e2edecoder/libtorch/lib/libtorch_cpu.so (0x00007f63789b6000)
        libc10.so => ../e2edecoder/libtorch/lib/libc10.so (0x00007f6392bff000)
        libgomp-a34b3233.so.1 => ../e2edecoder/libtorch/lib/libgomp-a34b3233.so.1 (0x00007f6374da8000)

**When i run binary-file and get error** 
required rank 4 tensor to use channels_last format
Exception raised from empty_tensor_restride at ../c10/core/TensorImpl.h:2145 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa7f4abbf72 in ../e2edecoder/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x5f (0x7fa7f4ab86bf in ../e2edecoder/libtorch/lib/libc10.so)
frame #2: <unknown function> + 0x106709f (0x7fa7db89d09f in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #3: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x80e (0x7fa7db899ffe in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #4: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x41 (0x7fa7db89aa61 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x34 (0x7fa7db89aab4 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #6: at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1e (0x7fa7dbd6e01e in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x1c2577a (0x7fa7dc45b77a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x12c (0x7fa7dc21cfdc in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x1c0c2af (0x7fa7dc4422af in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #10: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1e0 (0x7fa7dc257e30 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #11: at::native::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x582 (0x7fa7dbd6a632 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x1d2758a (0x7fa7dc55d58a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x139 (0x7fa7dc024229 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x1c0c6c7 (0x7fa7dc4426c7 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #15: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x139 (0x7fa7dc024229 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x28405a6 (0x7fa7dd0765a6 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x2840aed (0x7fa7dd076aed in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #18: at::_ops::_to_copy::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x1aa (0x7fa7dc091b7a in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #19: at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x112 (0x7fa7dbd66702 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #20: <unknown function> + 0x1dbdf30 (0x7fa7dc5f3f30 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #21: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x1c1 (0x7fa7dc184141 in ../e2edecoder/libtorch/lib/libtorch_cpu.so)
frame #22: <unknown function> + 0x14d50a (0x7fa7f433850a in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #23: wenet::TorchAsrDecoder::AdvanceDecoding() + 0x44d (0x7fa7f433ae3d in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #24: E2EInference::E2EInfer(short const*, int, bool) + 0x31b (0x7fa7f434658b in ../e2edecoderbin/../lib/libe2e-core-cpu.so)
frame #25: ./e2e-intel-ext() [0x41980b]
frame #26: __libc_start_main + 0xf5 (0x7fa7d9a4f555 in /lib64/libc.so.6)
frame #27: ./e2e-intel-ext() [0x41beff]



1. Could you please look at my error message? What is a problem in my case.
2. Training torch and libtorch version must be same?
3. Before saving torch-script model, do i have to do as follow
    model = model.to(memory_format=torch.channels_last)   #in python
    model = ipex.optimize(model)
4. Is this only for batch type inference? 

Thank you for reading mine.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

required rank 4 tensor to use channels_last format #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

required rank 4 tensor to use channels_last format #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions