Skip to content

Force to remap to CPU for TorchScript model if GPUdoesn't exist #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 24, 2021

Conversation

gigony
Copy link
Collaborator

@gigony gigony commented Oct 9, 2021

The TorchScript model file is loaded to the device where the model is trained (torch.jit.load).

For that reason, if the model is trained/saved by GPU and loaded by the system without GPU, it causes errors because the model cannot be loaded.

self._predictor = torch.jit.load(self.path).eval()

  File "/Users/Erik/opt/anaconda3/lib/python3.8/site-packages/monai/deploy/core/models/model.py", line 220, in __call__
    if self.predictor:
  File "/Users/Erik/opt/anaconda3/lib/python3.8/site-packages/monai/deploy/core/models/torch_model.py", line 50, in predictor
    self._predictor = torch.jit.load(self.path).eval()
  File "/Users/Erik/opt/anaconda3/lib/python3.8/site-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. 
This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). 
If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 
'aten::empty_strided' is only available for these backends: [CPU, Meta, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

This patch resolves the issue by remapping the storage to CPU if GPU is not available.

Resolve #166 that is from #156.

A test package is available by

pip install -i https://test.pypi.org/simple/ monai-deploy-app-sdk==0.0.167

Verified locally by disabling GPU

export CUDA_VISIBLE_DEVICES=""
python examples/apps/mednist_classifier_monaideploy/mednist_classifier_monaideploy.py -i input/AbdomenCT_007000.jpeg -o output -m classifier.zip

python examples/apps/ai_spleen_seg_app/app.py -i dcm/ -o output -m model.pt

@gigony gigony added the bug Something isn't working label Oct 9, 2021
@gigony gigony self-assigned this Oct 9, 2021
@gigony gigony requested a review from MMelQin October 9, 2021 06:21
Copy link
Collaborator

@MMelQin MMelQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR address the issue of jit.load should target available device (GPU preferred but not forced in not present). It, however, does not, or have not proved to, address the issue of force-load model traced with GPU onto CPU.

@@ -47,7 +47,9 @@ def predictor(self) -> "torch.nn.Module": # type: ignore
torch.nn.Module: the model's predictor
"""
if self._predictor is None:
self._predictor = torch.jit.load(self.path).eval()
# If GPU is not available, explicitly use "cpu" to dynamically remap storages to CPU.
map_location = None if torch.cuda.is_available() else "cpu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why not using torch device, to be explicit here and for clarity (https://pytorch.org/docs/master/generated/torch.jit.load.html#torch.jit.load)?

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

I am afraid the behavior is different (map_location needs to be None, not torch.device("cuda")).

We wanted to remap to 'cpu' only if cuda is not available. If cuda is available, we wouldn't want to remap location, and use the device specified in the model.

device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
self._predictor = torch.jit.load(self.path, map_location=device).eval()

vs

map_location = None if torch.cuda.is_available() else "cpu"
self._predictor = torch.jit.load(self.path, map_location=map_location).eval()

However, on second thought, it would be better to force remap, depending on cuda availability (as operator/application's code also usually use the condition to determine which device to use.

So, will fix as suggested

@@ -165,7 +165,8 @@ def compute(self, op_input: InputContext, op_output: OutputContext, context: Exe
model = context.models.get()
else:
print(f"Loading TorchScript model from: {MonaiSegInferenceOperator.MODEL_LOCAL_PATH}")
model = torch.jit.load(MonaiSegInferenceOperator.MODEL_LOCAL_PATH)
map_location = None if torch.cuda.is_available() else "cpu"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to use the device which is instantiated in this block of code. Also, this else block is a fallback when the SDK model loader fails to load the model, the original line #172 below.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Also force a model traced on GPU to CPU will most likely fail, so, this is only shifting the failure points.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ming!
Updated to use device variable

@@ -432,12 +432,10 @@
" image_tensor = self.transform(img) # (1, 64, 64), torch.float64\n",
" image_tensor = image_tensor[None].float() # (1, 1, 64, 64), torch.float32\n",
"\n",
" # Comment below line if you want to do CPU inference\n",
" image_tensor = image_tensor.cuda()\n",
" if torch.cuda.is_available():\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional logic is repeated multiple times. Wonder why not using instantiating a device, the tensor.to(device)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated existing code/tutorial.
Thank you for the suggestion!

Signed-off-by: Gigon Bae <gbae@nvidia.com>
@gigony gigony force-pushed the force_cpu_torchscript branch from d81de7a to 14c1aae Compare November 23, 2021 19:50
@gigony gigony merged commit 2f83bc1 into main Nov 24, 2021
@gigony gigony mentioned this pull request Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ai_spleen_seg_app and mednist_classifier_monaideploy example doesn't work on CPU only system
2 participants