Description
Describe the bug
Running Spleen Seg example app in JupyterLab, built the MONAI App Package Docker, and the MAP docker fails when running on the system with RTX RTX A6000 GPU with Ampere architecture.
Steps/Code to reproduce bug
- On a system with Ampere GPU
- Run the 5th MONAI Deploy tutorial in Jupyter Notebook in JupyterLab (with Python >= 3.7 as it is another known issue that's fixed)
- See the error
app = AISpleenSegApp()
# app.run(input="dcm", output="output", model="model.ts")
app.run(input="/home/hju/monai-deploy-app-sdk/dcm", output="/home/hju/monai-deploy-app-sdk/output", model="/home/hju/monai-deploy-app-sdk/model.ts")
Going to initiate execution of operator DICOMDataLoaderOperator
Executing operator DICOMDataLoaderOperator (Process ID: 6802, Operator ID: cec9734d-e25e-4f14-b6f7-66bfd8f9b730)
[2022-01-21 20:34:04,558] [WARNING] (root) - No selection rules given; select all series.
[2022-01-21 20:34:04,558] [INFO] (root) - Working on study, instance UID: 1.2.826.0.1.3680043.2.1125.1.67295333199898911264201812221946213
[2022-01-21 20:34:04,559] [INFO] (root) - Working on series, instance UID: 1.2.826.0.1.3680043.2.1125.1.68102559796966796813942775094416763
Done performing execution of operator DICOMDataLoaderOperator
Going to initiate execution of operator DICOMSeriesSelectorOperator
Executing operator DICOMSeriesSelectorOperator (Process ID: 6802, Operator ID: 3445787f-7fed-4d96-84f7-7084edd57123)
Working on study, instance UID: 1.2.826.0.1.3680043.2.1125.1.67295333199898911264201812221946213
Working on series, instance UID: 1.2.826.0.1.3680043.2.1125.1.68102559796966796813942775094416763
Done performing execution of operator DICOMSeriesSelectorOperator
Going to initiate execution of operator DICOMSeriesToVolumeOperator
Executing operator DICOMSeriesToVolumeOperator (Process ID: 6802, Operator ID: 66f4d414-263c-4f79-9787-aff103886c7d)
Done performing execution of operator DICOMSeriesToVolumeOperator
Going to initiate execution of operator SpleenSegOperator
Executing operator SpleenSegOperator (Process ID: 6802, Operator ID: 5dc2958a-367d-4cc9-9d61-a20ce1c4f2d9)
Converted Image object metadata:
SeriesInstanceUID: 1.2.826.0.1.3680043.2.1125.1.68102559796966796813942775094416763, type <class 'str'>
Modality: CT, type <class 'str'>
SeriesDescription: No series description, type <class 'str'>
PatientPosition: HFS, type <class 'str'>
SeriesNumber: 1, type <class 'int'>
row_pixel_spacing: 1.0, type <class 'float'>
col_pixel_spacing: 1.0, type <class 'float'>
depth_pixel_spacing: 1.0, type <class 'float'>
row_direction_cosine: [-1.0, 0.0, 0.0], type <class 'list'>
col_direction_cosine: [0.0, -1.0, 0.0], type <class 'list'>
depth_direction_cosine: [0.0, 0.0, 1.0], type <class 'list'>
dicom_affine_transform: [[-1. 0. 0. 0.]
[ 0. -1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]], type <class 'numpy.ndarray'>
nifti_affine_transform: [[ 1. -0. -0. -0.]
[-0. 1. -0. -0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]], type <class 'numpy.ndarray'>
StudyInstanceUID: 1.2.826.0.1.3680043.2.1125.1.67295333199898911264201812221946213, type <class 'str'>
StudyID: SLICER10001, type <class 'str'>
StudyDate: 2019-09-16, type <class 'str'>
StudyTime: 010100.000000, type <class 'str'>
StudyDescription: spleen, type <class 'str'>
AccessionNumber: 1, type <class 'str'>
selection_name: 1.2.826.0.1.3680043.2.1125.1.68102559796966796813942775094416763, type <class 'str'>
/home/hju/anaconda3/envs/monai/lib/python3.7/site-packages/torch/cuda/__init__.py:143: UserWarning:
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_6802/3567561955.py in <module>
2
3 # app.run(input="dcm", output="output", model="model.ts")
----> 4 app.run(input="/home/hju/monai-deploy-app-sdk/dcm", output="/home/hju/monai-deploy-app-sdk/output", model="/home/hju/monai-deploy-app-sdk/model.ts")
~/anaconda3/envs/monai/lib/python3.7/site-packages/monai/deploy/core/application.py in run(self, log_level, input, output, model, workdir, datastore, executor)
427 datastore_obj = DatastoreFactory.create(app_context.datastore)
428 executor_obj = ExecutorFactory.create(app_context.executor, {"app": self, "datastore": datastore_obj})
--> 429 executor_obj.run()
430
431 @abstractmethod
~/anaconda3/envs/monai/lib/python3.7/site-packages/monai/deploy/core/executors/single_process_executor.py in run(self)
123 + Fore.RESET
124 )
--> 125 op.compute(op_exec_context.input_context, op_exec_context.output_context, op_exec_context)
126
127 # Execute post_compute()
/tmp/ipykernel_6802/1686202219.py in compute(self, op_input, op_output, context)
42
43 # Now let the built-in operator handles the work with the I/O spec and execution context.
---> 44 infer_operator.compute(op_input, op_output, context)
45
46 def pre_process(self, img_reader) -> Compose:
~/anaconda3/envs/monai/lib/python3.7/site-packages/monai/deploy/operators/monai_seg_inference_operator.py in compute(self, op_input, op_output, context)
220 sw_batch_size=sw_batch_size,
221 overlap=self.overlap,
--> 222 predictor=model,
223 )
224 d = [post_transforms(i) for i in decollate_batch(d)]
~/anaconda3/envs/monai/lib/python3.7/site-packages/monai/inferers/utils.py in sliding_window_inference(inputs, roi_size, sw_batch_size, predictor, overlap, mode, sigma_scale, padding_mode, cval, sw_device, device, *args, **kwargs)
115 # Create window-level importance map
116 importance_map = compute_importance_map(
--> 117 get_valid_patch_size(image_size, roi_size), mode=mode, sigma_scale=sigma_scale, device=device
118 )
119
~/anaconda3/envs/monai/lib/python3.7/site-packages/monai/data/utils.py in compute_importance_map(patch_size, mode, sigma_scale, device)
761 device = torch.device(device) # type: ignore[arg-type]
762 if mode == BlendMode.CONSTANT:
--> 763 importance_map = torch.ones(patch_size, device=device).float()
764 elif mode == BlendMode.GAUSSIAN:
765 center_coords = [i // 2 for i in patch_size]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Expected behavior
The example should work on system with Ampere GPU
Environment details (please complete the following information)
- OS/Platform: Ubuntu 20.04
- Python Version: 3.7
- Method of MONAI Deploy App SDK install: Jupyter Notebook in JupyterLab
- SDK Version: v0.2
Additional context
It is known that since v1.8, torch needs to pip installed with a command targeting a CUDA version. The App SDK packager uses a NVIDIA PyTorch base image, which has the torch pre-installed with CUDA version consideration, though the App SDK packager may not process the torch dependency from the app correctly or ensure torch is still properly installed after the MAP Docker image is built.
Note, the Spleen App does not pin the version of torch, e.g.
@md.env(pip_packages=["monai==0.6.0", "torch>=1.5", "numpy>=1.20", "nibabel"])