Open
Description
[2025-05-06T16:47:27.159Z] Running ./modules/torch_compile.ipynb
[2025-05-06T16:47:27.159Z] Checking PEP8 compliance...
[2025-05-06T16:47:27.719Z] Running notebook...
[2025-05-06T16:47:34.247Z] Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
[2025-05-06T16:47:37.507Z] MONAI version: 1.4.1rc1+46.gb58e883c
[2025-05-06T16:47:37.507Z] Numpy version: 1.26.4
[2025-05-06T16:47:37.507Z] Pytorch version: 2.7.0+cu126
[2025-05-06T16:47:37.507Z] MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
[2025-05-06T16:47:37.507Z] MONAI rev id: b58e883c887e0f99d382807550654c44d94f47bd
[2025-05-06T16:47:37.507Z] MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
[2025-05-06T16:47:37.507Z]
[2025-05-06T16:47:37.507Z] Optional dependencies:
[2025-05-06T16:47:38.067Z] Pytorch Ignite version: 0.4.11
[2025-05-06T16:47:38.067Z] ITK version: 5.4.3
[2025-05-06T16:47:38.067Z] Nibabel version: 5.3.2
[2025-05-06T16:47:38.067Z] scikit-image version: 0.19.3
[2025-05-06T16:47:38.067Z] scipy version: 1.14.0
[2025-05-06T16:47:38.067Z] Pillow version: 7.0.0
[2025-05-06T16:47:38.067Z] Tensorboard version: 2.16.2
[2025-05-06T16:47:38.067Z] gdown version: 5.2.0
[2025-05-06T16:47:38.067Z] TorchVision version: 0.22.0+cu126
[2025-05-06T16:47:38.067Z] tqdm version: 4.66.5
[2025-05-06T16:47:38.067Z] lmdb version: 1.6.2
[2025-05-06T16:47:38.067Z] psutil version: 6.0.0
[2025-05-06T16:47:38.067Z] pandas version: 2.2.2
[2025-05-06T16:47:38.067Z] einops version: 0.8.0
[2025-05-06T16:47:38.067Z] transformers version: 4.40.2
[2025-05-06T16:47:38.067Z] mlflow version: 2.22.0
[2025-05-06T16:47:38.067Z] pynrrd version: 1.1.3
[2025-05-06T16:47:38.067Z] clearml version: 2.0.0rc0
[2025-05-06T16:47:38.067Z]
[2025-05-06T16:47:38.067Z] For details about installing the optional dependencies, please visit:
[2025-05-06T16:47:38.067Z] https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
[2025-05-06T16:47:38.067Z]
[2025-05-06T16:47:41.332Z] papermill --progress-bar --log-output -k python3
[2025-05-06T16:47:41.332Z] /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:149: UserWarning: the file is not specified with any extension : -
[2025-05-06T16:47:41.332Z] warnings.warn(f"the file is not specified with any extension : {os.path.basename(path)}")
[2025-05-06T16:55:02.641Z]
Executing: 0%| | 0/32 [00:00<?, ?cell/s]
Executing: 3%|▎ | 1/32 [00:00<00:29, 1.04cell/s]
Executing: 12%|█▎ | 4/32 [00:12<01:33, 3.33s/cell]
Executing: 19%|█▉ | 6/32 [00:22<01:44, 4.03s/cell]
Executing: 31%|███▏ | 10/32 [00:26<00:52, 2.38s/cell]
Executing: 50%|█████ | 16/32 [00:28<00:19, 1.23s/cell]
Executing: 56%|█████▋ | 18/32 [00:35<00:23, 1.71s/cell]
Executing: 62%|██████▎ | 20/32 [07:07<09:10, 45.85s/cell]
Executing: 69%|██████▉ | 22/32 [07:08<05:46, 34.69s/cell]
Executing: 72%|███████▏ | 23/32 [07:18<04:38, 30.98s/cell]
Executing: 72%|███████▏ | 23/32 [07:20<02:52, 19.17s/cell]
[2025-05-06T16:55:02.641Z] /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:149: UserWarning: the file is not specified with any extension : -
[2025-05-06T16:55:02.641Z] warnings.warn(f"the file is not specified with any extension : {os.path.basename(path)}")
[2025-05-06T16:55:02.641Z] Traceback (most recent call last):
[2025-05-06T16:55:02.641Z] File "/usr/local/bin/papermill", line 8, in <module>
[2025-05-06T16:55:02.641Z] sys.exit(papermill())
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
[2025-05-06T16:55:02.641Z] return self.main(*args, **kwargs)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
[2025-05-06T16:55:02.641Z] rv = self.invoke(ctx)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
[2025-05-06T16:55:02.641Z] return ctx.invoke(self.callback, **ctx.params)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
[2025-05-06T16:55:02.641Z] return __callback(*args, **kwargs)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
[2025-05-06T16:55:02.641Z] return f(get_current_context(), *args, **kwargs)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/papermill/cli.py", line 235, in papermill
[2025-05-06T16:55:02.641Z] execute_notebook(
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 131, in execute_notebook
[2025-05-06T16:55:02.641Z] raise_for_execution_errors(nb, output_path)
[2025-05-06T16:55:02.641Z] File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 251, in raise_for_execution_errors
[2025-05-06T16:55:02.641Z] raise error
[2025-05-06T16:55:02.641Z] papermill.exceptions.PapermillExecutionError:
[2025-05-06T16:55:02.641Z] ---------------------------------------------------------------------------
[2025-05-06T16:55:02.641Z] Exception encountered at "In [11]":
[2025-05-06T16:55:02.641Z] ---------------------------------------------------------------------------
[2025-05-06T16:55:02.641Z] InductorError Traceback (most recent call last)
[2025-05-06T16:55:02.641Z] Cell In[11], line 14
[2025-05-06T16:55:02.641Z] 12 inputs, labels = batch_data["image"].to(device), batch_data["label"].to(device)
[2025-05-06T16:55:02.641Z] 13 optimizer.zero_grad()
[2025-05-06T16:55:02.641Z] ---> 14 loss, train_time = timed(lambda: train(model_opt, inputs, labels)) # noqa: B023
[2025-05-06T16:55:02.641Z] 15 optimizer.step()
[2025-05-06T16:55:02.641Z] 16 epoch_loss += loss.item()
[2025-05-06T16:55:02.641Z]
[2025-05-06T16:55:02.641Z] Cell In[6], line 5, in timed(fn)
[2025-05-06T16:55:02.641Z] 3 end = torch.cuda.Event(enable_timing=True)
[2025-05-06T16:55:02.641Z] 4 start.record()
[2025-05-06T16:55:02.641Z] ----> 5 result = fn()
[2025-05-06T16:55:02.641Z] 6 end.record()
[2025-05-06T16:55:02.641Z] 7 torch.cuda.synchronize()
[2025-05-06T16:55:02.641Z]
[2025-05-06T16:55:02.641Z] Cell In[11], line 14, in <lambda>()
[2025-05-06T16:55:02.641Z] 12 inputs, labels = batch_data["image"].to(device), batch_data["label"].to(device)
[2025-05-06T16:55:02.641Z] 13 optimizer.zero_grad()
[2025-05-06T16:55:02.641Z] ---> 14 loss, train_time = timed(lambda: train(model_opt, inputs, labels)) # noqa: B023
[2025-05-06T16:55:02.641Z] 15 optimizer.step()
[2025-05-06T16:55:02.641Z] 16 epoch_loss += loss.item()
[2025-05-06T16:55:02.641Z]
[2025-05-06T16:55:02.641Z] Cell In[6], line 12, in train(model, inputs, labels)
[2025-05-06T16:55:02.641Z] 11 def train(model, inputs, labels):
[2025-05-06T16:55:02.641Z] ---> 12 outputs = model(inputs)
[2025-05-06T16:55:02.642Z] 13 loss_function = monai.losses.DiceCELoss(to_onehot_y=True, softmax=True)
[2025-05-06T16:55:02.642Z] 14 loss = loss_function(outputs, labels)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
[2025-05-06T16:55:02.642Z] 1749 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[2025-05-06T16:55:02.642Z] 1750 else:
[2025-05-06T16:55:02.642Z] -> 1751 return self._call_impl(*args, **kwargs)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
[2025-05-06T16:55:02.642Z] 1757 # If we don't have any hooks, we want to skip the rest of the logic in
[2025-05-06T16:55:02.642Z] 1758 # this function, and just call forward.
[2025-05-06T16:55:02.642Z] 1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[2025-05-06T16:55:02.642Z] 1760 or _global_backward_pre_hooks or _global_backward_hooks
[2025-05-06T16:55:02.642Z] 1761 or _global_forward_hooks or _global_forward_pre_hooks):
[2025-05-06T16:55:02.642Z] -> 1762 return forward_call(*args, **kwargs)
[2025-05-06T16:55:02.642Z] 1764 result = None
[2025-05-06T16:55:02.642Z] 1765 called_always_called_hooks = set()
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:663, in _TorchDynamoContext.__call__.<locals>._fn(*args, **kwargs)
[2025-05-06T16:55:02.642Z] 659 raise e.with_traceback(None) from None
[2025-05-06T16:55:02.642Z] 660 except ShortenTraceback as e:
[2025-05-06T16:55:02.642Z] 661 # Failures in the backend likely don't have useful
[2025-05-06T16:55:02.642Z] 662 # data in the TorchDynamo frames, so we strip them out.
[2025-05-06T16:55:02.642Z] --> 663 raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
[2025-05-06T16:55:02.642Z] 664 finally:
[2025-05-06T16:55:02.642Z] 665 # Restore the dynamic layer stack depth if necessary.
[2025-05-06T16:55:02.642Z] 666 set_eval_frame(None)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:760, in _compile_fx_inner(gm, example_inputs, **graph_kwargs)
[2025-05-06T16:55:02.642Z] 758 raise
[2025-05-06T16:55:02.642Z] 759 except Exception as e:
[2025-05-06T16:55:02.642Z] --> 760 raise InductorError(e, currentframe()).with_traceback(
[2025-05-06T16:55:02.642Z] 761 e.__traceback__
[2025-05-06T16:55:02.642Z] 762 ) from None
[2025-05-06T16:55:02.642Z] 763 finally:
[2025-05-06T16:55:02.642Z] 764 TritonBundler.end_compile()
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:745, in _compile_fx_inner(gm, example_inputs, **graph_kwargs)
[2025-05-06T16:55:02.642Z] 743 TritonBundler.begin_compile()
[2025-05-06T16:55:02.642Z] 744 try:
[2025-05-06T16:55:02.642Z] --> 745 mb_compiled_graph = fx_codegen_and_compile(
[2025-05-06T16:55:02.642Z] 746 gm, example_inputs, inputs_to_check, **graph_kwargs
[2025-05-06T16:55:02.642Z] 747 )
[2025-05-06T16:55:02.642Z] 748 assert mb_compiled_graph is not None
[2025-05-06T16:55:02.642Z] 749 mb_compiled_graph._time_taken_ns = time.time_ns() - start_time
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:1295, in fx_codegen_and_compile(gm, example_inputs, inputs_to_check, **graph_kwargs)
[2025-05-06T16:55:02.642Z] 1291 from .compile_fx_subproc import _SubprocessFxCompile
[2025-05-06T16:55:02.642Z] 1293 scheme = _SubprocessFxCompile()
[2025-05-06T16:55:02.642Z] -> 1295 return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:1119, in _InProcessFxCompile.codegen_and_compile(self, gm, example_inputs, inputs_to_check, graph_kwargs)
[2025-05-06T16:55:02.642Z] 1117 metrics_helper = metrics.CachedMetricsHelper()
[2025-05-06T16:55:02.642Z] 1118 with V.set_graph_handler(graph):
[2025-05-06T16:55:02.642Z] -> 1119 graph.run(*example_inputs)
[2025-05-06T16:55:02.642Z] 1120 output_strides: list[Optional[tuple[_StrideExprStr, ...]]] = []
[2025-05-06T16:55:02.642Z] 1121 if graph.graph_outputs is not None:
[2025-05-06T16:55:02.642Z] 1122 # We'll put the output strides in the compiled graph so we
[2025-05-06T16:55:02.642Z] 1123 # can later return them to the caller via TracingContext
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:877, in GraphLowering.run(self, *args)
[2025-05-06T16:55:02.642Z] 875 def run(self, *args: Any) -> Any: # type: ignore[override]
[2025-05-06T16:55:02.642Z] 876 with dynamo_timed("GraphLowering.run"):
[2025-05-06T16:55:02.642Z] --> 877 return super().run(*args)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py:171, in Interpreter.run(self, initial_env, enable_io_processing, *args)
[2025-05-06T16:55:02.642Z] 168 continue
[2025-05-06T16:55:02.642Z] 170 try:
[2025-05-06T16:55:02.642Z] --> 171 self.env[node] = self.run_node(node)
[2025-05-06T16:55:02.642Z] 172 except Exception as e:
[2025-05-06T16:55:02.642Z] 173 if self.extra_traceback:
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:1527, in GraphLowering.run_node(self, n)
[2025-05-06T16:55:02.642Z] 1525 else:
[2025-05-06T16:55:02.642Z] 1526 debug("")
[2025-05-06T16:55:02.642Z] -> 1527 result = super().run_node(n)
[2025-05-06T16:55:02.642Z] 1529 # require the same stride order for dense outputs,
[2025-05-06T16:55:02.642Z] 1530 # 1. user-land view() will not throw because inductor
[2025-05-06T16:55:02.642Z] 1531 # output different strides than eager
[2025-05-06T16:55:02.642Z] (...)
[2025-05-06T16:55:02.642Z] 1534 # 2: as_strided ops, we need make sure its input has same size/stride with
[2025-05-06T16:55:02.642Z] 1535 # eager model to align with eager behavior.
[2025-05-06T16:55:02.642Z] 1536 as_strided_ops = [
[2025-05-06T16:55:02.642Z] 1537 torch.ops.aten.as_strided.default,
[2025-05-06T16:55:02.642Z] 1538 torch.ops.aten.as_strided_.default,
[2025-05-06T16:55:02.642Z] (...)
[2025-05-06T16:55:02.642Z] 1541 torch.ops.aten.resize_as.default,
[2025-05-06T16:55:02.642Z] 1542 ]
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py:240, in Interpreter.run_node(self, n)
[2025-05-06T16:55:02.642Z] 238 assert isinstance(args, tuple)
[2025-05-06T16:55:02.642Z] 239 assert isinstance(kwargs, dict)
[2025-05-06T16:55:02.642Z] --> 240 return getattr(self, n.op)(n.target, args, kwargs)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:1169, in GraphLowering.call_function(self, target, args, kwargs)
[2025-05-06T16:55:02.642Z] 1163 decided_constraint = None # type: ignore[assignment]
[2025-05-06T16:55:02.642Z] 1165 # for implicitly fallback ops, we conservatively requires
[2025-05-06T16:55:02.642Z] 1166 # contiguous input since some eager kernels does not
[2025-05-06T16:55:02.642Z] 1167 # support non-contiguous inputs. They may silently cause
[2025-05-06T16:55:02.642Z] 1168 # accuracy problems. Check https://github.com/pytorch/pytorch/issues/140452
[2025-05-06T16:55:02.642Z] -> 1169 make_fallback(target, layout_constraint=decided_constraint)
[2025-05-06T16:55:02.642Z] 1171 elif get_decompositions([target]):
[2025-05-06T16:55:02.642Z] 1172 # There isn't a good way to dynamically patch this in
[2025-05-06T16:55:02.642Z] 1173 # since AOT Autograd already ran. The error message tells
[2025-05-06T16:55:02.642Z] 1174 # the user how to fix it.
[2025-05-06T16:55:02.642Z] 1175 raise MissingOperatorWithDecomp(target, args, kwargs)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/lowering.py:2023, in make_fallback(op, layout_constraint, warn, override_decomp)
[2025-05-06T16:55:02.642Z] 2018 torch._dynamo.config.suppress_errors = False
[2025-05-06T16:55:02.642Z] 2019 log.warning(
[2025-05-06T16:55:02.642Z] 2020 "A make_fallback error occurred in suppress_errors config,"
[2025-05-06T16:55:02.642Z] 2021 " and suppress_errors is being disabled to surface it."
[2025-05-06T16:55:02.642Z] 2022 )
[2025-05-06T16:55:02.642Z] -> 2023 raise AssertionError(
[2025-05-06T16:55:02.642Z] 2024 f"make_fallback({op}): a decomposition exists, we should switch to it."
[2025-05-06T16:55:02.642Z] 2025 " To fix this error, either add a decomposition to core_aten_decompositions (preferred)"
[2025-05-06T16:55:02.642Z] 2026 " or inductor_decompositions, and delete the corresponding `make_fallback` line."
[2025-05-06T16:55:02.642Z] 2027 " Get help from the inductor team if unsure, don't pick arbitrarily to unblock yourself.",
[2025-05-06T16:55:02.642Z] 2028 )
[2025-05-06T16:55:02.642Z] 2030 def register_fallback(op_overload):
[2025-05-06T16:55:02.642Z] 2031 add_needs_realized_inputs(op_overload)
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] InductorError: AssertionError: make_fallback(aten.upsample_trilinear3d.default): a decomposition exists, we should switch to it. To fix this error, either add a decomposition to core_aten_decompositions (preferred) or inductor_decompositions, and delete the corresponding `make_fallback` line. Get help from the inductor team if unsure, don't pick arbitrarily to unblock yourself.
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z]
[2025-05-06T16:55:02.642Z] real 7m21.829s
[2025-05-06T16:55:02.642Z] user 8m18.975s
[2025-05-06T16:55:02.642Z] sys 5m22.797s
[2025-05-06T16:55:02.642Z] Check failed!