Skip to content

testing errors from auto3dseg #952

Closed
@wyli

Description

@wyli
03:04:38  Running ./auto3dseg/notebooks/ensemble_byoc.ipynb
03:04:38  Checking PEP8 compliance...
03:04:39  Running notebook...
03:04:39  Before:
03:04:39      "max_epochs = 2\n",
03:04:39  After:
03:04:39      "max_epochs = 1\n",
03:04:43  MONAI version: 1.0.0+5.g84e271ec
03:04:43  Numpy version: 1.22.4
03:04:43  Pytorch version: 1.10.2+cu102
03:04:43  MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:04:43  MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:04:43  MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:04:43  
03:04:43  Optional dependencies:
03:04:43  Pytorch Ignite version: 0.4.8
03:04:43  Nibabel version: 4.0.2
03:04:43  scikit-image version: 0.19.3
03:04:43  Pillow version: 7.0.0
03:04:43  Tensorboard version: 2.10.0
03:04:43  gdown version: 4.5.1
03:04:43  TorchVision version: 0.11.3+cu102
03:04:43  tqdm version: 4.64.0
03:04:43  lmdb version: 1.3.0
03:04:43  psutil version: 5.9.1
03:04:43  pandas version: 1.1.5
03:04:43  einops version: 0.4.1
03:04:43  transformers version: 4.21.3
03:04:43  mlflow version: 1.29.0
03:04:43  pynrrd version: 0.4.3
03:04:43  
03:04:43  For details about installing the optional dependencies, please visit:
03:04:43      https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:04:43  
03:04:44  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:04:44    warnings.warn(
03:05:17  
Executing:   0%|          | 0/12 [00:00<?, ?cell/s]
Executing:   8%|▊         | 1/12 [00:01<00:16,  1.54s/cell]
Executing:  17%|█▋        | 2/12 [00:06<00:34,  3.49s/cell]
Executing:  33%|███▎      | 4/12 [00:10<00:19,  2.45s/cell]
Executing:  50%|█████     | 6/12 [00:10<00:08,  1.38s/cell]
Executing:  83%|████████▎ | 10/12 [00:31<00:07,  3.58s/cell]
Executing:  83%|████████▎ | 10/12 [00:32<00:06,  3.26s/cell]
03:05:17  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:05:17    warnings.warn(
03:05:17  Traceback (most recent call last):
03:05:17    File "/opt/conda/bin/papermill", line 8, in <module>
03:05:17      sys.exit(papermill())
03:05:17    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:05:17      return self.main(*args, **kwargs)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:05:17      rv = self.invoke(ctx)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:05:17      return ctx.invoke(self.callback, **ctx.params)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:05:17      return __callback(*args, **kwargs)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:05:17      return f(get_current_context(), *args, **kwargs)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:05:17      execute_notebook(
03:05:17    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:05:17      raise_for_execution_errors(nb, output_path)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:05:17      raise error
03:05:17  papermill.exceptions.PapermillExecutionError: 
03:05:17  ---------------------------------------------------------------------------
03:05:17  Exception encountered at "In [5]":
03:05:17  ---------------------------------------------------------------------------
03:05:17  CalledProcessError                        Traceback (most recent call last)
03:05:17  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:05:17      182     ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:05:17  --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:05:17      184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:05:17  
03:05:17  File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:05:17      515     if check and retcode:
03:05:17  --> 516         raise CalledProcessError(retcode, process.args,
03:05:17      517                                  output=stdout, stderr=stderr)
03:05:17      518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:05:17  
03:05:17  CalledProcessError: Command '['python', './workdir/segresnet2d_0/scripts/train.py', 'run', "--config_file='./workdir/segresnet2d_0/configs/transforms_train.yaml','./workdir/segresnet2d_0/configs/transforms_infer.yaml','./workdir/segresnet2d_0/configs/hyper_parameters.yaml','./workdir/segresnet2d_0/configs/network.yaml','./workdir/segresnet2d_0/configs/transforms_validate.yaml'", '--num_iterations=4', '--num_iterations_per_validation=2', '--num_images_per_batch=2', '--num_epochs=1', '--num_warmup_iterations=2']' returned non-zero exit status 1.
03:05:17  
03:05:17  The above exception was the direct cause of the following exception:
03:05:17  
03:05:17  RuntimeError                              Traceback (most recent call last)
03:05:17  Input In [5], in <cell line: 30>()
03:05:17       30 for h in history:
03:05:17       31     for _, algo in h.items():
03:05:17  ---> 32         algo.train(train_param)
03:05:17  
03:05:17  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:05:17      192 """
03:05:17      193 Load the run function in the training script of each model. Training parameter is predefined by the
03:05:17      194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:05:17     (...)
03:05:17      197     train_params:  to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:05:17      198 """
03:05:17      199 cmd, devices_info = self._create_cmd(train_params)
03:05:17  --> 200 return self._run_cmd(cmd, devices_info)
03:05:17  
03:05:17  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:05:17      186     output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:05:17      187     errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:05:17  --> 188     raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:05:17      189 return normal_out
03:05:17  
03:05:17  RuntimeError: subprocess call error 1: b'2022-09-23 02:05:11.053331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
03:05:17  To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:05:17  2022-09-23 02:05:11.201530: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:05:17  2022-09-23 02:05:11.234588: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:05:17  2022-09-23 02:05:11.855644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:05:17  2022-09-23 02:05:11.855721: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:05:17  2022-09-23 02:05:11.855728: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:05:17  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:05:17  Traceback (most recent call last):
03:05:17    File "./workdir/segresnet2d_0/scripts/train.py", line 439, in <module>
03:05:17      fire.Fire()
03:05:17    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:05:17      component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:05:17    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:05:17      component, remaining_args = _CallAndUpdateTrace(
03:05:17    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:05:17      component = fn(*varargs, **kwargs)
03:05:17    File "./workdir/segresnet2d_0/scripts/train.py", line 277, in run
03:05:17      lr_scheduler.step()
03:05:17    File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 152, in step
03:05:17      values = self.get_lr()
03:05:17    File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 372, in get_lr
03:05:17      if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
03:05:17  ZeroDivisionError: integer division or modulo by zero
03:05:17  ', b'[info] number of GPUs: 1
03:05:17  [info] world_size: 1
03:05:17  train_files: 8
03:05:17  val_files: 4
03:05:17  num_epochs 2
03:05:17  num_epochs_per_validation 1
03:05:17  [info] training from scratch
03:05:17  [info] amp enabled
03:05:17  ----------
03:05:17  epoch 1/2
03:05:17  learning rate is set to 0.2
03:05:17  [2022-09-23 02:05:13] 1/4, train_loss: 0.5047
03:05:17  '
03:05:17  Running ./auto3dseg/notebooks/hpo_optuna.ipynb
03:05:17  Checking PEP8 compliance...
03:05:17  Running notebook...
03:05:17  Before:
03:05:17      "max_epochs = 2\n",
03:05:17  After:
03:05:17      "max_epochs = 1\n",
03:05:22  MONAI version: 1.0.0+5.g84e271ec
03:05:22  Numpy version: 1.22.4
03:05:22  Pytorch version: 1.10.2+cu102
03:05:22  MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:05:22  MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:05:22  MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:05:22  
03:05:22  Optional dependencies:
03:05:22  Pytorch Ignite version: 0.4.8
03:05:22  Nibabel version: 4.0.2
03:05:22  scikit-image version: 0.19.3
03:05:22  Pillow version: 7.0.0
03:05:22  Tensorboard version: 2.10.0
03:05:22  gdown version: 4.5.1
03:05:22  TorchVision version: 0.11.3+cu102
03:05:22  tqdm version: 4.64.0
03:05:22  lmdb version: 1.3.0
03:05:22  psutil version: 5.9.1
03:05:22  pandas version: 1.1.5
03:05:22  einops version: 0.4.1
03:05:22  transformers version: 4.21.3
03:05:22  mlflow version: 1.29.0
03:05:22  pynrrd version: 0.4.3
03:05:22  
03:05:22  For details about installing the optional dependencies, please visit:
03:05:22      https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:05:22  
03:05:23  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:05:23    warnings.warn(
03:06:04  
Executing:   0%|          | 0/18 [00:00<?, ?cell/s]
Executing:   6%|▌         | 1/18 [00:01<00:27,  1.60s/cell]
Executing:  17%|█▋        | 3/18 [00:05<00:26,  1.79s/cell]
Executing:  39%|███▉      | 7/18 [00:17<00:29,  2.70s/cell]
Executing:  61%|██████    | 11/18 [00:22<00:13,  1.96s/cell]
Executing:  89%|████████▉ | 16/18 [00:22<00:02,  1.06s/cell]
Executing: 100%|██████████| 18/18 [00:38<00:00,  2.48s/cell]
Executing: 100%|██████████| 18/18 [00:39<00:00,  2.19s/cell]
03:06:04  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:06:04    warnings.warn(
03:06:04  Traceback (most recent call last):
03:06:04    File "/opt/conda/bin/papermill", line 8, in <module>
03:06:04      sys.exit(papermill())
03:06:04    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:06:04      return self.main(*args, **kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:06:04      rv = self.invoke(ctx)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:06:04      return ctx.invoke(self.callback, **ctx.params)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:06:04      return __callback(*args, **kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:06:04      return f(get_current_context(), *args, **kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:06:04      execute_notebook(
03:06:04    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:06:04      raise_for_execution_errors(nb, output_path)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:06:04      raise error
03:06:04  papermill.exceptions.PapermillExecutionError: 
03:06:04  ---------------------------------------------------------------------------
03:06:04  Exception encountered at "In [9]":
03:06:04  ---------------------------------------------------------------------------
03:06:04  CalledProcessError                        Traceback (most recent call last)
03:06:04  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:06:04      182     ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:06:04  --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:06:04      184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:06:04      515     if check and retcode:
03:06:04  --> 516         raise CalledProcessError(retcode, process.args,
03:06:04      517                                  output=stdout, stderr=stderr)
03:06:04      518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:06:04  
03:06:04  CalledProcessError: Command '['python', './optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py', 'run', "--config_file='./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_train.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_infer.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/hyper_parameters.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/network.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_validate.yaml'", '--learning_rate=0.0001']' returned non-zero exit status 1.
03:06:04  
03:06:04  The above exception was the direct cause of the following exception:
03:06:04  
03:06:04  RuntimeError                              Traceback (most recent call last)
03:06:04  Input In [9], in <cell line: 3>()
03:06:04        1 search_space = {'learning_rate': [0.0001, 0.001, 0.01, 0.1]}
03:06:04        2 study = optuna.create_study(sampler=optuna.samplers.GridSampler(search_space), direction='maximize')
03:06:04  ----> 3 study.optimize(partial(optuna_gen, obj_filename=optuna_gen.get_obj_filename(), output_folder=optuna_dir), n_trials=2)
03:06:04        4 print("Best value: {} (params: {})\n".format(study.best_value, study.best_params))
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/site-packages/optuna/study/study.py:419, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
03:06:04      315 def optimize(
03:06:04      316     self,
03:06:04      317     func: ObjectiveFuncType,
03:06:04     (...)
03:06:04      324     show_progress_bar: bool = False,
03:06:04      325 ) -> None:
03:06:04      326     """Optimize an objective function.
03:06:04      327 
03:06:04      328     Optimization is done by choosing a suitable set of hyperparameter values from a given
03:06:04     (...)
03:06:04      416             If nested invocation of this method occurs.
03:06:04      417     """
03:06:04  --> 419     _optimize(
03:06:04      420         study=self,
03:06:04      421         func=func,
03:06:04      422         n_trials=n_trials,
03:06:04      423         timeout=timeout,
03:06:04      424         n_jobs=n_jobs,
03:06:04      425         catch=catch,
03:06:04      426         callbacks=callbacks,
03:06:04      427         gc_after_trial=gc_after_trial,
03:06:04      428         show_progress_bar=show_progress_bar,
03:06:04      429     )
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:66, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
03:06:04       64 try:
03:06:04       65     if n_jobs == 1:
03:06:04  ---> 66         _optimize_sequential(
03:06:04       67             study,
03:06:04       68             func,
03:06:04       69             n_trials,
03:06:04       70             timeout,
03:06:04       71             catch,
03:06:04       72             callbacks,
03:06:04       73             gc_after_trial,
03:06:04       74             reseed_sampler_rng=False,
03:06:04       75             time_start=None,
03:06:04       76             progress_bar=progress_bar,
03:06:04       77         )
03:06:04       78     else:
03:06:04       79         if n_jobs == -1:
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:160, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
03:06:04      157         break
03:06:04      159 try:
03:06:04  --> 160     frozen_trial = _run_trial(study, func, catch)
03:06:04      161 finally:
03:06:04      162     # The following line mitigates memory problems that can be occurred in some
03:06:04      163     # environments (e.g., services that use computing containers such as CircleCI).
03:06:04      164     # Please refer to the following PR for further details:
03:06:04      165     # https://github.com/optuna/optuna/pull/325.
03:06:04      166     if gc_after_trial:
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:234, in _run_trial(study, func, catch)
03:06:04      227         assert False, "Should not reach."
03:06:04      229 if (
03:06:04      230     frozen_trial.state == TrialState.FAIL
03:06:04      231     and func_err is not None
03:06:04      232     and not isinstance(func_err, catch)
03:06:04      233 ):
03:06:04  --> 234     raise func_err
03:06:04      235 return frozen_trial
03:06:04  
03:06:04  File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:196, in _run_trial(study, func, catch)
03:06:04      194 with get_heartbeat_thread(trial._trial_id, study._storage):
03:06:04      195     try:
03:06:04  --> 196         value_or_values = func(trial)
03:06:04      197     except exceptions.TrialPruned as e:
03:06:04      198         # TODO(mamu): Handle multi-objective cases.
03:06:04      199         state = TrialState.PRUNED
03:06:04  
03:06:04  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/hpo_gen.py:333, in OptunaGen.__call__(self, trial, obj_filename, output_folder, template_path)
03:06:04      323 """
03:06:04      324 Callabe that Optuna will use to optimize the hyper-parameters
03:06:04      325 
03:06:04     (...)
03:06:04      330         ``{algorithm_templates_dir}/{network}/scripts/algo.py``
03:06:04      331 """
03:06:04      332 self.set_trial(trial)
03:06:04  --> 333 self.run_algo(obj_filename, output_folder, template_path)
03:06:04      334 return self.acc
03:06:04  
03:06:04  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/hpo_gen.py:395, in OptunaGen.run_algo(self, obj_filename, output_folder, template_path)
03:06:04      393 # step 3 generate the folder to save checkpoints and train
03:06:04      394 self.generate(output_folder)
03:06:04  --> 395 self.algo.train(self.params)
03:06:04      396 # step 4 report validation acc to controller
03:06:04      397 acc = self.algo.get_score()
03:06:04  
03:06:04  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:06:04      192 """
03:06:04      193 Load the run function in the training script of each model. Training parameter is predefined by the
03:06:04      194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:06:04     (...)
03:06:04      197     train_params:  to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:06:04      198 """
03:06:04      199 cmd, devices_info = self._create_cmd(train_params)
03:06:04  --> 200 return self._run_cmd(cmd, devices_info)
03:06:04  
03:06:04  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:06:04      186     output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:06:04      187     errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:06:04  --> 188     raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:06:04      189 return normal_out
03:06:04  
03:06:04  RuntimeError: subprocess call error 1: b'2022-09-23 02:05:54.715096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
03:06:04  To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:06:04  2022-09-23 02:05:54.859510: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:06:04  2022-09-23 02:05:54.892624: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:06:04  2022-09-23 02:05:55.523776: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:06:04  2022-09-23 02:05:55.523854: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:06:04  2022-09-23 02:05:55.523862: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  202.45334114]
03:06:04  Modifying image pixdim from [0.6       0.5999997 3.999998  1.       ] to [  0.60000002   0.59999975   3.99999799 129.0477496 ]
03:06:04  Modifying image pixdim from [0.625   0.625   3.59999 1.     ] to [  0.625        0.625        3.59998989 208.35237358]
03:06:04  Modifying image pixdim from [0.6       0.6000003 4.0000024 1.       ] to [  0.60000002   0.60000034   4.00000227 149.31681629]
03:06:04  Modifying image pixdim from [0.6       0.5999997 3.999998  1.       ] to [  0.60000002   0.59999975   3.99999799 117.82332812]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  160.84918933]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  150.20609596]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  162.36758282]
03:06:04  Modifying image pixdim from [0.604167  0.6041667 3.999998  1.       ] to [  0.60416698   0.60416671   3.99999799 174.60244975]
03:06:04  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  154.70488398]
03:06:04  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04  Modifying image pixdim from [0.6249998 0.625     3.5999987 1.       ] to [  0.62499983   0.625        3.59999877 153.34152766]
03:06:04  no available indices of class 1 to crop, set the crop ratio of this class to zero.
03:06:04  Modifying image pixdim from [0.75       0.74999964 2.9999986  1.        ] to [  0.75         0.74999965   2.99999861 157.50724574]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  142.87013615]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  148.64025265]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  150.52861008]
03:06:04  Modifying image pixdim from [0.6249999 0.6249999 3.6       1.       ] to [  0.62499988   0.62499988   3.5999999  152.06272679]
03:06:04  Modifying image pixdim from [0.6        0.60000014 4.0000005  1.        ] to [  0.60000002   0.60000017   4.00000041 174.1516677 ]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625       0.625       3.5999999 152.3814254]
03:06:04  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04  Modifying image pixdim from [0.625      0.62500006 3.6000001  1.        ] to [  0.625        0.62500005   3.60000016 168.35859116]
03:06:04  no available indices of class 1 to crop, set the crop ratio of this class to zero.
03:06:04  Modifying image pixdim from [0.75      0.7500001 4.0000005 1.       ] to [  0.75         0.75000013   4.00000043 128.6789431 ]
03:06:04  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04  Modifying image pixdim from [0.62499976 0.62499976 3.6        1.        ] to [  0.62499976   0.62499976   3.5999999  164.74616031]
03:06:04  Modifying image pixdim from [0.625   0.625   3.60001 1.     ] to [  0.625        0.625        3.60000992 173.25173679]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  180.81811432]
03:06:04  Modifying image pixdim from [0.625   0.625   3.60001 1.     ] to [  0.625        0.625        3.60000992 151.21210251]
03:06:04  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625       0.625       3.5999999 149.7974827]
03:06:04  Traceback (most recent call last):
03:06:04    File "./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py", line 439, in <module>
03:06:04      fire.Fire()
03:06:04    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:06:04      component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:06:04      component, remaining_args = _CallAndUpdateTrace(
03:06:04    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:06:04      component = fn(*varargs, **kwargs)
03:06:04    File "./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py", line 251, in run
03:06:04      outputs = model(inputs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04      return forward_call(*input, **kwargs)
03:06:04    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/networks/nets/segresnet.py", line 178, in forward
03:06:04      x, down_x = self.encode(x)
03:06:04    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/networks/nets/segresnet.py", line 155, in encode
03:06:04      x = self.convInit(x)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04      return forward_call(*input, **kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
03:06:04      input = module(input)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04      return forward_call(*input, **kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
03:06:04      return self._conv_forward(input, self.weight, self.bias)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
03:06:04      return F.conv2d(input, weight, bias, self.stride,
03:06:04    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/meta_tensor.py", line 249, in __torch_function__
03:06:04      ret = super().__torch_function__(func, types, args, kwargs)
03:06:04    File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 1051, in __torch_function__
03:06:04      ret = func(*args, **kwargs)
03:06:04  RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[2, 6, 64, 64] to have 3 channels, but got 6 channels instead
03:06:04  ', b'[info] number of GPUs: 1
03:06:04  [info] world_size: 1
03:06:04  train_files: 24
03:06:04  val_files: 6
03:06:04  num_epochs 2
03:06:04  num_epochs_per_validation 1
03:06:04  [info] training from scratch
03:06:04  [info] amp enabled
03:06:04  ----------
03:06:04  epoch 1/2
03:06:04  learning rate is set to 0.0001
03:06:04  '
03:06:20  Running ./auto3dseg/notebooks/auto3dseg_hello_world.ipynb
03:06:20  Checking PEP8 compliance...
03:06:21  Running notebook...
03:06:21  Before:
03:06:21      "max_epochs = 2\n",
03:06:21  After:
03:06:21      "max_epochs = 1\n",
03:06:25  MONAI version: 1.0.0+5.g84e271ec
03:06:25  Numpy version: 1.22.4
03:06:25  Pytorch version: 1.10.2+cu102
03:06:25  MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:06:25  MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:06:25  MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:06:25  
03:06:25  Optional dependencies:
03:06:25  Pytorch Ignite version: 0.4.8
03:06:25  Nibabel version: 4.0.2
03:06:25  scikit-image version: 0.19.3
03:06:25  Pillow version: 7.0.0
03:06:25  Tensorboard version: 2.10.0
03:06:25  gdown version: 4.5.1
03:06:25  TorchVision version: 0.11.3+cu102
03:06:25  tqdm version: 4.64.0
03:06:25  lmdb version: 1.3.0
03:06:25  psutil version: 5.9.1
03:06:25  pandas version: 1.1.5
03:06:25  einops version: 0.4.1
03:06:25  transformers version: 4.21.3
03:06:25  mlflow version: 1.29.0
03:06:25  pynrrd version: 0.4.3
03:06:25  
03:06:25  For details about installing the optional dependencies, please visit:
03:06:25      https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:06:25  
03:06:26  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:06:26    warnings.warn(
03:07:24  
Executing:   0%|          | 0/20 [00:00<?, ?cell/s]
Executing:   5%|▌         | 1/20 [00:01<00:30,  1.62s/cell]
Executing:  15%|█▌        | 3/20 [00:06<00:37,  2.21s/cell]
Executing:  25%|██▌       | 5/20 [00:10<00:29,  2.00s/cell]
Executing:  45%|████▌     | 9/20 [00:10<00:09,  1.15cell/s]
Executing:  55%|█████▌    | 11/20 [00:10<00:06,  1.49cell/s]
Executing:  85%|████████▌ | 17/20 [00:51<00:12,  4.10s/cell]
Executing:  85%|████████▌ | 17/20 [00:52<00:09,  3.12s/cell]
03:07:24  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:07:24    warnings.warn(
03:07:24  Traceback (most recent call last):
03:07:24    File "/opt/conda/bin/papermill", line 8, in <module>
03:07:24      sys.exit(papermill())
03:07:24    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:07:24      return self.main(*args, **kwargs)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:07:24      rv = self.invoke(ctx)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:07:24      return ctx.invoke(self.callback, **ctx.params)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:07:24      return __callback(*args, **kwargs)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:07:24      return f(get_current_context(), *args, **kwargs)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:07:24      execute_notebook(
03:07:24    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:07:24      raise_for_execution_errors(nb, output_path)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:07:24      raise error
03:07:24  papermill.exceptions.PapermillExecutionError: 
03:07:24  ---------------------------------------------------------------------------
03:07:24  Exception encountered at "In [8]":
03:07:24  ---------------------------------------------------------------------------
03:07:24  CalledProcessError                        Traceback (most recent call last)
03:07:24  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:07:24      182     ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:07:24  --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:07:24      184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:07:24  
03:07:24  File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:07:24      515     if check and retcode:
03:07:24  --> 516         raise CalledProcessError(retcode, process.args,
03:07:24      517                                  output=stdout, stderr=stderr)
03:07:24      518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:07:24  
03:07:24  CalledProcessError: Command '['python', '/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py', 'run', "--config_file='/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_train.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_infer.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/network_search.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/hyper_parameters.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/network.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_validate.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/hyper_parameters_search.yaml'", '--searching#num_iterations=4', '--searching#num_iterations_per_validation=2', '--searching#num_images_per_batch=2', '--searching#num_epochs=1', '--searching#num_warmup_iterations=2']' returned non-zero exit status 1.
03:07:24  
03:07:24  The above exception was the direct cause of the following exception:
03:07:24  
03:07:24  RuntimeError                              Traceback (most recent call last)
03:07:24  Input In [8], in <cell line: 1>()
03:07:24  ----> 1 runner.run()
03:07:24  
03:07:24  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/auto_runner.py:586, in AutoRunner.run(self)
03:07:24      584 history = import_bundle_algo_history(self.work_dir, only_trained=False)
03:07:24      585 if not self.hpo:
03:07:24  --> 586     self._train_algo_in_sequence(history)
03:07:24      587 else:
03:07:24      588     self._train_algo_in_nni(history)
03:07:24  
03:07:24  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/auto_runner.py:488, in AutoRunner._train_algo_in_sequence(self, history)
03:07:24      486 for task in history:
03:07:24      487     for _, algo in task.items():
03:07:24  --> 488         algo.train(self.train_params)
03:07:24      489         acc = algo.get_score()
03:07:24      490         algo_to_pickle(algo, template_path=algo.template_path, best_metrics=acc)
03:07:24  
03:07:24  File /home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/algorithm_templates/dints/scripts/algo.py:187, in DintsAlgo.train(self, train_params)
03:07:24      185 cmd, devices_info = self._create_cmd(dints_search_params)
03:07:24      186 cmd_search = cmd.replace('train.py', 'search.py')
03:07:24  --> 187 self._run_cmd(cmd_search, devices_info)
03:07:24      189 dints_train_params = {}
03:07:24      190 for k, v in params.items():
03:07:24  
03:07:24  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:07:24      186     output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:07:24      187     errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:07:24  --> 188     raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:07:24      189 return normal_out
03:07:24  
03:07:24  RuntimeError: subprocess call error 1: b'2022-09-23 02:06:52.336551: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
03:07:24  To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:07:24  2022-09-23 02:06:52.477179: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:07:24  2022-09-23 02:06:52.509211: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:07:24  2022-09-23 02:06:53.137491: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:07:24  2022-09-23 02:06:53.137592: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:07:24  2022-09-23 02:06:53.137599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:07:24  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:07:24  The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
03:07:24  Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:07:24  Traceback (most recent call last):
03:07:24    File "/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py", line 550, in <module>
03:07:24      fire.Fire()
03:07:24    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:07:24      component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:07:24    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:07:24      component, remaining_args = _CallAndUpdateTrace(
03:07:24    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:07:24      component = fn(*varargs, **kwargs)
03:07:24    File "/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py", line 381, in run
03:07:24      lr_scheduler.step()
03:07:24    File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 152, in step
03:07:24      values = self.get_lr()
03:07:24    File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 372, in get_lr
03:07:24      if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
03:07:24  ZeroDivisionError: integer division or modulo by zero
03:07:24  ', b"[info] number of GPUs: 1
03:07:24  [info] world_size: 1
03:07:24  train_files_w: 4
03:07:24  train_files_a: 4
03:07:24  val_files: 4
03:07:24  num_epochs 2
03:07:24  num_epochs_warmup 1
03:07:24  num_epochs_per_validation 1
03:07:24  [info] amp enabled
03:07:24  ----------
03:07:24  epoch 1/2
03:07:24  learning rate is set to 0.025
03:07:24  [2022-09-23 02:06:56] 1/2, train_loss: 0.7307
03:07:24  [2022-09-23 02:06:57] 2/2, train_loss: 0.7303
03:07:24  epoch 1 average loss: 0.7305, best mean dice: -1.0000 at epoch -1
03:07:24  1 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24  2 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24  3 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24  4 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24  evaluation metric - class 1: 0.16984054446220398
03:07:24  avg_metric 0.16984054446220398
03:07:24  saved new best metric model
03:07:24  current epoch: 1 current mean dice: 0.1698 best mean dice: 0.1698 at epoch 1
03:07:24  ----------
03:07:24  epoch 2/2
03:07:24  learning rate is set to 0.025
03:07:24  [2022-09-23 02:07:13] 1/2, train_loss: 0.7415
03:07:24  [2022-09-23 02:07:15] 1/2, train_loss_arch: 0.7368
03:07:24  "
03:08:23  Running ./auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb
03:08:23  Checking PEP8 compliance...
03:08:24  stdin:179:9: B007 Loop control variable 'name' not used within the loop body. If this is intended, start the name with an underscore.
03:08:24      for name, algo in task.items():
03:08:24          ^
03:08:24  Error: Try running with autofixes: --autofix.
03:08:24  
03:08:24  Check failed!
03:08:24  Running notebook...
03:08:24  Before:
03:08:24      "max_epochs = 2000\n",
03:08:24  After:
03:08:24      "max_epochs = 1\n",
03:08:28  MONAI version: 1.0.0+5.g84e271ec
03:08:28  Numpy version: 1.22.4
03:08:28  Pytorch version: 1.10.2+cu102
03:08:28  MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:08:28  MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:08:28  MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:08:28  
03:08:28  Optional dependencies:
03:08:28  Pytorch Ignite version: 0.4.8
03:08:28  Nibabel version: 4.0.2
03:08:28  scikit-image version: 0.19.3
03:08:28  Pillow version: 7.0.0
03:08:28  Tensorboard version: 2.10.0
03:08:28  gdown version: 4.5.1
03:08:28  TorchVision version: 0.11.3+cu102
03:08:28  tqdm version: 4.64.0
03:08:28  lmdb version: 1.3.0
03:08:28  psutil version: 5.9.1
03:08:28  pandas version: 1.1.5
03:08:28  einops version: 0.4.1
03:08:28  transformers version: 4.21.3
03:08:28  mlflow version: 1.29.0
03:08:28  pynrrd version: 0.4.3
03:08:28  
03:08:28  For details about installing the optional dependencies, please visit:
03:08:28      https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:08:28  
03:08:29  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:08:29    warnings.warn(
03:09:16  
Executing:   0%|          | 0/24 [00:00<?, ?cell/s]
Executing:   4%|▍         | 1/24 [00:01<00:36,  1.58s/cell]
Executing:  12%|█▎        | 3/24 [00:06<00:46,  2.23s/cell]
Executing:  21%|██        | 5/24 [00:10<00:38,  2.01s/cell]
Executing:  46%|████▌     | 11/24 [00:22<00:27,  2.10s/cell]
Executing:  58%|█████▊    | 14/24 [00:38<00:31,  3.12s/cell]
Executing:  88%|████████▊ | 21/24 [00:45<00:06,  2.00s/cell]
Executing:  88%|████████▊ | 21/24 [00:46<00:06,  2.23s/cell]
03:09:16  /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:09:16    warnings.warn(
03:09:16  Traceback (most recent call last):
03:09:16    File "/opt/conda/bin/papermill", line 8, in <module>
03:09:16      sys.exit(papermill())
03:09:16    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:09:16      return self.main(*args, **kwargs)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:09:16      rv = self.invoke(ctx)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:09:16      return ctx.invoke(self.callback, **ctx.params)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:09:16      return __callback(*args, **kwargs)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:09:16      return f(get_current_context(), *args, **kwargs)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:09:16      execute_notebook(
03:09:16    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:09:16      raise_for_execution_errors(nb, output_path)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:09:16      raise error
03:09:16  papermill.exceptions.PapermillExecutionError: 
03:09:16  ---------------------------------------------------------------------------
03:09:16  Exception encountered at "In [9]":
03:09:16  ---------------------------------------------------------------------------
03:09:16  CalledProcessError                        Traceback (most recent call last)
03:09:16  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:09:16      182     ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:09:16  --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:09:16      184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:09:16  
03:09:16  File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:09:16      515     if check and retcode:
03:09:16  --> 516         raise CalledProcessError(retcode, process.args,
03:09:16      517                                  output=stdout, stderr=stderr)
03:09:16      518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:09:16  
03:09:16  CalledProcessError: Command '['python', './auto3dseg_work_dir/swinunetr_4/scripts/train.py', 'run', "--config_file='./auto3dseg_work_dir/swinunetr_4/configs/transforms_train.yaml','./auto3dseg_work_dir/swinunetr_4/configs/transforms_infer.yaml','./auto3dseg_work_dir/swinunetr_4/configs/hyper_parameters.yaml','./auto3dseg_work_dir/swinunetr_4/configs/network.yaml','./auto3dseg_work_dir/swinunetr_4/configs/transforms_validate.yaml'", '--num_iterations=12', '--num_iterations_per_validation=6', '--num_images_per_batch=2', '--num_epochs=1', '--num_warmup_iterations=6']' died with <Signals.SIGABRT: 6>.
03:09:16  
03:09:16  The above exception was the direct cause of the following exception:
03:09:16  
03:09:16  RuntimeError                              Traceback (most recent call last)
03:09:16  Input In [9], in <cell line: 2>()
03:09:16        2 for task in history:
03:09:16        3     for name, algo in task.items():
03:09:16  ----> 4         algo.train(train_param)  # can use default params by `algo.train()`
03:09:16        5         acc = algo.get_score()
03:09:16        6         algo_to_pickle(algo, template_path=algo.template_path, best_metrics=acc)
03:09:16  
03:09:16  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:09:16      192 """
03:09:16      193 Load the run function in the training script of each model. Training parameter is predefined by the
03:09:16      194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:09:16     (...)
03:09:16      197     train_params:  to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:09:16      198 """
03:09:16      199 cmd, devices_info = self._create_cmd(train_params)
03:09:16  --> 200 return self._run_cmd(cmd, devices_info)
03:09:16  
03:09:16  File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:09:16      186     output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:09:16      187     errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:09:16  --> 188     raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:09:16      189 return normal_out
03:09:16  
03:09:16  RuntimeError: subprocess call error -6: b'Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  154.70488398]
03:09:16  Modifying image pixdim from [0.625   0.625   3.59999 1.     ] to [  0.625        0.625        3.59998989 208.35237358]
03:09:16  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  170.38896694]
03:09:16  Modifying image pixdim from [0.625   0.625   3.60001 1.     ] to [  0.625        0.625        3.60000992 173.25173679]
03:09:16  Modifying image pixdim from [0.6       0.5999997 3.999998  1.       ] to [  0.60000002   0.59999975   3.99999799 129.0477496 ]
03:09:16  Modifying image pixdim from [0.625 0.625 3.6   1.   ] to [  0.625        0.625        3.5999999  202.45334114]
03:09:16  Traceback (most recent call last):
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 184, in __call__
03:09:16      out = _pad(img_t, pad_width=to_pad_, mode=mode_, **kwargs_)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 138, in _pt_pad
03:09:16      return pad_pt(img.unsqueeze(0), pt_pad_width, mode=mode, **kwargs).squeeze(0)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 4170, in _pad
03:09:16      return handle_torch_function(_pad, (input,), input, pad, mode=mode, value=value)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/torch/overrides.py", line 1355, in handle_torch_function
03:09:16      result = torch_func_method(public_api, types, args, kwargs)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/meta_tensor.py", line 249, in __torch_function__
03:09:16      ret = super().__torch_function__(func, types, args, kwargs)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 1051, in __torch_function__
03:09:16      ret = func(*args, **kwargs)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 4199, in _pad
03:09:16      return torch._C._nn.reflection_pad3d(input, pad)
03:09:16  RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (22, 22) at dimension 4 of input [1, 2, 320, 320, 20]
03:09:16  
03:09:16  The above exception was the direct cause of the following exception:
03:09:16  
03:09:16  Traceback (most recent call last):
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 91, in apply_transform
03:09:16      return _apply_transform(transform, data, unpack_items)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 55, in _apply_transform
03:09:16      return transform(parameters)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/dictionary.py", line 147, in __call__
03:09:16      d[key] = self.padder(d[key], mode=m)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 189, in __call__
03:09:16      raise ValueError(f"{mode_}, {kwargs_}, {img_t.dtype}, {img_t.device}") from err
03:09:16  ValueError: reflect, {}, torch.float32, cpu
03:09:16  
03:09:16  The above exception was the direct cause of the following exception:
03:09:16  
03:09:16  Traceback (most recent call last):
03:09:16    File "./auto3dseg_work_dir/swinunetr_4/scripts/train.py", line 409, in <module>
03:09:16      fire.Fire()
03:09:16    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:09:16      component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:09:16    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:09:16      component, remaining_args = _CallAndUpdateTrace(
03:09:16    File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:09:16      component = fn(*varargs, **kwargs)
03:09:16    File "./auto3dseg_work_dir/swinunetr_4/scripts/train.py", line 141, in run
03:09:16      train_ds = monai.data.CacheDataset(
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 794, in __init__
03:09:16      self.set_data(data)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 819, in set_data
03:09:16      self._cache = _compute_cache()
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 808, in _compute_cache
03:09:16      return self._fill_cache()
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 835, in _fill_cache
03:09:16      return list(p.imap(self._load_cache_item, range(self.cache_num)))
03:09:16    File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
03:09:16      raise value
03:09:16    File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
03:09:16      result = (True, func(*args, **kwds))
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 848, in _load_cache_item
03:09:16      item = apply_transform(_xform, item)
03:09:16    File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 118, in apply_transform
03:09:16      raise RuntimeError(f"applying transform {transform}") from e
03:09:16  RuntimeError: applying transform <monai.transforms.croppad.dictionary.SpatialPadd object at 0x7f6ba32abd60>
03:09:16  terminate called without an active exception
03:09:16  ', b'[info] number of GPUs: 1
03:09:16  [info] world_size: 1
03:09:16  train_files: 24
03:09:16  val_files: 6
03:09:16  '

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions