Description
03:04:38 Running ./auto3dseg/notebooks/ensemble_byoc.ipynb
03:04:38 Checking PEP8 compliance...
03:04:39 Running notebook...
03:04:39 Before:
03:04:39 "max_epochs = 2\n",
03:04:39 After:
03:04:39 "max_epochs = 1\n",
03:04:43 MONAI version: 1.0.0+5.g84e271ec
03:04:43 Numpy version: 1.22.4
03:04:43 Pytorch version: 1.10.2+cu102
03:04:43 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:04:43 MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:04:43 MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:04:43
03:04:43 Optional dependencies:
03:04:43 Pytorch Ignite version: 0.4.8
03:04:43 Nibabel version: 4.0.2
03:04:43 scikit-image version: 0.19.3
03:04:43 Pillow version: 7.0.0
03:04:43 Tensorboard version: 2.10.0
03:04:43 gdown version: 4.5.1
03:04:43 TorchVision version: 0.11.3+cu102
03:04:43 tqdm version: 4.64.0
03:04:43 lmdb version: 1.3.0
03:04:43 psutil version: 5.9.1
03:04:43 pandas version: 1.1.5
03:04:43 einops version: 0.4.1
03:04:43 transformers version: 4.21.3
03:04:43 mlflow version: 1.29.0
03:04:43 pynrrd version: 0.4.3
03:04:43
03:04:43 For details about installing the optional dependencies, please visit:
03:04:43 https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:04:43
03:04:44 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:04:44 warnings.warn(
03:05:17
Executing: 0%| | 0/12 [00:00<?, ?cell/s]
Executing: 8%|▊ | 1/12 [00:01<00:16, 1.54s/cell]
Executing: 17%|█▋ | 2/12 [00:06<00:34, 3.49s/cell]
Executing: 33%|███▎ | 4/12 [00:10<00:19, 2.45s/cell]
Executing: 50%|█████ | 6/12 [00:10<00:08, 1.38s/cell]
Executing: 83%|████████▎ | 10/12 [00:31<00:07, 3.58s/cell]
Executing: 83%|████████▎ | 10/12 [00:32<00:06, 3.26s/cell]
03:05:17 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:05:17 warnings.warn(
03:05:17 Traceback (most recent call last):
03:05:17 File "/opt/conda/bin/papermill", line 8, in <module>
03:05:17 sys.exit(papermill())
03:05:17 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:05:17 return self.main(*args, **kwargs)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:05:17 rv = self.invoke(ctx)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:05:17 return ctx.invoke(self.callback, **ctx.params)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:05:17 return __callback(*args, **kwargs)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:05:17 return f(get_current_context(), *args, **kwargs)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:05:17 execute_notebook(
03:05:17 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:05:17 raise_for_execution_errors(nb, output_path)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:05:17 raise error
03:05:17 papermill.exceptions.PapermillExecutionError:
03:05:17 ---------------------------------------------------------------------------
03:05:17 Exception encountered at "In [5]":
03:05:17 ---------------------------------------------------------------------------
03:05:17 CalledProcessError Traceback (most recent call last)
03:05:17 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:05:17 182 ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:05:17 --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:05:17 184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:05:17
03:05:17 File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:05:17 515 if check and retcode:
03:05:17 --> 516 raise CalledProcessError(retcode, process.args,
03:05:17 517 output=stdout, stderr=stderr)
03:05:17 518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:05:17
03:05:17 CalledProcessError: Command '['python', './workdir/segresnet2d_0/scripts/train.py', 'run', "--config_file='./workdir/segresnet2d_0/configs/transforms_train.yaml','./workdir/segresnet2d_0/configs/transforms_infer.yaml','./workdir/segresnet2d_0/configs/hyper_parameters.yaml','./workdir/segresnet2d_0/configs/network.yaml','./workdir/segresnet2d_0/configs/transforms_validate.yaml'", '--num_iterations=4', '--num_iterations_per_validation=2', '--num_images_per_batch=2', '--num_epochs=1', '--num_warmup_iterations=2']' returned non-zero exit status 1.
03:05:17
03:05:17 The above exception was the direct cause of the following exception:
03:05:17
03:05:17 RuntimeError Traceback (most recent call last)
03:05:17 Input In [5], in <cell line: 30>()
03:05:17 30 for h in history:
03:05:17 31 for _, algo in h.items():
03:05:17 ---> 32 algo.train(train_param)
03:05:17
03:05:17 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:05:17 192 """
03:05:17 193 Load the run function in the training script of each model. Training parameter is predefined by the
03:05:17 194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:05:17 (...)
03:05:17 197 train_params: to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:05:17 198 """
03:05:17 199 cmd, devices_info = self._create_cmd(train_params)
03:05:17 --> 200 return self._run_cmd(cmd, devices_info)
03:05:17
03:05:17 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:05:17 186 output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:05:17 187 errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:05:17 --> 188 raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:05:17 189 return normal_out
03:05:17
03:05:17 RuntimeError: subprocess call error 1: b'2022-09-23 02:05:11.053331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
03:05:17 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:05:17 2022-09-23 02:05:11.201530: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:05:17 2022-09-23 02:05:11.234588: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:05:17 2022-09-23 02:05:11.855644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:05:17 2022-09-23 02:05:11.855721: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:05:17 2022-09-23 02:05:11.855728: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:05:17 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:05:17 Traceback (most recent call last):
03:05:17 File "./workdir/segresnet2d_0/scripts/train.py", line 439, in <module>
03:05:17 fire.Fire()
03:05:17 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:05:17 component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:05:17 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:05:17 component, remaining_args = _CallAndUpdateTrace(
03:05:17 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:05:17 component = fn(*varargs, **kwargs)
03:05:17 File "./workdir/segresnet2d_0/scripts/train.py", line 277, in run
03:05:17 lr_scheduler.step()
03:05:17 File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 152, in step
03:05:17 values = self.get_lr()
03:05:17 File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 372, in get_lr
03:05:17 if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
03:05:17 ZeroDivisionError: integer division or modulo by zero
03:05:17 ', b'[info] number of GPUs: 1
03:05:17 [info] world_size: 1
03:05:17 train_files: 8
03:05:17 val_files: 4
03:05:17 num_epochs 2
03:05:17 num_epochs_per_validation 1
03:05:17 [info] training from scratch
03:05:17 [info] amp enabled
03:05:17 ----------
03:05:17 epoch 1/2
03:05:17 learning rate is set to 0.2
03:05:17 [2022-09-23 02:05:13] 1/4, train_loss: 0.5047
03:05:17 '
03:05:17 Running ./auto3dseg/notebooks/hpo_optuna.ipynb
03:05:17 Checking PEP8 compliance...
03:05:17 Running notebook...
03:05:17 Before:
03:05:17 "max_epochs = 2\n",
03:05:17 After:
03:05:17 "max_epochs = 1\n",
03:05:22 MONAI version: 1.0.0+5.g84e271ec
03:05:22 Numpy version: 1.22.4
03:05:22 Pytorch version: 1.10.2+cu102
03:05:22 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:05:22 MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:05:22 MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:05:22
03:05:22 Optional dependencies:
03:05:22 Pytorch Ignite version: 0.4.8
03:05:22 Nibabel version: 4.0.2
03:05:22 scikit-image version: 0.19.3
03:05:22 Pillow version: 7.0.0
03:05:22 Tensorboard version: 2.10.0
03:05:22 gdown version: 4.5.1
03:05:22 TorchVision version: 0.11.3+cu102
03:05:22 tqdm version: 4.64.0
03:05:22 lmdb version: 1.3.0
03:05:22 psutil version: 5.9.1
03:05:22 pandas version: 1.1.5
03:05:22 einops version: 0.4.1
03:05:22 transformers version: 4.21.3
03:05:22 mlflow version: 1.29.0
03:05:22 pynrrd version: 0.4.3
03:05:22
03:05:22 For details about installing the optional dependencies, please visit:
03:05:22 https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:05:22
03:05:23 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:05:23 warnings.warn(
03:06:04
Executing: 0%| | 0/18 [00:00<?, ?cell/s]
Executing: 6%|▌ | 1/18 [00:01<00:27, 1.60s/cell]
Executing: 17%|█▋ | 3/18 [00:05<00:26, 1.79s/cell]
Executing: 39%|███▉ | 7/18 [00:17<00:29, 2.70s/cell]
Executing: 61%|██████ | 11/18 [00:22<00:13, 1.96s/cell]
Executing: 89%|████████▉ | 16/18 [00:22<00:02, 1.06s/cell]
Executing: 100%|██████████| 18/18 [00:38<00:00, 2.48s/cell]
Executing: 100%|██████████| 18/18 [00:39<00:00, 2.19s/cell]
03:06:04 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:06:04 warnings.warn(
03:06:04 Traceback (most recent call last):
03:06:04 File "/opt/conda/bin/papermill", line 8, in <module>
03:06:04 sys.exit(papermill())
03:06:04 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:06:04 return self.main(*args, **kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:06:04 rv = self.invoke(ctx)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:06:04 return ctx.invoke(self.callback, **ctx.params)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:06:04 return __callback(*args, **kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:06:04 return f(get_current_context(), *args, **kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:06:04 execute_notebook(
03:06:04 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:06:04 raise_for_execution_errors(nb, output_path)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:06:04 raise error
03:06:04 papermill.exceptions.PapermillExecutionError:
03:06:04 ---------------------------------------------------------------------------
03:06:04 Exception encountered at "In [9]":
03:06:04 ---------------------------------------------------------------------------
03:06:04 CalledProcessError Traceback (most recent call last)
03:06:04 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:06:04 182 ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:06:04 --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:06:04 184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:06:04
03:06:04 File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:06:04 515 if check and retcode:
03:06:04 --> 516 raise CalledProcessError(retcode, process.args,
03:06:04 517 output=stdout, stderr=stderr)
03:06:04 518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:06:04
03:06:04 CalledProcessError: Command '['python', './optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py', 'run', "--config_file='./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_train.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_infer.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/hyper_parameters.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/network.yaml','./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/configs/transforms_validate.yaml'", '--learning_rate=0.0001']' returned non-zero exit status 1.
03:06:04
03:06:04 The above exception was the direct cause of the following exception:
03:06:04
03:06:04 RuntimeError Traceback (most recent call last)
03:06:04 Input In [9], in <cell line: 3>()
03:06:04 1 search_space = {'learning_rate': [0.0001, 0.001, 0.01, 0.1]}
03:06:04 2 study = optuna.create_study(sampler=optuna.samplers.GridSampler(search_space), direction='maximize')
03:06:04 ----> 3 study.optimize(partial(optuna_gen, obj_filename=optuna_gen.get_obj_filename(), output_folder=optuna_dir), n_trials=2)
03:06:04 4 print("Best value: {} (params: {})\n".format(study.best_value, study.best_params))
03:06:04
03:06:04 File /opt/conda/lib/python3.8/site-packages/optuna/study/study.py:419, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
03:06:04 315 def optimize(
03:06:04 316 self,
03:06:04 317 func: ObjectiveFuncType,
03:06:04 (...)
03:06:04 324 show_progress_bar: bool = False,
03:06:04 325 ) -> None:
03:06:04 326 """Optimize an objective function.
03:06:04 327
03:06:04 328 Optimization is done by choosing a suitable set of hyperparameter values from a given
03:06:04 (...)
03:06:04 416 If nested invocation of this method occurs.
03:06:04 417 """
03:06:04 --> 419 _optimize(
03:06:04 420 study=self,
03:06:04 421 func=func,
03:06:04 422 n_trials=n_trials,
03:06:04 423 timeout=timeout,
03:06:04 424 n_jobs=n_jobs,
03:06:04 425 catch=catch,
03:06:04 426 callbacks=callbacks,
03:06:04 427 gc_after_trial=gc_after_trial,
03:06:04 428 show_progress_bar=show_progress_bar,
03:06:04 429 )
03:06:04
03:06:04 File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:66, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
03:06:04 64 try:
03:06:04 65 if n_jobs == 1:
03:06:04 ---> 66 _optimize_sequential(
03:06:04 67 study,
03:06:04 68 func,
03:06:04 69 n_trials,
03:06:04 70 timeout,
03:06:04 71 catch,
03:06:04 72 callbacks,
03:06:04 73 gc_after_trial,
03:06:04 74 reseed_sampler_rng=False,
03:06:04 75 time_start=None,
03:06:04 76 progress_bar=progress_bar,
03:06:04 77 )
03:06:04 78 else:
03:06:04 79 if n_jobs == -1:
03:06:04
03:06:04 File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:160, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
03:06:04 157 break
03:06:04 159 try:
03:06:04 --> 160 frozen_trial = _run_trial(study, func, catch)
03:06:04 161 finally:
03:06:04 162 # The following line mitigates memory problems that can be occurred in some
03:06:04 163 # environments (e.g., services that use computing containers such as CircleCI).
03:06:04 164 # Please refer to the following PR for further details:
03:06:04 165 # https://github.com/optuna/optuna/pull/325.
03:06:04 166 if gc_after_trial:
03:06:04
03:06:04 File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:234, in _run_trial(study, func, catch)
03:06:04 227 assert False, "Should not reach."
03:06:04 229 if (
03:06:04 230 frozen_trial.state == TrialState.FAIL
03:06:04 231 and func_err is not None
03:06:04 232 and not isinstance(func_err, catch)
03:06:04 233 ):
03:06:04 --> 234 raise func_err
03:06:04 235 return frozen_trial
03:06:04
03:06:04 File /opt/conda/lib/python3.8/site-packages/optuna/study/_optimize.py:196, in _run_trial(study, func, catch)
03:06:04 194 with get_heartbeat_thread(trial._trial_id, study._storage):
03:06:04 195 try:
03:06:04 --> 196 value_or_values = func(trial)
03:06:04 197 except exceptions.TrialPruned as e:
03:06:04 198 # TODO(mamu): Handle multi-objective cases.
03:06:04 199 state = TrialState.PRUNED
03:06:04
03:06:04 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/hpo_gen.py:333, in OptunaGen.__call__(self, trial, obj_filename, output_folder, template_path)
03:06:04 323 """
03:06:04 324 Callabe that Optuna will use to optimize the hyper-parameters
03:06:04 325
03:06:04 (...)
03:06:04 330 ``{algorithm_templates_dir}/{network}/scripts/algo.py``
03:06:04 331 """
03:06:04 332 self.set_trial(trial)
03:06:04 --> 333 self.run_algo(obj_filename, output_folder, template_path)
03:06:04 334 return self.acc
03:06:04
03:06:04 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/hpo_gen.py:395, in OptunaGen.run_algo(self, obj_filename, output_folder, template_path)
03:06:04 393 # step 3 generate the folder to save checkpoints and train
03:06:04 394 self.generate(output_folder)
03:06:04 --> 395 self.algo.train(self.params)
03:06:04 396 # step 4 report validation acc to controller
03:06:04 397 acc = self.algo.get_score()
03:06:04
03:06:04 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:06:04 192 """
03:06:04 193 Load the run function in the training script of each model. Training parameter is predefined by the
03:06:04 194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:06:04 (...)
03:06:04 197 train_params: to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:06:04 198 """
03:06:04 199 cmd, devices_info = self._create_cmd(train_params)
03:06:04 --> 200 return self._run_cmd(cmd, devices_info)
03:06:04
03:06:04 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:06:04 186 output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:06:04 187 errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:06:04 --> 188 raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:06:04 189 return normal_out
03:06:04
03:06:04 RuntimeError: subprocess call error 1: b'2022-09-23 02:05:54.715096: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
03:06:04 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:06:04 2022-09-23 02:05:54.859510: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:06:04 2022-09-23 02:05:54.892624: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:06:04 2022-09-23 02:05:55.523776: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:06:04 2022-09-23 02:05:55.523854: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:06:04 2022-09-23 02:05:55.523862: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 202.45334114]
03:06:04 Modifying image pixdim from [0.6 0.5999997 3.999998 1. ] to [ 0.60000002 0.59999975 3.99999799 129.0477496 ]
03:06:04 Modifying image pixdim from [0.625 0.625 3.59999 1. ] to [ 0.625 0.625 3.59998989 208.35237358]
03:06:04 Modifying image pixdim from [0.6 0.6000003 4.0000024 1. ] to [ 0.60000002 0.60000034 4.00000227 149.31681629]
03:06:04 Modifying image pixdim from [0.6 0.5999997 3.999998 1. ] to [ 0.60000002 0.59999975 3.99999799 117.82332812]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 160.84918933]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 150.20609596]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 162.36758282]
03:06:04 Modifying image pixdim from [0.604167 0.6041667 3.999998 1. ] to [ 0.60416698 0.60416671 3.99999799 174.60244975]
03:06:04 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 154.70488398]
03:06:04 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04 Modifying image pixdim from [0.6249998 0.625 3.5999987 1. ] to [ 0.62499983 0.625 3.59999877 153.34152766]
03:06:04 no available indices of class 1 to crop, set the crop ratio of this class to zero.
03:06:04 Modifying image pixdim from [0.75 0.74999964 2.9999986 1. ] to [ 0.75 0.74999965 2.99999861 157.50724574]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 142.87013615]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 148.64025265]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 150.52861008]
03:06:04 Modifying image pixdim from [0.6249999 0.6249999 3.6 1. ] to [ 0.62499988 0.62499988 3.5999999 152.06272679]
03:06:04 Modifying image pixdim from [0.6 0.60000014 4.0000005 1. ] to [ 0.60000002 0.60000017 4.00000041 174.1516677 ]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 152.3814254]
03:06:04 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04 Modifying image pixdim from [0.625 0.62500006 3.6000001 1. ] to [ 0.625 0.62500005 3.60000016 168.35859116]
03:06:04 no available indices of class 1 to crop, set the crop ratio of this class to zero.
03:06:04 Modifying image pixdim from [0.75 0.7500001 4.0000005 1. ] to [ 0.75 0.75000013 4.00000043 128.6789431 ]
03:06:04 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:06:04 Modifying image pixdim from [0.62499976 0.62499976 3.6 1. ] to [ 0.62499976 0.62499976 3.5999999 164.74616031]
03:06:04 Modifying image pixdim from [0.625 0.625 3.60001 1. ] to [ 0.625 0.625 3.60000992 173.25173679]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 180.81811432]
03:06:04 Modifying image pixdim from [0.625 0.625 3.60001 1. ] to [ 0.625 0.625 3.60000992 151.21210251]
03:06:04 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 149.7974827]
03:06:04 Traceback (most recent call last):
03:06:04 File "./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py", line 439, in <module>
03:06:04 fire.Fire()
03:06:04 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:06:04 component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:06:04 component, remaining_args = _CallAndUpdateTrace(
03:06:04 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:06:04 component = fn(*varargs, **kwargs)
03:06:04 File "./optuna_learningrate_grid/segresnet2d_0_override_learning_rate_0.0001/scripts/train.py", line 251, in run
03:06:04 outputs = model(inputs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04 return forward_call(*input, **kwargs)
03:06:04 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/networks/nets/segresnet.py", line 178, in forward
03:06:04 x, down_x = self.encode(x)
03:06:04 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/networks/nets/segresnet.py", line 155, in encode
03:06:04 x = self.convInit(x)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04 return forward_call(*input, **kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
03:06:04 input = module(input)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
03:06:04 return forward_call(*input, **kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
03:06:04 return self._conv_forward(input, self.weight, self.bias)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
03:06:04 return F.conv2d(input, weight, bias, self.stride,
03:06:04 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/meta_tensor.py", line 249, in __torch_function__
03:06:04 ret = super().__torch_function__(func, types, args, kwargs)
03:06:04 File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 1051, in __torch_function__
03:06:04 ret = func(*args, **kwargs)
03:06:04 RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[2, 6, 64, 64] to have 3 channels, but got 6 channels instead
03:06:04 ', b'[info] number of GPUs: 1
03:06:04 [info] world_size: 1
03:06:04 train_files: 24
03:06:04 val_files: 6
03:06:04 num_epochs 2
03:06:04 num_epochs_per_validation 1
03:06:04 [info] training from scratch
03:06:04 [info] amp enabled
03:06:04 ----------
03:06:04 epoch 1/2
03:06:04 learning rate is set to 0.0001
03:06:04 '
03:06:20 Running ./auto3dseg/notebooks/auto3dseg_hello_world.ipynb
03:06:20 Checking PEP8 compliance...
03:06:21 Running notebook...
03:06:21 Before:
03:06:21 "max_epochs = 2\n",
03:06:21 After:
03:06:21 "max_epochs = 1\n",
03:06:25 MONAI version: 1.0.0+5.g84e271ec
03:06:25 Numpy version: 1.22.4
03:06:25 Pytorch version: 1.10.2+cu102
03:06:25 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:06:25 MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:06:25 MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:06:25
03:06:25 Optional dependencies:
03:06:25 Pytorch Ignite version: 0.4.8
03:06:25 Nibabel version: 4.0.2
03:06:25 scikit-image version: 0.19.3
03:06:25 Pillow version: 7.0.0
03:06:25 Tensorboard version: 2.10.0
03:06:25 gdown version: 4.5.1
03:06:25 TorchVision version: 0.11.3+cu102
03:06:25 tqdm version: 4.64.0
03:06:25 lmdb version: 1.3.0
03:06:25 psutil version: 5.9.1
03:06:25 pandas version: 1.1.5
03:06:25 einops version: 0.4.1
03:06:25 transformers version: 4.21.3
03:06:25 mlflow version: 1.29.0
03:06:25 pynrrd version: 0.4.3
03:06:25
03:06:25 For details about installing the optional dependencies, please visit:
03:06:25 https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:06:25
03:06:26 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:06:26 warnings.warn(
03:07:24
Executing: 0%| | 0/20 [00:00<?, ?cell/s]
Executing: 5%|▌ | 1/20 [00:01<00:30, 1.62s/cell]
Executing: 15%|█▌ | 3/20 [00:06<00:37, 2.21s/cell]
Executing: 25%|██▌ | 5/20 [00:10<00:29, 2.00s/cell]
Executing: 45%|████▌ | 9/20 [00:10<00:09, 1.15cell/s]
Executing: 55%|█████▌ | 11/20 [00:10<00:06, 1.49cell/s]
Executing: 85%|████████▌ | 17/20 [00:51<00:12, 4.10s/cell]
Executing: 85%|████████▌ | 17/20 [00:52<00:09, 3.12s/cell]
03:07:24 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:07:24 warnings.warn(
03:07:24 Traceback (most recent call last):
03:07:24 File "/opt/conda/bin/papermill", line 8, in <module>
03:07:24 sys.exit(papermill())
03:07:24 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:07:24 return self.main(*args, **kwargs)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:07:24 rv = self.invoke(ctx)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:07:24 return ctx.invoke(self.callback, **ctx.params)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:07:24 return __callback(*args, **kwargs)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:07:24 return f(get_current_context(), *args, **kwargs)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:07:24 execute_notebook(
03:07:24 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:07:24 raise_for_execution_errors(nb, output_path)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:07:24 raise error
03:07:24 papermill.exceptions.PapermillExecutionError:
03:07:24 ---------------------------------------------------------------------------
03:07:24 Exception encountered at "In [8]":
03:07:24 ---------------------------------------------------------------------------
03:07:24 CalledProcessError Traceback (most recent call last)
03:07:24 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:07:24 182 ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:07:24 --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:07:24 184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:07:24
03:07:24 File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:07:24 515 if check and retcode:
03:07:24 --> 516 raise CalledProcessError(retcode, process.args,
03:07:24 517 output=stdout, stderr=stderr)
03:07:24 518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:07:24
03:07:24 CalledProcessError: Command '['python', '/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py', 'run', "--config_file='/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_train.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_infer.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/network_search.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/hyper_parameters.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/network.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/transforms_validate.yaml','/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/configs/hyper_parameters_search.yaml'", '--searching#num_iterations=4', '--searching#num_iterations_per_validation=2', '--searching#num_images_per_batch=2', '--searching#num_epochs=1', '--searching#num_warmup_iterations=2']' returned non-zero exit status 1.
03:07:24
03:07:24 The above exception was the direct cause of the following exception:
03:07:24
03:07:24 RuntimeError Traceback (most recent call last)
03:07:24 Input In [8], in <cell line: 1>()
03:07:24 ----> 1 runner.run()
03:07:24
03:07:24 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/auto_runner.py:586, in AutoRunner.run(self)
03:07:24 584 history = import_bundle_algo_history(self.work_dir, only_trained=False)
03:07:24 585 if not self.hpo:
03:07:24 --> 586 self._train_algo_in_sequence(history)
03:07:24 587 else:
03:07:24 588 self._train_algo_in_nni(history)
03:07:24
03:07:24 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/auto_runner.py:488, in AutoRunner._train_algo_in_sequence(self, history)
03:07:24 486 for task in history:
03:07:24 487 for _, algo in task.items():
03:07:24 --> 488 algo.train(self.train_params)
03:07:24 489 acc = algo.get_score()
03:07:24 490 algo_to_pickle(algo, template_path=algo.template_path, best_metrics=acc)
03:07:24
03:07:24 File /home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/algorithm_templates/dints/scripts/algo.py:187, in DintsAlgo.train(self, train_params)
03:07:24 185 cmd, devices_info = self._create_cmd(dints_search_params)
03:07:24 186 cmd_search = cmd.replace('train.py', 'search.py')
03:07:24 --> 187 self._run_cmd(cmd_search, devices_info)
03:07:24 189 dints_train_params = {}
03:07:24 190 for k, v in params.items():
03:07:24
03:07:24 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:07:24 186 output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:07:24 187 errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:07:24 --> 188 raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:07:24 189 return normal_out
03:07:24
03:07:24 RuntimeError: subprocess call error 1: b'2022-09-23 02:06:52.336551: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
03:07:24 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
03:07:24 2022-09-23 02:06:52.477179: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
03:07:24 2022-09-23 02:06:52.509211: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
03:07:24 2022-09-23 02:06:53.137491: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer.so.7\'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:07:24 2022-09-23 02:06:53.137592: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library \'libnvinfer_plugin.so.7\'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/opt/conda/lib/python3.8/site-packages/torch/lib:/opt/conda/lib/python3.8/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
03:07:24 2022-09-23 02:06:53.137599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
03:07:24 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:07:24 The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
03:07:24 Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
03:07:24 Traceback (most recent call last):
03:07:24 File "/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py", line 550, in <module>
03:07:24 fire.Fire()
03:07:24 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:07:24 component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:07:24 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:07:24 component, remaining_args = _CallAndUpdateTrace(
03:07:24 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:07:24 component = fn(*varargs, **kwargs)
03:07:24 File "/home/jenkins/agent/workspace/Monai-notebooks/tuts/tutorials/auto3dseg/notebooks/auto3dseg_work_dir/dints_0/scripts/search.py", line 381, in run
03:07:24 lr_scheduler.step()
03:07:24 File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 152, in step
03:07:24 values = self.get_lr()
03:07:24 File "/opt/conda/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 372, in get_lr
03:07:24 if (self.last_epoch == 0) or (self.last_epoch % self.step_size != 0):
03:07:24 ZeroDivisionError: integer division or modulo by zero
03:07:24 ', b"[info] number of GPUs: 1
03:07:24 [info] world_size: 1
03:07:24 train_files_w: 4
03:07:24 train_files_a: 4
03:07:24 val_files: 4
03:07:24 num_epochs 2
03:07:24 num_epochs_warmup 1
03:07:24 num_epochs_per_validation 1
03:07:24 [info] amp enabled
03:07:24 ----------
03:07:24 epoch 1/2
03:07:24 learning rate is set to 0.025
03:07:24 [2022-09-23 02:06:56] 1/2, train_loss: 0.7307
03:07:24 [2022-09-23 02:06:57] 2/2, train_loss: 0.7303
03:07:24 epoch 1 average loss: 0.7305, best mean dice: -1.0000 at epoch -1
03:07:24 1 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24 2 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24 3 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24 4 / 4 tensor([[0.1698]], device='cuda:0')
03:07:24 evaluation metric - class 1: 0.16984054446220398
03:07:24 avg_metric 0.16984054446220398
03:07:24 saved new best metric model
03:07:24 current epoch: 1 current mean dice: 0.1698 best mean dice: 0.1698 at epoch 1
03:07:24 ----------
03:07:24 epoch 2/2
03:07:24 learning rate is set to 0.025
03:07:24 [2022-09-23 02:07:13] 1/2, train_loss: 0.7415
03:07:24 [2022-09-23 02:07:15] 1/2, train_loss_arch: 0.7368
03:07:24 "
03:08:23 Running ./auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb
03:08:23 Checking PEP8 compliance...
03:08:24 stdin:179:9: B007 Loop control variable 'name' not used within the loop body. If this is intended, start the name with an underscore.
03:08:24 for name, algo in task.items():
03:08:24 ^
03:08:24 Error: Try running with autofixes: --autofix.
03:08:24
03:08:24 Check failed!
03:08:24 Running notebook...
03:08:24 Before:
03:08:24 "max_epochs = 2000\n",
03:08:24 After:
03:08:24 "max_epochs = 1\n",
03:08:28 MONAI version: 1.0.0+5.g84e271ec
03:08:28 Numpy version: 1.22.4
03:08:28 Pytorch version: 1.10.2+cu102
03:08:28 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
03:08:28 MONAI rev id: 84e271ec939330e7cedf22b3871c4a2a62d3c2a2
03:08:28 MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
03:08:28
03:08:28 Optional dependencies:
03:08:28 Pytorch Ignite version: 0.4.8
03:08:28 Nibabel version: 4.0.2
03:08:28 scikit-image version: 0.19.3
03:08:28 Pillow version: 7.0.0
03:08:28 Tensorboard version: 2.10.0
03:08:28 gdown version: 4.5.1
03:08:28 TorchVision version: 0.11.3+cu102
03:08:28 tqdm version: 4.64.0
03:08:28 lmdb version: 1.3.0
03:08:28 psutil version: 5.9.1
03:08:28 pandas version: 1.1.5
03:08:28 einops version: 0.4.1
03:08:28 transformers version: 4.21.3
03:08:28 mlflow version: 1.29.0
03:08:28 pynrrd version: 0.4.3
03:08:28
03:08:28 For details about installing the optional dependencies, please visit:
03:08:28 https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
03:08:28
03:08:29 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:08:29 warnings.warn(
03:09:16
Executing: 0%| | 0/24 [00:00<?, ?cell/s]
Executing: 4%|▍ | 1/24 [00:01<00:36, 1.58s/cell]
Executing: 12%|█▎ | 3/24 [00:06<00:46, 2.23s/cell]
Executing: 21%|██ | 5/24 [00:10<00:38, 2.01s/cell]
Executing: 46%|████▌ | 11/24 [00:22<00:27, 2.10s/cell]
Executing: 58%|█████▊ | 14/24 [00:38<00:31, 3.12s/cell]
Executing: 88%|████████▊ | 21/24 [00:45<00:06, 2.00s/cell]
Executing: 88%|████████▊ | 21/24 [00:46<00:06, 2.23s/cell]
03:09:16 /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
03:09:16 warnings.warn(
03:09:16 Traceback (most recent call last):
03:09:16 File "/opt/conda/bin/papermill", line 8, in <module>
03:09:16 sys.exit(papermill())
03:09:16 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
03:09:16 return self.main(*args, **kwargs)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
03:09:16 rv = self.invoke(ctx)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
03:09:16 return ctx.invoke(self.callback, **ctx.params)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
03:09:16 return __callback(*args, **kwargs)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
03:09:16 return f(get_current_context(), *args, **kwargs)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
03:09:16 execute_notebook(
03:09:16 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
03:09:16 raise_for_execution_errors(nb, output_path)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
03:09:16 raise error
03:09:16 papermill.exceptions.PapermillExecutionError:
03:09:16 ---------------------------------------------------------------------------
03:09:16 Exception encountered at "In [9]":
03:09:16 ---------------------------------------------------------------------------
03:09:16 CalledProcessError Traceback (most recent call last)
03:09:16 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:183, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:09:16 182 ps_environ["CUDA_VISIBLE_DEVICES"] = devices_info
03:09:16 --> 183 normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
03:09:16 184 logger.info(repr(normal_out).replace("\\n", "\n").replace("\\t", "\t"))
03:09:16
03:09:16 File /opt/conda/lib/python3.8/subprocess.py:516, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
03:09:16 515 if check and retcode:
03:09:16 --> 516 raise CalledProcessError(retcode, process.args,
03:09:16 517 output=stdout, stderr=stderr)
03:09:16 518 return CompletedProcess(process.args, retcode, stdout, stderr)
03:09:16
03:09:16 CalledProcessError: Command '['python', './auto3dseg_work_dir/swinunetr_4/scripts/train.py', 'run', "--config_file='./auto3dseg_work_dir/swinunetr_4/configs/transforms_train.yaml','./auto3dseg_work_dir/swinunetr_4/configs/transforms_infer.yaml','./auto3dseg_work_dir/swinunetr_4/configs/hyper_parameters.yaml','./auto3dseg_work_dir/swinunetr_4/configs/network.yaml','./auto3dseg_work_dir/swinunetr_4/configs/transforms_validate.yaml'", '--num_iterations=12', '--num_iterations_per_validation=6', '--num_images_per_batch=2', '--num_epochs=1', '--num_warmup_iterations=6']' died with <Signals.SIGABRT: 6>.
03:09:16
03:09:16 The above exception was the direct cause of the following exception:
03:09:16
03:09:16 RuntimeError Traceback (most recent call last)
03:09:16 Input In [9], in <cell line: 2>()
03:09:16 2 for task in history:
03:09:16 3 for name, algo in task.items():
03:09:16 ----> 4 algo.train(train_param) # can use default params by `algo.train()`
03:09:16 5 acc = algo.get_score()
03:09:16 6 algo_to_pickle(algo, template_path=algo.template_path, best_metrics=acc)
03:09:16
03:09:16 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:200, in BundleAlgo.train(self, train_params)
03:09:16 192 """
03:09:16 193 Load the run function in the training script of each model. Training parameter is predefined by the
03:09:16 194 algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
03:09:16 (...)
03:09:16 197 train_params: to specify the devices using a list of integers: ``{"CUDA_VISIBLE_DEVICES": [1,2,3]}``.
03:09:16 198 """
03:09:16 199 cmd, devices_info = self._create_cmd(train_params)
03:09:16 --> 200 return self._run_cmd(cmd, devices_info)
03:09:16
03:09:16 File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/apps/auto3dseg/bundle_gen.py:188, in BundleAlgo._run_cmd(self, cmd, devices_info)
03:09:16 186 output = repr(e.stdout).replace("\\n", "\n").replace("\\t", "\t")
03:09:16 187 errors = repr(e.stderr).replace("\\n", "\n").replace("\\t", "\t")
03:09:16 --> 188 raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
03:09:16 189 return normal_out
03:09:16
03:09:16 RuntimeError: subprocess call error -6: b'Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 154.70488398]
03:09:16 Modifying image pixdim from [0.625 0.625 3.59999 1. ] to [ 0.625 0.625 3.59998989 208.35237358]
03:09:16 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 170.38896694]
03:09:16 Modifying image pixdim from [0.625 0.625 3.60001 1. ] to [ 0.625 0.625 3.60000992 173.25173679]
03:09:16 Modifying image pixdim from [0.6 0.5999997 3.999998 1. ] to [ 0.60000002 0.59999975 3.99999799 129.0477496 ]
03:09:16 Modifying image pixdim from [0.625 0.625 3.6 1. ] to [ 0.625 0.625 3.5999999 202.45334114]
03:09:16 Traceback (most recent call last):
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 184, in __call__
03:09:16 out = _pad(img_t, pad_width=to_pad_, mode=mode_, **kwargs_)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 138, in _pt_pad
03:09:16 return pad_pt(img.unsqueeze(0), pt_pad_width, mode=mode, **kwargs).squeeze(0)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 4170, in _pad
03:09:16 return handle_torch_function(_pad, (input,), input, pad, mode=mode, value=value)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/torch/overrides.py", line 1355, in handle_torch_function
03:09:16 result = torch_func_method(public_api, types, args, kwargs)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/meta_tensor.py", line 249, in __torch_function__
03:09:16 ret = super().__torch_function__(func, types, args, kwargs)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 1051, in __torch_function__
03:09:16 ret = func(*args, **kwargs)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 4199, in _pad
03:09:16 return torch._C._nn.reflection_pad3d(input, pad)
03:09:16 RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (22, 22) at dimension 4 of input [1, 2, 320, 320, 20]
03:09:16
03:09:16 The above exception was the direct cause of the following exception:
03:09:16
03:09:16 Traceback (most recent call last):
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 91, in apply_transform
03:09:16 return _apply_transform(transform, data, unpack_items)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 55, in _apply_transform
03:09:16 return transform(parameters)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/dictionary.py", line 147, in __call__
03:09:16 d[key] = self.padder(d[key], mode=m)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/croppad/array.py", line 189, in __call__
03:09:16 raise ValueError(f"{mode_}, {kwargs_}, {img_t.dtype}, {img_t.device}") from err
03:09:16 ValueError: reflect, {}, torch.float32, cpu
03:09:16
03:09:16 The above exception was the direct cause of the following exception:
03:09:16
03:09:16 Traceback (most recent call last):
03:09:16 File "./auto3dseg_work_dir/swinunetr_4/scripts/train.py", line 409, in <module>
03:09:16 fire.Fire()
03:09:16 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
03:09:16 component_trace = _Fire(component, args, parsed_flag_args, context, name)
03:09:16 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
03:09:16 component, remaining_args = _CallAndUpdateTrace(
03:09:16 File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
03:09:16 component = fn(*varargs, **kwargs)
03:09:16 File "./auto3dseg_work_dir/swinunetr_4/scripts/train.py", line 141, in run
03:09:16 train_ds = monai.data.CacheDataset(
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 794, in __init__
03:09:16 self.set_data(data)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 819, in set_data
03:09:16 self._cache = _compute_cache()
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 808, in _compute_cache
03:09:16 return self._fill_cache()
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 835, in _fill_cache
03:09:16 return list(p.imap(self._load_cache_item, range(self.cache_num)))
03:09:16 File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
03:09:16 raise value
03:09:16 File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
03:09:16 result = (True, func(*args, **kwds))
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/data/dataset.py", line 848, in _load_cache_item
03:09:16 item = apply_transform(_xform, item)
03:09:16 File "/home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/transforms/transform.py", line 118, in apply_transform
03:09:16 raise RuntimeError(f"applying transform {transform}") from e
03:09:16 RuntimeError: applying transform <monai.transforms.croppad.dictionary.SpatialPadd object at 0x7f6ba32abd60>
03:09:16 terminate called without an active exception
03:09:16 ', b'[info] number of GPUs: 1
03:09:16 [info] world_size: 1
03:09:16 train_files: 24
03:09:16 val_files: 6
03:09:16 '
Metadata
Metadata
Assignees
Labels
No labels