@@ -400,6 +400,11 @@ expects during training and inference time on sample data.
400
400
predictions = model(x) # Returns predictions
401
401
print (predictions[0 ])
402
402
403
+ ::
404
+
405
+ {'loss_classifier': tensor(0.0820, grad_fn=<NllLossBackward0>), 'loss_box_reg': tensor(0.0278, grad_fn=<DivBackward0>), 'loss_objectness': tensor(0.0027, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>), 'loss_rpn_box_reg': tensor(0.0036, grad_fn=<DivBackward0>)}
406
+ {'boxes': tensor([], size=(0, 4), grad_fn=<StackBackward0>), 'labels': tensor([], dtype=torch.int64), 'scores': tensor([], grad_fn=<IndexBackward0>)}
407
+
403
408
404
409
Let’s now write the main function which performs the training and the
405
410
validation:
@@ -474,6 +479,102 @@ validation:
474
479
475
480
print (" That's it!" )
476
481
482
+ ::
483
+
484
+ Epoch: [0] [ 0/60] eta: 0:02:43 lr: 0.000090 loss: 2.8181 (2.8181) loss_classifier: 0.5218 (0.5218) loss_box_reg: 0.1272 (0.1272) loss_mask: 2.1324 (2.1324) loss_objectness: 0.0346 (0.0346) loss_rpn_box_reg: 0.0022 (0.0022) time: 2.7332 data: 0.4483 max mem: 1984
485
+ Epoch: [0] [10/60] eta: 0:00:24 lr: 0.000936 loss: 1.3190 (1.6752) loss_classifier: 0.4611 (0.4213) loss_box_reg: 0.2928 (0.3031) loss_mask: 0.6962 (0.9183) loss_objectness: 0.0238 (0.0253) loss_rpn_box_reg: 0.0074 (0.0072) time: 0.4944 data: 0.0439 max mem: 2762
486
+ Epoch: [0] [20/60] eta: 0:00:13 lr: 0.001783 loss: 0.9419 (1.2621) loss_classifier: 0.2171 (0.3037) loss_box_reg: 0.2906 (0.3064) loss_mask: 0.4174 (0.6243) loss_objectness: 0.0190 (0.0210) loss_rpn_box_reg: 0.0059 (0.0068) time: 0.2108 data: 0.0042 max mem: 2823
487
+ Epoch: [0] [30/60] eta: 0:00:08 lr: 0.002629 loss: 0.6349 (1.0344) loss_classifier: 0.1184 (0.2339) loss_box_reg: 0.2706 (0.2873) loss_mask: 0.2276 (0.4897) loss_objectness: 0.0065 (0.0168) loss_rpn_box_reg: 0.0059 (0.0067) time: 0.1650 data: 0.0051 max mem: 2823
488
+ Epoch: [0] [40/60] eta: 0:00:05 lr: 0.003476 loss: 0.4631 (0.8771) loss_classifier: 0.0650 (0.1884) loss_box_reg: 0.1924 (0.2604) loss_mask: 0.1734 (0.4084) loss_objectness: 0.0029 (0.0135) loss_rpn_box_reg: 0.0051 (0.0063) time: 0.1760 data: 0.0052 max mem: 2823
489
+ Epoch: [0] [50/60] eta: 0:00:02 lr: 0.004323 loss: 0.3261 (0.7754) loss_classifier: 0.0368 (0.1606) loss_box_reg: 0.1424 (0.2366) loss_mask: 0.1479 (0.3599) loss_objectness: 0.0022 (0.0116) loss_rpn_box_reg: 0.0051 (0.0067) time: 0.1775 data: 0.0052 max mem: 2823
490
+ Epoch: [0] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.3261 (0.7075) loss_classifier: 0.0415 (0.1433) loss_box_reg: 0.1114 (0.2157) loss_mask: 0.1573 (0.3316) loss_objectness: 0.0020 (0.0103) loss_rpn_box_reg: 0.0052 (0.0066) time: 0.2064 data: 0.0049 max mem: 2823
491
+ Epoch: [0] Total time: 0:00:14 (0.2412 s / it)
492
+ creating index...
493
+ index created!
494
+ Test: [ 0/50] eta: 0:00:25 model_time: 0.1576 (0.1576) evaluator_time: 0.0029 (0.0029) time: 0.5063 data: 0.3452 max mem: 2823
495
+ Test: [49/50] eta: 0:00:00 model_time: 0.0335 (0.0701) evaluator_time: 0.0025 (0.0038) time: 0.0594 data: 0.0025 max mem: 2823
496
+ Test: Total time: 0:00:04 (0.0862 s / it)
497
+ Averaged stats: model_time: 0.0335 (0.0701) evaluator_time: 0.0025 (0.0038)
498
+ Accumulating evaluation results...
499
+ DONE (t=0.01s).
500
+ Accumulating evaluation results...
501
+ DONE (t=0.01s).
502
+ IoU metric: bbox
503
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.722
504
+ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.987
505
+ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.938
506
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
507
+ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.752
508
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.730
509
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.353
510
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.762
511
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.762
512
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
513
+ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.775
514
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.769
515
+ IoU metric: segm
516
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.726
517
+ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.993
518
+ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.913
519
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.344
520
+ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.593
521
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.743
522
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.360
523
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.760
524
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.760
525
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.633
526
+ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.662
527
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
528
+
529
+ ...
530
+
531
+ Epoch: [4] [ 0/60] eta: 0:00:32 lr: 0.000500 loss: 0.1593 (0.1593) loss_classifier: 0.0194 (0.0194) loss_box_reg: 0.0272 (0.0272) loss_mask: 0.1046 (0.1046) loss_objectness: 0.0044 (0.0044) loss_rpn_box_reg: 0.0037 (0.0037) time: 0.5369 data: 0.3801 max mem: 3064
532
+ Epoch: [4] [10/60] eta: 0:00:10 lr: 0.000500 loss: 0.1609 (0.1870) loss_classifier: 0.0194 (0.0236) loss_box_reg: 0.0272 (0.0383) loss_mask: 0.1140 (0.1190) loss_objectness: 0.0005 (0.0023) loss_rpn_box_reg: 0.0029 (0.0037) time: 0.2016 data: 0.0378 max mem: 3064
533
+ Epoch: [4] [20/60] eta: 0:00:08 lr: 0.000500 loss: 0.1652 (0.1826) loss_classifier: 0.0224 (0.0242) loss_box_reg: 0.0286 (0.0374) loss_mask: 0.1075 (0.1165) loss_objectness: 0.0003 (0.0016) loss_rpn_box_reg: 0.0016 (0.0029) time: 0.1866 data: 0.0044 max mem: 3064
534
+ Epoch: [4] [30/60] eta: 0:00:06 lr: 0.000500 loss: 0.1676 (0.1884) loss_classifier: 0.0245 (0.0264) loss_box_reg: 0.0286 (0.0401) loss_mask: 0.1075 (0.1175) loss_objectness: 0.0003 (0.0013) loss_rpn_box_reg: 0.0018 (0.0030) time: 0.2106 data: 0.0055 max mem: 3064
535
+ Epoch: [4] [40/60] eta: 0:00:03 lr: 0.000500 loss: 0.1726 (0.1884) loss_classifier: 0.0245 (0.0265) loss_box_reg: 0.0283 (0.0394) loss_mask: 0.1187 (0.1184) loss_objectness: 0.0003 (0.0011) loss_rpn_box_reg: 0.0020 (0.0029) time: 0.1897 data: 0.0056 max mem: 3064
536
+ Epoch: [4] [50/60] eta: 0:00:01 lr: 0.000500 loss: 0.1910 (0.1938) loss_classifier: 0.0273 (0.0280) loss_box_reg: 0.0414 (0.0418) loss_mask: 0.1177 (0.1198) loss_objectness: 0.0003 (0.0010) loss_rpn_box_reg: 0.0022 (0.0031) time: 0.1623 data: 0.0056 max mem: 3064
537
+ Epoch: [4] [59/60] eta: 0:00:00 lr: 0.000500 loss: 0.1732 (0.1888) loss_classifier: 0.0273 (0.0278) loss_box_reg: 0.0327 (0.0405) loss_mask: 0.0993 (0.1165) loss_objectness: 0.0003 (0.0010) loss_rpn_box_reg: 0.0023 (0.0030) time: 0.1732 data: 0.0056 max mem: 3064
538
+ Epoch: [4] Total time: 0:00:11 (0.1920 s / it)
539
+ creating index...
540
+ index created!
541
+ Test: [ 0/50] eta: 0:00:21 model_time: 0.0589 (0.0589) evaluator_time: 0.0032 (0.0032) time: 0.4269 data: 0.3641 max mem: 3064
542
+ Test: [49/50] eta: 0:00:00 model_time: 0.0515 (0.0521) evaluator_time: 0.0020 (0.0031) time: 0.0579 data: 0.0024 max mem: 3064
543
+ Test: Total time: 0:00:03 (0.0679 s / it)
544
+ Averaged stats: model_time: 0.0515 (0.0521) evaluator_time: 0.0020 (0.0031)
545
+ Accumulating evaluation results...
546
+ DONE (t=0.01s).
547
+ Accumulating evaluation results...
548
+ DONE (t=0.01s).
549
+ IoU metric: bbox
550
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.846
551
+ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.997
552
+ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.978
553
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.412
554
+ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.689
555
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.864
556
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.417
557
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.876
558
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.876
559
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.567
560
+ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.750
561
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.896
562
+ IoU metric: segm
563
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.777
564
+ Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.997
565
+ Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.961
566
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.424
567
+ Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.631
568
+ Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791
569
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.373
570
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.814
571
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.814
572
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.633
573
+ Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.713
574
+ Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.827
575
+
576
+ That's it!
577
+
477
578
478
579
So after one epoch of training, we obtain a COCO-style mAP > 50, and
479
580
a mask mAP of 65.
@@ -512,6 +613,9 @@ dataset and verify
512
613
plt.imshow(output_image.permute(1 , 2 , 0 ))
513
614
514
615
616
+ .. image :: ../../_static/img/tv_tutorial/tv_image06.png
617
+
618
+
515
619
The results look good!
516
620
517
621
Wrapping up
0 commit comments