Skip to content

Commit beec23c

Browse files
review fixes
1 parent bf79878 commit beec23c

File tree

1 file changed

+2
-8
lines changed

1 file changed

+2
-8
lines changed

beginner_source/knowledge_distillation_tutorial.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -353,13 +353,7 @@ def train_knowledge_distillation(teacher, student, train_loader, epochs, learnin
353353
# ----------------------------
354354
# Feel free to play around with the temperature parameter that controls the softness of the softmax function and the loss coefficients.
355355
# In neural networks, it is easy to include to include additional loss functions to the main objectives to achieve goals like better generalization.
356-
# Let's try including an objective for the student, but now let's focus on their hidden states rather than their output layers. In the previous example,
357-
# the teacher's representation after the convolutional layers had the following shape:
358-
#
359-
# ``(batch_size, num_filters_for_last_conv_layer, 8, 8)``
360-
#
361-
# Same for the student, with the only exception being the number of filters, where here we have fewer filters.
362-
#
356+
# Let's try including an objective for the student, but now let's focus on their hidden states rather than their output layers.
363357
# Our goal is to convey information from the teacher's representation to the student by including a naive loss function,
364358
# whose minimization implies that the flattened vectors that are subsequently passed to the classifiers have become more *similar* as the loss decreases.
365359
# Of course, the teacher does not update its weights, so the minimization depends only on the student's weights.
@@ -734,7 +728,7 @@ def train_mse_loss(teacher, student, train_loader, epochs, learning_rate, featur
734728
# In ML applications, we mostly care about inference time because training happens before the model deployment.
735729
# If our lightweight model is still too heavy for deployment, we can apply different ideas, such as post-training quantization.
736730
# Additional losses can be applied in many tasks, not just classification, and you can experiment with quantities like coefficients,
737-
# temperature, or number of neurons. # Feel free to tune any numbers in the tutorial above,
731+
# temperature, or number of neurons. Feel free to tune any numbers in the tutorial above,
738732
# but keep in mind, if you change the number of neurons / filters chances are a shape mismatch might occur.
739733
#
740734
# For more information, see:

0 commit comments

Comments
 (0)