Transfer learning tutorial -- momentum handling

In the tutorial, a new SGD optimizer is generated each epoch, which means that the momentum from the last epoch is discarded. As far as I can tell, this is unusual behavior in deep network training. The same optimizer should be used, but with learning rate modified.