Closed
Description
The text in the tutorial indicates that the batch size is the outermost dimension of the data: "For instance, with the alphabet as the sequence (total length of 26) and a batch size of 4, we would divide the alphabet into 4 sequences of length 6:" But in the code there is a call to .t()
so that the sequence length is the outermost dimension.