Skip to content

Ambiguity in transformer_tutorial.py #1037

Closed
@bjourne

Description

@bjourne

The text in the tutorial indicates that the batch size is the outermost dimension of the data: "For instance, with the alphabet as the sequence (total length of 26) and a batch size of 4, we would divide the alphabet into 4 sequences of length 6:" But in the code there is a call to .t() so that the sequence length is the outermost dimension.

Metadata

Metadata

Assignees

Labels

TextIssues relating to text tutorialsdocathon-h1-2023A label for the docathon in H1 2023easy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions