Skip to content

Attention decoder in seq2seq model different from the one on paper #1642

Closed
@ssooffiiaannee

Description

@ssooffiiaannee

Following is the attention decoder network used in this tutorial.
image
But I find it different from the model on this paper, which is the improvement by the attention mechanism.
image
For example, on the first picture, the embedding vectors go through the attention layer, but in the second one, the embedding vectors (Wich I consider to be the Xs) start first from the bidirectionnal RNN network.
Same for dropout before embedded, I don't see any dropout on the paper.

cc @pytorch/team-text-core @Nayef211

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions