Attention decoder in seq2seq model different from the one on paper

Following is the attention decoder network used in this tutorial.
![image](https://user-images.githubusercontent.com/37119086/128633315-e7edaa98-c349-4938-b7c6-667109611592.png)
But I find it different from the model on [this](https://arxiv.org/pdf/1409.0473.pdf) paper, which is the improvement by the attention mechanism.
![image](https://user-images.githubusercontent.com/37119086/128633319-5890894e-84c1-4476-9b85-81c305a676b3.png)
For example, on the first picture, the embedding vectors go through the attention layer, but in the second one, the embedding vectors (Wich I consider to be the Xs) start first from the bidirectionnal RNN network. 
Same for dropout before embedded, I don't see any dropout on the paper.

cc @pytorch/team-text-core @Nayef211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention decoder in seq2seq model different from the one on paper #1642

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attention decoder in seq2seq model different from the one on paper #1642

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions