Skip to content

Commit 07839ae

Browse files
authored
Merge branch 'master' into packed_accessor
2 parents 9789474 + ea34cb0 commit 07839ae

22 files changed

+656
-156
lines changed

.jenkins/build.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,11 @@ export PATH=/opt/conda/bin:$PATH
1919
pip install sphinx==1.8.2 pandas
2020

2121
# For Tensorboard. Until 1.14 moves to the release channel.
22-
pip install tb-nightly
22+
pip install tb-nightly
23+
24+
# Install two language tokenizers for Translation with TorchText tutorial
25+
python -m spacy download en
26+
python -m spacy download de
2327

2428
# PyTorch Theme
2529
rm -rf src
Loading

_static/img/thumbnails/torchtext.png

22.7 KB
Loading
38.2 KB
Loading

advanced_source/cpp_export.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
3. Loading a TorchScript Model in C++
1+
Loading a TorchScript Model in C++
22
=====================================
33

44
**This tutorial was updated to work with PyTorch 1.2**

advanced_source/super_resolution_with_onnxruntime.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
4. (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime
2+
(optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime
33
========================================================================
44
55
In this tutorial, we describe how to convert a model defined

beginner_source/Intro_to_TorchScript_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
2. Introduction to TorchScript
2+
Introduction to TorchScript
33
===========================
44
55
*James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com)*, rev2
@@ -24,7 +24,7 @@
2424
- How to compose both approaches
2525
- Saving and loading TorchScript modules
2626
27-
We hope that after you complete this tutorial, you proceed to go through
27+
We hope that after you complete this tutorial, you will proceed to go through
2828
`the follow-on tutorial <https://pytorch.org/tutorials/advanced/cpp_export.html>`_
2929
which will walk you through an example of actually calling a TorchScript
3030
model from C++.

beginner_source/README.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,7 @@ Beginner Tutorials
2020
5. nlp/* and deep_learning_nlp_tutorial.rst
2121
Deep Learning for NLP with Pytorch
2222
https://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.html
23+
24+
6. transformer_translation.py
25+
Language Translation with Transformers
26+
https://pytorch.org/tutorials/beginner/transformer_translation.html

beginner_source/aws_distributed_training_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
4. (advanced) PyTorch 1.0 Distributed Trainer with Amazon AWS
2+
(advanced) PyTorch 1.0 Distributed Trainer with Amazon AWS
33
=============================================================
44
55
**Author**: `Nathan Inkawhich <https://github.com/inkawhich>`_

beginner_source/chatbot_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ def outputVar(l, voc):
537537
max_target_len = max([len(indexes) for indexes in indexes_batch])
538538
padList = zeroPadding(indexes_batch)
539539
mask = binaryMatrix(padList)
540-
mask = torch.ByteTensor(mask)
540+
mask = torch.BoolTensor(mask)
541541
padVar = torch.LongTensor(padList)
542542
return padVar, mask, max_target_len
543543

beginner_source/text_sentiment_ngrams_tutorial.py

Lines changed: 60 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
"""
2-
Text Classification Tutorial
3-
============================
2+
Text Classification with TorchText
3+
==================================
44
5-
This tutorial shows how to use the text classification datasets,
6-
including
5+
This tutorial shows how to use the text classification datasets
6+
in ``torchtext``, including
77
88
::
99
1010
- AG_NEWS,
11-
- SogouNews,
12-
- DBpedia,
11+
- SogouNews,
12+
- DBpedia,
1313
- YelpReviewPolarity,
14-
- YelpReviewFull,
15-
- YahooAnswers,
14+
- YelpReviewFull,
15+
- YahooAnswers,
1616
- AmazonReviewPolarity,
1717
- AmazonReviewFull
1818
19-
This example shows the application of ``TextClassification`` Dataset for
20-
supervised learning analysis.
19+
This example shows how to train a supervised learning algorithm for
20+
classification using one of these ``TextClassification`` datasets.
2121
2222
Load data with ngrams
2323
---------------------
@@ -54,20 +54,20 @@
5454
######################################################################
5555
# Define the model
5656
# ----------------
57-
#
57+
#
5858
# The model is composed of the
5959
# `EmbeddingBag <https://pytorch.org/docs/stable/nn.html?highlight=embeddingbag#torch.nn.EmbeddingBag>`__
6060
# layer and the linear layer (see the figure below). ``nn.EmbeddingBag``
6161
# computes the mean value of a “bag” of embeddings. The text entries here
6262
# have different lengths. ``nn.EmbeddingBag`` requires no padding here
6363
# since the text lengths are saved in offsets.
64-
#
64+
#
6565
# Additionally, since ``nn.EmbeddingBag`` accumulates the average across
6666
# the embeddings on the fly, ``nn.EmbeddingBag`` can enhance the
6767
# performance and memory efficiency to process a sequence of tensors.
68-
#
68+
#
6969
# .. image:: ../_static/img/text_sentiment_ngrams_model.png
70-
#
70+
#
7171

7272
import torch.nn as nn
7373
import torch.nn.functional as F
@@ -83,7 +83,7 @@ def init_weights(self):
8383
self.embedding.weight.data.uniform_(-initrange, initrange)
8484
self.fc.weight.data.uniform_(-initrange, initrange)
8585
self.fc.bias.data.zero_()
86-
86+
8787
def forward(self, text, offsets):
8888
embedded = self.embedding(text, offsets)
8989
return self.fc(embedded)
@@ -92,21 +92,21 @@ def forward(self, text, offsets):
9292
######################################################################
9393
# Initiate an instance
9494
# --------------------
95-
#
95+
#
9696
# The AG_NEWS dataset has four labels and therefore the number of classes
9797
# is four.
98-
#
98+
#
9999
# ::
100-
#
100+
#
101101
# 1 : World
102102
# 2 : Sports
103103
# 3 : Business
104104
# 4 : Sci/Tec
105-
#
105+
#
106106
# The vocab size is equal to the length of vocab (including single word
107107
# and ngrams). The number of classes is equal to the number of labels,
108108
# which is four in AG_NEWS case.
109-
#
109+
#
110110

111111
VOCAB_SIZE = len(train_dataset.get_vocab())
112112
EMBED_DIM = 32
@@ -117,7 +117,7 @@ def forward(self, text, offsets):
117117
######################################################################
118118
# Functions used to generate batch
119119
# --------------------------------
120-
#
120+
#
121121

122122

123123
######################################################################
@@ -129,13 +129,13 @@ def forward(self, text, offsets):
129129
# mini-batch. Pay attention here and make sure that ``collate_fn`` is
130130
# declared as a top level def. This ensures that the function is available
131131
# in each worker.
132-
#
132+
#
133133
# The text entries in the original data batch input are packed into a list
134134
# and concatenated as a single tensor as the input of ``nn.EmbeddingBag``.
135135
# The offsets is a tensor of delimiters to represent the beginning index
136136
# of the individual sequence in the text tensor. Label is a tensor saving
137137
# the labels of individual text entries.
138-
#
138+
#
139139

140140
def generate_batch(batch):
141141
label = torch.tensor([entry[0] for entry in batch])
@@ -144,7 +144,7 @@ def generate_batch(batch):
144144
# torch.Tensor.cumsum returns the cumulative sum
145145
# of elements in the dimension dim.
146146
# torch.Tensor([1.0, 2.0, 3.0]).cumsum(dim=0)
147-
147+
148148
offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
149149
text = torch.cat(text)
150150
return text, offsets, label
@@ -153,7 +153,7 @@ def generate_batch(batch):
153153
######################################################################
154154
# Define functions to train the model and evaluate results.
155155
# ---------------------------------------------------------
156-
#
156+
#
157157

158158

159159
######################################################################
@@ -163,7 +163,7 @@ def generate_batch(batch):
163163
# `here <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`__).
164164
# We use ``DataLoader`` here to load AG_NEWS datasets and send it to the
165165
# model for training/validation.
166-
#
166+
#
167167

168168
from torch.utils.data import DataLoader
169169

@@ -186,7 +186,7 @@ def train_func(sub_train_):
186186

187187
# Adjust the learning rate
188188
scheduler.step()
189-
189+
190190
return train_loss / len(sub_train_), train_acc / len(sub_train_)
191191

192192
def test(data_):
@@ -207,13 +207,13 @@ def test(data_):
207207
######################################################################
208208
# Split the dataset and run the model
209209
# -----------------------------------
210-
#
210+
#
211211
# Since the original AG_NEWS has no valid dataset, we split the training
212212
# dataset into train/valid sets with a split ratio of 0.95 (train) and
213213
# 0.05 (valid). Here we use
214214
# `torch.utils.data.dataset.random_split <https://pytorch.org/docs/stable/data.html?highlight=random_split#torch.utils.data.random_split>`__
215215
# function in PyTorch core library.
216-
#
216+
#
217217
# `CrossEntropyLoss <https://pytorch.org/docs/stable/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss>`__
218218
# criterion combines nn.LogSoftmax() and nn.NLLLoss() in a single class.
219219
# It is useful when training a classification problem with C classes.
@@ -222,7 +222,7 @@ def test(data_):
222222
# learning rate is set to 4.0.
223223
# `StepLR <https://pytorch.org/docs/master/_modules/torch/optim/lr_scheduler.html#StepLR>`__
224224
# is used here to adjust the learning rate through epochs.
225-
#
225+
#
226226

227227
import time
228228
from torch.utils.data.dataset import random_split
@@ -250,56 +250,56 @@ def test(data_):
250250
print('Epoch: %d' %(epoch + 1), " | time in %d minutes, %d seconds" %(mins, secs))
251251
print(f'\tLoss: {train_loss:.4f}(train)\t|\tAcc: {train_acc * 100:.1f}%(train)')
252252
print(f'\tLoss: {valid_loss:.4f}(valid)\t|\tAcc: {valid_acc * 100:.1f}%(valid)')
253-
253+
254254

255255
######################################################################
256256
# Running the model on GPU with the following information:
257-
#
257+
#
258258
# Epoch: 1 \| time in 0 minutes, 11 seconds
259-
#
259+
#
260260
# ::
261-
#
261+
#
262262
# Loss: 0.0263(train) | Acc: 84.5%(train)
263263
# Loss: 0.0001(valid) | Acc: 89.0%(valid)
264-
#
265-
#
264+
#
265+
#
266266
# Epoch: 2 \| time in 0 minutes, 10 seconds
267-
#
267+
#
268268
# ::
269-
#
269+
#
270270
# Loss: 0.0119(train) | Acc: 93.6%(train)
271271
# Loss: 0.0000(valid) | Acc: 89.6%(valid)
272-
#
273-
#
272+
#
273+
#
274274
# Epoch: 3 \| time in 0 minutes, 9 seconds
275-
#
275+
#
276276
# ::
277-
#
277+
#
278278
# Loss: 0.0069(train) | Acc: 96.4%(train)
279279
# Loss: 0.0000(valid) | Acc: 90.5%(valid)
280-
#
281-
#
280+
#
281+
#
282282
# Epoch: 4 \| time in 0 minutes, 11 seconds
283-
#
283+
#
284284
# ::
285-
#
285+
#
286286
# Loss: 0.0038(train) | Acc: 98.2%(train)
287287
# Loss: 0.0000(valid) | Acc: 90.4%(valid)
288-
#
289-
#
288+
#
289+
#
290290
# Epoch: 5 \| time in 0 minutes, 11 seconds
291-
#
291+
#
292292
# ::
293-
#
293+
#
294294
# Loss: 0.0022(train) | Acc: 99.0%(train)
295-
# Loss: 0.0000(valid) | Acc: 91.0%(valid)
296-
#
295+
# Loss: 0.0000(valid) | Acc: 91.0%(valid)
296+
#
297297

298298

299299
######################################################################
300300
# Evaluate the model with test dataset
301301
# ------------------------------------
302-
#
302+
#
303303

304304
print('Checking the results of test dataset...')
305305
test_loss, test_acc = test(test_dataset)
@@ -308,21 +308,21 @@ def test(data_):
308308

309309
######################################################################
310310
# Checking the results of test dataset…
311-
#
311+
#
312312
# ::
313-
#
313+
#
314314
# Loss: 0.0237(test) | Acc: 90.5%(test)
315-
#
315+
#
316316

317317

318318
######################################################################
319319
# Test on a random news
320320
# ---------------------
321-
#
321+
#
322322
# Use the best model so far and test a golf news. The label information is
323323
# available
324324
# `here <https://pytorch.org/text/datasets.html?highlight=ag_news#torchtext.datasets.AG_NEWS>`__.
325-
#
325+
#
326326

327327
import re
328328
from torchtext.data.utils import ngrams_iterator
@@ -360,10 +360,10 @@ def predict(text, model, vocab, ngrams):
360360

361361
######################################################################
362362
# This is a Sports news
363-
#
363+
#
364364

365365

366366
######################################################################
367367
# You can find the code examples displayed in this note
368368
# `here <https://github.com/pytorch/text/tree/master/examples/text_classification>`__.
369-
#
369+
#

0 commit comments

Comments
 (0)