Skip to content

Update pyspelling to include beginner tutorials written in Python #2279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,6 @@ cleanup.sh

# VSCode
*.vscode

# pyspelling
dictionary.dic
25 changes: 21 additions & 4 deletions .pyspelling.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,42 @@
spellchecker: aspell
matrix:
- name: beginner
- name: python
sources:
- beginner_source/data_loading_tutorial.py
- beginner_source/*.py
dictionary:
wordlists:
- tutorials-wordlist.txt
- en-wordlist.txt
pipeline:
- pyspelling.filters.python:
group_comments: true
- pyspelling.filters.context:
context_visible_first: true
delimiters:
# Exclude figure rST tags
- open: '\.\.\s+(figure|literalinclude|)::'
- open: '\.\.\s+(figure|literalinclude|math|image|grid)::'
close: '\n'
# Exclude raw directive
- open: '\.\. (raw)::.*$\n*'
close: '\n'
# Exclude Python coding directives
- open: '-\*- coding:'
close: '\n'
# Exclude Authors:
- open: 'Author(|s):'
close: '\n'
# Exclude .rst directives:
- open: ':math:`.*`'
close: ' '
# Ignore multiline content in codeblock
- open: '(?s)^::\n\n '
close: '^\n'
Comment on lines +30 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad, it means that there are incorrectly offset codeblocks in the tutorials

# Ignore reStructuredText block directives
- open: '\.\. (code-block)::.*$\n*'
content: '(?P<first>(^(?P<indent>[ ]+).*$\n))(?P<other>(^([ \t]+.*|[ \t]*)$\n)*)'
close: '(^(?![ \t]+.*$))'
- pyspelling.filters.markdown:
- pyspelling.filters.html:
ignores:
- code
- pre
- pyspelling.filters.url:
8 changes: 4 additions & 4 deletions beginner_source/Intro_to_TorchScript_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Introduction to TorchScript
===========================

*James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com)*, rev2
**Authors:** James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com), rev2

This tutorial is an introduction to TorchScript, an intermediate
representation of a PyTorch model (subclass of ``nn.Module``) that
Expand Down Expand Up @@ -147,7 +147,7 @@ def forward(self, x, h):


######################################################################
# We’ve once again redefined our MyCell class, but here we’ve defined
# We’ve once again redefined our ``MyCell`` class, but here we’ve defined
# ``MyDecisionGate``. This module utilizes **control flow**. Control flow
# consists of things like loops and ``if``-statements.
#
Expand Down Expand Up @@ -202,7 +202,7 @@ def forward(self, x, h):
# inputs* the network might see.
#
# What exactly has this done? It has invoked the ``Module``, recorded the
# operations that occured when the ``Module`` was run, and created an
# operations that occurred when the ``Module`` was run, and created an
# instance of ``torch.jit.ScriptModule`` (of which ``TracedModule`` is an
# instance)
#
Expand Down Expand Up @@ -283,7 +283,7 @@ def forward(self, x, h):
# Looking at the ``.code`` output, we can see that the ``if-else`` branch
# is nowhere to be found! Why? Tracing does exactly what we said it would:
# run the code, record the operations *that happen* and construct a
# ScriptModule that does exactly that. Unfortunately, things like control
# ``ScriptModule`` that does exactly that. Unfortunately, things like control
# flow are erased.
#
# How can we faithfully represent this module in TorchScript? We provide a
Expand Down
91 changes: 48 additions & 43 deletions beginner_source/chatbot_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,28 @@
# :align: center
# :alt: bot
#
# .. code:: python
#
# > hello?
# Bot: hello .
# > where am I?
# Bot: you re in a hospital .
# > who are you?
# Bot: i m a lawyer .
# > how are you doing?
# Bot: i m fine .
# > are you my friend?
# Bot: no .
# > you're under arrest
# Bot: i m trying to help you !
# > i'm just kidding
# Bot: i m sorry .
# > where are you from?
# Bot: san francisco .
# > it's time for me to leave
# Bot: i know .
# > goodbye
# Bot: goodbye .
# .. code-block:: python
#
# > hello?
# Bot: hello .
# > where am I?
# Bot: you re in a hospital .
# > who are you?
# Bot: i m a lawyer .
# > how are you doing?
# Bot: i m fine .
# > are you my friend?
# Bot: no .
# > you're under arrest
# Bot: i m trying to help you !
# > i'm just kidding
# Bot: i m sorry .
# > where are you from?
# Bot: san francisco .
# > it's time for me to leave
# Bot: i know .
# > goodbye
# Bot: goodbye .
#
# **Tutorial Highlights**
#
Expand All @@ -65,7 +65,7 @@
# - Implement greedy-search decoding module
# - Interact with trained chatbot
#
# **Acknowledgements**
# **Acknowledgments**
#
# This tutorial borrows code from the following sources:
#
Expand All @@ -75,7 +75,7 @@
# 2) Sean Robertson’s practical-pytorch seq2seq-translation example:
# https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation
#
# 3) FloydHub’s Cornell Movie Corpus preprocessing code:
# 3) FloydHub Cornell Movie Corpus preprocessing code:
# https://github.com/floydhub/textutil-preprocess-cornell-movie-corpus
#

Expand Down Expand Up @@ -162,11 +162,11 @@ def printLines(file, n=10):
# contains a tab-separated *query sentence* and a *response sentence* pair.
#
# The following functions facilitate the parsing of the raw
# *utterances.jsonl* data file.
# ``utterances.jsonl`` data file.
#
# - ``loadLinesAndConversations`` splits each line of the file into a dictionary of
# lines with fields: lineID, characterID, and text and then groups them
# into conversations with fields: conversationID, movieID, and lines.
# lines with fields: ``lineID``, ``characterID``, and text and then groups them
# into conversations with fields: ``conversationID``, ``movieID``, and lines.
# - ``extractSentencePairs`` extracts pairs of sentences from
# conversations
#
Expand Down Expand Up @@ -215,7 +215,7 @@ def extractSentencePairs(conversations):

######################################################################
# Now we’ll call these functions and create the file. We’ll call it
# *formatted_movie_lines.txt*.
# ``formatted_movie_lines.txt``.
#

# Define path to new file
Expand Down Expand Up @@ -359,12 +359,12 @@ def readVocs(datafile, corpus_name):
voc = Voc(corpus_name)
return voc, pairs

# Returns True iff both sentences in a pair 'p' are under the MAX_LENGTH threshold
# Returns True if both sentences in a pair 'p' are under the MAX_LENGTH threshold
def filterPair(p):
# Input sequences need to preserve the last word for EOS token
return len(p[0].split(' ')) < MAX_LENGTH and len(p[1].split(' ')) < MAX_LENGTH

# Filter pairs using filterPair condition
# Filter pairs using the ``filterPair`` condition
def filterPairs(pairs):
return [pair for pair in pairs if filterPair(pair)]

Expand Down Expand Up @@ -659,7 +659,7 @@ def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):
self.hidden_size = hidden_size
self.embedding = embedding

# Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'
# Initialize GRU; the input_size and hidden_size parameters are both set to 'hidden_size'
# because our input size is a word embedding with number of features == hidden_size
self.gru = nn.GRU(hidden_size, hidden_size, n_layers,
dropout=(0 if n_layers == 1 else dropout), bidirectional=True)
Expand Down Expand Up @@ -958,7 +958,7 @@ def train(input_variable, lengths, target_variable, mask, max_target_len, encode
input_variable = input_variable.to(device)
target_variable = target_variable.to(device)
mask = mask.to(device)
# Lengths for rnn packing should always be on the cpu
# Lengths for RNN packing should always be on the CPU
lengths = lengths.to("cpu")

# Initialize variables
Expand Down Expand Up @@ -1007,7 +1007,7 @@ def train(input_variable, lengths, target_variable, mask, max_target_len, encode
print_losses.append(mask_loss.item() * nTotal)
n_totals += nTotal

# Perform backpropatation
# Perform backpropagation
loss.backward()

# Clip gradients: gradients are modified in place
Expand All @@ -1032,8 +1032,8 @@ def train(input_variable, lengths, target_variable, mask, max_target_len, encode
# lifting with the ``train`` function.
#
# One thing to note is that when we save our model, we save a tarball
# containing the encoder and decoder state_dicts (parameters), the
# optimizers’ state_dicts, the loss, the iteration, etc. Saving the model
# containing the encoder and decoder ``state_dicts`` (parameters), the
# optimizers’ ``state_dicts``, the loss, the iteration, etc. Saving the model
# in this way will give us the ultimate flexibility with the checkpoint.
# After loading a checkpoint, we will be able to use the model parameters
# to run inference, or we can continue training right where we left off.
Expand Down Expand Up @@ -1240,8 +1240,8 @@ def evaluateInput(encoder, decoder, searcher, voc):
# Configure models
model_name = 'cb_model'
attn_model = 'dot'
#attn_model = 'general'
#attn_model = 'concat'
#``attn_model = 'general'``
#``attn_model = 'concat'``
hidden_size = 500
encoder_n_layers = 2
decoder_n_layers = 2
Expand All @@ -1251,12 +1251,17 @@ def evaluateInput(encoder, decoder, searcher, voc):
# Set checkpoint to load from; set to None if starting from scratch
loadFilename = None
checkpoint_iter = 4000
#loadFilename = os.path.join(save_dir, model_name, corpus_name,
# '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
# '{}_checkpoint.tar'.format(checkpoint_iter))

#############################################################
# Sample code to load from a checkpoint:
#
# .. code-block:: python
#
# loadFilename = os.path.join(save_dir, model_name, corpus_name,
# '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),
# '{}_checkpoint.tar'.format(checkpoint_iter))

# Load model if a loadFilename is provided
# Load model if a ``loadFilename`` is provided
if loadFilename:
# If loading on same machine the model was trained on
checkpoint = torch.load(loadFilename)
Expand Down Expand Up @@ -1319,7 +1324,7 @@ def evaluateInput(encoder, decoder, searcher, voc):
encoder_optimizer.load_state_dict(encoder_optimizer_sd)
decoder_optimizer.load_state_dict(decoder_optimizer_sd)

# If you have cuda, configure cuda to call
# If you have CUDA, configure CUDA to call
for state in encoder_optimizer.state.values():
for k, v in state.items():
if isinstance(v, torch.Tensor):
Expand All @@ -1344,7 +1349,7 @@ def evaluateInput(encoder, decoder, searcher, voc):
# To chat with your model, run the following block.
#

# Set dropout layers to eval mode
# Set dropout layers to ``eval`` mode
encoder.eval()
decoder.eval()

Expand Down
Loading