From 873399972dcf259a4917712fef15113f12f787c3 Mon Sep 17 00:00:00 2001 From: "Christine P. Chai" Date: Sat, 4 Jan 2025 16:44:28 -0800 Subject: [PATCH 1/4] Make some https URLs clickable --- content/tutorial-nlp-from-scratch.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/tutorial-nlp-from-scratch.md b/content/tutorial-nlp-from-scratch.md index 2eb581da..ee24f588 100644 --- a/content/tutorial-nlp-from-scratch.md +++ b/content/tutorial-nlp-from-scratch.md @@ -107,8 +107,8 @@ We made sure to include different demographics in our data and included a range 1. **Text Denoising** : Before converting your text into vectors, it is important to clean it and remove all unhelpful parts a.k.a the noise from your data by converting all characters to lowercase, removing html tags, brackets and stop words (words that don't add much meaning to a sentence). Without this step the dataset is often a cluster of words that the computer doesn't understand. -2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from https://nlp.stanford.edu/projects/glove/. Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file. - >The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6835575.pdf which one can use on embeddings of their choice to mitigate bias, if present. +2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from . Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file. + >The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in which one can use on embeddings of their choice to mitigate bias, if present. You'll start with importing the necessary packages to build our Deep Learning network. @@ -1049,11 +1049,11 @@ To further enhance and optimize your neural network model, you can consider one - Initialize weights using [Xavier Initialization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization) to prevent vanishing/exploding gradients instead of initializing them randomly. - Replace LSTM with a [Bidirectional LSTM](https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks) to use both left and right context for predicting sentiment. -Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github.io/illustrated-transformer/)( which uses [Attention](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/) to tackle all the problems that plague an LSTM such as as lack of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning), lack of [parallel training](https://web.stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf) and a long gradient chain for lengthy sequences +Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github.io/illustrated-transformer/) (which uses [Attention](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/) to tackle all the problems that plague an LSTM such as as lack of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning), lack of [parallel training](https://web.stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf) and a long gradient chain for lengthy sequences). Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as PyTorch, JAX or TensorFlow — that provide NumPy-like APIs, have built-in automatic differentiation and GPU support, and are designed for high-performance numerical computing and machine learning. Finally, to know more about how ethics come into play when developing a machine learning model, you can refer to the following resources : -- Data ethics resources by the Turing Institute. https://www.turing.ac.uk/research/data-ethics +- Data ethics resources by the Turing Institute. - Considering how artificial intelligence shifts power, an [article](https://www.nature.com/articles/d41586-020-02003-2) and [talk](https://slideslive.com/38923453/the-values-of-machine-learning) by Pratyusha Kalluri - More ethics resources on [this blog post](https://www.fast.ai/2018/09/24/ai-ethics-resources/) by Rachel Thomas and the [Radical AI podcast](https://www.radicalai.org/) From b0a63e530aa6e12a2c4f5ae03a49d9addd504365 Mon Sep 17 00:00:00 2001 From: "Christine P. Chai" Date: Tue, 7 Jan 2025 13:19:14 -0800 Subject: [PATCH 2/4] Update content/tutorial-nlp-from-scratch.md Co-authored-by: Mukulika <60316606+Mukulikaa@users.noreply.github.com> --- content/tutorial-nlp-from-scratch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/tutorial-nlp-from-scratch.md b/content/tutorial-nlp-from-scratch.md index ee24f588..0dd0b8b7 100644 --- a/content/tutorial-nlp-from-scratch.md +++ b/content/tutorial-nlp-from-scratch.md @@ -1049,7 +1049,7 @@ To further enhance and optimize your neural network model, you can consider one - Initialize weights using [Xavier Initialization](https://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html#xavier-initialization) to prevent vanishing/exploding gradients instead of initializing them randomly. - Replace LSTM with a [Bidirectional LSTM](https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks) to use both left and right context for predicting sentiment. -Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github.io/illustrated-transformer/) (which uses [Attention](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/) to tackle all the problems that plague an LSTM such as as lack of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning), lack of [parallel training](https://web.stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf) and a long gradient chain for lengthy sequences). +Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github.io/illustrated-transformer/) which uses [Attention](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/) to tackle all the problems that plague an LSTM such as lack of [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning), lack of [parallel training](https://web.stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf), and a long gradient chain for lengthy sequences. Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as PyTorch, JAX or TensorFlow — that provide NumPy-like APIs, have built-in automatic differentiation and GPU support, and are designed for high-performance numerical computing and machine learning. From f9b97ec1e9f4ddc6fc3c1d6c216207263507d24c Mon Sep 17 00:00:00 2001 From: "Christine P. Chai" Date: Tue, 7 Jan 2025 13:19:21 -0800 Subject: [PATCH 3/4] Update content/tutorial-nlp-from-scratch.md Co-authored-by: Mukulika <60316606+Mukulikaa@users.noreply.github.com> --- content/tutorial-nlp-from-scratch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/tutorial-nlp-from-scratch.md b/content/tutorial-nlp-from-scratch.md index 0dd0b8b7..4defe73f 100644 --- a/content/tutorial-nlp-from-scratch.md +++ b/content/tutorial-nlp-from-scratch.md @@ -1054,6 +1054,6 @@ Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github. Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as PyTorch, JAX or TensorFlow — that provide NumPy-like APIs, have built-in automatic differentiation and GPU support, and are designed for high-performance numerical computing and machine learning. Finally, to know more about how ethics come into play when developing a machine learning model, you can refer to the following resources : -- Data ethics resources by the Turing Institute. +- [Data ethics resources](https://www.turing.ac.uk/research/data-ethics) by the Turing Institute - Considering how artificial intelligence shifts power, an [article](https://www.nature.com/articles/d41586-020-02003-2) and [talk](https://slideslive.com/38923453/the-values-of-machine-learning) by Pratyusha Kalluri - More ethics resources on [this blog post](https://www.fast.ai/2018/09/24/ai-ethics-resources/) by Rachel Thomas and the [Radical AI podcast](https://www.radicalai.org/) From 62243a9f583b986b1efb44beb5ba008996820846 Mon Sep 17 00:00:00 2001 From: "Christine P. Chai" Date: Tue, 7 Jan 2025 13:28:31 -0800 Subject: [PATCH 4/4] Updated two more urls due to reviewer suggestion --- content/tutorial-nlp-from-scratch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/tutorial-nlp-from-scratch.md b/content/tutorial-nlp-from-scratch.md index 4defe73f..ae2eeb14 100644 --- a/content/tutorial-nlp-from-scratch.md +++ b/content/tutorial-nlp-from-scratch.md @@ -107,8 +107,8 @@ We made sure to include different demographics in our data and included a range 1. **Text Denoising** : Before converting your text into vectors, it is important to clean it and remove all unhelpful parts a.k.a the noise from your data by converting all characters to lowercase, removing html tags, brackets and stop words (words that don't add much meaning to a sentence). Without this step the dataset is often a cluster of words that the computer doesn't understand. -2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from . Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file. - >The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in which one can use on embeddings of their choice to mitigate bias, if present. +2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from [the GloVe official website](https://nlp.stanford.edu/projects/glove/). Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file. + >The GloVe word embeddings include sets that were trained on billions of tokens, some up to 840 billion tokens. These algorithms exhibit stereotypical biases, such as gender bias which can be traced back to the original training data. For example certain occupations seem to be more biased towards a particular gender, reinforcing problematic stereotypes. The nearest solution to this problem are some de-biasing algorithms as the one presented in [this research article](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6835575.pdf), which one can use on embeddings of their choice to mitigate bias, if present. You'll start with importing the necessary packages to build our Deep Learning network.