Skip to content

Commit cd51ed1

Browse files
ADDED RNN
1 parent d9576d0 commit cd51ed1

File tree

6 files changed

+150
-0
lines changed

6 files changed

+150
-0
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Recurrent Neural Networks (RNNs) in Deep Learning
2+
3+
## Introduction
4+
5+
Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to work with sequential data. Unlike traditional feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs, making them particularly suited for tasks such as natural language processing, speech recognition, and time series analysis.
6+
7+
## Basic Structure
8+
9+
An RNN processes a sequence of inputs $(x_1, x_2, ..., x_T)$ and produces a sequence of outputs $(y_1, y_2, ..., y_T)$. At each time step $t$, the network updates its hidden state $h_t$ based on the current input $x_t$ and the previous hidden state $h_{t-1}$.
10+
11+
## The different types of RNN are:
12+
- **One to One RNN**
13+
- **One to Many RNN**
14+
- **Many to One RNN**
15+
- **Many to Many RNN**
16+
17+
![alt text](<images/types of rnn.webp>)
18+
19+
### One to One RNN
20+
One to One RNN (Tx=Ty=1) is the most basic and traditional type of Neural network giving a single output for a single input, as can be seen in the above image.It is also known as Vanilla Neural Network. It is used to solve regular machine learning problems.
21+
22+
### One to Many
23+
One to Many (Tx=1,Ty>1) is a kind of RNN architecture is applied in situations that give multiple output for a single input. A basic example of its application would be Music generation. In Music generation models, RNN models are used to generate a music piece(multiple output) from a single musical note(single input).
24+
25+
### Many to One
26+
Many-to-one RNN architecture (Tx>1,Ty=1) is usually seen for sentiment analysis model as a common example. As the name suggests, this kind of model is used when multiple inputs are required to give a single output.
27+
28+
Take for example The Twitter sentiment analysis model. In that model, a text input (words as multiple inputs) gives its fixed sentiment (single output). Another example could be movie ratings model that takes review texts as input to provide a rating to a movie that may range from 1 to 5.
29+
30+
### Many-to-Many
31+
As is pretty evident, Many-to-Many RNN (Tx>1,Ty>1) Architecture takes multiple input and gives multiple output, but Many-to-Many models can be two kinds as represented above:
32+
33+
1. Tx=Ty:
34+
35+
This refers to the case when input and output layers have the same size. This can be also understood as every input having a output, and a common application can be found in Named entity Recognition.
36+
37+
2. Tx!=Ty:
38+
39+
Many-to-Many architecture can also be represented in models where input and output layers are of different size, and the most common application of this kind of RNN architecture is seen in Machine Translation. For example, “I Love you”, the 3 magical words of the English language translates to only 2 in Spanish, “te amo”. Thus, machine translation models are capable of returning words more or less than the input string because of a non-equal Many-to-Many RNN architecture works in the background.
40+
41+
## Mathematical Formulation
42+
43+
**Simplified Architecture Of RNN**
44+
45+
![alt text](images/basic_rnn_arch.webp)
46+
47+
The basic RNN can be described by the following equations:
48+
49+
1. Hidden state update:
50+
$$ h_t = f(W_{hh}h_{t-1} + W_{xh}x_t + b_h) $$
51+
52+
2. Output calculation:
53+
$$ y_t = g(W_{hy}h_t + b_y) $$
54+
55+
Where:
56+
- $h_t$ is the hidden state at time $t$
57+
- $x_t$ is the input at time $t$
58+
- $y_t$ is the output at time $t$
59+
- $W_{hh}$, $W_{xh}$, and $W_{hy}$ are weight matrices
60+
- $b_h$ and $b_y$ are bias vectors
61+
- $f$ and $g$ are activation functions (often tanh or ReLU for $f$, and softmax for $g$ in classification tasks)
62+
63+
64+
65+
## Backpropagation Through Time (BPTT)
66+
67+
RNNs are trained using Backpropagation Through Time (BPTT), an extension of the standard backpropagation algorithm. The loss is calculated at each time step and propagated backwards through the network:
68+
69+
$$ \frac{\partial L}{\partial W} = \sum_{t=1}^T \frac{\partial L_t}{\partial W} $$
70+
71+
Where $L$ is the total loss and $L_t$ is the loss at time step $t$.
72+
73+
74+
75+
## Variants of RNNs
76+
### Long Short-Term Memory (LSTM)
77+
78+
LSTMs address the vanishing gradient problem in standard RNNs by introducing a memory cell and gating mechanisms. The LSTM architecture contains three gates and a memory cell:
79+
80+
$$ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) $$
81+
$$ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) $$
82+
$$ \tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) $$
83+
$$ C_t = f_t * C_{t-1} + i_t * \tilde{C}_t $$
84+
$$ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) $$
85+
$$ h_t = o_t * \tanh(C_t) $$
86+
87+
Where:
88+
- $f_t$, $i_t$, and $o_t$ are the forget, input, and output gates respectively
89+
- $C_t$ is the cell state
90+
- $h_t$ is the hidden state
91+
- $\sigma$ is the sigmoid function
92+
- $*$ denotes element-wise multiplication
93+
94+
**This is how an LSTM Architecture looks like:**
95+
96+
![alt text](images/LSTM.webp)
97+
#### Gate Descriptions:
98+
99+
1. **Forget Gate** $(f_t)$:
100+
- Purpose: Decides what information to discard from the cell state.
101+
- Operation: Takes $h_{t-1}$ and $x_t$ as input and outputs a number between 0 and 1 for each number in the cell state $C_{t-1}$.
102+
- Interpretation: 1 means "keep this" while 0 means "forget this".
103+
- This is how as forget gate look like:
104+
105+
![alt text](<images/forget gate.webp>)
106+
107+
2. **Input Gate** $(i_t)$:
108+
- Purpose: Decides which new information to store in the cell state.
109+
- Operation:
110+
- $i_t$: Decides which values we'll update.
111+
- $\tilde{C}_t$: Creates a vector of new candidate values that could be added to the state.
112+
- This is how as Input gate look like:
113+
114+
115+
![alt text](<images/input gate.webp>)
116+
117+
3. **Cell State Update**:
118+
- Purpose: Updates the old cell state, $C_{t-1}$, into the new cell state $C_t$.
119+
- Operation:
120+
- Multiply the old state by $f_t$, forgetting things we decided to forget earlier.
121+
- Add $i_t * \tilde{C}_t$. This is the new candidate values, scaled by how much we decided to update each state value.
122+
-
123+
124+
4. **Output Gate** $(o_t)$:
125+
- Purpose: Decides what parts of the cell state we're going to output.
126+
- Operation:
127+
- $o_t$: Decides what parts of the cell state we're going to output.
128+
- Multiply it by a tanh of the cell state to push the values to be between -1 and 1.
129+
130+
The power of LSTMs lies in their ability to selectively remember or forget information over long sequences, mitigating the vanishing gradient problem that plagues simple RNNs.
131+
132+
## Applications
133+
134+
1. Natural Language Processing (NLP)
135+
2. Speech Recognition
136+
3. Machine Translation
137+
4. Time Series Prediction
138+
5. Sentiment Analysis
139+
6. Music Generation
140+
141+
## Challenges and Considerations
142+
143+
1. Vanishing and Exploding Gradients
144+
2. Long-term Dependencies
145+
3. Computational Complexity
146+
4. Choosing the Right Architecture (LSTM vs GRU vs Simple RNN)
147+
148+
## Conclusion
149+
150+
RNNs and their variants like LSTM are powerful tools for processing sequential data. They have revolutionized many areas of machine learning, particularly in tasks involving time-dependent or sequential information. Understanding their structure, mathematics, and applications is crucial for effectively applying them to real-world problems.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)