Skip to content

Commit 03cab56

Browse files
Update Recurrent-Neural-Networks.md for equation rendering
1 parent cd51ed1 commit 03cab56

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

docs/Deep Learning/Recurrent Neural Networks/Recurrent-Neural-Networks.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,12 @@ Many-to-Many architecture can also be represented in models where input and outp
4747
The basic RNN can be described by the following equations:
4848

4949
1. Hidden state update:
50-
$$ h_t = f(W_{hh}h_{t-1} + W_{xh}x_t + b_h) $$
5150

52-
2. Output calculation:
53-
$$ y_t = g(W_{hy}h_t + b_y) $$
51+
$$h_t = f(W_{hh}h_{t-1} + W_{xh}x_t + b_h)$$
52+
53+
3. Output calculation:
54+
55+
$$y_t = g(W_{hy}h_t + b_y)$$
5456

5557
Where:
5658
- $h_t$ is the hidden state at time $t$
@@ -66,7 +68,7 @@ Where:
6668

6769
RNNs are trained using Backpropagation Through Time (BPTT), an extension of the standard backpropagation algorithm. The loss is calculated at each time step and propagated backwards through the network:
6870

69-
$$ \frac{\partial L}{\partial W} = \sum_{t=1}^T \frac{\partial L_t}{\partial W} $$
71+
$$\frac{\partial L}{\partial W} = \sum_{t=1}^T \frac{\partial L_t}{\partial W}$$
7072

7173
Where $L$ is the total loss and $L_t$ is the loss at time step $t$.
7274

@@ -77,12 +79,15 @@ Where $L$ is the total loss and $L_t$ is the loss at time step $t$.
7779

7880
LSTMs address the vanishing gradient problem in standard RNNs by introducing a memory cell and gating mechanisms. The LSTM architecture contains three gates and a memory cell:
7981

80-
$$ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) $$
81-
$$ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) $$
82-
$$ \tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) $$
83-
$$ C_t = f_t * C_{t-1} + i_t * \tilde{C}_t $$
84-
$$ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) $$
85-
$$ h_t = o_t * \tanh(C_t) $$
82+
$$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$$
83+
84+
$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$
85+
86+
$$C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$$
87+
88+
$$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$$
89+
90+
$$h_t = o_t * \tanh(C_t)$$
8691

8792
Where:
8893
- $f_t$, $i_t$, and $o_t$ are the forget, input, and output gates respectively
@@ -110,8 +115,6 @@ Where:
110115
- $i_t$: Decides which values we'll update.
111116
- $\tilde{C}_t$: Creates a vector of new candidate values that could be added to the state.
112117
- This is how as Input gate look like:
113-
114-
115118
![alt text](<images/input gate.webp>)
116119

117120
3. **Cell State Update**:
@@ -147,4 +150,4 @@ The power of LSTMs lies in their ability to selectively remember or forget infor
147150

148151
## Conclusion
149152

150-
RNNs and their variants like LSTM are powerful tools for processing sequential data. They have revolutionized many areas of machine learning, particularly in tasks involving time-dependent or sequential information. Understanding their structure, mathematics, and applications is crucial for effectively applying them to real-world problems.
153+
RNNs and their variants like LSTM are powerful tools for processing sequential data. They have revolutionized many areas of machine learning, particularly in tasks involving time-dependent or sequential information. Understanding their structure, mathematics, and applications is crucial for effectively applying them to real-world problems.

0 commit comments

Comments
 (0)