Skip to content

Commit 03fb6ef

Browse files
Added optimizers in ANN
1 parent 3fffe0c commit 03fb6ef

File tree

1 file changed

+133
-0
lines changed

1 file changed

+133
-0
lines changed
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Deep Learning Optimizers
2+
3+
This repository contains implementations and explanations of various optimization algorithms used in deep learning. Each optimizer is explained with its mathematical equations and includes a small code example using Keras.
4+
5+
## Table of Contents
6+
- [Introduction](#introduction)
7+
- [Optimizers](#optimizers)
8+
- [Gradient Descent](#gradient-descent)
9+
- [Stochastic Gradient Descent (SGD)](#stochastic-gradient-descent-sgd)
10+
- [Momentum](#momentum)
11+
- [AdaGrad](#adagrad)
12+
- [RMSprop](#rmsprop)
13+
- [Adam](#adam)
14+
- [Usage](#usage)
15+
16+
17+
## Introduction
18+
19+
Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate to reduce the losses. Optimization algorithms help to minimize (or maximize) an objective function by adjusting the weights of the network.
20+
21+
## Optimizers
22+
23+
### Gradient Descent
24+
25+
Gradient Descent is the most basic but most used optimization algorithm. It is an iterative optimization algorithm to find the minimum of a function.
26+
27+
**Mathematical Equation:**
28+
29+
$$ \theta = \theta - \eta \nabla J(\theta) $$
30+
31+
**Keras Code:**
32+
33+
```python
34+
from keras.optimizers import SGD
35+
36+
model.compile(optimizer=SGD(learning_rate=0.01), loss='mse')
37+
```
38+
39+
### Stochastic Gradient Descent (SGD)
40+
41+
SGD updates the weights for each training example, rather than at the end of each epoch.
42+
43+
**Mathematical Equation:**
44+
45+
$$ \theta = \theta - \eta \nabla J(\theta; x^{(i)}; y^{(i)}) $$
46+
47+
**Keras Code:**
48+
49+
```python
50+
from keras.optimizers import SGD
51+
52+
model.compile(optimizer=SGD(learning_rate=0.01), loss='mse')
53+
```
54+
55+
### Momentum
56+
57+
Momentum helps accelerate gradients vectors in the right directions, thus leading to faster converging.
58+
59+
**Mathematical Equation:**
60+
61+
$$ v_t = \gamma v_{t-1} + \eta \nabla J(\theta) $$
62+
$$ \theta = \theta - v_t $$
63+
64+
**Keras Code:**
65+
66+
```python
67+
from keras.optimizers import SGD
68+
69+
model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), loss='mse')
70+
```
71+
72+
### AdaGrad
73+
74+
AdaGrad adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters.
75+
76+
**Mathematical Equation:**
77+
78+
$$\theta = \theta - \frac{\eta}{\sqrt{G_{ii} + \epsilon}} \nabla J(\theta)$$
79+
80+
**Keras Code:**
81+
82+
```python
83+
from keras.optimizers import Adagrad
84+
85+
model.compile(optimizer=Adagrad(learning_rate=0.01), loss='mse')
86+
```
87+
88+
### RMSprop
89+
90+
RMSprop modifies AdaGrad to perform better in the non-convex setting by changing the gradient accumulation into an exponentially weighted moving average.
91+
92+
**Mathematical Equation:**
93+
94+
$$ E[g^2]_t = \gamma E[g^2]_{t-1} + (1 - \gamma) g_t^2 $$
95+
$$ \theta = \theta - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} \nabla J(\theta) $$
96+
97+
**Keras Code:**
98+
99+
```python
100+
from keras.optimizers import RMSprop
101+
102+
model.compile(optimizer=RMSprop(learning_rate=0.001), loss='mse')
103+
```
104+
105+
### Adam
106+
107+
Adam combines the advantages of two other extensions of SGD: AdaGrad and RMSprop.
108+
109+
**Mathematical Equation:**
110+
111+
$$ m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t $$
112+
$$ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 $$
113+
$$ \hat{m_t} = \frac{m_t}{1 - \beta_1^t} $$
114+
$$ \hat{v_t} = \frac{v_t}{1 - \beta_2^t} $$
115+
$$ \theta = \theta - \eta \frac{\hat{m_t}}{\sqrt{\hat{v_t}} + \epsilon} $$
116+
117+
**Keras Code:**
118+
119+
```python
120+
from keras.optimizers import Adam
121+
122+
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
123+
```
124+
125+
## Usage
126+
127+
To use these optimizers, simply include the relevant Keras code snippet in your model compilation step. For example:
128+
129+
```python
130+
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
131+
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))
132+
```
133+

0 commit comments

Comments
 (0)