Skip to content

Commit 856e546

Browse files
authored
Merge pull request #4046 from Shantnu-singh/main
Optimizers in deep learning
2 parents 12849d3 + 6dfae53 commit 856e546

File tree

1 file changed

+132
-0
lines changed

1 file changed

+132
-0
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Deep Learning Optimizers
2+
3+
This repository contains implementations and explanations of various optimization algorithms used in deep learning. Each optimizer is explained with its mathematical equations and includes a small code example using Keras.
4+
5+
## Table of Contents
6+
- [Introduction](#introduction)
7+
- [Optimizers](#optimizers)
8+
- [Gradient Descent](#gradient-descent)
9+
- [Stochastic Gradient Descent (SGD)](#stochastic-gradient-descent-sgd)
10+
- [Momentum](#momentum)
11+
- [AdaGrad](#adagrad)
12+
- [RMSprop](#rmsprop)
13+
- [Adam](#adam)
14+
- [Usage](#usage)
15+
16+
17+
## Introduction
18+
19+
Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate to reduce the losses. Optimization algorithms help to minimize (or maximize) an objective function by adjusting the weights of the network.
20+
21+
## Optimizers
22+
23+
### Gradient Descent
24+
25+
Gradient Descent is the most basic but most used optimization algorithm. It is an iterative optimization algorithm to find the minimum of a function.
26+
27+
**Mathematical Equation:**
28+
29+
$$ \theta = \theta - \eta \nabla J(\theta) $$
30+
31+
**Keras Code:**
32+
33+
```python
34+
from keras.optimizers import SGD
35+
36+
model.compile(optimizer=SGD(learning_rate=0.01), loss='mse')
37+
```
38+
39+
### Stochastic Gradient Descent (SGD)
40+
41+
SGD updates the weights for each training example, rather than at the end of each epoch.
42+
43+
**Mathematical Equation:**
44+
45+
$$\theta = \theta - \eta \nabla J(\theta; x^{(i)}; y^{(i)})$$
46+
47+
**Keras Code:**
48+
49+
```python
50+
from keras.optimizers import SGD
51+
52+
model.compile(optimizer=SGD(learning_rate=0.01), loss='mse')
53+
```
54+
55+
### Momentum
56+
57+
Momentum helps accelerate gradients vectors in the right directions, thus leading to faster converging.
58+
59+
**Mathematical Equation:**
60+
61+
$$ v_t = \gamma v_{t-1} + \eta \nabla J(\theta) $$
62+
$$ \theta = \theta - v_t $$
63+
64+
**Keras Code:**
65+
66+
```python
67+
from keras.optimizers import SGD
68+
69+
model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), loss='mse')
70+
```
71+
72+
### AdaGrad
73+
74+
AdaGrad adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters.
75+
76+
**Mathematical Equation:**
77+
78+
$$ \theta = \theta - \frac{\eta}{\sqrt{G_{ii} + \epsilon}} \nabla J(\theta) $$
79+
80+
**Keras Code:**
81+
82+
```python
83+
from keras.optimizers import Adagrad
84+
85+
model.compile(optimizer=Adagrad(learning_rate=0.01), loss='mse')
86+
```
87+
88+
### RMSprop
89+
90+
RMSprop modifies AdaGrad to perform better in the non-convex setting by changing the gradient accumulation into an exponentially weighted moving average.
91+
92+
**Mathematical Equation:**
93+
94+
$$\theta = \theta - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} \nabla J(\theta)$$
95+
96+
**Keras Code:**
97+
98+
```python
99+
from keras.optimizers import RMSprop
100+
101+
model.compile(optimizer=RMSprop(learning_rate=0.001), loss='mse')
102+
```
103+
104+
### Adam
105+
106+
Adam combines the advantages of two other extensions of SGD: AdaGrad and RMSprop.
107+
108+
**Mathematical Equation:**
109+
110+
$$ m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t $$
111+
$$ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 $$
112+
$$ \hat{m_t} = \frac{m_t}{1 - \beta_1^t} $$
113+
$$ \hat{v_t} = \frac{v_t}{1 - \beta_2^t} $$
114+
$$ \theta = \theta - \eta \frac{\hat{m_t}}{\sqrt{\hat{v_t}} + \epsilon} $$
115+
116+
**Keras Code:**
117+
118+
```python
119+
from keras.optimizers import Adam
120+
121+
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
122+
```
123+
124+
## Usage
125+
126+
To use these optimizers, simply include the relevant Keras code snippet in your model compilation step. For example:
127+
128+
```python
129+
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
130+
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))
131+
```
132+

0 commit comments

Comments
 (0)