Gradient Descent (for Neural Networks)
Goal #
- minimize
w.r.t.
- where:
is the Loss Function
are the ground truth Labels
is the score function using the Model Parameter
producing predictions
How it is done #
-
Walking down the slope using
:
-
where
is the distance we want to step: called Learning rate
-
iterate this many times
-
the algorithm may very well get stuck in a local minimum
-
the best
can be expressed as: