Gradient Descent (for Neural Networks)

Goal #

minimize w.r.t.
where:
- is the Loss Function
- are the ground truth Labels
- is the score function using the Model Parameter producing predictions

How it is done #

Walking down the slope using :
where is the distance we want to step: called Learning rate
iterate this many times
the algorithm may very well get stuck in a local minimum
the best can be expressed as: