Computational Graph

The way computation can be modelled in a Neural Network .
Directional graph
Matrix operations are represented as compute nodes
Variables or scalar operators are vertex nodes
Directional edges show the flow of inputs to vertices
In a way the computational graph is equivalent to a syntax tree of a math expression

Evaluation #

Evaluating the neural network based on the Compute Graph and all the weights and input vector.

For Gradient Descent we need to compute the gradient of the Loss Function (which is also part of the compute graph). So we are interested in:

To do this, first evaluate the loss function as described above and let each node remember it’s intermediate result.
Compute the partial derivatives for compute nodes. Here:
- ; ;
- ; ;
Walk the graph backwards
- We know the parial derivative of the last node
- We also know the evaluated values of the nodes contributing to the final node
- So each node we can annotate with the value of their derivative under the current evaluation
- Recursively do this until all leafs are reached
- According to the chain rule, the resulting derivative value must be multiplied with the parents derivative value