Linear Regression
Is a linear model used in Machine Learning belongs to Supervised Learning .
- Sometimes also called Linear Score Functions
- Given a labeled/classified data set:
- Each data point
is associated with some label/classificaion
is the desired quantity that we want to predict for new data
- Each data point
- Training a linear regression model creates a linear function
as part of the Model Parameter
- Testing (validating) takes a new data point and the trained model to produce
an estimate
Training the model #
-
The Loss Function for the linear regression:
-
to be independent on the number of samples
-
So the optimization problem becomes
-
This is linear least squares
-
To find the minimum:
- Therefore:
Training in in 2D #
Is the Least Squares Estimate the best estimate? #
- Yes because the resulting model is the same as one derived with the Maximum Likelihood Estimate . (Assuming the independence of the samples and the normal distribution)
Proof #
-
We take the MLE model:
-
since
is monotonically increasing, the below expression optimizes for the same
as the previous one, but is easier to work with
()
-
Next, assume
is a Normal Distribution :
-
Therefore
-
Substituting this back into (2 ):
()
-
Simplifying
-
Substituting this back into (5 ):
-
Finally we are interested in the
to maximize the expression, so we partially derive with respect to
to find the extremum
-
Therefore:
From Linear Regression to a Neural Network #
Linear regression will always result in a linear Decision Boundary , which are often not flexible enough.
Until now the linear score function looks like
- with
being the weight matrix.
Even with additional weight matrices, the result would still be linear, since the matrices could be collapsed into one, since matrix multiplication is associative.
The solution then is to make use of a non linear part similar as the Logistic Regression does:
- 2 layer NN
- 3 layer NN
and so on
where is a non-linear function, like Sigmoid Functions
,
,
, …