Maximum Likelihood Estimate
or “MLE” for short
-
Data distributions:
-
is just the training samples
- Real world observations
- If we observed every possible event ever made then this is a continuous distribution
-
is parametrized by
Goal #
Finding to maximize the likelihood to make the same observations
with
as with
.
In other words: The model should predict the data that we have.
Independency Assumption #
Obtaining the model for a MLE #
The is the one
where the product of the probabilies
to get the correct estimate, is maximized. Since probabilies go from 0 to
1, maximizing their product makes sense.