machine learning - Is likelihood calculated over the whole training set or a single example? -
suppose have training set of (x,y)
s, x
input example , y
output tag, , y
value (1....k) (k number of classes).
when calculating likelihood
of training set, should calculated whole training set (all of examples), is:
l = p(y|x) = p(y_1|x_1) * p(y_2|x_2) * ....
or likelihood computed specific training example (x,y)?
i'm asking because saw these lecture notes (page 2), seems calculate l_i - i.e., likelihood every training example separately.
the likelihood function describes probability of generating set of training data given parameters , can used find parameters generate training data maximum probability. can create likelihood function subset of training data, wouldn't represent likelihood of whole data. can (and apparently silently done in lecture notes) assume data independent , identically distributed (iid). therefore, can split joint probability function smaller pieces, i.e. p(x|theta) = p(x1|theta) * p(x2|theta) * ...
(based on independence assumption), , can use same function same parameters (theta) each of these pieces, e.g. normal distribution (based on identicality assumption). can use logarithm turn product sum, i.e. p(x|theta) = p(x1|theta) + p(x2|theta) + ...
. function can maximized setting derivative zero. resulting maximum theta creates x maximum probability, i.e. maximum likelihood estimator.
Comments
Post a Comment