Next: NeuroBayes
Up: Network training
Previous: Other minimisation schemes
  Contents
Alternative cost functions
In section (3.2.1) we used the difference between network output and desired output squared as a cost function (equation 3.2). We are not limited to that choice however since any differentiable function which approaches a minimum for
could be used.
A commonly used function is based on entropy. In mathematics, entropy is defined as
where P is the probability that X is in the state and is defined as 0 if .
The relative entropy of a probability function with respect to a probability function is defined by:
The following cost function (based on the definition of relative entropy) was suggested by several authors ([And88], [Hop87],[SLF88]) in the late 1980's:
The term
is used for the probability that the hypothesis represented by unit is true: means definitively false, means definitively true. Similarly
is interpreted as a target set of probabilities (see [Kul59] for further details).
Like the quadratic error function, this choice is positive semi-definite and approaches zero when
.
The advantage of the entropy-based cost function is that it diverges if the output of one unit saturates at the wrong extreme, whereas the quadratic function just approaches a constant. When using (3.18) as the cost function, the same rules (e.g. for the backpropagation algorithm) apply.
Next: NeuroBayes
Up: Network training
Previous: Other minimisation schemes
  Contents
Ulrich Kerzel
2002-08-27