next up previous contents
Next: NeuroBayes Up: Network training Previous: Other minimisation schemes   Contents


Alternative cost functions

In section (3.2.1) we used the difference between network output and desired output squared as a cost function (equation 3.2). We are not limited to that choice however since any differentiable function which approaches a minimum for could be used.

A commonly used function is based on entropy. In mathematics, entropy is defined as

(3.16)

where P is the probability that X is in the state and is defined as 0 if . The relative entropy of a probability function with respect to a probability function is defined by:
(3.17)

The following cost function (based on the definition of relative entropy) was suggested by several authors ([And88], [Hop87],[SLF88]) in the late 1980's:

(3.18)

The term is used for the probability that the hypothesis represented by unit is true: means definitively false, means definitively true. Similarly is interpreted as a target set of probabilities (see [Kul59] for further details). Like the quadratic error function, this choice is positive semi-definite and approaches zero when . The advantage of the entropy-based cost function is that it diverges if the output of one unit saturates at the wrong extreme, whereas the quadratic function just approaches a constant. When using (3.18) as the cost function, the same rules (e.g. for the backpropagation algorithm) apply.


next up previous contents
Next: NeuroBayes Up: Network training Previous: Other minimisation schemes   Contents
Ulrich Kerzel 2002-08-27