We shall now describe a technique for computing the argmax for a
mixture model using the bounding techniques introduced
earlier. Traditionally, the mean of each Gaussian is evaluated to
determine its probability and the best mean is returned as the arg
max. This is not always the correct solution and we derive a technique
with which the true arg max can be computed.
Observe the Gaussian mixture model shown in
Figure~#fig:marginal#5572> (what we get <#5573#>after<#5573#> training the
model and observing a new covariate input #math226##tex2html_wrap_inline6485#. It is
composed of 4 Gaussians which are shown alone with dotted lines. We
would like to solve for the #tex2html_wrap_inline6487# which maximizes the
expression. One may be tempted to just try setting #tex2html_wrap_inline6489# to the
prototype of each Gaussian #tex2html_wrap_inline6491# and checking to see which one of
these is the best. The 4 prototypes points are shown as circles on the
pdf and none of them is the maximum. This is due to interaction
between the individual models and the maximum could, for instance, lie
between two Gaussians. Therefore, this is usually not the arg max. In
addition, we show the expectation value on the curve (the average
#tex2html_wrap_inline6493#) by a * symbol and also note that it is not maximal.
We will consider the prototypes as seeds and try optimizing the probability
from there. Thus, we should converge to the <#5578#>global<#5578#> maximum of
probability after investigating all possible attractors. For M
Gaussians, there are M attractor basins and these usually have
non-zero mutual interaction. Thus, assume we are at a current
operating point #math227##tex2html_wrap_inline6499# (seeded at one of the Gaussian means
or prototypes). We wish to find a better #tex2html_wrap_inline6501# using an
iterative approach. Equivalently, we can maximize the logarithm of the
difference in probabilities #tex2html_wrap_inline6503# as in
Equation~#eqn:deltaargmax#5583>.
#math228#
#tex2html_wrap_indisplay13097# |
|
|
(7.34) |
Applying Jensen's inequality yields a bound which is very similar to
our previous Q-function in the EM algorithm. Effectively, we
propose solving arg maximization using an EM framework with multiple
seeds (a total of M seeds for M Gaussians). In fact, the
application of Jensen's inequality above is the E-step. The M-step is
obtained by taking derivatives with respect to #tex2html_wrap_inline6513# of the
bound and setting them to 0. We obtain the update equation for the
#tex2html_wrap_inline6515# as in Equation~#eqn:argmaxupdate#5626>. This E and the M
steps are iterated and #tex2html_wrap_inline6517# monotonically converges to a local
maximum. For each of the 4 seeds, we show the convergence on the
mixture model in Figure~#fig:marginalARGMAX#5628>(a) and the monotonic
increase in Figure~#fig:marginalARGMAX#5629>(b). Given that only a handful
of iterations are needed and only M Gaussians are being used, the
complete arg maximization process we implemented requires only several
milliseconds to find the best #tex2html_wrap_inline6521# occupying a vector space of
dimensionality greater than 100.
#math229#
#tex2html_wrap_indisplay13108# |
|
|
(7.35) |
<#13109#>Figure<#13109#> 7.13:
<#13110#>Arg Maximization<#13110#>
#figure5651# |
If time is a critical factor, we can use standard techniques to
compute a mere expectation of the mixture model which is given in
Equation~#eqn:expectation#5660>. This solution, often has lower
probability than the arg max.
#math230#
#tex2html_wrap_indisplay13113# |
|
|
(7.36) |