This paper provides a set of methods for analyzing various finite mixture models. It includes not only the traditional methods, such as the EM algorithm of univariate and multivariable normal mixture, but also some newly studied methods reflecting the finite mixture model. Many algorithms are EM algorithms or based on em like ideas, so this paper includes an overview of EM algorithms for finite hybrid models.
1. Introduction to finite mixture model
Individuals in a population can often be divided into groups. However, even if we observe the characteristics of these individuals, we may not really observe the groups of these members. This task is sometimes called “unsupervised clustering” in the literature. In fact, hybrid models can generally be considered to be composed of a subset of clustering methods called “model-based clustering”.
The finite mixture model can also be used in cases other than those interested in individual clustering. Firstly, the finite mixture model gives the description of the whole subgroup, rather than assigning individuals to these subgroups. Sometimes, the finite mixture model only provides a means to fully describe a specific distribution, such as the residual distribution with outliers in the linear regression model.
No matter what the goal of modelers when using hybrid models, most theories of these models involve the assumption that subgroups are distributed in the form of a specific parameter – which is often univariate or multivariable normal.
The recent research goal is to relax or modify the multivariable normal hypothesis and the computational technology of finite mixture model analysis, in which the components are regression, vectors generated by multivariable data discretization, or even completely unspecified distributions.
2. EM algorithm of finite mixture model
The EM algorithm iteratively maximizes rather than the observed log likelihood LX（ θ)， The formula is
1. Step e: calculate Q（ θ|θ (t))
2. Step M: setting θ (t+1)=argmax θ ∈ Φ Q( θ|θ (t))
For the finite mixture model, step e does not depend on the structure of F, because the missing data part is only related to Z.
Z is discrete, and their distribution is given by Bayesian theorem. The M step itself can be divided into two parts, and λ About maximization, it does not depend on F, and φ With regard to maximization, it must be handled specifically for each model (for example, parametric, semi parametric, or nonparametric). Therefore, the EM algorithm of the model has the following common characteristics.
11. Step E. Calculate the “a posteriori” probability contained in the component (in data and θ (t) As a condition).
For all I = 1,… N and j = 1,… Numerically, it is very dangerous to implement it completely according to the formulation of formula (2), because when Xi is far away from any component, all φ (t) J 0 (XI) values will cause the value to underflow to zero, so there may be an uncertain form of 0 / 0. Therefore, many routines actually use equivalent expressions
Or some variant of it.
two λ Step M. set up
2.3. An example of EM algorithm
As an example, we consider univariate normal mixture analysis of the waiting data of geyser eruption interval described in Figure 1. This fully parameterized case corresponds to the mixed distribution of the univariate Gaussian family described in Section 1, where the j-th component density in (1) φ J (x) is normal and the mean is μ j. Variance is σ 2 j。
For parameter (µ J, σ 2 J), j = 1.. this EM algorithm is very simple for the M step of this univariate mixed distribution, for example, it can be found in McLachlan and peel (2000).
mixEM(waiting, lambda = .5)
The above code will fit a two-component mixed distribution (because Mu is a vector of length 2), where the standard deviation is assumed to be equal (because Sigma is a scalar rather than a vector).
Figure 1: sequence of log likelihood values, LX（ θ (t))
Figure 2: fitting geyser waiting data with parametric EM algorithm. Gaussian component of fitting.
R> plot(wait1, density = TRUE, cex.axis = 1.4, cex.lab = 1.4, cex.main = 1.8, + main2 = "Time between Old Faithful eruptions", xlab2 = "Minutes")
Two graphs: sequence of observed log likelihood values T 7 → LX（ θ (t) ) and histogram of data, where n（ ˆ µj , σˆ 2 J) m (where M = 2) fitted Gaussian component densities, j = 1,…, m, are superimposed together. estimate θˆ
In addition, you can get the same output using summary.
3. Cutpoint methods
Traditionally, most literatures on finite mixture models assume the density function of equation (1) φ J (x) comes from a known family of parameters. However, some authors have recently considered such a problem: in addition to some conditions required to ensure the identifiability of parameters in the model, φ J (x) is unspecified. We used the cut point method of Elmore et al. (2004).
We refer to Elmore et al. To use the tangent point at an interval of about 10.5 from – 63 to 63. Then create a multi indicator dataset from the original data, as shown below.
R> cutpts <- 10.5*(-6:6) R> mult(data, cuts = cutpts)
Once the multi index data is created, we can apply EM algorithm to estimate the multi index parameters. Finally, the estimated distribution function of the equation is calculated and plotted. Fig. 3 shows a diagram of 3-component and 4-component solutions; These charts are very similar to the corresponding charts in Figures 1 and 2 of Elmore et al. (2004).
R> plot(data, posterior, lwd = 2, + main = "Three component solution")
Figure 3 (a)
Figure 3 (b)
Summary can also be used to summarize EM output.
Semiparametric examples of univariate symmetry and position offset
stay φ (-) under the additional assumption that the Lebesgue metric is absolutely continuous, Bordes et al. (2007) proposed a random algorithm to estimate the model parameters, that is（ λ, µ, φ)。 A special case
R> plot(wait1, which = 2 ) R> wait2 <-EM(waiting) R> plot(wait2, lty = 2)
Figure 4 (a)
Figure 4 (b)
Since the semi parametric version depends on the kernel density estimation step (8), it is necessary to select a bandwidth for this step. By default, Silverman’s rule of thumb (Silverman 1986) is applied to the entire dataset.
However, the choice of bandwidth will be very different, as shown in Figure 4 (b).
> wait2a <- EM(wait, bw = 1) > plot(wait2a > plot(wait2b
We find that when the bandwidth is close to 2, the semi parametric solution looks very close to the normal mixed distribution solution of Fig. 2. Further reducing the bandwidth will lead to the “unevenness” shown by the solid line in Fig. 4 (b). On the other hand, when the bandwidth is 8, the semi parametric solution is very poor, because the algorithm tries to make each component look similar to the whole mixed distribution.
Most popular insights