In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it, i.e. the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too high dimensional to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the Metropolis–Hastings algorithm. == General explanation == Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance. Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities. Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values. These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given. == History == The development of MCMC methods is deeply rooted in the early exploration of Monte Carlo (MC) techniques in the mid-20th century, particularly in physics. These developments were marked by the Metropolis algorithm proposed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller in 1953, which was designed to tackle high-dimensional integration problems using early computers. Then in 1970, W. K. Hastings generalized this algorithm and inadvertently introduced the component-wise updating idea, later known as Gibbs sampling. Simultaneously, the theoretical foundations for Gibbs sampling were being developed, such as the Hammersley–Clifford theorem from Julian Besag's 1974 paper. Although the seeds of MCMC were sown earlier, including the formal naming of Gibbs sampling in image processing by Stuart Geman and Donald Geman (1984) and the data augmentation method by Martin A. Tanner and Wing Hung Wong (1987), its "revolution" in mainstream statistics largely followed demonstrations of the universality and ease of implementation of sampling methods (especially Gibbs sampling) for complex statistical (particularly Bayesian) problems, spurred by increasing computational power and software like BUGS. This transformation was accompanied by significant theoretical advancements, such as Luke Tierney's (1994) rigorous treatment of MCMC convergence, and Jun S. Liu, Wong, and Augustine Kong's (1994, 1995) analysis of Gibbs sampler structure. Subsequent developments further expanded the MCMC toolkit, including particle filters (Sequential Monte Carlo) for sequential problems, Perfect sampling aiming for exact simulation (Jim Propp and David B. Wilson, 1996), RJMCMC (Peter J. Green, 1995) for handling variable-dimension models, and deeper investigations into convergence diagnostics and the central limit theorem. Overall, the evolution of MCMC represents a paradigm shift in statistical computation, enabling the analysis of numerous previously intractable complex models and continually expanding the scope and impact of statistics. == Mathematical setting == Suppose (Xn) is a Markov Chain in the general state space X {\displaystyle {\mathcal {X}}} with specific properties. We are interested in the limiting behavior of the partial sums: S n ( h ) = 1 n ∑ i = 1 n h ( X i ) {\displaystyle S_{n}(h)={\dfrac {1}{n}}\sum _{i=1}^{n}h(X_{i})} as n goes to infinity. Particularly, we hope to establish the Law of Large Numbers and the Central Limit Theorem for MCMC. In the following, we state some definitions and theorems necessary for the important convergence results. In short, we need the existence of invariant measure and Harris recurrent to establish the Law of Large Numbers of MCMC (Ergodic Theorem). And we need aperiodicity, irreducibility and extra conditions such as reversibility to ensure the Central Limit Theorem holds in MCMC. === Irreducibility and aperiodicity === Recall that in the discrete setting, a Markov chain is said to be irreducible if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, φ-irreducibility generalizes irreducibility by using a reference measure φ on the measurable space ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} . Definition (φ-irreducibility) Given a measure φ {\displaystyle \varphi } defined on ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} , the Markov chain ( X n ) {\displaystyle (X_{n})} with transition kernel K ( x , y ) {\displaystyle K(x,y)} is φ-irreducible if, for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} with φ ( A ) > 0 {\displaystyle \varphi (A)>0} , there exists n {\displaystyle n} such that K n ( x , A ) > 0 {\displaystyle K^{n}(x,A)>0} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} (Equivalently, P x ( τ A < ∞ ) > 0 {\displaystyle P_{x}(\tau _{A}<\infty )>0} , here τ A = inf { n ≥ 1 ; X n ∈ A } {\displaystyle \tau _{A}=\inf\{n\geq 1;X_{n}\in A\}} is the first n {\displaystyle n} for which the chain enters the set A {\displaystyle A} ). This is a more general definition for irreducibility of a Markov chain in non-discrete state space. In the discrete case, an irreducible Markov chain is said to be aperiodic if it has period 1. Formally, the period of a state ω ∈ X {\displaystyle \omega \in {\mathcal {X}}} is defined as: d ( ω ) := g c d { m ≥ 1 ; K m ( ω , ω ) > 0 } {\displaystyle d(\omega ):=\mathrm {gcd} \{m\geq 1\,;\,K^{m}(\omega ,\omega )>0\}} For the general (non-discrete) case, we define aperiodicity in terms of small sets: Definition (Cycle length and small sets) A φ-irreducible Markov chain ( X n ) {\displaystyle (X_{n})} has a cycle of length d if there exists a small set C {\displaystyle C} , an associated integer M {\displaystyle M} , and a probability distribution ν M {\displaystyle \nu _{M}} such that d is the greatest common divisor of: { m ≥ 1 ; ∃ δ m > 0 such that C is small for ν m ≥ δ m ν M } . {\displaystyle \{m\geq 1\,;\,\exists \,\delta _{m}>0{\text{ such that }}C{\text{ is small for }}\nu _{m}\geq \delta _{m}\nu _{M}\}.} A set C {\displaystyle C} is called small if there exists m ∈ N ∗ {\displaystyle m\in \mathbb {N} ^{}} and a nonzero measure ν m {\displaystyle \nu _{m}} such that: K m ( x , A ) ≥ ν m ( A ) , ∀ x ∈ C , ∀ A ∈ B ( X ) . {\displaystyle K^{m}(x,A)\geq \nu _{m}(A),\quad \forall x\in C,\,\forall A\in {\mathcal {B}}({\mathcal {X}}).} === Harris recurrent === Definition (Harris recurrence) A set A {\displaystyle A} is Harris recurrent if P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ A {\displaystyle x\in A} , where η A = ∑ n = 1 ∞ I A ( X n ) {\displaystyle \eta _{A}=\sum _{n=1}^{\infty }\mathbb {I} _{A}(X_{n})} is the number of visits of the chain ( X n ) {\displaystyle (X_{n})} to the set A {\displaystyle A} . The chain ( X n ) {\displaystyle (X_{n})} is said to be Harris recurrent if there exists a measure ψ {\displaystyle \psi } such that the chain is ψ {\displaystyle \psi } -irreducible and every measurable set A {\displaystyle A} with ψ ( A ) > 0 {\displaystyle \psi (A)>0} is Harris recurrent. A useful criterion for verifying Harris recurrence is the following: Proposition If for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} , we have P x ( τ A < ∞ ) = 1 {\displaystyle P_{x}(\tau _{A}<\infty )=1} for every x ∈ A {\displaystyle x\in A} , then P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} , and the chain ( X n ) {\displaystyle (X_{n})} is Harris recurrent. This definition is only needed when the state space X {\displaystyle {\mathcal {X}}} is uncountable. In the countable case, recurrence corresponds to E x [ η x ] = ∞ {\displaystyle \mathbb {E} _{x}[\eta _{x}]=\infty } , which is equivalent to P x ( τ x < ∞ ) = 1 {\displaystyle P_{x}(\tau _{x}<\infty )=1} for all x ∈ X {\displaystyle x\i
Read more →