A Markov chain
is a stochastic model
describing a sequence
of possible events in which the probability of each event depends only on the state attained in the previous event. In continuous-time
, it is known as a Markov process.
It is named after the Russia
n mathematician Andrey Markov
Markov chains have many applications as statistical model
s of real-world processes, such as studying cruise control system
s in motor vehicle
s, queues or lines of customers arriving at an airport, currency exchange rate
s and animal population dynamics.
Markov processes are the basis for general stochastic simulation methods known as Markov chain Monte Carlo
, which are used for simulating sampling from complex probability distributions, and have found application in Bayesian statistics
and artificial intelligence
The adjective Markovian
is used to describe something that is related to a Markov process.
A Markov process is a stochastic process
that satisfies the Markov property
(sometimes characterized as "memorylessness
"). In simpler terms, it is a process for which predictions can be made regarding future outcomes based solely on its present state and—most importantly—such predictions are just as good as the ones that could be made knowing the process's full history. In other words, conditional
on the present state of the system, its future and past states are independent
A Markov chain is a type of Markov process that has either a discrete state space
or a discrete index set (often representing time), but the precise definition of a Markov chain varies. For example, it is common to define a Markov chain as a Markov process in either discrete or continuous time
with a countable state space (thus regardless of the nature of time), but it is also common to define a Markov chain as having discrete time in either countable or continuous state space (thus regardless of the state space).
Types of Markov chains
The system's state space
and time parameter index need to be specified. The following table gives an overview of the different instances of Markov processes for different levels of state space generality and for discrete time v. continuous time
Note that there is no definitive agreement in the literature on the use of some of the terms that signify special cases of Markov processes. Usually the term "Markov chain" is reserved for a process with a discrete set of times, that is, a discrete-time Markov chain (DTMC)
, but a few authors use the term "Markov process" to refer to a continuous-time Markov chain (CTMC)
without explicit mention. In addition, there are other extensions of Markov processes that are referred to as such but do not necessarily fall within any of these four categories (see Markov model
). Moreover, the time index need not necessarily be real-valued; like with the state space, there are conceivable processes that move through index sets with other mathematical constructs. Notice that the general state space continuous-time Markov chain is general to such a degree that it has no designated term.
While the time parameter is usually discrete, the state space
of a Markov chain does not have any generally agreed-on restrictions: the term may refer to a process on an arbitrary state space. However, many applications of Markov chains employ finite or countably infinite
state spaces, which have a more straightforward statistical analysis. Besides time-index and state-space parameters, there are many other variations, extensions and generalizations (see Variations
). For simplicity, most of this article concentrates on the discrete-time, discrete state-space case, unless mentioned otherwise.
The changes of state of the system are called transitions. The probabilities associated with various state changes are called transition probabilities. The process is characterized by a state space, a transition matrix
describing the probabilities of particular transitions, and an initial state (or initial distribution) across the state space. By convention, we assume all possible states and transitions have been included in the definition of the process, so there is always a next state, and the process does not terminate.
A discrete-time random process involves a system which is in a certain state at each step, with the state changing randomly between steps. The steps are often thought of as moments in time, but they can equally well refer to physical distance or any other discrete measurement. Formally, the steps are the integers
or natural numbers
, and the random process is a mapping of these to states. The Markov property states that the conditional probability distribution
for the system at the next step (and in fact at all future steps) depends only on the current state of the system, and not additionally on the state of the system at previous steps.
Since the system changes randomly, it is generally impossible to predict with certainty the state of a Markov chain at a given point in the future. However, the statistical properties of the system's future can be predicted. In many applications, it is these statistical properties that are important.
Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov processes in continuous time were discovered long before Andrey Markov
's work in the early 20th century in the form of the Poisson process
. Markov was interested in studying an extension of independent random sequences, motivated by a disagreement with Pavel Nekrasov
who claimed independence was necessary for the weak law of large numbers
to hold. In his first paper on Markov chains, published in 1906, Markov showed that under certain conditions the average outcomes of the Markov chain would converge to a fixed vector of values, so proving a weak law of large numbers without the independence assumption, which had been commonly regarded as a requirement for such mathematical laws to hold. Markov later used Markov chains to study the distribution of vowels in Eugene Onegin
, written by Alexander Pushkin
, and proved a central limit theorem
for such chains.
In 1912 Henri Poincaré
studied Markov chains on finite group
s with an aim to study card shuffling. Other early uses of Markov chains include a diffusion model, introduced by Paul
and Tatyana Ehrenfest
in 1907, and a branching process, introduced by Francis Galton
and Henry William Watson
in 1873, preceding the work of Markov. After the work of Galton and Watson, it was later revealed that their branching process had been independently discovered and studied around three decades earlier by Irénée-Jules Bienaymé
. Starting in 1928, Maurice Fréchet
became interested in Markov chains, eventually resulting in him publishing in 1938 a detailed study on Markov chains.
developed in a 1931 paper a large part of the early theory of continuous-time Markov processes. Kolmogorov was partly inspired by Louis Bachelier's 1900 work on fluctuations in the stock market as well as Norbert Wiener
's work on Einstein's model of Brownian movement. He introduced and studied a particular set of Markov processes known as diffusion processes, where he derived a set of differential equations describing the processes. Independent of Kolmogorov's work, Sydney Chapman
derived in a 1928 paper an equation, now called the Chapman–Kolmogorov equation
, in a less mathematically rigorous way than Kolmogorov, while studying Brownian movement. The differential equations are now called the Kolmogorov equations or the Kolmogorov–Chapman equations. Other mathematicians who contributed significantly to the foundations of Markov processes include William Feller
, starting in 1930s, and then later Eugene Dynkin
, starting in the 1950s.
s based on integers and the gambler's ruin
problem are examples of Markov processes. Some variations of these processes were studied hundreds of years earlier in the context of independent variables. Two important examples of Markov processes are the Wiener process
, also known as the Brownian motion
process, and the Poisson process
, which are considered the most important and central stochastic processes in the theory of stochastic processes. These two processes are Markov processes in continuous time, while random walks on the integers and the gambler's ruin problem are examples of Markov processes in discrete time.
A famous Markov chain is the so-called "drunkard's walk", a random walk on the number line
where, at each step, the position may change by +1 or −1 with equal probability. From any position there are two possible transitions, to the next or previous integer. The transition probabilities depend only on the current position, not on the manner in which the position was reached. For example, the transition probabilities from 5 to 4 and 5 to 6 are both 0.5, and all other transition probabilities from 5 are 0. These probabilities are independent of whether the system was previously in 4 or 6.
Another example is the dietary habits of a creature who eats only grapes, cheese, or lettuce, and whose dietary habits conform to the following rules:
It eats exactly once a day.
If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability.
If it ate grapes today, tomorrow it will eat grapes with probability 1/10, cheese with probability 4/10, and lettuce with probability 5/10.
If it ate lettuce today, tomorrow it will eat grapes with probability 4/10 or cheese with probability 6/10. It will not eat lettuce again tomorrow.
This creature's eating habits can be modeled with a Markov chain since its choice tomorrow depends solely on what it ate today, not what it ate yesterday or any other time in the past. One statistical property that could be calculated is the expected percentage, over a long period, of the days on which the creature will eat grapes.
A series of independent events (for example, a series of coin flips) satisfies the formal definition of a Markov chain. However, the theory is usually applied only when the probability distribution of the next step depends non-trivially on the current state.
A non-Markov example
Suppose that there is a coin purse containing five quarters (each worth 25¢), five dimes (each worth 10¢), and five nickels (each worth 5¢), and one by one, coins are randomly drawn from the purse and are set on a table. If represents the total value of the coins set on the table after draws, with , then the sequence is not a Markov process.
To see why this is the case, suppose that in the first six draws, all five nickels and a quarter are drawn. Thus . If we know not just , but the earlier values as well, then we can determine which coins have been drawn, and we know that the next coin will not be a nickel; so we can determine that with probability 1. But if we do not know the earlier values, then based only on the value we might guess that we had drawn four dimes and two nickels, in which case it would certainly be possible to draw another nickel next. Thus, our guesses about are impacted by our knowledge of values prior to .
However, it is possible to model this scenario as a Markov process. Instead of defining to represent the total value of the coins on the table, we could define to represent the count of the various coin types on the table. For instance, could be defined to represent the state where there is one quarter, zero dimes, and five nickels on the table after 6 one-by-one draws. This new model would be represented by 216 possible states (that is, 6x6x6 states, since each of the three coin types could have zero to five coins on the table by the end of the 6 draws). Suppose that the first draw results in state . The probability of achieving now depends on ; for example, the state is not possible. After the second draw, the third draw depends on which coins have so far been drawn, but no longer only on the coins that were drawn for the first state (since probabilistically important information has since been added to the scenario). In this way, the likelihood of the state depends exclusively on the outcome of the state.
Discrete-time Markov chain
A discrete-time Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states:
: if both conditional probabilities are well defined, that is, if
The possible values of Xi form a countable set S called the state space of the chain.
Markov chains are often described by a sequence of directed graphs, where the edges of graph n are labeled by the probabilities of going from one state at time n to the other states at time n + 1, The same information is represented by the transition matrix from time n to time n + 1. However, Markov chains are frequently assumed to be time-homogeneous (see variations below), in which case the graph and matrix are independent of n and are thus not presented as sequences.
These descriptions highlight the structure of the Markov chain that is independent of the initial distribution When time-homogeneous, the chain can be interpreted as a state machine assigning a probability of hopping from each vertex or state to an adjacent one. The probability of the machine's state can be analyzed as the statistical behavior of the machine with an element of the state space as input, or as the behavior of the machine with the initial distribution of states as input, where is the Iverson bracket.
The fact that some sequences of states might have zero probability of occurring corresponds to a graph with multiple connected components, where we omit edges that would carry a zero transition probability. For example, if a has a nonzero probability of going to b, but a and x lie in different connected components of the graph, then is defined, while is not.
Time-homogeneous Markov chains (or stationary Markov chains) are processes where
:for all n. The probability of the transition is independent of n.
A Markov chain with memory (or a Markov chain of order m)
:where m is finite, is a process satisfying
:In other words, the future state depends on the past m states. It is possible to construct a chain from which has the 'classical' Markov property by taking as state space the ordered m-tuples of X values, ie. .
Continuous-time Markov chain
A continuous-time Markov chain (Xt)t ≥ 0 is defined by a finite or countable state space S, a transition rate matrix Q with dimensions equal to that of the state space and initial probability distribution defined on the state space. For i ≠ j, the elements qij are non-negative and describe the rate of the process transitions from state i to state j. The elements qii are chosen such that each row of the transition rate matrix sums to zero, while the row-sums of a probability transition matrix in a (discrete) Markov chain are all equal to one.
There are three equivalent definitions of the process.
Let be the random variable describing the state of the process at time t, and assume the process is in a state i at time t.
Then, knowing , is independent of previous values