



Suppose you had a black box, and all you could measure from it are streams of data, which for the sake of simplicity, are streams of bits.
10100010010…
00101100100…
We would like to construct a theory that explains its behaviour. Indeed, this is essentially what science is about - the construction of theories to explain phenomena we observe.
Of course, there’s an infinite number of different constructions that explain output. We can explain the ways the stars move by saying individual angels moved them each day, but it wouldn’t be a good explanation since you’re left with the question of why the angels are moving them that way. Essentially, good explanations should compress information.
So, our basic goal is too
Construct the simplist machine that accurately captures the patterns within the data.
In this article, I’ll outline one possible approach, known as epsilon-machines, developed by C. Shalizi and J. Crutchfield.
Machines that Generate Strings - A Simple Example:
Essentially, every machine can be thought of as a physical system that can be in one of
possible physical states. The dynamics of the system can then be defined as transitions between different states. Since we want a system that can produce bitstrings, we’re going to add another condition. At every discrete timestep, the system is going to emit a bit, which is determined by what dynamics is happening at that timestep.
For example, one simple system as a two-state system, whose states we label
and
that oscillates between the
and
, emiting a 1 or a 0 depending on its current state. Such a system, when you run it, will clearly always give bitstrings of the form
..1010101010101…
What we want to do is the reverse. We have the bitstring above… and we want to reverse engineer the two state machine.
Any physical system can exists in a number of different states. Thus, the first step for us is to introduce bare minimum number of physical states that allow us to explain the output.
We begin with a finite state machine, where all the information about the future dynamics and hence output of the machine is determined by its current state. Any extra information would be redundant. Thus, the idea is to take a string of bits up to a certain point (the history), and ask what information is required to predict the next bit.
Therefore, we group the set of all possible output histories into equivalence classes, such that replacing one history with another history within the same class does not alter does not affect our prediction of its future evolution. We refer to these classes as Causal States
In our example, for instance, we observe that all strings alternate between 0 and 1. So, any string the ends with a 0 will have identical future evolution (the same also applies to 1). Thus, we can define a machine with two causal states

The next step is to model of dynamics. That is, at each discrete time step, our system jumps from one causal state, to another, and in process, emits a bit. We model these transitions by transition matrices, where
represents the probability of transitioning from
to
while emitting the bit
. Thus, this simple example, features two matrices,
and
, with values as listed in the figure above.
The Formalities:
With the intuition afforded by out example, we can define an epsilon machine more generally by the ordered pair
, where
is the collection of causal states, and
the collection of matrices the define how the states transform.
In out simple example, the string we analysed was deterministic, such that 0 always succeeded a 1 and vice versa. In general, e-machines describe stochastic processes. For example, the completely random string can be described by a trivial e-machine with a single causal state, with
.
Features of E-Machines:
Some useful properties of these machines I list without proof:
where
is the probablity it is in causal state
. This is easy to compute.
is minimal when compared to any other state machines that predicts the dynamics of the process.Note this this more general definition takes into account non-deterministic emissions… the measurement results can contain inherent randomness.
A Few Personal Queries:
Epsilon machines are motivated by Occam’s Razor, the idea the the best explanation of a given phenomena is the simplest explanation. In the case of Epsilon machines, their simplicity was judged on the basis of the entropy of the causal state distribution. Intuitively, this equates to the amount of information contained the knowledge of which causal state your epsilon machine is in.
However, is this an appropraite measure of complexity for Occam’s Razor? There exists many different epsilon machines with the same Causal State distribution, with different transition matrices and hence model different physical processes. Thus, a measure of the simplicity of the model should necessarily include the information required to describe these transition matrices.
Could an alternative construction, with a more complex state space, have a simply transition structure?
References:










More Options ...

Categories
Tag Cloud
Blog RSS
Comments RSS


Void « Default
Life
Earth
Wind
Water
Fire
Light 
4:17 am - March 12th, 2009
[...] Epsilon Machines: Formalising Occam’s Razor [...]
9:21 am - March 18th, 2009
[...] Comments Occam’s Mathematical Razor | MileGu.com on Epsilon Machines - Occam’s mathematical RazorMile Gu on Of Go (Weiqi), intuition, and Artificial IntelligenceBob Hearn on Of Go (Weiqi), [...]