Work of late has required quite a bit of time playing with SVG images, and consequently decrypting the descriptions of Bézier curves.
This in turn got me to thinking about approximating probability distributions over bounded intervals. If you don't know the true distribution, you may be able to make reasonable predictions by using a simplified distribution that shares the same characteristics.
The basic algorithm is fairly straight forward: each observable implies a constraint. In other words, if F is a distribution that delivers each of your observables, and G(z) is a function that satisfies all of your constraints, then F+G(z) is also going to be a distribution with the right properties. What's z? Z is some parameter that you can play with so that F+G(z) satisfies some new condition.
Typically, observables will have the form "L[F] = Y", and the constraints the form "L[G] = 0". If L is Linear, it follows that L[F+G]=Y.
Eventually, we end up with some function p(x) = [F+G1(z1)+G2(z2)+G3(z3)+...](x). In each case, Gn preserves our earlier constraints for all values z, and zN is a parameter that delivers the next observable in our set.
For example, lets suppose we want a distribution on [0,1], which vanishes at the endpoints. In other words p(0) = p(1) = 0. This simplest curve to satisfy this equation is a parabola, with roots at 0 and 1. In other words, p(x) = x(1-x). Furthermore, we are promised that any power series multiplied by this equation will also satisfy these conditions.
p(x) = x(1-x)[F(x)].
Now, as it happens the next two things I'm interested in are the slope of the curve leaving the origin, and the slope of the curve as it arrives at 1. We can call these slopes A and -B (expecting B to be positive, since we want the probability to be strictly greater than zero), and therefore we are looking for some way to use these as parameters to find our next iteration of p(x).
p(x) = x(1-x)[F{A,B}(x)]
The product rule tells us that [FG]' = F'G + FG'
p'(x)
= [x(1-x)F(x)]'
= [(x-x²)F(x)]'
= (1-2x)F(x) + (x-x²)F'(x)
= (1-2x)F(x) + x(1-x)F'(x)
The second term vanishes at the endpoints, so we can concentrate on what the first term tells us
p'(0) = F(0) = A
p'(1) = -F(1) = -B
So any curve at all that passes through (0,A) and (1,B) gets the job done. The simplest one is a line, so lets use that.
p(x) = x(1-x)[ A + (B-A)x ]
Rearranging the terms
p(x) = x(1-x)[ (1-x)A + xB ]
p(x) = x(1-x)² A + x²(1-x)B
You might see a pattern here: to make sure that a particular order derivative vanishes in a particular place, you just need the right number of factors of that root. Guessing that we'll have another constraint, you might suppose that we'll be looking next at
p(x) = x(1-x)² A + x²(1-x)B + x²(1-x)² C
and you would be right. There's another way to see this. If we go back...
p(x) = x(1-x)[ A + (B-A)x + C]
We don't want to change our earlier observables, so C should be zero at both roots. And we already know how to do that, by injecting the parabola
p(x) = x(1-x)[ A + (B-A)x + x(1-x)C ]
and then expanding.
p(x) = x(1-x)² A + x²(1-x)B + [x(1-x)]² C
What can we do with C? How about a local maximum at m? For starters, we'll treat C as a parameterized constant, and try to solve
p'(m) = [ ..... + (x(1-x))² C(m)]' = 0.
There's not much to it, so let's just sketch out the last term.
[ (x(1-x))² C(m)]' = 2(x(1-x))(1-2x) C(m)
Now just think of C(m) as a fraction: we want everything to wash out when x=m, if we make the denominator cancel the factor in front ( 2m(1-m)(1-2m) ), put a minus sign in the numerator, then just copy the derivative of the A and B terms (substituting m for x). Done.
We can go a notch further by adding to C an expression for x where the derivative at m is zero. It's quick to cheat by using our earlier rule that we simply multiply by the right power of (x-m), but there's a surprise hiding if we do the math completely.
We want the derivative of (x(1-x))²[C+G] to cancel out the derivative of the first part of the function. Now the derivative is linear, and we've already found a constant C that gets the job done, so we require that the derivative of (x(1-x))²G vanish. Let's go back to the product rule again: [FG]' = F'G + FG' = 0. From this we can see that there is a trivial solution when G and G' both vanish (this is our friend (x-m)²), but there's also a second solution available, when G and G' are perfectly balanced at point m.
-F(m) G(m) / F'(m) = G'(m)
Which is satisfied by
G(x) = exp[ -F(m) x / F'(m) ]
G(x) = exp[ -m(1-m) x /(2-4m) ]
Both of these solutions are just as valid if we multiply a constant, and so...
p(x) = x(1-x)² A + x²(1-x)B + (x(1-x))² [ C(m) + (x-m)²D(y) + exp[-m(1-m) x /(2-4m) ]E(z) ]
Where to go from here? Well, there's an excess of flexibility, as we have one clear constraint that we haven't addressed (we need ∫p = 1), but two parameters available. If I had no further observables to address, I would most likely discard the exponential and use D(y) to meet the last obligation. But if there are other constraints to meet, then I would probably use E(z) to deliver the correct integral, and expand D subject to the constraint that the integral vanish.
It seems to me that you ought to be able to go in any order you wish. For example, you could start by satisfying the probability distribution by using p(x) = 1, then start adding additional observables with the constraint that the integral vanishes.
Last piece of advice - don't trust the math blindly. Look at the plot for your distribution as a sanity check; if your model includes negative probability, you had better be able to explain why that's a good idea.
July 4, 2007 9:26 PM
| TrackBack