Syndicate this site: (RSS)

Benfords Law

Benford's Law - random samples of data do not have a random progression of first digits, but rather a logarithmic one, with 1 being the most likely.

Bizarre realization #1: this does not depend on the base used to represent the data.

June 7, 2003 11:10 AM | TrackBack

Comments

Danil, although I never thought about it before, I can convince myself that Benford's Law is reasonable because we count in a monotonically increasing direction.

The simplest case to illustrate is a single-digit, positive number, 1-9; call it N. The odds of selecting one of them is 1 in 9, obviously. N represents the largest result we can get. You then distribute your samples randomly from 1-N and record their frequency. Repeat this many times.

You will find that 1 comes up so much because for all possible N, you will always be able to get a 1 when you distribute random data, but you cannot get a number larger than N because N was defined as the largest result for our tests.

Another way to rationalize Benford's Law is: if you are randomly deciding whether or not to increase something, you will eventually stop. So the odds are much higher of having a small total at the end than a large total because there were so many more opportunities to stop with the larger total than with the smaller one.

Do either of these explanations feel plausible?


Cheers,
--binkley

Comment by: B. K. Oxley (binkley) June 7,2003

No, but don't let that stop you.

I think monotonic counting has anything to do with it - if it did, we would see the effect in uniform distributions. We don't. We would see the effect in the least significant digit. We don't.

Furthermore, though this is a phenomenon that appears regardless of base, it is not one that appears regardless of representation. Write out the numbers in roman numerals, and you'll quickly see what I mean.

The key to the puzzle is that our numerology is exponential in nature.

As for your probability puzzle,
p(k,n) = 1/n * ( H[n] - H[k-1] )
where H[n] is the nth term of the Harmonic Series

Some interesting bits here.
p(n,n) = 1/n^2 trivially.

p(n-1,n) = 1/n (1/n-1 + 1/n)
= 1/n ((2n-1)/(n(n-1))
= 1/n^2 (2n-1)/(n-1)
= 1/n^2 (2 + 1/(n-1))

p(n-1,n)/p(n,n) = 2 + 1/(n-1)

The tail of this curve starts out at a 3:1 ratio, decending to a 2:1 ratio in the limit. Similarly
p(n-2,n)/p(n,n) = 3 + (3n+4)/(n-2)(n-1)

Hah! 3:1. Any guesses how the rest of the terms go?

Comment by: Danil June 7,2003

See also Power Law

Comment by: Danil June 9,2003

Danil,

See below from The New York Times referring to Dow Jones index. This is explains why Benfords law doesn't work for normal distributions, e.g. lottery's were the balls are numbered but they could be labelled with anything, e.g. girls names and does for any real set of numbers,e.g. tax returns. The law is Log(d + 1/d) and shows 1 has a 30% likelyhood whereas 9 only has a 4.6% likelyhood.


To illustrate Benford's Law, Dr. Mark J. Nigrini offered this example:

"If we think of the Dow Jones stock average as 1,000, our first digit would be 1.

"To get to a Dow Jones average with a first digit of 2, the average must increase to 2,000, and getting from 1,000 to 2,000 is a 100 percent increase.

"Let's say that the Dow goes up at a rate of about 20 percent a year. That means that it would take five years to get from 1 to 2 as a first digit.

"But suppose we start with a first digit 5. It only requires a 20 percent increase to get from 5,000 to 6,000, and that is achieved in one year.

"When the Dow reaches 9,000, it takes only an 11 percent increase and just seven months to reach the 10,000 mark, which starts with the number 1. At that point you start over with the first digit a 1, once again. Once again, you must double the number -- 10,000 -- to 20,000 before reaching 2 as the first digit.

"As you can see, the number 1 predominates at every step of the progression, as it does in logarithmic sequences."

Comment by: Andy October 7,2003
Post a comment




Who are you?