My cousin Whitney has a long article about Eight Belles' injuries, including raising doubts about the amount of pain she was suffering, and raising the issue that we are far too quick to dismiss prosthetic treatment for horses with even catastrophic leg injuries.
On my Livejournal friends list, I have about forty people, at least three of whom keep pet rats. Our intuition says this is an unusually large number. If 3 out of 40 were typical, there'd be about 22 million people in the US with pet rats, and that just doesn't seem reasonable. So there must be something going on, not quite obvious, that ties rat owners together and makes them more likely to appear in the same, otherwise unrelated group, and maybe that same thing makes it more likely that these are the sorts of friends and acquaintances that'll appeal to me.
Or at least, that's what our brains tell us to figure out. Because we're human, we are hard-wired to detect correlations and seek out causations; it's how you survive in a world of unpredictable weather and unreliable food supplies (plus occasional leopards).
But, as a consequence of always looking for correlation-causations, our intuition is actually really bad at detecting when there's not a correlation, when it's just random chance.
This is where mathematics comes in. Math is just one of the ways we have figured out to use our really big brains' general processing capacity to route around the things that confuse our specific processing modules.
It turns out that there's a very direct, very simple bit of math that's aimed squarely at the question of how likely it is that three people out of forty will keep pet rats, and that's the Poisson distribution, the discovery of Siméon-Denis Poisson.
The Poisson distribution connects four pieces of data. First, there's the sample space. That might be a duration, or a physical area or volume, or it might be the number of things in a set. It's basically a measurement of how much room there is for whatever you're interested in to happen; the more room, the more stuff you expect to happen. In our case, the sample space is forty people. Call the sample space N.
Second, there's the probability per unit of sample space that something will happen. For example, there's about one chance in 10^17 that an atom of uranium-238 will decay in any given second. There's about one chance in two that a given person will be female. In our case, we're not sure what the probability that someone will keep pet rats is, but we think it's pretty small. It's one of the things we're trying to figure out. Call the probability density p.
Third, there's the number of times we actually see whatever it is we're interested in happening in our sample space. In the 24 hours after this post is made public, the mighty mighty Whiterose webserver gets 22 requests for it. After two years of running, the Fermilab collider sees six events with top quarks in them. In our case, we see three people who keep pet rats. Call the number of events k.
Fourth, there's the probability that, given the sample space and the unit probability, that we actually see the number of events we see. Call the event probability f.
I won't reproduce the actual Poisson equation here; it's the first one on the Wikipedia page, with f and k as I've defined them, and λ as N*p. λ is the number of events we expect to see, as distinct from the number of events we actually see.
[It is probable that a statistician has a more direct way of doing what I'm about to do, but I am a physicist, and once I know something, like how to solve the hydrogen atom, I'm going to use it until I'm absolutely certain it doesn't work any more. So we grind numbers into the Poisson equation.]
We can use the Poisson equation to try and evaluate how likely it is that I have three rat-keepers on my friends list out of pure random chance. First things first: The Poisson distribution is pretty fat and squishy, especially when k is small, so that f(k-1;λ) and f(k+1;λ) are not likely to be much larger or smaller than f(k;λ). (For those of you familiar with standard deviations, the σ of k is sqrt(k).) If you expect to see four events, then you're about 60% likely to actually see between two and six events.)
Recall that N=40 and k=3. Let's set p to 5%--we're guessing that 1 person in 20 has pet rats. When we plug those numbers in, we get f=18%. It's about 18% likely that I'd see three rat-keepers out of 40 if 1 in twenty people are rat-keepers. That's a number that's perfectly consistent with random chance.
But 5% is an awfully high fraction of rat-fanciers. If we use half that, 2.5%, we find that it's only 6% likely that we'd get three hits. 2% rat-fanciers makes it 4% likely to get three out of forty.
Let's take it down to 1%. If one in a hundred people keeps pet rats, then it's only 0.7% likely that there would be three people who keep pet rats out of the forty people on my friends list. That's not very likely at all. At 1 in 100, we'd expect to find no one at all who keeps rats on my friends list 67% of the time.
So here's what math is telling us: Either the number of rat-fanciers is much larger than you'd expect, up around one in twenty, one in forty sort of range, or the rat fanciers on my friends list are not there by random chance, but by some sort of real correlation/causation effect as yet to be determined.
According to petrats.org, the actual number of rat-fanciers in America is around the 1 in 200 or 1 in 300 range. So we can be very confident that, in fact, there is no random chance involved in my knowing three rat fanciers well enough to read their LJs.
Math: It works.