Hypergeometric Distribution

Much probability theory is based on the notion of a "Bernoulli trial" -- an event that results in a "success" or "failure." When tossing a coin, for example, you can consider "heads" a success and "tails" a failure, each occurring with a probability of 1/2. When attempting to calculate the probability of a sequence of coin tosses -- say, the chance of getting two heads in a row -- you only need to multiply the probability of each individual event. For two heads in a row, the probability would be 1/2 x 1/2, or 1/4. Odds for the Daily 3 game can be calculated in this way; the general formula is called the "binomial distribution."

However, for lotto games such as Powerball and Gopher 5, the binomial distribution does not apply. When tossing coins, the result of one toss does not affect the result of the next. The events are said to be "independent." However, in a lotto game, selection of successive balls is not independent because the balls are not put back into the machine. For Powerball, the probability of the first ball being a 1 are 1 in 59. If a 1 is selected, the probability of a 1 on the second ball is 0. If 1 is not selected on the first ball, the probability of selecting it on the second ball drops to 1 in 58 because there is one fewer ball in the machine. This is referred to as "sampling without replacement."

Assume we have F objects from which to choose. (For Powerball, this would be the 59 white balls.) Of these F, M are successes (numbers that match the ones on your ticket, or 5 in the case of Powerball) and F-M are failures (for Powerball, the 54 balls in the machine that don't match a number on your ticket). Next, we conduct n Bernoulli trials - we draw n balls (5 for Powerball) from the machine. What we need to know is the probability of getting p successes and n-p failures in those n trials. To match all five white balls, for example, you calculate the probability of 5 successes and 0 failures in 5 trials. The general formula for probability is (Equation 1):

Each of these terms boils down to counting. To calculate the denominator, begin by realizing that there are F ways of selecting the first object, F-1, ways of selecting the second object, and so on to F-n ways of selecting the nth object. The total number of ways of making this selection, therefore, is F(F-1)(F-2)...(F-n). However, in our case the order of selection does not matter -- (1,2,3,4,5) is the same as (5,4,3,2,1). We need to adjust for the number of combinations that are identical save for order.

To see this, imagine a drawing of two balls from a set of three. There are 3 ways of picking the first ball and two ways of picking the second, for a total of 6 outcomes: (1,2), (1,3), (2,1), (2,3), (3,1), and (3,2). However, only three of these are distinct: (1,2), (1,3), and (2,3) -- the others are merely reorderings. In general, if we're picking n objects, there will be (n)(n-1)(n-2)... ways of arranging each unique combination, so we need to divide our first calculation by this term. There's a mathematical shorthand for this, called a "binomial coefficient." Using a binomial coefficient, the total number of ways of selecting n objects from a set of F is:

where F! is shorthand for (F)x(F-1)x(F-2) ...x3x2x1.

Binomial coefficients are notated as:

f over n

Using the same reasoning, you'll find that the number of ways of getting p successes is

M over p

and the number of ways of getting n-p failures is

F-M over n - p

Substituting these terms into equation 1, our general formula for the probability of p successes in n trials is

This is referred to as the hypergeometric distribution. It can easily be calculated by using the HYPGEOMDIST function in Microsoft Excel.

We can apply this formula to Powerball. Suppose we want to find the probability of matching three of the five white balls. In this case, F = 59 (number of white balls), M = 5 (number of balls that match a number on our ticket), n = 5 (number of balls drawn), and p = 3. Placing these numbers in the formula, we get

or 1 in 349.9.

For a more mathematically rigorous derivation of this formula, we recommend An Introduction to Probability Theory and Its Applications, Volume 1 by William Feller (New York: John Wiley & Sons, Inc. 1968)