Bayesian probability

from the Artful Common Queries page

In the mid-1700s, the English clergyman Thomas Bayes figured out how to calculate the probability of A given B when you know just three facts ...
  • the unconditional probability of A
  • the conditional probability of B given A
  • the conditional probability of B given notA

When A is an event (say, getting a particular infection), and B is a test for event A (say, a test for that infection) then p(B|A), the conditional probability of B given A, is the true positive rate or sensitivity of test B for event A, and p(B|notA) is the test B false positive rate for detecting event A.

(In Bayes's time. gamblers were understandably keen on such insights. Wouldn't it have been lovely for Rev Bayes if he'd known his brilliant little theorem would, in mid-20th century, become the basis of much probabilistic inference?)

His formula for the probability of A given B is ...

                      P(A) x P(B|A)              
P(A|B) = ----------------------------------------- 
         ( P(A) x P(B|A) ) + ( P(notA) x P(B|notA)   
See here for a derivation. In words, the formula says the probability of A given B is ...
  • the probability of A...
  • multiplied by the probability of B given A...
  • divided by the sum of ...
  • the probability of A multiplied by the probability of B given A, and...
  • the probability of not-A multiplied by the probability of B given not-A

Let A be "you have the infection".

Let B be "you tested positive for the disease".

Then ...

P(A) is the present prevalence of the disease in your population, so...

P(notA) = 1-P(A) = the present rate of non-infection

P(A|B) is what we wish to know, the probability we're infected given that we're asymptomatic and tested positive.

P(B|A) is the true positive rate of the test as above, so...

P(B|notA) is the probability of testing positive when you're actually free of infection, the false-positive rate.

P(A|notB) is the probability of having the disease even though you're asymptomatic and tested negative---the false negative rate.

The formula is a natural for a simple function ...

delimiter ;
set global log_bin_trust_function_creators=1;
drop function if exists bayes;
delimiter go
create function bayes( 
  pA decimal(4,3), 
  pBgivenA decimal(4,3), 
  pBgivenNotA decimal(4,3)
) returns decimal(4,3)
  declare ret decimal(4,3) unsigned default 0.0;
  if pA <= 0 or pBgivenA <=0 or pBgivenNotA < 0 then
    return ret;
  end if;
    round( pA * pBgivenA / 
           ((pA*pBgivenA) + ((1-pA)*pBgivenNotA)), 
delimiter ;
select bayes( .10, .95, .05 );  -- ( returns .68 )
That result says when 10% of your population is infected, then a positive result on a test with a 95% sensitivity and a 5% false positive rate gives you a probability of .68 that you have the infection.

You also want to know the probability of a false negative---the probability that the test wrongly reports a negative result. You don't need a fancy formula for that---the false negative rate is just 1 - the true positive rate, for the above example 1 - .95 = 0.05.

That Bayes probability function in hand, we can easily see how base rate, true-positive and false-positive parameters affect the credibility of test results by building a table of combinations of base incidence rate and test true positive and false positive rates with ranges of interest, for example

  • base rates between 5% and 95%
  • btrue positive rates between 50% and 99%
  • false positive rates between 1% and 50%
set @@cte_max_recursion_depth = 500000;
drop table if exists bayes;
create table bayes( 
  baserate decimal(4,3),
  truepos decimal(4,3), 
  falsepos decimal(4,3),
  bayesprob decimal(4,3)   
  with recursive 
    cteA as (
      select .05 as pA
      union all
      select pA+.05 as pA
      from cteA
      where pA < .95
    cteBgivenA as (
      select 0.5 as pBgivenA
      union all
      select pBgivenA + .05 as pBgivenA
      from cteBgivenA
      where pBgivenA < 0.95
    cteBgivenNotA as (
      select .01 as pBgivenNotA
      union all
      select pBgivenNotA + .01 as pBgivenNotA
      from cteBgivenNotA
      where pBgivenNotA < .5    )
    cteA.pA as baserate, 
    cteBgivenA.pBgivenA as truepos, 
    cteBgivenNotA.pBgivenNotA as falsepos,
    bayes( cteA.pA, 
         ) as bayesprob
  from cteA
  cross join cteBgivenA
  cross join cteBgivenNotA ;
That builds a table of 5,700 rows. Queries against it can illustrate how baserate, true positive and false positive test performance rates affect the Bayesian probability that a positive test result means what it says.


For example here's the curve we get for queries against the bayes table where the true positive rate is 70%, the false positive rate is 15%, and we plot the Bayes'Theorem result against base rate. By running such a query with different true and false positive rates, we can see the family of curves to which the above curve belongs. The shape indicates improving results with higher base rates; growing the true positive rate and decreasing the false positive rate both shift the curve up the chart.

Last updated 16 Mar 2020