Professional Documents
Culture Documents
Bayesian statistics
1 / 20
Outline
Todays agenda
Bayesian analysis as a measure of uncertainty
Probability and Bayes Theorem Frequentists condence intervals
Bayesian statistics
2 / 20
What is probability
1
Frequency interpretation
Limiting proportion of times the event occurs in an innite sequence of independent repetitions of the experiment Examples: toss of coins, dices The requirement that an experiment can be repeated is rather limiting: does it make sense to make the statement of the form probability of rain tomorrow =0.5? What does it mean for independent experiment? (circular use of probability)
Subjective probability
Concern the assessments of a given person about uncertain outcomes Interpreted as a personal belief or a statement about uncertainty Based on the present knowledge of the event - knowledge base change leads to probability updating
Bayesian statistics
3 / 20
Bayesian statistics
4 / 20
Example
You are given:
1
A portfolio of independent risks is divided into two classes, Class A and Class B. There are twice as many risks in Class A as in Class B. The number of claims for each insured during a single year follows a Bernoulli distribution. Classes A and B have claim size distributions as follows:
Claim Size 50,000 100,000 Class A 0.60 0.40 Class B 0.36 0.64
2 3
The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.
One insured is chosen at random. The insureds loss for two years combined is 100,000. You are asked to determine which class the selected insured is most likely to belong to.
Wayne Zhang (CNA insurance company) Bayesian statistics June 6, 2011 Chicago 5 / 20
Solution
Denote D as the event that the two years combined loss is 100,000, and in this case {A, B } . Prior distribution: p ( = A) = 0.6667, p ( = B ) = 0.3333. Data distribution: p (D |A) = 0.222 0.62 + 2 0.22 (1 0.22) 0.4 = 0.1547 p (D |B ) = 0.112 0.362 + 2 0.11 (1 0.11) 0.64 = 0.1269 Posterior distribution: P (A|D ) = p (D |A)P (A) p (D |A)P (A) + p (D |B )P (B ) 0.1547 0.6667 = 0.1547 0.6667 + 0.1269 0.3333 = 0.71
P (B |D ) = 1 P (A|D ) = 0.29
Wayne Zhang (CNA insurance company) Bayesian statistics June 6, 2011 Chicago 6 / 20
What do we learn?
It illustrates how data are combined with prior information to come to a conclusion The conclusion includes a measure of uncertainty Were we to use any other method, say MLE (choose that maximize p (D |)), we will not be able to get an estimate of uncertainty In that case, getting a condence interval is also meaningless: it suggests either complete condence or no condence In fact, the frequentists construction of condence intervals encounter many diculties.
Bayesian statistics
7 / 20
Condence interval
Consider the following example from Berger (1985): Suppose that X1 and X2 are independent with identical distribution given by P (Xi = 1) = P (Xi = + 1) = 1/2. Then, a frequentist 75% condence procedure is: (i.e., P ( (X ) = ) = 0.75 for all ) (X ) = (X1 + X2 )/2, X1 = X2 X1 1, X1 = X2
Bayesian statistics
8 / 20
Bayesian statistics
9 / 20
Prior information
1
A lady, who adds milk to her tea, claims to be able to tell whether the tea or the milk was poured into the cup rst. In all of ten trials conducted to test this, she correctly determines which was poured rst. A music expert claims to be able to distinguish a page of Haydn score from a page of Mozart score. In ten trials conducted to test this, he makes a correct determination each time. A drunken friend says he can predict the outcome of a ip of a fair coin. In ten trials conducted to test this, he is correct each time.
In all three situations, the unknown quantity is the probability of the person answering correctly. A classical signicance test of the various claims would reject the hypothesis that = 0.5 (i.e., the person is guessing) with a (one-tailed) signicance level of 210 .
Bayesian statistics
10 / 20
Bayesian statistics
11 / 20
Jereys prior: motivated by the desire that inference should not depend on how a model is parameterized.
E.g., if U [0, 1], has a non-uniform distribution with higher density near 1 than 0. Obtained as p () I (), I () being the Fisher information
Bayesian statistics
12 / 20
Conjugate priors
Denition If F is a class of sampling distribution p (D |), and P is a class of prior distributions p (), then the class P is conjugate for F if p ( |D ) P p (D |) F , p () P
xi 2
v, v
v=
1 2 0
n 2
Bayesian statistics
13 / 20
Exchangeability
Exchangeability
Denition The random quantities x1 , , xn are said to be nitely exchangeable if P (x1 E1 , , xn En ) = P (x(1) E1 , , x(n) En ), for any permutation on the set 1, , n, and any sets E1 , , En of possible values. Intuitively, it says distributions of x1 , , xn do not depend on the labeling or order independent and identically distributed is a special case This formalizes the notion of the future being predictable on the basis of past experience. An innite sequence x1 , x2 , is exchangeable if every nite sequence is exchangeable
Wayne Zhang (CNA insurance company) Bayesian statistics June 6, 2011 Chicago 14 / 20
(1)
Exchangeability
Exchangeability contd
de Finettis Theorem If x1 , x2 , is an innite exchangeable sequence of random variables with probability measure P , there exists a probability measure Q on F , the set of all distributions on R, such that the joint distribution of x1 , , xn has the form
n
P ( x1 , , xn ) =
F i =1
F (xi )dQ (F )
(2)
Bayesian statistics
15 / 20
Interval estimate
100(1 )% credible interval C such that Highest probability density region (HPD): p (1 ) p (2 ), 1 C , 2 /C
C
p ()d = 1 .
Bayesian statistics
16 / 20
Example
Suppose that an insurance policy incurred 5 claims each year in a period of two years (n = 2): We assume the claim count follows a Poisson distribution yi | Pois () The prior comes from a conjugate Gamma distribution such that Gamma(, ) Then we have the posterior |y Gamma( + yi , + n). We rely on posterior simulation to draw inference, as this is more straightforward and can work when the posterior is not of closed form.
mean 3 5 8 8 8 Prior variation 1 3 4 20 100 mean 3.79 4.99 5.56 5.05 5.03 Posterior median 50% interval 3.72 [3.16, 4.37] 4.79 [3.98, 5.83] 5.42 [4.50, 6.47] 4.93 [3.99, 5.97] 4.84 [3.89, 5.98]
June 6, 2011 Chicago 17 / 20
Bayesian statistics
prior
posteior
mean=3, std=1 0
10
12
14
10
12
14
prior
posteior
mean=8, std=4 0
10
12
14
10
12
14
prior
posteior
mean=8, std=100 0
10
12
14
10
12
14
n= 5 n= 2
10 n= 20
12
10
12
n= 100
10
12
10
12
Bayesian statistics
19 / 20
Nonconjugate prior
Now suppose I want to use a Uniform prior that log N (0, 1002 ), then the posterior distribution will be |y
yi 1
exp
(log )2 2 1002
(3)
This is not a distribution (as a function of ) we are familiar with, so how do we simulate samples from it?
Bayesian statistics
20 / 20