Professional Documents
Culture Documents
Mattias Villani
I Model
iid
y1 , ..., yn |θ, σ2 ∼ N (θ, σ2 )
I Conjugate prior
σ2
2
θ |σ ∼ N µ0 ,
κ0
σ2 ∼ Inv -χ2 (ν0 , σ02 )
I Parameters θ = ( β 1 , β 2 , ..., β k , σ2 ).
I Assumptions:
I E (yi ) = β 1 xi 1 + β 2 xi 2 + ... + β k xik (linear function)
I Var (yi ) = σ2 (homoscedasticity)
I Corr (yi , yj |X , β, σ2 ) = 0, i 6= j.
I Normality of ε i .
I The x’s are assumed known (non-random).
y1 β1 ε1
y = ... , β = ... , ε = ..
.
yn βk εn
0
x1 x11 · · · x1k
X = ... = ... .. ..
. .
xn 0 xn1 · · · xnk
I Usually xi1 = 1, for all i. β 1 is the intercept.
I Likelihood for the full sample
y| β, σ2 , X ∼ N (Xβ, σ2 In )
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 6 / 17
L INEAR REGRESSION - UNIFORM PRIOR
I Standard non-informative prior: uniform on (β, log σ2 )
p ( β, σ2 ) ∝ σ−2
I Joint posterior of β and σ2 :
σ2 |y ∼ Inv -χ2 (n − k, s 2 )
I Marginal posterior of β :
β|σ2 ∼ N µ0 , σ2 Ω0−1
σ2 ∼ Inv − χ2 ν0 , σ02
I Posterior
β|σ2 , y ∼ N µn , σ2 Ωn−1
σ2 |y ∼ Inv − χ2 νn , σn2
−1
µ n = X0 X + Ω 0 X0 X β̂ + Ω0 µ0
Ω n = X0 X + Ω 0
νn = ν0 + n
νn σn2 = ν0 σ02 + y0 y + µ00 Ω0 µ0 − µn0 Ωn µn
2
y
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Constant basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Linear basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Quadratic basis function
1
0.5
y
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
2
y
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Constant basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Linear basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Basis function for (x−0.4)+
1
0.5
y
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
y = Xb β + ε,
Xb = (b1 , ..., bm ).
Xb = (1, x, b1 , ..., bm ).
−0.2 −0.2
LogRatio
LogRatio
−0.4 −0.4
−0.6 −0.6
Data
−0.8 Estimated E(y|x) −0.8
−1 −1
400 500 600 700 400 500 600 700
Range Range
λ = 1 λ = 10
0 0
−0.2 −0.2
LogRatio
LogRatio
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
400 500 600 700 400 500 600 700
Range Range
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 13 / 17
S MOOTHNESS PRIOR FOR SPLINES , CONT.
I The famous Lasso variable selection method is equivalent to using the
posterior mode estimate under the prior:
σ2
2 iid
β i |σ ∼ Laplace 0,
λ
with density
λ λ | βi |
p ( β i ) = 2 exp − 2
2σ σ
I The Bayesian shrinkage prior is interpretable. Not ad hoc.
I Laplace distribution have heavy tails.
I Laplace: many β i are close to zero, but some β i may be very large.
I Normal distribution have light tails.
I Normal prior: most β i are fairly equal in size, and no single β i can be
very much larger than the other ones.
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 14 / 17
E STIMATING THE SHRINKAGE
I How do we determine the degree of smoothness, λ? Cross-validation
is one possible approach.
I Bayesian: λ is unknown ⇒ use a prior for λ.
I One possibility: λ ∼ Inv − χ2 (η0 , λ0 ). The user specifies η0 and λ0 .
I Alternative approach: specify the prior on the degrees of freedom.
I Hierarchical setup:
y| β, X ∼ N (Xβ, σ2 In )
β|σ2 , λ ∼ N 0, σ2 λ−1 Im
β|σ2 , λ, y ∼ N µn , Ωn−1
p ( β, σ2 , λ, k1 , ..., km |y, X)
I MCMC can be used to simulate from the joint posterior. Li and Villani
(2013, SJS).
I The basic spline model can be extended with:
I Heteroscedastic errors (also modelled with a spline)
I Non-normal errors (student-t or mixture distributions)
I Autocorrelated/dependent errors (AR process for the error term)
I MCMC can again be used to simulate from the joint posterior.
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 17 / 17