Bayes Learn L5

B AYESIAN L EARNING - L ECTURE 5
Mattias Villani
Division of Statistics and Machine Learning

Department of Computer and Information Science
Linköping University
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 1 / 17

L ECTURE OVERVIEW
I Normal model with conjugate prior

I The linear regression model
I Non-linear regression
I Regularization priors

N ORMAL MODEL - NORMAL PRIOR
I Model
iid
y1 , ..., yn |θ, σ2 ∼ N (θ, σ2 )
I Conjugate prior
σ2

2
θ |σ ∼ N µ0 ,
κ0
σ2 ∼ Inv -χ2 (ν0 , σ02 )

N ORMAL MODEL WITH NORMAL PRIOR
I Posterior
σ2

2
θ |y , σ ∼ N µn ,
κn
σ2 |y ∼ Inv -χ2 (νn , σn2 ).
where
κ0 n
µn = µ0 + ȳ
κ0 + n κ0 + n
κn = κ0 + n
νn = ν0 + n
κ0 n
νn σn2 = ν0 σ02 + (n − 1)s 2 + (ȳ − µ0 )2 .
κ0 + n
I Marginal posterior
θ ∼ tνn µn , σn2 /κn


T HE L INEAR R EGRESSION M ODEL
I The ordinary linear regression model:
yi = β 1 xi1 + β 2 xi2 + ... + β k xik + ε i

iid
ε i ∼ N (0, σ2 ).
I Parameters θ = ( β 1 , β 2 , ..., β k , σ2 ).
I Assumptions:
I E (yi ) = β 1 xi 1 + β 2 xi 2 + ... + β k xik (linear function)
I Var (yi ) = σ2 (homoscedasticity)
I Corr (yi , yj |X , β, σ2 ) = 0, i 6= j.
I Normality of ε i .
I The x’s are assumed known (non-random).

L INEAR REGRESSION IN MATRIX FORM
I The linear regression model in matrix form
y = Xβ + ε
(n ×1) (n×k )(k ×1) (n ×1)
     
y1 β1 ε1
y =  ...  , β =  ...  , ε =  .. 
    
. 
yn βk εn
 0   
x1 x11 · · · x1k
X =  ...  =  ... .. .. 
  
. . 
xn 0 xn1 · · · xnk
I Usually xi1 = 1, for all i. β 1 is the intercept.
I Likelihood for the full sample
y| β, σ2 , X ∼ N (Xβ, σ2 In )
L INEAR REGRESSION - UNIFORM PRIOR
I Standard non-informative prior: uniform on (β, log σ2 )
p ( β, σ2 ) ∝ σ−2
I Joint posterior of β and σ2 :
β|σ2 , y ∼ N β̂, σ2 (X0 X)−1

σ2 |y ∼ Inv -χ2 (n − k, s 2 )
where β̂ = (X0 X)−1 X0 y and s 2 = n−1 k (y − X β̂)0 (y − X β̂).

I Simulate from the joint posterior by iteratively simulating from
I p ( σ 2 |y )
I p ( β | σ2 , y )
I Marginal posterior of β :
β|y ∼ tn−k β̂, s 2 (X 0 X )−1


L INEAR REGRESSION - CONJUGATE PRIOR
I Joint prior for β and σ2
β|σ2 ∼ N µ0 , σ2 Ω0−1

σ2 ∼ Inv − χ2 ν0 , σ02

I Posterior
β|σ2 , y ∼ N µn , σ2 Ωn−1

σ2 |y ∼ Inv − χ2 νn , σn2

−1
µ n = X0 X + Ω 0 X0 X β̂ + Ω0 µ0

Ω n = X0 X + Ω 0
νn = ν0 + n
νn σn2 = ν0 σ02 + y0 y + µ00 Ω0 µ0 − µn0 Ωn µn


P OLYNOMIAL REGRESSION
I Polynomial regression
f (xi ) = β 0 + β 1 xi + β 2 xi2 + ... + β k xik .
y = XP β + ε,
where
XP = (1, x, x 2 , ..., x k ).
Quadratic regression
2
y
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Constant basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Linear basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Quadratic basis function
1
0.5
y
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

S PLINE REGRESSION
I Polynomials are too global. Need more local basis functions.
I Truncated power splines given knot locations k1 , ..., km
(xi − kj )p if xi > kj

bij =
0 otherwise
Piece−wise linear regression
2
y
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Constant basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Linear basis function
1
y
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Basis function for (x−0.4)+
1
0.5
y
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

S PLINES , CONT.
I Note: given the knots, the non-parametric spline regression model is a

linear regression of y on the m ’dummy variables’ bj
y = Xb β + ε,
where Xb is the basis regression matrix
Xb = (b1 , ..., bm ).
I It is also common to include an intercept and the linear part of the

model separately. In this case we have
Xb = (1, x, b1 , ..., bm ).

S MOOTHNESS PRIOR FOR SPLINES
I Problem: too many knots leads to over-fitting.
I Solution: smoothness/shrinkage/regularization prior
σ2

2 iid
β i |σ ∼ N 0,
λ
I Larger λ gives smoother fit. Note: here we have Ω0 = λI .
I Equivalent to a penalized likelihood:
−2 · log p ( β|σ2 , y, X) ∝ RSS ( β) + λβ0 β
I Posterior mean gives ridge regression estimator
−1 0
β̃ = X0 X + λI Xy
I Shrinkage toward zero
As λ → ∞, β̃ → 0
I When X0 X = I
1
β̃ = β̂ OLS
1+λ
B AYESIAN SPLINE WITH SMOOTHNESS PRIOR
λ = 0 λ = 0.5
0 0
−0.2 −0.2
LogRatio
LogRatio
−0.4 −0.4
−0.6 −0.6
Data
−0.8 Estimated E(y|x) −0.8
−1 −1
400 500 600 700 400 500 600 700
Range Range
λ = 1 λ = 10
0 0
−0.2 −0.2
LogRatio
LogRatio
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
400 500 600 700 400 500 600 700
Range Range
S MOOTHNESS PRIOR FOR SPLINES , CONT.
I The famous Lasso variable selection method is equivalent to using the
posterior mode estimate under the prior:
σ2

2 iid
β i |σ ∼ Laplace 0,
λ
with density
λ λ | βi |
p ( β i ) = 2 exp − 2
2σ σ
I The Bayesian shrinkage prior is interpretable. Not ad hoc.
I Laplace distribution have heavy tails.
I Laplace: many β i are close to zero, but some β i may be very large.
I Normal distribution have light tails.
I Normal prior: most β i are fairly equal in size, and no single β i can be
very much larger than the other ones.
E STIMATING THE SHRINKAGE
I How do we determine the degree of smoothness, λ? Cross-validation
is one possible approach.
I Bayesian: λ is unknown ⇒ use a prior for λ.
I One possibility: λ ∼ Inv − χ2 (η0 , λ0 ). The user specifies η0 and λ0 .
I Alternative approach: specify the prior on the degrees of freedom.
I Hierarchical setup:
y| β, X ∼ N (Xβ, σ2 In )
β|σ2 , λ ∼ N 0, σ2 λ−1 Im

σ2 ∼ Inv − χ2 (ν0 , σ02 )

λ ∼ Inv − χ2 (η0 , λ0 )
so Ω0 = λIm in the previous notation.

R EGRESSION WITH ESTIMATED SHRINKAGE
I The joint posterior of β, σ2 and λ is
β|σ2 , λ, y ∼ N µn , Ωn−1

σ2 |λ, y ∼ Inv − χ2 νn , σn2

s −νn /2
| Ω0 | νn σn2

p ( λ |y ) ∝ · p (λ)
| XT X + Ω 0 | 2
where Ω0 = λIm , and p (λ) is the prior for λ, and

−1
µ n = XT X + Ω 0 XT y
Ω n = XT X + Ω 0
νn = ν0 + n
n Ωn µn
νn σn2 = ν0 σ02 + yT y − µT

M ORE COMPLEXITY
I The location of the knots can be treated as unknown, and estimated
from the data. Joint posterior
p ( β, σ2 , λ, k1 , ..., km |y, X)
I The marginal posterior for λ, k1 , ..., km is a nightmare.
I MCMC can be used to simulate from the joint posterior. Li and Villani
(2013, SJS).
I The basic spline model can be extended with:
I Heteroscedastic errors (also modelled with a spline)
I Non-normal errors (student-t or mixture distributions)
I Autocorrelated/dependent errors (AR process for the error term)
I MCMC can again be used to simulate from the joint posterior.

Bayes Learn L5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Learn L5

Uploaded by

Copyright:

Available Formats

B AYESIAN L EARNING - L ECTURE 5

Division of Statistics and Machine Learning

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 1 / 17

I Normal model with conjugate prior

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 2 / 17

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 3 / 17

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 4 / 17

I The ordinary linear regression model:

yi = β 1 xi1 + β 2 xi2 + ... + β k xik + ε i

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 5 / 17

β|σ2 , y ∼ N β̂, σ2 (X0 X)−1

where β̂ = (X0 X)−1 X0 y and s 2 = n−1 k (y − X β̂)0 (y − X β̂).

β|y ∼ tn−k β̂, s 2 (X 0 X )−1

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 7 / 17

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 8 / 17

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 9 / 17

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 10 / 17

I Note: given the knots, the non-parametric spline regression model is a

where Xb is the basis regression matrix

I It is also common to include an intercept and the linear part of the

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 11 / 17

σ2 ∼ Inv − χ2 (ν0 , σ02 )

so Ω0 = λIm in the previous notation.

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 15 / 17

σ2 |λ, y ∼ Inv − χ2 νn , σn2

where Ω0 = λIm , and p (λ) is the prior for λ, and

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 16 / 17

I The marginal posterior for λ, k1 , ..., km is a nightmare.

You might also like