You are on page 1of 95

Lecture 1:

Basic Statistical Tools


Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Madison 19 23 May 2014

Basic probability
Events are possible outcomes from some
random process

e.g., a genotype is AA, a phenotype is larger


than 20

Pr(E) denotes the probability of an event E


Pr(E) is between zero and one
The sum of the probabilities of all possible
nonoverlapping events is one.
e.g, if the possible events are E1 , , Ek, then
Pr(E1) + + Pr(Ek) = 1

The AND rule


Consider two possible events, E1 and E2.
If these are independent (knowledge that
one has occurred does not change the
probability of the second), then the joint
probability Pr(E1,E2), the Probability of E1
AND E2, is Pr(E1,E2), = Pr(E1)* Pr(E2),
Hence, with independence, AND = multiply
Conditional probability is used when the
events are NOT independent
3

Example
Consider the cross AaBbCc X aaBbCc

What is the probability of an aabbcc offspring?


Assuming independent assortment (no linkage)
= Pr(aa | Aa x aa) * Pr(bb | Bb x Bb) * Pr(cc | Cc x
Cc) = (1/2)(1/4)(1/4) = 1/32
How many offspring do we need to score to have a
90% probability of seeing at least one aabbcc?
Let p = 1/32. Prob(not seeing aabbcc in n
offspring) = (1-p)n.
Prob(at least one) = 0.9 implies Prob(none) = 0.1
(1-p)n = 0.1, or n = log(0.1)/log(1-1/32) = 72.5
4

The OR rule
Again, consider two possible events, E1 and E2.
If these events are NONOVERLAPPING (they contain
no common elements), then Pr(E1 or E2) = Pr(E1) +
Pr( E2)
Hence, OR = add
Example: What is the probability that a genotype is
A-, i.e., that is AA or Aa?
The events genotype = AA and genotype = Aa
are nonoverlapping
Hence, Pr(A-) = Pr(AA or Aa) = Pr(AA) + Pr(Aa)

Conditional Probability
It is ALWAYS true that

Pr(A,B) = P(A|B)P(B) = P(B|A)P(A)


P(A|B) is the conditional probability of A given B
P(A) is the marginal probability of A
P(A,B) is the joint probability of A and B
If P(A|B) = P(A) for all possible B values, then A
and B are independent

Note that

P(A|B) = P(A,B)/P(B)

Examples of Prob (cont)


Recall that yellow peas (Y-) are dominant to green
peas (gg). Consider the F2 in a cross of YY x gg.
What is the probability of a yellow F2 offspring?
Pr(yellow) = Pr(YY or Yg)
= Pr(YY) + Pr(Yg) =1/4 + 1/2 = 3/4

What is the probability that a yellow F2 offspring is a YY


homozygote?
Pr(YY | F2 Yellow) = Pr(YY and F2 Yellow)/Pr(F2 yellow)
= (1/4)/(3/4) = 1/3.

Bayes Theorem
Suppose an unobservable random variable (RV) takes on
values b1 .. bn
Suppose that we observe the outcome A of an RV correlated
with b. What can we say about b given A?
Bayes theorem:

A typical application in genetics is that A is some


phenotype and b indexes some underlying (but unknown)
genotype
8

Example: BRCA1/2 & Breast


cancer
NCI statistics:
12% is lifetime risk of breast cancer in females
60% is lifetime risk if carry BRCA 1 or 2 mutation
One estimate of BRCA 1 or 2 allele frequency is
around 2.3%.
Question: Given a patent has breast cancer, what
is the chance that she has a BRCA 1 or BRCA 2
mutation?

Here

Event B = has a BRCA mutation


Event A = has breast cancer

Bayes: Pr(B|A) = Pr(A|B)* Pr(B)/Pr(A)


Pr(A) = 0.12
Pr(B) = 0.023
Pr(A|B) = 0.60
Hence, Pr(BRCA|Breast cancer)

= [0.60*0.023]/0.12 = 0.115

Hence, for the assumed BRCA frequency


(2.3%), 11.5% of all patients with breast
cancer have a BRCA mutation
10

Second example: Suppose height > 70. What is


the probability individual is QQ? Qq? qq?
Suppose:
Genotype

QQ

Qq

qq

Freq(genotype)

0.5

0.3

0.2

Pr(height >70 | genotype)

0.3

0.6

0.9

Pr(height > 70) = 0.3*0.5 +0.6*0.3 + 0.9*0.2 = 0.51

Pr(QQ | height > 70) =

Pr(QQ) * Pr (height > 70 | QQ)


Pr(height > 70)
= 0.5*0.3 / 0.51 = 0.294

11

Discrete Random Variables


A random variable (RV) = outcome (realization) not a set
value, but rather drawn from some probability distribution
A discrete RV x --- takes on values X1, X2, Xk
Probability distribution: Pi = Pr(x = Xi)
Probabilities are non-negative
and sum to one

Pi > 0,

P =1
i

Example: Suppose the probability of seeing no


individuals of genotype AABB in our sample is 0.1. What
is the probability of seeing at least one?
Pr(none) + Pr(at least one) = 1, hence
Pr(at least one) = 1-Pr(none) = 0.9
12

The Binominal Distribution


What is the expected number of successes in a series
of n trails where the probability p of success is the
same for each trail?
This is given by the binominal distribution,
Pr(k successes | n, p) = n!/[ (n-k)! k!] pk (1-p)n-k

Example: Suppose p = 0.05 and n = 10. What is the


probability of seeing EXACTLY one success?
Pr(k=1) = 10!/(9!*1!) 0.051 0.959 = 10* 0.051 0.959 = 0.315

What is the probability of seeing AT LEAST one


success?
Pr(k > 0) = 1 -Pr(k=0) = 1-(1-0.05)10 = 0.401

13

The Poisson Distribution


Given that the expected number of successes
in our sample is , what is the probability
that we see k successes?
This is given by the Poisson distribution
Pr(k successes | ) = e- k/k!

Example: suppose = 0.5.

Pr(k = 1) = e-0.50.51/1! = 0.303


Pr(at least one success) = 1- Pr(k = 0)
= 1 - e-0.5 = 0.393

Connection with binominal: = n*p

Can either use Poisson as an approximation or


when the sample size n is not given

14

The geometric distribution


Given success probability p per trail, how
many failures k occur before the first
success?
This is a waiting-time (as opposed to a
counting) problem, and is given by the
geometric distribution

Pr(k failures before a success) = (1-p)kp


Example: Suppose p = 0.05. What is the
probability of AT LEAST one success in the first
10 trails?
= 1 - Pr(none in 1st 10) = 1-(1-p)10 = 0.401
15

Continuous Random Variables


A continuous RV x can take on any possible value in some interval (or
set of intervals). The probability distribution is defined by the
probability density function (or pdf), p(x)

Finally, the cdf, or cumulative probability function,


is defined as cdf(z) = Pr( x < z)

Example: The normal (or Gaussian)


distribution
Mean , variance 2

Unit normal
(mean 0, variance 1)

17

Mean () = peak of
distribution

The variance is a measure of spread about the mean. The


smaller 2, the narrower the distribution about the mean

18

Joint and Conditional Probabilities


The probability for a pair (x,y) of random variables is
specified by the joint probability density function, p(x,y)

The marginal density of x, p(x)

Joint and Conditional Probabilities


p(y|x), the conditional density of y given x

Relationships among p(x), p(x,y), p(y|x)


x and y are said to be independent if p(x,y) = p(x) * p(y)

Note that p(y|x) = p(y) if x and y are independent

20

Expectations of Random Variables


The expected value, E [f(x)], of some function f of the
random variable x is just the average value of that function

E[x] = the (arithmetic) mean, , of a random variable x

21

Expectations of Random Variables


E[ (x - )2 ] = 2, the variance of x

More generally, the rth moment about the mean is given


by E[ (x - )r ]
r = 2: variance (2)
r = 3: skew (value is zero for a normal)
r = 4: (scaled) kurtosis (34 for a normal)
Useful properties of expectations
22

Covariances
Cov(x,y) = E [(x-x)(y-y)]

= E [x*y] - E[x]*E[y]

Cov(x,y) > 0, positive (linear) association between x & y


cov(X,Y) > 0
Y

23

Cov(x,y) < 0, negative (linear) association between x & y


cov(X,Y) < 0
Y

Cov(x,y) = 0, no linear association between x & y


cov(X,Y) = 0
Y

X
24

Cov(x,y) = 0 DOES NOT imply no association


cov(X,Y) = 0
Y

If x and y are independent, then cov(x,y) = 0


However, cov(x,y) = 0 DOES NOT imply that
x and y are independent.

25

Correlation
Cov = 10 tells us nothing about the strength of an
association
What is needed is an absolute measure of association
This is provided by the correlation, r(x,y)

r = 1 implies a perfect (positive) linear association


r = - 1 implies a perfect (negative) linear association
26

Useful Properties of Variances and


Covariances
Symmetry, Cov(x,y) = Cov(y,x)
The covariance of a variable with itself is the
variance, Cov(x,x) = Var(x)
If a is a constant, then
Cov(ax,y) = a Cov(x,y)
Var(a x) = a2 Var(x).
Var(ax) = Cov(ax,ax) = a2 Cov(x,x) = a2Var(x)
Cov(x+y,z) = Cov(x,z) + Cov(y,z)
27

More generally

Hence, the variance of a sum equals the sum of the


variances ONLY when the elements are uncorrelated
Question: What is Var(x-y)?
28

Regressions
Consider the best (linear) predictor of y given we know x

The slope of this linear regression is a function of Cov,

The fraction of the variation in y accounted for by knowing


x, i.e,Var(yhat - y), is r2
29

Relationship between the correlation and the regression


slope:

If Var(x) = Var(y), then by|x = b x|y = r(x,y)


In this case, the fraction of variation accounted for
by the regression is b2

30

r2 = 0.3

r2 = 0.9

r2 = 0.6

r2 = 1.0

31

Properties of Least-squares Regressions


The slope and intercept obtained by least-squares:
minimize the sum of squared residuals:

The average value of the residual is zero


The LS solution maximizes the amount of variation in
y that can be explained by a linear regression on x
Fraction of variance in y accounted by the regression
is r2
The residual errors around the least-squares regression
are uncorrelated with the predictor variable x
Homoscedastic vs. heteroscedastic residual variances

32

Different methods of analysis


Parameters of these various models can be
estimated in a number of frameworks
Method of moments

Very little assumptions about the underlying distribution.


Typically, the mean of some statistic has an expected value
of the parameter
Example: Estimate of the mean given by the sample
mean, xbar, as E(xbar) = .
While estimation does not require distribution assumptions,
confidence intervals and hypothesis testing do

Distribution-based estimation

The explicit form of the distribution used

33

Distribution-based estimation
Maximum likelihood estimation

MLE
REML
More in Lynch & Walsh (book) Appendix 3

Bayesian

More in Walsh & Lynch (online chapters = Vol 2)


Appendices 2,3

34

Maximum Likelihood
p(x1,, xn | ) = density of the observed data (x1,, xn)
given the (unknown) distribution parameter(s)
Fisher suggested the method of maximum likelihood --given the data (x1,, xn) find the value(s) of that
maximize p(x1,, xn | )
We usually express p(x1,, xn | ) as a likelihood
function l ( | x1,, xn ) to remind us that it is dependent
on the observed data
The Maximum Likelihood Estimator (MLE) of are the
value(s) that maximize the likelihood function l given
the observed data x1,, xn .%

35

MLE of
l ( | x)

%
This is formalized by looking at the log-likelihood surface,
L = ln [l ( | x) ]. Since ln is a monotonic function, the
value of that maximizes l also maximizes L
The curvature of the likelihood surface in the neighborhood of
the MLE informs us as to the precision of the estimator.
A narrow peak = high precision. A board peak = low precision
The larger the curvature, the smaller
the variance
36

Likelihood Ratio tests


Hypothesis testing in the ML frameworks occurs
through likelihood-ratio (LR) tests

r is the MLE under the restricted conditions (some


parameters specified, e.g., var =1)
r is the MLE under the unrestricted conditions (no
parameters specified)
For large sample sizes (generally) LR approaches a
Chi-square distribution with r df (r = number of
parameters assigned fixed values under null)

37

Bayesian Statistics
An extension of likelihood is Bayesian statistics
Instead of simply estimating a point estimate (e.g., the
MLE), the goal is the estimate the entire distribution
for the unknown parameter given the data x
p( | x) = C * l(x | ) p()
p( | x) is the posterior distribution for given the data x
l(x | ) is just the likelihood function
p() is the prior distribution on .
38

Bayesian Statistics
Why Bayesian?
Exact for any sample size
Marginal posteriors
Efficient use of any prior information
MCMC (such as Gibbs sampling) methods
Priors quantify the strength of any prior information.
Often these are taken to be diffuse (with a high
variance), so prior weights on spread over a wide
range of possible values.
39

p values in Hypothesis testing


The p value of a test statistic is the
probability of seeing a value as large (or
larger) under the null hypothesis
For example, suppose you are assuming a
random variable comes from a normal with
mean zero and variance one.

The probability of seeing a value more extreme


than 2 (i.e., greater than two or less than -2) is
0.0455, the p value associated with this value of
the test statistic.

40

Significance and multiple comparisons


One could either report a p value or have some criteria (i.e., any
test with a p value less than 0.01) that declares a test to be
significant (and hence a positive result)
p is the probability of a false positive, the probability of
declaring a test under the null as being significant.
The problem of multiple comparisons arises when a large
number of tests are performed.
Suppose our significance threshold is p = 0.005, but 1000
tests are done. Under the null, we still expect 0.005*1000
= 5 significant tests
Bonferroni corrections are done by first setting a significance level
for the entire COLLECTION of tests (say = 0.05). To have this
level experiment-wide control of false positives requires each test
uses p = /n
For n = 1000, an experiment-wide false positive rate
(probability) of 0.05 declares significance only with the p value
for a test is less than 0.05/1000 = 0.00005.

41

Power and Type I/II errors


A Type I error is the probability of declaring a
test to be significant when the null is true (a
false positive)
The power of a statistical test (a function of
the sample size and the true parameters) is
the probability of declaring a test to be
significant when the null is false.
A Type II error occurs when we fail to declare a
test significant when it is not from the null (i.e., a
false negative)

42

FDR, the false discovery rate


p is the probability of declaring a test under the null
to be significant (the false-positive rate)
When many tests are expected to be significant (i.e.,
looking for differences in expression over a large
number of genes), a more appropriate measure is
the false discovery rate (or FDR), the number of false
positives among all tests declared to be significant.

Example: Suppose 1000 tests with a significant threshold of


p = 0.05 is used. Expect 5 false positives, but suppose that
30 significant tests are found. Here the FDR = 5/30 =
0.167.
Hence, 16.7% of the positive tests are false positives

43

Lecture 1:
Intro/refresher in
Matrix Algebra
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016

Topics
Definitions, dimensionality, addition,
subtraction
Matrix multiplication
Inverses, solving systems of equations
Quadratic products and covariances
The multivariate normal distribution
Eigenstructure
Basic matrix calculations in R
The Singular Value Decompositon (SVD)
2

Matrices: An array of elements


Vectors: A matrix with either one row or one column.
Usually written in bold lowercase, e.g. a, b, c

Column vector

Row vector

(3 x 1)

(1 x 4)

Dimensionality of a matrix: r x c (rows x columns)


think of Railroad Car
3

General Matrices
Usually written in bold uppercase, e.g. A, C, D

Square matrix

(3 x 2)

Dimensionality of a matrix: r x c (rows x columns)


think of Railroad Car
A matrix is defined by a list of its elements.
B has ij-th element Bij -- the element in row i
and column j

Addition and Subtraction of Matrices


If two matrices have the same dimension (both are r x c),
then matrix addition and subtraction simply follows by
adding (or subtracting) on an element by element basis
Matrix addition: (A+B)ij = A ij + B ij
Matrix subtraction: (A-B)ij = A ij - B ij
Examples:

Partitioned Matrices
It will often prove useful to divide (or partition) the
elements of a matrix into a matrix whose elements are
itself matrices.

One useful partition is to write the matrix as


either a row vector of column vectors or
a column vector of row vectors

A column vector whose


elements are row vectors

A row vector whose


elements are column
vectors

Towards Matrix Multiplication: dot products


The dot (or inner) product of two vectors (both of
length n) is defined as follows:

Example:

a b = 1*4 + 2*5 + 3*7 + 4*9 = 60


8

Matrices are compact ways to write


systems of equations

The least-squares solution for the linear model


yields the following system of equations for the i

This can be more compactly written in matrix form as

XTX
or, =

"
(XTX)-1 XTy

XTy
10

Matrix Multiplication:
The order in which matrices are multiplied affects
the matrix product, e.g. AB = BA
For the product of two matrices to exist, the matrices
must conform. For AB, the number of columns of A must
equal the number of rows of B.
The matrix C = AB has the same number of rows as A
and the same number of columns as B.

11

Outer indices given dimensions of


resulting matrix, with r rows (A)
and c columns (B)

C(rxc) = A(rxk) B(kxc)


Inner indices must match
columns of A = rows of B
Example: Is the product ABCD defined? If so, what
is its dimensionality? Suppose
A3x5 B5x9 C9x6 D6x23
Yes, defined, as inner indices match. Result is a 3 x 23
matrix (3 rows, 23 columns)
12

More formally, consider the product L = MN


Express the matrix M as a column vector of row vectors

13

Example

ORDER of multiplication matters! Indeed, consider


C3x5 D5x5 which gives a 3 x 5 matrix, versus D5x5 C3x5 ,
which is not defined.
14

Matrix multiplication in R
R fills in the matrix from
the list c by filling in as
columns, here with 2 rows
(nrow=2)
Entering A or B displays what was
entered (always a good thing to check)
The command %*% is the R code
for the multiplication of two matrices
On your own: What is the matrix resulting from BA?
What is A if nrow=1 or nrow=4 is used?
15

The Transpose of a Matrix


The transpose of a matrix exchanges the
rows and columns, ATij = Aji
Useful identities
(AB)T = BT AT
(ABC)T = CT BT AT
Inner product = aTb = aT(1 X n) b (n X 1)
Indices match, matrices conform
Dimension of resulting product is 1 X 1 (i.e. a scalar)
Note that bTa = (bTa)T = aTb
16

Outer product = abT = a (n X 1) bT (1 X n)


Resulting product is an n x n matrix"

17

R code for transposition


t(A) = transpose of A

Enter the column vector a

Compute inner product aTa


Compute outer product aaT

18

Solving equations
The identity matrix I

Serves the same role as 1 in scalar algebra, e.g.,


a*1=1*a =a, with AI=IA= A

The inverse matrix A-1 (IF it exists)

Defined by A A-1 = I, A-1A = I


Serves the same role as scalar division

To solve ax = c, multiply both sides by (1/a) to give:


(1/a)*ax = (1/a)c or (1/a)*a*x = 1*x = x,
Hence x = (1/a)c
To solve Ax = c, A-1Ax = A-1 c
Or A-1Ax = Ix = x = A-1 c

19

The Identity Matrix, I


The identity matrix serves the role of the
number 1 in matrix multiplication: AI =A, IA = A
I is a square diagonal matrix, with all diagonal elements
being one, all off-diagonal elements zero."
Iij = "

1 for i = j
0 otherwise"

20

The Identity Matrix in R


diag(k), where k is an integer, return the k x k I matix

21

The Inverse Matrix, A-1


For a square matrix A, define its Inverse A-1, as
the matrix satisfying

22

If det(A) is not zero, A-1 exists and A is said to be


non-singular. If det(A) = 0, A is singular, and no
unique inverse exists (generalized inverses do)"
Generalized inverses, and their uses in solving systems
of equations, are discussed in Appendix 3 of Lynch &
Walsh
A- is the typical notation to denote the G-inverse of a
matrix
When a G-inverse is used, provided the system is
consistent, then some of the variables have a family
of solutions (e.g., x1 =2, but x2 + x3 = 6)
23

Inversion in R
solve(A) computes A-1
det(A) computes determinant of A
Using A entered earlier
Compute A-1

Showing that A-1 A

=I

Computing determinant of A
24

Homework
Put the following system of equations in matrix
form, and solve using R
3x1 + 4x2 + 4 x3 + 6x4 = -10
9x1 + 2x2 - x3 - 6x4 = 20
x1 + x2 + x3 - 10x4 = 2
2x1 + 9x2 + 2x3 - x4 = -10

25

Example: solve the OLS for in y = + 1z1 + 2z2 + e

If 12 = 0, these reduce to the two univariate slopes,

Likewise, if 12 = 1, this reduces to a univariate regression,


27

Useful identities "


(AT)-1 = (A-1)T
(AB)-1 = B-1 A-1
For a diagonal matrix D, then det (D), which is also
denoted by |D|, = product of the diagonal elements
Also, the determinant of any square matrix A,
det(A), is simply the product of the eigenvalues of A,
which statisfy
Ae = e
If A is n x n, solutions to are an n-degree polynomial. e is
the eigenvector associated with . If any of the roots to the
equation are zero, A-1 is not defined. In this case, for some
linear combination b, we have Ab = 0.
28

Variance-Covariance matrix
A very important square matrix is the
variance-covariance matrix V associated with
a vector x of random variables.
Vij = Cov(xi,xj), so that the i-th diagonal
element of V is the variance of xi, and off
-diagonal elements are covariances
V is a symmetric, square matrix

29

The trace
The trace, tr(A) or trace(A), of a square matrix
A is simply the sum of its diagonal elements
The importance of the trace is that it equals
the sum of the eigenvalues of A, tr(A) = i
For a covariance matrix V, tr(V) measures the
total amount of variation in the variables
i / tr(V) is the fraction of the total variation
in x contained in the linear combination eiTx, where
ei, the i-th principal component of V is also the
i-th eigenvector of V (Vei = i ei)

30

Eigenstructure in R
eigen(A) returns the eigenvalues and vectors of A

Trace = 60
PC 1 accounts for 34.4/60 =
57% of all the variation
0.400* x1 0.139*x2 + 0.906*x3
PC 1
31

Quadratic and Bilinear Forms


Quadratic product: for An x n and xn x 1
Scalar (1 x 1)
Bilinear Form (generalization of quadratic product)
for Am x n, an x 1, bm x1 their bilinear form is bT1 x m Am x n an x 1

Note that bTA a = aTAT b

32

Covariance Matrices for


Transformed Variables
What is the variance of the linear combination,
c1x1 + c2x2 + + cnxn ? (note this is a scalar)

Likewise, the covariance between two linear combinations


can be expressed as a bilinear form,
33

Example: Suppose the variances of x1, x2, and x3 are


10, 20, and 30. x1 and x2 have a covariance of -5,
x1 and x3 of 10, while x2 and x3 are uncorrelated.
What are the variances of the indices
y1 = x1-2x2+5x3 and y2 = 6x2-4x3?

Var(y1) = Var(c1Tx) = c1T Var(x) c1 = 960


Var(y2) = Var(c2Tx) = c2T Var(x) c2 = 1200
Cov(y1,y2) = Cov(c1Tx, c2Tx) = c1T Var(x) c2 = -910
Homework: use R to compute the above values

34

The Multivariate Normal


Distribution (MVN)
Consider the pdf for n independent normal
random variables, the ith of which has mean
i and variance 2i

This can be expressed more compactly in matrix form


35

Define the covariance matrix V for the vector x of


the n normal random variable by

Define the mean vector by gives

Hence in matrix from the MVN pdf becomes

Notice this holds for any vector and symmetric positive


36
-definite matrix V, as | V | > 0.

The multivariate normal


Just as a univariate normal is defined by
its mean and spread, a multivariate
normal is defined by its mean vector
(also called the centroid) and variance
-covariance matrix V

37

Vector of means determines location


Spread (geometry) about determined by V

"

x1, x2 equal variances,


positively correlated

"
x1, x2 equal variances,
uncorrelated

Eigenstructure (the eigenvectors and their corresponding


eigenvalues) determines the geometry of V.
38

Vector of means determines location


Spread (geometry) about determined by V

"

"

x1, x2 equal variances,


negatively correlated

Var(x1) < Var(x2),


uncorrelated

Positive tilt = positive correlations


Negative tilt = negative correlation
No tilt = uncorrelated

39

Eigenstructure of V
The direction of the largest axis of
variation is given by the unit-length
vector e1, the 1st eigenvector of V.

1 e1
2 e2

The next largest axis of orthogonal


(at 90 degrees from) e1, is
given by e2, the 2nd eigenvector

"

40

Properties of the MVN - I


1) If x is MVN, any subset of the variables in x is also MVN
2) If x is MVN, any linear combination of the
elements of x is also MVN. If x ~ MVN(,V)

41

Principal components
The principal components (or PCs) of a covariance
matrix define the axes of variation.

PC1 is the direction (linear combination cTx) that explains


the most variation.
PC2 is the next largest direction (at 90degree from PC1),
and so on

PCi = ith eigenvector of V


Fraction of variation accounted for by PCi = i /
trace(V)
If V has a few large eigenvalues, most of the variation
is distributed along a few linear combinations (axis
of variation)
The singular value decomposition is the
generalization of this idea to nonsquare matrices

42

Properties of the MVN - II


3) Conditional distributions are also MVN. Partition x
into two components, x1 (m dimensional column vector)
and x2 ( n-m dimensional column vector)

x1 | x2 is MVN with m-dimensional mean vector

and m x m covariance matrix


43

Properties of the MVN - III


4) If x is MVN, the regression of any subset of
x on another subset is linear and homoscedastic

Where e is MVN with mean vector 0 and


variance-covariance matrix

44

The regression is linear because it is a linear function


of x2
The regression is homoscedastic because the variancecovariance matrix for e does not depend on the value of
the xs

All these matrices are constant, and hence


the same for any value of x

45

Example: Regression of Offspring value on Parental values


Assume the vector of offspring value and the values of
both its parents is MVN. Then from the correlations
among (outbred) relatives,

46

Regression of Offspring value on Parental values (cont.)

Where e is normal with mean zero and variance

47

Hence, the regression of offspring trait value given


the trait values of its parents is
zo = o + h2/2(zs- s) + h2/2(zd- d) + e
where the residual e is normal with mean zero and
Var(e) = z2(1-h4/2)
Similar logic gives the regression of offspring breeding
value on parental breeding value as
Ao = o + (As- s)/2 + (Ad- d)/2 + e
= As/2 + Ad/2 + e
where the residual e is normal with mean zero and
Var(e) = A2/2
48

49

50

A data set for soybeans grown in New York (Gauch 1992) gives the
GE matrix as

Where GEij = value for


Genotype i in envir. j

51

For example, the rank-1 SVD approximation for GE32 is


g311e12 = 746.10*(-0.66)*0.64 = -315
While the rank-2 SVD approximation is g312e12 + g322e22 =
746.10*(-0.66)*0.64 + 131.36* 0.12*(-0.51) = -323
Actual value is -324
Generally, the rank-2 SVD approximation for GEij is
gi11e1j + gi22e2j

52

Additional R matrix commands

53

Additional R matrix commands (cont)

54

Additional references
Lynch & Walsh Chapter 8 (intro to
matrices)
Online notes:
Appendix 4 (Matrix geometry)
Appendix 5 (Matrix derivatives)

55

Lecture 3:
Linear and Mixed Models
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016

Quick Review of the Major Points


The general linear model can be written as

y = X + e
y = vector of observed dependent values
X = Design matrix: observations of the variables in the
assumed linear model
= vector of unknown parameters to estimate
e = vector of residuals (deviation from model fit),
e = y-X "
2

y = X + e
Solution to depends on the covariance structure
(= covariance matrix) of the vector e of residuals
Ordinary least squares (OLS)
OLS: e ~ MVN(0, 2 I)
Residuals are homoscedastic and uncorrelated,
so that we can write the cov matrix of e as Cov(e) = 2I
the OLS estimate, OLS() = (XTX)-1 XTy
Generalized least squares (GLS)
GLS: e ~ MVN(0, V)
Residuals are heteroscedastic and/or dependent,
GLS() = (XT V-1 X)-1 V-1 XTy
3

BLUE
Both the OLS and GLS solutions are also
called the Best Linear Unbiased Estimator (or
BLUE for short)
Whether the OLS or GLS form is used
depends on the assumed covariance
structure for the residuals
Special case of Var(e) = e2 I -- OLS
All others, i.e., Var(e) = R -- GLS

Linear Models
One tries to explain a dependent variable y as a linear
function of a number of independent (or predictor)
variables.
A multiple regression is a typical linear model,
Here e is the residual, or deviation between the true
value observed and the value predicted by the linear
model.
The (partial) regression coefficients are interpreted
as follows: a unit change in xi while holding all
other variables constant results in a change of i in y
5

Linear Models
As with a univariate regression (y = a + bx + e), the model
parameters are typically chosen by least squares,
wherein they are chosen to minimize the sum of
squared residuals, ei2
This unweighted sum of squared residuals assumes
an OLS error structure, so all residuals are equally
weighted (homoscedastic) and uncorrelated
If the residuals differ in variances and/or some are
correlated (GLS conditions), then we need to minimize
the weighted sum eTV-1e, which removes correlations and
gives all residuals equal variance.
6

Predictor and Indicator Variables


Suppose we measuring the offspring of p sires. One
linear model would be
yij = + si + eij
yij = trait value of offspring j from sire i
= overall mean. This term is included to give the si
terms a mean value of zero, i.e., they are expressed
as deviations from the mean
si = The effect for sire i (the mean of its offspring). Recall
that variance in the si estimates Cov(half sibs) = VA/4
eij = The deviation of the jth offspring from the family
mean of sire i. The variance of the es estimates the
within-family variance.

Predictor and Indicator Variables


In a regression, the predictor variables are
typically continuous, although they need not be.
yij = + si + eij
Note that the predictor variables here are the si, (the
value associated with sire i) something that we are trying
to estimate
We can write this in linear model form, yij = + k xiksi + eij ,
by using indicator variables,

Models consisting entirely of indicator variables


are typically called ANOVA, or analysis of variance
models
Models that contain no indicator variables (other than
for the mean), but rather consist of observed value of
continuous or discrete values are typically called
regression models
Both are special cases of the General Linear Model
(or GLM)
yijk = + si + dij + xijk + eijk
Example: Nested half sib/full sib design with an
age correction on the trait

Example: Nested half sib/full sib design with an


age correction on the trait
ANOVA model
yijk = + si + dij + xijk + eijk

Regression model
si = effect of sire i
dij = effect of dam j crossed to sire i
xijk = age of the kth offspring from i x j cross
10

Linear Models in Matrix Form


Suppose we have 3 variables in a multiple regression,
with four (y,x) vectors of observations.

The design matrix X. Details of both the experimental


design and the observed values of the predictor variables
all reside solely in X
11

In-class Exercise
Suppose you measure height and sprint speed for
five individuals, with heights (x) of 9, 10, 11, 12, 13
and associated sprint speeds (y) of 60, 138, 131, 170, 221
1) Write in matrix form (i.e, the design matrix
X and vector of unknowns) the following models
y = bx
y = a + bx
y = bx2
y = a + bx + cx2
2) Using the X and y associated with these models,
compute the OLS BLUE, = (XTX)-1XTy for each

12

Rank of the design matrix


With n observations and p unknowns, X is an n x p
matrix, so that XTX is p x p
Thus, at most X can provide unique estimates for up
to p < n parameters
The rank of X is the number of independent rows of
X. If X is of full rank, then rank = p
A parameter is said to be estimable if we can provide
a unique estimate of it. If the rank of X is k < p, then
exactly k parameters are estimable (some as linear
combinations, e.g. 1-33 = 4)
if det(XTX) = 0, then X is not of full rank
Number of nonzero eigenvalues of XTX gives the
rank of X.
13

Experimental design and X


The structure of X determines not only which
parameters are estimable, but also the expected
sample variances, as Var() = k (XTX)-1
Experimental design determines the structure of X
before an experiment (of course, missing data
almost always means the final X is different form the
proposed X)
Different criteria used for an optimal design. Let V =
(XTX)-1 . The idea is to chose a design for X given
the constraints of the experiment that:
A-optimality: minimizes tr(V)
D-optimality: minimizes det(V)
E-optimality: minimizes leading eigenvalue of V

14

Ordinary Least Squares (OLS)


When the covariance structure of the residuals has a
certain form, we solve for the vector using OLS
If residuals follow a MVN distribution, OLS = ML solution
If the residuals are homoscedastic and uncorrelated,
2(ei) = e2, (ei,ej) = 0. Hence, each residual is equally
weighted,
Sum of squared
residuals can
be written as
15

Ordinary Least Squares (OLS)


Taking (matrix) derivatives shows this is minimized by

This is the OLS estimate of the vector


The variance-covariance estimate for the sample estimates
is

The ij-th element gives the covariance between the


estimates of i and j.

16

Sample Variances/Covariances
The residual variance can be estimated as

The estimated residual variance can be substituted into

To give an approximation for the sampling variance and


covariances of our estimates.
Confidence intervals follow since the vector of estimates
~ MVN(, V)
17

Example: Regression Through the Origin


yi = xi + ei

18

Polynomial Regressions
GLM can easily handle any function of the observed
predictor variables, provided the parameters to estimate
are still linear, e.g. Y = + 1f(x) + 2g(x) + + e
Quadratic regression:

19

Interaction Effects
Interaction terms (e.g. sex x age) are handled similarly

With x1 held constant, a unit change in x2 changes y


by 2 + 3x1 (i.e., the slope in x2 depends on the current
value of x1 )
Likewise, a unit change in x1 changes y by 1 + 3x2
20

The GLM lets you build your


own model!
Suppose you want a quadratic regression
forced through the origin where the slope of
the quadratic term can vary over the sexes
(pollen vs. seed parents)
Yi = 1xi + 2xi2 + 3sixi2
si is an indicator (0/1) variable for the sex (0 =
male, 1 = female).
Male slope = 2,
Female slope = 2 + 3

21

Generalized Least Squares (GLS)


Suppose the residuals no longer have the same
variance (i.e., display heteroscedasticity). Clearly
we do not wish to minimize the unweighted sum
of squared residuals, because those residuals with
smaller variance should receive more weight.
Likewise in the event the residuals are correlated,
we also wish to take this into account (i.e., perform
a suitable transformation to remove the correlations)
before minimizing the sum of squares.
Either of the above settings leads to a GLS solution
in place of an OLS solution.

22

In the GLS setting, the covariance matrix for the


vector e of residuals is written as R where
Rij = (ei,ej)
The linear model becomes y = X + e, cov(e) = R
The GLS solution for is

The variance-covariance of the estimated model


parameters is given by

23

Model diagnostics
Its all about the residuals
Plot the residuals

Quick and easy screen for outliers

Test for normality among estimated residuals


Q-Q plot
Wilk-Shapiro test
If non-normal, try transformations, such as log

24

OLS, GLS summary

25

Fixed vs. Random Effects


In linear models are are trying to accomplish two goals:
estimation the values of model parameters and estimate
any appropriate variances.
For example, in the simplest regression model,
y = + x + e, we estimate the values for and and
also the variance of e. We, of course, can also
estimate the ei = yi - ( + xi )
Note that / are fixed constants are we trying to
estimate (fixed factors or fixed effects), while the
ei values are drawn from some probability distribution
(typically Normal with mean 0, variance 2e). The
ei are random effects.

26

This distinction between fixed and random effects is


extremely important in terms of how we analyzed a model.
If a parameter is a fixed constant we wish to estimate,
it is a fixed effect. If a parameter is drawn from
some probability distribution and we are trying to make
inferences on either the distribution and/or specific
realizations from this distribution, it is a random effect.
We generally speak of estimating fixed factors (BLUE) and
predicting random effects (BLUP -- best linear unbiased
Predictor)
Mixed models (MM) contain both fixed and random factors
y = Xb + Zu + e, u ~MVN(0,R), e ~ MVN(0,2eI)
Key: need to specify covariance structures for MM

27

Example: Sire model


yij = + si + eij
Here is a fixed effect, e a random effect
Is the sire effect s fixed or random ?
It depends. If we have (say) 10 sires, if we are ONLY
interested in the values of these particular 10 sires and
dont care to make any other inferences about the
population from which the sires are drawn, then we can
treat them as fixed effects. In the case, the model is
fully specified by the covariance structure for the residuals.
Thus, we need to estimate , s1 to s10 and 2e, and we
write the model as yij = + si + eij, 2(e) = 2e I
28

Random effects models


It is often useful to treat certain effects as
random, as opposed to fixed

Suppose we have k effects. If we treat these as


fixed, we lose k degrees of freedom
If we assume each of the k realizations are drawn
from a normal with mean zero and unknown
variance, only one degree of freedom lost --- that
for estimating the variance
We can then predict the values of the k realizations

29

Environmental effects
Consider yield data measured over several years in a
series of plots.
Standard to treat year-to-year variation at a specific
site as being random effects
Often the plot effects (mean value over years) are
also treated as random.
For example, consider plants group in growing
region i, location j within that region, and year
(season) k for that location-region effect
E = Ri + Lij + eijk

Typically R can be a fixed effect, while L and e are


random effects, Lik ~ N(0,2L) and eikj ~ N(0,2e)

30

Random models
With a random model, one is assuming that
all levels of a factor are not observed.
Rather, some subset of values are drawn
from some underlying distribution

For example, year to year variation in rainfall at a


location. Each year is a random sample from the
long-term distribution of rainfall values
Typically, assume a functional form for this
underlying distribution (e.g., normal with mean 0)
and then use observations to estimate the
distribution parameters (here, the variance)

31

Random models (cont)


Key feature:

Only one degree of freedom used (estimate of


the variance)
Using the fixed effects and the estimated
underlying distribution parameters, one then
predicts the actual realizations of the individual
values (i.e., the year effects)
Assumption: the covariance structure among the
individual realizations of the realized effects. If
only a variance is assume, this implies they are
independent. If they are assumed to be
correlated, this structure must be estimated.

32

Random models
Lets go back to treating yearly effects as random
If assume these are uncorrelated, only use one
degree of freedom, but makes assumptions about
covariance structure
Standard: Uncorrelated
Option: some sort of autocorrelation process, say with a
yearly decay of r (must also be estimated)

Conversely, could all be treated as fixed, but would


use k degrees of freedom for k years, but no
assumptions on their relationships (covariance
structure)

33

yij = + si + eij
Conversely, if we are not only interested in these
10 particular sires but also wish to make some
inference about the population from which they
were drawn (such as the additive variance, since
2A = 42s, ), then the si are random effects. In this
case we wish to estimate and the variances
2s and 2w. Since 2si also estimates (or predicts)
the breeding value for sire i, we also wish to
estimate (predict) these as well. Under a
random-effects interpretation, we write the model as
yij = + si + eij, 2(e) = 2eI, 2(s) = 2AA

34

Identifiability
Recall that a fixed effect is said to be
estimable if we can obtain a unique estimate
for it (either because X is of full rank or when
using a generalized inverse it returns a
unique estimate)
Lack of estimable arises because the experiment
design confounds effects

The analogous term for random models is


identifiability

The variance components have unique estimates


35

The general mixed model


Vector of fixed effects (to be estimated),
e.g., year, sex and age effects
Vector of
observations
(phenotypes)

Incidence matrix for random effects

y = X + Zu + e
Incidence
matrix for
fixed effects

Vector of residual errors


(random effects)

Vector of random
effects, such as
individual
Breeding values
(to be estimated)

36

The general mixed model


Vector of
observations
(phenotypes)

Vector of fixed effects


Incidence matrix for random effects

y = X + Zu + e
Incidence
matrix for
fixed effects

Vector of residual errors

Vector of random
effects

Observe y, X, Z.
Estimate fixed effects
Estimate random effects u, e

37

Means & Variances for y = X + Zu + e


Means: E(u) = E(e) = 0, E(y) = X
Variances:
Let R be the covariance matrix for the
residuals. We typically assume R = 2e*I
Let G be the covariance matrix for the vector
u of random effects
The covariance matrix for y becomes
V = ZGZT + R
Hence, y ~ MVN (X, V)
Mean X due to fixed effects
Variance V due to random effects

38

Chi-square and F distributions


Let Ui ~ N(0,1), i.e., a unit normal
The sum U12 + U22 + + Uk2 is a chi-square random
variable with k degrees of freedom
Under appropriate normality assumptions, the
sums of squares that appear in linear models
are also chi-square distributed. In particular,

The ratio of two chi-squares is an F distribution


39

In particular, an F distribution with k numerator


degrees of freedom, and n denominator degrees
of freedom is given by

The expected value of a chi-square with k degrees


of freedom is k, hence numerator and denominator
both have expected value one
F distributions frequently arise in tests
of linear models, as these usually involve ratios
of sums of squares.
40

Sums of Squares in linear models


The total sums of squares (SST) of a linear model
can be written as the sum of the error (or residual)
sum of squares and the model (or regression) sum
of squares
SST = SSM + SSE

r2, the coefficient of determination, is the


fraction of variation accounted for by the model
r2 =

SSM
SST

=1-

SSE
SST

41

Sums of Squares are quadratic products

We can write this as a quadratic product as

Where J is a matrix all of whose elements are 1s

42

Expected value of sums of


squares
In ANOVA tables, the E(MS), or expected
value of the Mean Squares (scaled SS or Sum
of Squares), often appears
This directly follows from the quadratic
product. If E(x) = , Var(x) = V, then
E(xTAx) = tr(AV) + TA"

43

Hypothesis testing
Provided the residual errors in the model are MVN, then for a model
with n observations and p estimated parameters,

Consider the comparison of a full (p parameters)


and reduced (q < p) models, where SSEr = error SS for
reduced model, SSEf = error SS for full model

The difference in the error sum of squares for the full and reduced
model provided a test for whether the model fit is the same

This ratio follows an Fp-q,n-p distribution

44

Does our model account for a significant fraction of the


variation?
Here the reduced model is just yi = u + ei
In this case, the error sum of squares for the
reduced model is just the total sum of squares,
and the F test ratio becomes

This ratio follows an Fp-1,n-p distribution


45

Different statistical models


GLM = general linear model

OLS ordinary least squares: e ~ MVN(0,cI)


GLS generalized least squares: e ~ MVN(0,R)

Mixed models

Both fixed and random effects (beyond the residual)

Mixture models

A weighted mixture of distributions

Generalized linear models

Nonlinear functions, non-normality

46

Mixture models
Under a mixture model, an observation potentially
comes from one of several different distributions, so
that the density function is 11 + 22 + 33

The mixture proportions i sum to one


The i represent different distribution, e.g., normal with mean i
and variance 2

Mixture models come up in QTL mapping -- an


individual could have QTL genotype QQ, Qq, or qq
See Lynch & Walsh Chapter 13

They also come up in codon models of evolution, were a


site may be neutral, deleterious, or advantageous, each
with a different distribution of selection coefficients
See Walsh & Lynch (volume 2A website), Chapters 10,11

47

Generalized linear models

Typically assume non-normal distribution for


residuals, e.g., Poisson, binomial, gamma, etc

48

Likelihoods for GLMs


Under assumption of MVN, x ~ MVN( ,V), the likelihood
function becomes
!
!
1
!
!
!

!
=
!
T
!

!
!
1
!
2
!
1
!
L(,V | x) =! (!2!!)!
j!V!j!
ex! !p! ! (!x!! ")! V! (x! ! ! !)!
2!
-

n =2

Variance components (e.g., 2A, 2e, etc.) are included in V

REML = restricted maximum likelihood. Method of


choice for variance components, as it maximizes
that part of the likelihood function that is independent
of the fixed effects, .
49

Lecture 4:
Introduction to Quantitative
Genetics
Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Edinburgh, 31 Oct - 4 Nov 2016

Basic model of Quantitative Genetics


Phenotypic value -- we will occasionally
also use z for this value

Basic model: P = G + E

Environmental value

Genotypic value

G = average phenotypic value for that genotype


if we are able to replicate it over the universe
of environmental values, G = E[P]
Hence, genotypic values are functions of the
environments experienced.

Basic model of Quantitative Genetics


Basic model: P = G + E
G = average phenotypic value for that genotype
if we are able to replicate it over the universe
of environmental values, G = E[P]
G = average value of an inbred line over a series
of environments
G x E interaction --- The performance of a particular
genotype in a particular environment differs from
the sum of the average performance of that
genotype over all environments and the average
performance of that environment over all genotypes.
Basic model now becomes P = G + E + GE
3

East (1911) data


on US maize
crosses

Same G, Var(P) = Var(E)

Each sample (P1, P2, F1) has same G, all variation in


P is due to variation in E

All same G, hence


Var(P) = Var(E)

Variation in G
Var(P) = Var(G) +
Var(E)

Var(F2) > Var(F1) due to Variation in G

Johannsen (1903) bean data


Johannsen had a series of fully inbred
(= pure) lines.
There was a consistent between-line
difference in the mean bean size
Differences in G across lines

However, within a given line, size of


parental seed independent of size of
offspring speed
No variation in G within a line

The transmission of genotypes versus


alleles
With fully inbred lines, offspring have the same genotype as
their parent, and hence the entire parental genotypic value G is
passed along
Hence, favorable interactions between alleles (such as with
dominance) are not lost by randomization under random mating
but rather passed along.

When offspring are generated by crossing (or random mating),


each parent contributes a single allele at each locus to its
offspring, and hence only passes along a PART of its genotypic
value
This part is determined by the average effect of the allele

Downside is that favorable interaction between alleles are NOT


passed along to their offspring in a diploid (but, as we will see, are
in an autoteraploid)

Genotypic values
It will prove very useful to decompose the genotypic
value into the difference between homozygotes (2a) and
a measure of dominance (d or k = d/a)
aa
C-a

Aa

AA

C+d

C+a

Note that the constant C is the average value of


the two homozygotes.
If no dominance, d = 0, as heterozygote value equals
the average of the two parents. Can also write d = ka,
so that G(Aa) = C + ak

10

Computing a and d
Suppose a major locus influences plant height, with
the following values
Genotype

aa

Aa

AA

Trait value

10

15

16

C = [G(AA) + G(aa)]/2 = (16+10)/2 = 13


a = [G(AA) - G(aa)]/2 = (16-10)/2 = 3
d = G(Aa)] - [G(AA) + G(aa)]/2
= G(Aa)] - C = 15 - 13 = 2

11

Population means: Random mating


Let p = freq(A), q = 1-p = freq(a). Assuming
random-mating (Hardy-Weinberg frequencies),
Genotype

aa

Aa

AA

Value

C-a

C+d

C+a

Frequency

q2

2pq

p2

Mean = q2(C - a) + 2pq(C + d) + p2(C + a)


RM = C + a(q-p) + d(2pq)
Contribution from
homozygotes

Contribution from
heterozygotes

12

Population means: Inbred cross F2


Suppose two inbred lines are crossed. If A is fixed
in one population and a in the other, then p = q = 1/2
Genotype

aa

Aa

AA

Value

C-a

C+d

C+a

Frequency

1/4

1/2

1/4

Mean = (1/4)(C - a) + (1/2)(C + d) + (1/4)( C + a)


RM = C + d/2
Note that C is the average of the two parental lines, so when d
> 0, F2 exceeds this. Note also that the F1 exceeds
this average by d, so only half of this passed onto F2.

13

Population means: RILs from an F2


A large number of F2 individuals are fully inbred, either by selfing
for many generations or by generating doubled haploids. If p an
q denote the F2 frequencies of A and a, what is the expected
mean over the set of resulting RILs?

Genotype

aa

Aa

AA

Value

C-a

C+d

C+a

Frequency

RILs = C + a(p-q)
Note this is independent of the amount of dominance (d)
14

The average effect of an allele


The average effect A of an allele A is defined by the
difference between offspring that get allele A and a
random offspring.
A = mean(offspring value given parent transmits
A) - mean(all offspring)
Similar definition for a.
Note that while C, a, and d (the genotypic
parameters) do not change with allele frequency, x
is clearly a function of the frequencies of alleles with
which allele x combines.

15

Random mating
Consider the average effect of allele A when a parent is randomlymated to another individual from its population

Suppose parent contributes A


Allele from other
parent

Probability

Genotype

Value

AA

C+a

Aa

C+d

Mean(A transmitted) = p(C + a) + q(C + d) = C + pa + qd


A = Mean(A transmitted) - = q[a + d(q-p)]

16

Random mating
Now suppose parent contributes a
Allele from other
parent

Probability

Genotype

Value

Aa

C+d

aa

C-a

Mean(a transmitted) = p(C + d) + q(C - a) = C - qa + pd


a = Mean(a transmitted) - = -p[a + d(q-p)]

17

, the average effect of an


allelic substitution
= A - a is the average effect of an allelic
substitution, the change in mean trait value when an
a allele in a random individual is replaced by an A
allele

= a + d(q-p). Note that


A = q and a =-p.

E(X) = pA + qa = pq - qp = 0,
The average effect of a random allele is zero,
hence average effects are deviations from the
mean
18

Dominance deviations
Fisher (1918) decomposed the contribution
to the genotypic value from a single locus as
Gij = + i + j + ij
Here, is the mean (a function of p)
i are the average effects
Hence, + i + j is the predicted genotypic
value given the average effect (over all
genotypes) of alleles i and j.
The dominance deviation associated with
genotype Gij is the difference between its true
value and its value predicted from the sum of
average effects (essentially a residual)

19

Fishers (1918) Decomposition of G


One of Fishers key insights was that the genotypic value
consists of a fraction that can be passed from parent to
offspring and a fraction that cannot.
In particular, under sexual reproduction, parents only
pass along SINGLE ALLELES to their offspring
Consider the genotypic value Gij resulting from an
AiAj individual

Gij = G + i + j + ij
Average contribution to genotypic value for allele i
Mean value G = Gij Freq(AiAj)

20

Gij = G + i + j + ij
Since parents pass along single alleles to their
offspring, the i (the average effect of allele i)
represent these contributions
The average effect for an allele is POPULATIONSPECIFIC, as it depends on the types and frequencies
of alleles that it pairs with
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
21

Gij = G + i + j + ij
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
Dominance deviations --- the difference (for genotype
AiAj) between the genotypic value predicted from the
two single alleles and the actual genotypic value,

^ij = ij
Gij - G
22

+ 22

Genotypic Value

G22
G21

12
+ 21

22

+ 1 + 2

Slope = = 2 - 1

1
11

G11
0
11

N = # Copies of Allele 2

Genotypes

1
21

2
22

23

Fishers decomposition is a Regression

Gij = G + i + j + ij
Predicted value

Residual error

A notational change clearly shows this is a regression,

Gij = G + 21 +(2 1) N + ij
Independent (predictor) variable N = # of A2 alleles
Note that the slope 2 - 1 = , the average effect
of an allelic substitution
24

Gij = G + 21 + (2 1) N + ij
Intercept

Regression slope

A key point is that the average effects change with


allele frequencies. Indeed, if overdominance is present
they can change sign with allele frequencies.
25

Allele A2 common, 1 > 2


G21
G22

G
G11
0

1
2
N
The size of the circle denotes the weight associated with
that genotype. While the genotypic values do not change,
their frequencies (and hence weights) do.
26

Allele A1 common, 2 > 1


G21

Slope = 2 - 1
G22

G
G11
0

1
N

Again, same genotypic values as previous slide, but


different weights, and hence a different slope
(here a change in sign!)

27

Both A1 and A2 frequent, 1 = 2 = 0


G21
G22

G
G11
0

With these allele frequencies, both alleles have the same


mean value when transmitted, so that all parents have the
same average offspring value -- no response to selection
28

Average Effects and Additive Genetic Values


The values are the average effects of an allele
A key concept is the Additive Genetic Value (A) of
an individual

A ( G ij ) = i + j

i(k) = effect of allele i at locus k


A is called the Breeding value or the Additive genetic
value

29

Why all the fuss over A?


Suppose pollen parent has A = 10 and seed parent has
A = -2 for plant height
Expected average offspring height is (10 - 2)/2
= 4 units above the population mean. Offspring A =
average of parental As
KEY: parents only pass single alleles to their offspring.
Hence, they only pass along the A part of their genotypic
value G
30

Genetic Variances
Writing the genotypic value as

Gij = G + (i + j) + ij
The genetic variance can be written as

This follows since


As Cov(,) = 0
31

Genetic Variances

Additive Genetic Variance


(or simply Additive Variance)

Dominance Genetic Variance


(or simply dominance variance)

Hence, total genetic variance = additive + dominance


variances,

$G2 = $ 2A + $D2

32

Key concepts (so far)

i = average effect of allele i

Property of a single allele in a particular population


(depends on genetic background)

A = Additive Genetic Value (A)

A = sum (over all loci) of average effects


Fraction of G that parents pass along to their offspring
Property of an Individual in a particular population

Var(A) = additive genetic variance


Variance in additive genetic values
Property of a population

Can estimate A or Var(A) without knowing any of the


underlying genetical detail (forthcoming)

33

Q1Q1

Q1Q2

Q2Q2

a(1+k)

2a

Since E[] = 0,
Var() = E[( -a)2] = E[2]
One locus, 2 alleles:

When dominance present, Additive variance is an


asymmetric function of allele frequencies
34

Dominance variance

Q1Q1

Q1Q2

Q2Q2

a(1+k)

2a

This is a symmetric function of


allele frequencies

Can also be expressed in terms of d = ak


35

Additive variance, VA, with no dominance (k = 0)

VA

Allele frequency, p
36

Complete dominance (k = 1)
VA

VD

Allele frequency, p
37

Epistasis

These components are defined to be uncorrelated,


(or orthogonal), so that

38

Additive x Additive interactions -- , AA


interactions between a single allele
at one locus with a single allele at another
Additive x Dominance interactions -- , AD
interactions between an allele at one
locus with the genotype at another, e.g.
allele Ai and genotype Bkj
Dominance x dominance interaction --- , DD
the interaction between the dominance
deviation at one locus with the dominance
deviation at another.

39

You might also like