Day 1

Lecture 1:
Basic Statistical Tools

Bruce Walsh lecture notes
Short Course on Evolutionary
Quantitative Genetics,
Madison 19 23 May 2014
Basic probability
Events are possible outcomes from some
random process
e.g., a genotype is AA, a phenotype is larger

than 20
Pr(E) denotes the probability of an event E

Pr(E) is between zero and one
The sum of the probabilities of all possible
nonoverlapping events is one.
e.g, if the possible events are E1 , , Ek, then
Pr(E1) + + Pr(Ek) = 1
The AND rule

Consider two possible events, E1 and E2.
If these are independent (knowledge that
one has occurred does not change the
probability of the second), then the joint
probability Pr(E1,E2), the Probability of E1
AND E2, is Pr(E1,E2), = Pr(E1)* Pr(E2),
Hence, with independence, AND = multiply
Conditional probability is used when the
events are NOT independent
3
Example
Consider the cross AaBbCc X aaBbCc
What is the probability of an aabbcc offspring?

Assuming independent assortment (no linkage)
= Pr(aa | Aa x aa) * Pr(bb | Bb x Bb) * Pr(cc | Cc x
Cc) = (1/2)(1/4)(1/4) = 1/32
How many offspring do we need to score to have a
90% probability of seeing at least one aabbcc?
Let p = 1/32. Prob(not seeing aabbcc in n
offspring) = (1-p)n.
Prob(at least one) = 0.9 implies Prob(none) = 0.1
(1-p)n = 0.1, or n = log(0.1)/log(1-1/32) = 72.5
4
The OR rule
Again, consider two possible events, E1 and E2.
If these events are NONOVERLAPPING (they contain
no common elements), then Pr(E1 or E2) = Pr(E1) +
Pr( E2)
Hence, OR = add
Example: What is the probability that a genotype is
A-, i.e., that is AA or Aa?
The events genotype = AA and genotype = Aa
are nonoverlapping
Hence, Pr(A-) = Pr(AA or Aa) = Pr(AA) + Pr(Aa)
Conditional Probability
It is ALWAYS true that
Pr(A,B) = P(A|B)P(B) = P(B|A)P(A)

P(A|B) is the conditional probability of A given B
P(A) is the marginal probability of A
P(A,B) is the joint probability of A and B
If P(A|B) = P(A) for all possible B values, then A
and B are independent
Note that
P(A|B) = P(A,B)/P(B)
Examples of Prob (cont)

Recall that yellow peas (Y-) are dominant to green
peas (gg). Consider the F2 in a cross of YY x gg.
What is the probability of a yellow F2 offspring?
Pr(yellow) = Pr(YY or Yg)
= Pr(YY) + Pr(Yg) =1/4 + 1/2 = 3/4
What is the probability that a yellow F2 offspring is a YY

homozygote?
Pr(YY | F2 Yellow) = Pr(YY and F2 Yellow)/Pr(F2 yellow)
= (1/4)/(3/4) = 1/3.
Bayes Theorem
Suppose an unobservable random variable (RV) takes on
values b1 .. bn
Suppose that we observe the outcome A of an RV correlated
with b. What can we say about b given A?
Bayes theorem:
A typical application in genetics is that A is some

phenotype and b indexes some underlying (but unknown)
genotype
8
Example: BRCA1/2 & Breast

cancer
NCI statistics:
12% is lifetime risk of breast cancer in females
60% is lifetime risk if carry BRCA 1 or 2 mutation
One estimate of BRCA 1 or 2 allele frequency is
around 2.3%.
Question: Given a patent has breast cancer, what
is the chance that she has a BRCA 1 or BRCA 2
mutation?
Here
Event B = has a BRCA mutation

Event A = has breast cancer
Bayes: Pr(B|A) = Pr(A|B)* Pr(B)/Pr(A)

Pr(A) = 0.12
Pr(B) = 0.023
Pr(A|B) = 0.60
Hence, Pr(BRCA|Breast cancer)
= [0.60*0.023]/0.12 = 0.115
Hence, for the assumed BRCA frequency

(2.3%), 11.5% of all patients with breast
cancer have a BRCA mutation
10
Second example: Suppose height > 70. What is

the probability individual is QQ? Qq? qq?
Suppose:
Genotype
QQ
Qq
qq
Freq(genotype)
0.5
0.3
0.2
Pr(height >70 | genotype)
0.3
0.6
0.9
Pr(height > 70) = 0.3*0.5 +0.6*0.3 + 0.9*0.2 = 0.51
Pr(QQ | height > 70) =
Pr(QQ) * Pr (height > 70 | QQ)

Pr(height > 70)
= 0.5*0.3 / 0.51 = 0.294
11
Discrete Random Variables

A random variable (RV) = outcome (realization) not a set
value, but rather drawn from some probability distribution
A discrete RV x --- takes on values X1, X2, Xk
Probability distribution: Pi = Pr(x = Xi)
Probabilities are non-negative
and sum to one
Pi > 0,
P =1
i
Example: Suppose the probability of seeing no

individuals of genotype AABB in our sample is 0.1. What
is the probability of seeing at least one?
Pr(none) + Pr(at least one) = 1, hence
Pr(at least one) = 1-Pr(none) = 0.9
12
The Binominal Distribution

What is the expected number of successes in a series
of n trails where the probability p of success is the
same for each trail?
This is given by the binominal distribution,
Pr(k successes | n, p) = n!/[ (n-k)! k!] pk (1-p)n-k
Example: Suppose p = 0.05 and n = 10. What is the

probability of seeing EXACTLY one success?
Pr(k=1) = 10!/(9!*1!) 0.051 0.959 = 10* 0.051 0.959 = 0.315
What is the probability of seeing AT LEAST one

success?
Pr(k > 0) = 1 -Pr(k=0) = 1-(1-0.05)10 = 0.401
13
The Poisson Distribution

Given that the expected number of successes
in our sample is , what is the probability
that we see k successes?
This is given by the Poisson distribution
Pr(k successes | ) = e- k/k!
Example: suppose = 0.5.
Pr(k = 1) = e-0.50.51/1! = 0.303

Pr(at least one success) = 1- Pr(k = 0)
= 1 - e-0.5 = 0.393
Connection with binominal: = n*p
Can either use Poisson as an approximation or

when the sample size n is not given
14
The geometric distribution

Given success probability p per trail, how
many failures k occur before the first
success?
This is a waiting-time (as opposed to a
counting) problem, and is given by the
geometric distribution
Pr(k failures before a success) = (1-p)kp

Example: Suppose p = 0.05. What is the
probability of AT LEAST one success in the first
10 trails?
= 1 - Pr(none in 1st 10) = 1-(1-p)10 = 0.401
15
Continuous Random Variables

A continuous RV x can take on any possible value in some interval (or
set of intervals). The probability distribution is defined by the
probability density function (or pdf), p(x)
Finally, the cdf, or cumulative probability function,

is defined as cdf(z) = Pr( x < z)
Example: The normal (or Gaussian)

distribution
Mean , variance 2
Unit normal
(mean 0, variance 1)
17
Mean () = peak of
distribution
The variance is a measure of spread about the mean. The

smaller 2, the narrower the distribution about the mean
18
Joint and Conditional Probabilities

The probability for a pair (x,y) of random variables is
specified by the joint probability density function, p(x,y)
The marginal density of x, p(x)
Joint and Conditional Probabilities

p(y|x), the conditional density of y given x
Relationships among p(x), p(x,y), p(y|x)

x and y are said to be independent if p(x,y) = p(x) * p(y)
Note that p(y|x) = p(y) if x and y are independent
20
Expectations of Random Variables

The expected value, E [f(x)], of some function f of the
random variable x is just the average value of that function
E[x] = the (arithmetic) mean, , of a random variable x
21
Expectations of Random Variables

E[ (x - )2 ] = 2, the variance of x
More generally, the rth moment about the mean is given

by E[ (x - )r ]
r = 2: variance (2)
r = 3: skew (value is zero for a normal)
r = 4: (scaled) kurtosis (34 for a normal)
Useful properties of expectations
22
Covariances
Cov(x,y) = E [(x-x)(y-y)]
= E [x*y] - E[x]*E[y]
Cov(x,y) > 0, positive (linear) association between x & y

cov(X,Y) > 0
Y
23
Cov(x,y) < 0, negative (linear) association between x & y

cov(X,Y) < 0
Y
Cov(x,y) = 0, no linear association between x & y

cov(X,Y) = 0
Y
X
24
Cov(x,y) = 0 DOES NOT imply no association

cov(X,Y) = 0
Y
If x and y are independent, then cov(x,y) = 0

However, cov(x,y) = 0 DOES NOT imply that
x and y are independent.
25
Correlation
Cov = 10 tells us nothing about the strength of an
association
What is needed is an absolute measure of association
This is provided by the correlation, r(x,y)
r = 1 implies a perfect (positive) linear association

r = - 1 implies a perfect (negative) linear association
26
Useful Properties of Variances and

Covariances
Symmetry, Cov(x,y) = Cov(y,x)
The covariance of a variable with itself is the
variance, Cov(x,x) = Var(x)
If a is a constant, then
Cov(ax,y) = a Cov(x,y)
Var(a x) = a2 Var(x).
Var(ax) = Cov(ax,ax) = a2 Cov(x,x) = a2Var(x)
Cov(x+y,z) = Cov(x,z) + Cov(y,z)
27
More generally
Hence, the variance of a sum equals the sum of the

variances ONLY when the elements are uncorrelated
Question: What is Var(x-y)?
28
Regressions
Consider the best (linear) predictor of y given we know x
The slope of this linear regression is a function of Cov,
The fraction of the variation in y accounted for by knowing

x, i.e,Var(yhat - y), is r2
29
Relationship between the correlation and the regression

slope:
If Var(x) = Var(y), then by|x = b x|y = r(x,y)

In this case, the fraction of variation accounted for
by the regression is b2
30
r2 = 0.3
r2 = 0.9
r2 = 0.6
r2 = 1.0
31
Properties of Least-squares Regressions

The slope and intercept obtained by least-squares:
minimize the sum of squared residuals:
The average value of the residual is zero

The LS solution maximizes the amount of variation in
y that can be explained by a linear regression on x
Fraction of variance in y accounted by the regression
is r2
The residual errors around the least-squares regression
are uncorrelated with the predictor variable x
Homoscedastic vs. heteroscedastic residual variances
32
Different methods of analysis

Parameters of these various models can be
estimated in a number of frameworks
Method of moments
Very little assumptions about the underlying distribution.

Typically, the mean of some statistic has an expected value
of the parameter
Example: Estimate of the mean given by the sample
mean, xbar, as E(xbar) = .
While estimation does not require distribution assumptions,
confidence intervals and hypothesis testing do
Distribution-based estimation
The explicit form of the distribution used
33
Distribution-based estimation
Maximum likelihood estimation
MLE
REML
More in Lynch & Walsh (book) Appendix 3
Bayesian
More in Walsh & Lynch (online chapters = Vol 2)

Appendices 2,3
34
Maximum Likelihood
p(x1,, xn | ) = density of the observed data (x1,, xn)
given the (unknown) distribution parameter(s)
Fisher suggested the method of maximum likelihood --given the data (x1,, xn) find the value(s) of that
maximize p(x1,, xn | )
We usually express p(x1,, xn | ) as a likelihood
function l ( | x1,, xn ) to remind us that it is dependent
on the observed data
The Maximum Likelihood Estimator (MLE) of are the
value(s) that maximize the likelihood function l given
the observed data x1,, xn .%
35
MLE of
l ( | x)
%
This is formalized by looking at the log-likelihood surface,
L = ln [l ( | x) ]. Since ln is a monotonic function, the
value of that maximizes l also maximizes L
The curvature of the likelihood surface in the neighborhood of
the MLE informs us as to the precision of the estimator.
A narrow peak = high precision. A board peak = low precision
The larger the curvature, the smaller
the variance
36
Likelihood Ratio tests

Hypothesis testing in the ML frameworks occurs
through likelihood-ratio (LR) tests
r is the MLE under the restricted conditions (some

parameters specified, e.g., var =1)
r is the MLE under the unrestricted conditions (no
parameters specified)
For large sample sizes (generally) LR approaches a
Chi-square distribution with r df (r = number of
parameters assigned fixed values under null)
37
Bayesian Statistics
An extension of likelihood is Bayesian statistics
Instead of simply estimating a point estimate (e.g., the
MLE), the goal is the estimate the entire distribution
for the unknown parameter given the data x
p( | x) = C * l(x | ) p()
p( | x) is the posterior distribution for given the data x
l(x | ) is just the likelihood function
p() is the prior distribution on .
38
Bayesian Statistics
Why Bayesian?
Exact for any sample size
Marginal posteriors
Efficient use of any prior information
MCMC (such as Gibbs sampling) methods
Priors quantify the strength of any prior information.
Often these are taken to be diffuse (with a high
variance), so prior weights on spread over a wide
range of possible values.
39
p values in Hypothesis testing

The p value of a test statistic is the
probability of seeing a value as large (or
larger) under the null hypothesis
For example, suppose you are assuming a
random variable comes from a normal with
mean zero and variance one.
The probability of seeing a value more extreme

than 2 (i.e., greater than two or less than -2) is
0.0455, the p value associated with this value of
the test statistic.
40
Significance and multiple comparisons

One could either report a p value or have some criteria (i.e., any
test with a p value less than 0.01) that declares a test to be
significant (and hence a positive result)
p is the probability of a false positive, the probability of
declaring a test under the null as being significant.
The problem of multiple comparisons arises when a large
number of tests are performed.
Suppose our significance threshold is p = 0.005, but 1000
tests are done. Under the null, we still expect 0.005*1000
= 5 significant tests
Bonferroni corrections are done by first setting a significance level
for the entire COLLECTION of tests (say = 0.05). To have this
level experiment-wide control of false positives requires each test
uses p = /n
For n = 1000, an experiment-wide false positive rate
(probability) of 0.05 declares significance only with the p value
for a test is less than 0.05/1000 = 0.00005.
41
Power and Type I/II errors

A Type I error is the probability of declaring a
test to be significant when the null is true (a
false positive)
The power of a statistical test (a function of
the sample size and the true parameters) is
the probability of declaring a test to be
significant when the null is false.
A Type II error occurs when we fail to declare a
test significant when it is not from the null (i.e., a
false negative)
42
FDR, the false discovery rate

p is the probability of declaring a test under the null
to be significant (the false-positive rate)
When many tests are expected to be significant (i.e.,
looking for differences in expression over a large
number of genes), a more appropriate measure is
the false discovery rate (or FDR), the number of false
positives among all tests declared to be significant.
Example: Suppose 1000 tests with a significant threshold of

p = 0.05 is used. Expect 5 false positives, but suppose that
30 significant tests are found. Here the FDR = 5/30 =
0.167.
Hence, 16.7% of the positive tests are false positives
43
Lecture 1:
Intro/refresher in
Matrix Algebra
Edinburgh, 31 Oct - 4 Nov 2016
Topics
Definitions, dimensionality, addition,
subtraction
Matrix multiplication
Inverses, solving systems of equations
Quadratic products and covariances
The multivariate normal distribution
Eigenstructure
Basic matrix calculations in R
The Singular Value Decompositon (SVD)
2
Matrices: An array of elements

Vectors: A matrix with either one row or one column.
Usually written in bold lowercase, e.g. a, b, c
Column vector
Row vector
(3 x 1)
(1 x 4)
Dimensionality of a matrix: r x c (rows x columns)

think of Railroad Car
3
General Matrices
Usually written in bold uppercase, e.g. A, C, D
Square matrix
(3 x 2)
Dimensionality of a matrix: r x c (rows x columns)

think of Railroad Car
A matrix is defined by a list of its elements.
B has ij-th element Bij -- the element in row i
and column j
Addition and Subtraction of Matrices

If two matrices have the same dimension (both are r x c),
then matrix addition and subtraction simply follows by
adding (or subtracting) on an element by element basis
Matrix addition: (A+B)ij = A ij + B ij
Matrix subtraction: (A-B)ij = A ij - B ij
Examples:
Partitioned Matrices
It will often prove useful to divide (or partition) the
elements of a matrix into a matrix whose elements are
itself matrices.
One useful partition is to write the matrix as

either a row vector of column vectors or
a column vector of row vectors
A column vector whose

elements are row vectors
A row vector whose

elements are column
vectors
Towards Matrix Multiplication: dot products

The dot (or inner) product of two vectors (both of
length n) is defined as follows:
Example:
a b = 1*4 + 2*5 + 3*7 + 4*9 = 60

8
Matrices are compact ways to write

systems of equations
The least-squares solution for the linear model

yields the following system of equations for the i
This can be more compactly written in matrix form as
XTX
or, =
"
(XTX)-1 XTy
XTy
10
Matrix Multiplication:
The order in which matrices are multiplied affects
the matrix product, e.g. AB = BA
For the product of two matrices to exist, the matrices
must conform. For AB, the number of columns of A must
equal the number of rows of B.
The matrix C = AB has the same number of rows as A
and the same number of columns as B.
11
Outer indices given dimensions of

resulting matrix, with r rows (A)
and c columns (B)
C(rxc) = A(rxk) B(kxc)

Inner indices must match
columns of A = rows of B
Example: Is the product ABCD defined? If so, what
is its dimensionality? Suppose
A3x5 B5x9 C9x6 D6x23
Yes, defined, as inner indices match. Result is a 3 x 23
matrix (3 rows, 23 columns)
12
More formally, consider the product L = MN

Express the matrix M as a column vector of row vectors
13
Example
ORDER of multiplication matters! Indeed, consider

C3x5 D5x5 which gives a 3 x 5 matrix, versus D5x5 C3x5 ,
which is not defined.
14
Matrix multiplication in R
R fills in the matrix from
the list c by filling in as
columns, here with 2 rows
(nrow=2)
Entering A or B displays what was
entered (always a good thing to check)
The command %*% is the R code
for the multiplication of two matrices
On your own: What is the matrix resulting from BA?
What is A if nrow=1 or nrow=4 is used?
15
The Transpose of a Matrix

The transpose of a matrix exchanges the
rows and columns, ATij = Aji
Useful identities
(AB)T = BT AT
(ABC)T = CT BT AT
Inner product = aTb = aT(1 X n) b (n X 1)
Indices match, matrices conform
Dimension of resulting product is 1 X 1 (i.e. a scalar)
Note that bTa = (bTa)T = aTb
16
Outer product = abT = a (n X 1) bT (1 X n)

Resulting product is an n x n matrix"
17
R code for transposition

t(A) = transpose of A
Enter the column vector a
Compute inner product aTa

Compute outer product aaT
18
Solving equations
The identity matrix I
Serves the same role as 1 in scalar algebra, e.g.,

a*1=1*a =a, with AI=IA= A
The inverse matrix A-1 (IF it exists)
Defined by A A-1 = I, A-1A = I

Serves the same role as scalar division
To solve ax = c, multiply both sides by (1/a) to give:

(1/a)*ax = (1/a)c or (1/a)*a*x = 1*x = x,
Hence x = (1/a)c
To solve Ax = c, A-1Ax = A-1 c
Or A-1Ax = Ix = x = A-1 c
19
The Identity Matrix, I

The identity matrix serves the role of the
number 1 in matrix multiplication: AI =A, IA = A
I is a square diagonal matrix, with all diagonal elements
being one, all off-diagonal elements zero."
Iij = "
1 for i = j
0 otherwise"
20
The Identity Matrix in R

diag(k), where k is an integer, return the k x k I matix
21
The Inverse Matrix, A-1

For a square matrix A, define its Inverse A-1, as
the matrix satisfying
22
If det(A) is not zero, A-1 exists and A is said to be

non-singular. If det(A) = 0, A is singular, and no
unique inverse exists (generalized inverses do)"
Generalized inverses, and their uses in solving systems
of equations, are discussed in Appendix 3 of Lynch &
Walsh
A- is the typical notation to denote the G-inverse of a
matrix
When a G-inverse is used, provided the system is
consistent, then some of the variables have a family
of solutions (e.g., x1 =2, but x2 + x3 = 6)
23
Inversion in R
solve(A) computes A-1
det(A) computes determinant of A
Using A entered earlier
Compute A-1
Showing that A-1 A
=I
Computing determinant of A
24
Homework
Put the following system of equations in matrix
form, and solve using R
3x1 + 4x2 + 4 x3 + 6x4 = -10
9x1 + 2x2 - x3 - 6x4 = 20
x1 + x2 + x3 - 10x4 = 2
2x1 + 9x2 + 2x3 - x4 = -10
25
Example: solve the OLS for in y = + 1z1 + 2z2 + e
If 12 = 0, these reduce to the two univariate slopes,
Likewise, if 12 = 1, this reduces to a univariate regression,

27
Useful identities "

(AT)-1 = (A-1)T
(AB)-1 = B-1 A-1
For a diagonal matrix D, then det (D), which is also
denoted by |D|, = product of the diagonal elements
Also, the determinant of any square matrix A,
det(A), is simply the product of the eigenvalues of A,
which statisfy
Ae = e
If A is n x n, solutions to are an n-degree polynomial. e is
the eigenvector associated with . If any of the roots to the
equation are zero, A-1 is not defined. In this case, for some
linear combination b, we have Ab = 0.
28
Variance-Covariance matrix
A very important square matrix is the
variance-covariance matrix V associated with
a vector x of random variables.
Vij = Cov(xi,xj), so that the i-th diagonal
element of V is the variance of xi, and off
-diagonal elements are covariances
V is a symmetric, square matrix
29
The trace
The trace, tr(A) or trace(A), of a square matrix
A is simply the sum of its diagonal elements
The importance of the trace is that it equals
the sum of the eigenvalues of A, tr(A) = i
For a covariance matrix V, tr(V) measures the
total amount of variation in the variables
i / tr(V) is the fraction of the total variation
in x contained in the linear combination eiTx, where
ei, the i-th principal component of V is also the
i-th eigenvector of V (Vei = i ei)
30
Eigenstructure in R
eigen(A) returns the eigenvalues and vectors of A
Trace = 60
PC 1 accounts for 34.4/60 =
57% of all the variation
0.400* x1 0.139*x2 + 0.906*x3
PC 1
31
Quadratic and Bilinear Forms

Quadratic product: for An x n and xn x 1
Scalar (1 x 1)
Bilinear Form (generalization of quadratic product)
for Am x n, an x 1, bm x1 their bilinear form is bT1 x m Am x n an x 1
Note that bTA a = aTAT b
32
Covariance Matrices for

Transformed Variables
What is the variance of the linear combination,
c1x1 + c2x2 + + cnxn ? (note this is a scalar)
Likewise, the covariance between two linear combinations

can be expressed as a bilinear form,
33
Example: Suppose the variances of x1, x2, and x3 are

10, 20, and 30. x1 and x2 have a covariance of -5,
x1 and x3 of 10, while x2 and x3 are uncorrelated.
What are the variances of the indices
y1 = x1-2x2+5x3 and y2 = 6x2-4x3?
Var(y1) = Var(c1Tx) = c1T Var(x) c1 = 960

Var(y2) = Var(c2Tx) = c2T Var(x) c2 = 1200
Cov(y1,y2) = Cov(c1Tx, c2Tx) = c1T Var(x) c2 = -910
Homework: use R to compute the above values
34
The Multivariate Normal

Distribution (MVN)
Consider the pdf for n independent normal
random variables, the ith of which has mean
i and variance 2i
This can be expressed more compactly in matrix form

35
Define the covariance matrix V for the vector x of

the n normal random variable by
Define the mean vector by gives
Hence in matrix from the MVN pdf becomes
Notice this holds for any vector and symmetric positive

36
-definite matrix V, as | V | > 0.
The multivariate normal

Just as a univariate normal is defined by
its mean and spread, a multivariate
normal is defined by its mean vector
(also called the centroid) and variance
-covariance matrix V
37
Vector of means determines location

Spread (geometry) about determined by V
"
x1, x2 equal variances,

positively correlated
"
uncorrelated
Eigenstructure (the eigenvectors and their corresponding

eigenvalues) determines the geometry of V.
38
Vector of means determines location

Spread (geometry) about determined by V
"
"

negatively correlated
Var(x1) < Var(x2),

uncorrelated
Positive tilt = positive correlations

Negative tilt = negative correlation
No tilt = uncorrelated
39
Eigenstructure of V
The direction of the largest axis of
variation is given by the unit-length
vector e1, the 1st eigenvector of V.
1 e1
2 e2
The next largest axis of orthogonal

(at 90 degrees from) e1, is
given by e2, the 2nd eigenvector
"
40
Properties of the MVN - I

1) If x is MVN, any subset of the variables in x is also MVN
2) If x is MVN, any linear combination of the
elements of x is also MVN. If x ~ MVN(,V)
41
Principal components
The principal components (or PCs) of a covariance
matrix define the axes of variation.
PC1 is the direction (linear combination cTx) that explains

the most variation.
PC2 is the next largest direction (at 90degree from PC1),
and so on
PCi = ith eigenvector of V

Fraction of variation accounted for by PCi = i /
trace(V)
If V has a few large eigenvalues, most of the variation
is distributed along a few linear combinations (axis
of variation)
The singular value decomposition is the
generalization of this idea to nonsquare matrices
42
Properties of the MVN - II

3) Conditional distributions are also MVN. Partition x
into two components, x1 (m dimensional column vector)
and x2 ( n-m dimensional column vector)
x1 | x2 is MVN with m-dimensional mean vector
and m x m covariance matrix

43
Properties of the MVN - III

4) If x is MVN, the regression of any subset of
x on another subset is linear and homoscedastic
Where e is MVN with mean vector 0 and

variance-covariance matrix
44
The regression is linear because it is a linear function

of x2
The regression is homoscedastic because the variancecovariance matrix for e does not depend on the value of
the xs
All these matrices are constant, and hence

the same for any value of x
45
Example: Regression of Offspring value on Parental values

Assume the vector of offspring value and the values of
both its parents is MVN. Then from the correlations
among (outbred) relatives,
46
Regression of Offspring value on Parental values (cont.)
Where e is normal with mean zero and variance
47
Hence, the regression of offspring trait value given

the trait values of its parents is
zo = o + h2/2(zs- s) + h2/2(zd- d) + e
where the residual e is normal with mean zero and
Var(e) = z2(1-h4/2)
Similar logic gives the regression of offspring breeding
value on parental breeding value as
Ao = o + (As- s)/2 + (Ad- d)/2 + e
= As/2 + Ad/2 + e
where the residual e is normal with mean zero and
Var(e) = A2/2
48
49
50
A data set for soybeans grown in New York (Gauch 1992) gives the
GE matrix as
Where GEij = value for

Genotype i in envir. j
51
For example, the rank-1 SVD approximation for GE32 is

g311e12 = 746.10*(-0.66)*0.64 = -315
While the rank-2 SVD approximation is g312e12 + g322e22 =
746.10*(-0.66)*0.64 + 131.36* 0.12*(-0.51) = -323
Actual value is -324
Generally, the rank-2 SVD approximation for GEij is
gi11e1j + gi22e2j
52
Additional R matrix commands
53
Additional R matrix commands (cont)
54
Additional references
Lynch & Walsh Chapter 8 (intro to
matrices)
Online notes:
Appendix 4 (Matrix geometry)
Appendix 5 (Matrix derivatives)
55
Lecture 3:
Linear and Mixed Models
Quick Review of the Major Points

The general linear model can be written as
y = X + e
y = vector of observed dependent values
X = Design matrix: observations of the variables in the
assumed linear model
= vector of unknown parameters to estimate
e = vector of residuals (deviation from model fit),
e = y-X "
2
y = X + e
Solution to depends on the covariance structure
(= covariance matrix) of the vector e of residuals
Ordinary least squares (OLS)
OLS: e ~ MVN(0, 2 I)
Residuals are homoscedastic and uncorrelated,
so that we can write the cov matrix of e as Cov(e) = 2I
the OLS estimate, OLS() = (XTX)-1 XTy
Generalized least squares (GLS)
GLS: e ~ MVN(0, V)
Residuals are heteroscedastic and/or dependent,
GLS() = (XT V-1 X)-1 V-1 XTy
3
BLUE
Both the OLS and GLS solutions are also
called the Best Linear Unbiased Estimator (or
BLUE for short)
Whether the OLS or GLS form is used
depends on the assumed covariance
structure for the residuals
Special case of Var(e) = e2 I -- OLS
All others, i.e., Var(e) = R -- GLS
Linear Models
One tries to explain a dependent variable y as a linear
function of a number of independent (or predictor)
variables.
A multiple regression is a typical linear model,
Here e is the residual, or deviation between the true
value observed and the value predicted by the linear
model.
The (partial) regression coefficients are interpreted
as follows: a unit change in xi while holding all
other variables constant results in a change of i in y
5
Linear Models
As with a univariate regression (y = a + bx + e), the model
parameters are typically chosen by least squares,
wherein they are chosen to minimize the sum of
squared residuals, ei2
This unweighted sum of squared residuals assumes
an OLS error structure, so all residuals are equally
weighted (homoscedastic) and uncorrelated
If the residuals differ in variances and/or some are
correlated (GLS conditions), then we need to minimize
the weighted sum eTV-1e, which removes correlations and
gives all residuals equal variance.
6
Predictor and Indicator Variables

Suppose we measuring the offspring of p sires. One
linear model would be
yij = + si + eij
yij = trait value of offspring j from sire i
= overall mean. This term is included to give the si
terms a mean value of zero, i.e., they are expressed
as deviations from the mean
si = The effect for sire i (the mean of its offspring). Recall
that variance in the si estimates Cov(half sibs) = VA/4
eij = The deviation of the jth offspring from the family
mean of sire i. The variance of the es estimates the
within-family variance.
Predictor and Indicator Variables

In a regression, the predictor variables are
typically continuous, although they need not be.
yij = + si + eij
Note that the predictor variables here are the si, (the
value associated with sire i) something that we are trying
to estimate
We can write this in linear model form, yij = + k xiksi + eij ,
by using indicator variables,
Models consisting entirely of indicator variables

are typically called ANOVA, or analysis of variance
models
Models that contain no indicator variables (other than
for the mean), but rather consist of observed value of
continuous or discrete values are typically called
regression models
Both are special cases of the General Linear Model
(or GLM)
yijk = + si + dij + xijk + eijk
Example: Nested half sib/full sib design with an
age correction on the trait
Example: Nested half sib/full sib design with an

age correction on the trait
ANOVA model
yijk = + si + dij + xijk + eijk
Regression model
si = effect of sire i
dij = effect of dam j crossed to sire i
xijk = age of the kth offspring from i x j cross
10
Linear Models in Matrix Form

Suppose we have 3 variables in a multiple regression,
with four (y,x) vectors of observations.
The design matrix X. Details of both the experimental

design and the observed values of the predictor variables
all reside solely in X
11
In-class Exercise
Suppose you measure height and sprint speed for
five individuals, with heights (x) of 9, 10, 11, 12, 13
and associated sprint speeds (y) of 60, 138, 131, 170, 221
1) Write in matrix form (i.e, the design matrix
X and vector of unknowns) the following models
y = bx
y = a + bx
y = bx2
y = a + bx + cx2
2) Using the X and y associated with these models,
compute the OLS BLUE, = (XTX)-1XTy for each
12
Rank of the design matrix

With n observations and p unknowns, X is an n x p
matrix, so that XTX is p x p
Thus, at most X can provide unique estimates for up
to p < n parameters
The rank of X is the number of independent rows of
X. If X is of full rank, then rank = p
A parameter is said to be estimable if we can provide
a unique estimate of it. If the rank of X is k < p, then
exactly k parameters are estimable (some as linear
combinations, e.g. 1-33 = 4)
if det(XTX) = 0, then X is not of full rank
Number of nonzero eigenvalues of XTX gives the
rank of X.
13
Experimental design and X

The structure of X determines not only which
parameters are estimable, but also the expected
sample variances, as Var() = k (XTX)-1
Experimental design determines the structure of X
before an experiment (of course, missing data
almost always means the final X is different form the
proposed X)
Different criteria used for an optimal design. Let V =
(XTX)-1 . The idea is to chose a design for X given
the constraints of the experiment that:
A-optimality: minimizes tr(V)
D-optimality: minimizes det(V)
E-optimality: minimizes leading eigenvalue of V
14
Ordinary Least Squares (OLS)

When the covariance structure of the residuals has a
certain form, we solve for the vector using OLS
If residuals follow a MVN distribution, OLS = ML solution
If the residuals are homoscedastic and uncorrelated,
2(ei) = e2, (ei,ej) = 0. Hence, each residual is equally
weighted,
Sum of squared
residuals can
be written as
15
Ordinary Least Squares (OLS)

Taking (matrix) derivatives shows this is minimized by
This is the OLS estimate of the vector

The variance-covariance estimate for the sample estimates
is
The ij-th element gives the covariance between the

estimates of i and j.
16
Sample Variances/Covariances
The residual variance can be estimated as
The estimated residual variance can be substituted into
To give an approximation for the sampling variance and

covariances of our estimates.
Confidence intervals follow since the vector of estimates
~ MVN(, V)
17
Example: Regression Through the Origin

yi = xi + ei
18
Polynomial Regressions
GLM can easily handle any function of the observed
predictor variables, provided the parameters to estimate
are still linear, e.g. Y = + 1f(x) + 2g(x) + + e
Quadratic regression:
19
Interaction Effects
Interaction terms (e.g. sex x age) are handled similarly
With x1 held constant, a unit change in x2 changes y

by 2 + 3x1 (i.e., the slope in x2 depends on the current
value of x1 )
Likewise, a unit change in x1 changes y by 1 + 3x2
20
The GLM lets you build your

own model!
Suppose you want a quadratic regression
forced through the origin where the slope of
the quadratic term can vary over the sexes
(pollen vs. seed parents)
Yi = 1xi + 2xi2 + 3sixi2
si is an indicator (0/1) variable for the sex (0 =
male, 1 = female).
Male slope = 2,
Female slope = 2 + 3
21
Generalized Least Squares (GLS)

Suppose the residuals no longer have the same
variance (i.e., display heteroscedasticity). Clearly
we do not wish to minimize the unweighted sum
of squared residuals, because those residuals with
smaller variance should receive more weight.
Likewise in the event the residuals are correlated,
we also wish to take this into account (i.e., perform
a suitable transformation to remove the correlations)
before minimizing the sum of squares.
Either of the above settings leads to a GLS solution
in place of an OLS solution.
22
In the GLS setting, the covariance matrix for the

vector e of residuals is written as R where
Rij = (ei,ej)
The linear model becomes y = X + e, cov(e) = R
The GLS solution for is
The variance-covariance of the estimated model

parameters is given by
23
Model diagnostics
Its all about the residuals
Plot the residuals
Quick and easy screen for outliers
Test for normality among estimated residuals

Q-Q plot
Wilk-Shapiro test
If non-normal, try transformations, such as log
24
OLS, GLS summary
25
Fixed vs. Random Effects

In linear models are are trying to accomplish two goals:
estimation the values of model parameters and estimate
any appropriate variances.
For example, in the simplest regression model,
y = + x + e, we estimate the values for and and
also the variance of e. We, of course, can also
estimate the ei = yi - ( + xi )
Note that / are fixed constants are we trying to
estimate (fixed factors or fixed effects), while the
ei values are drawn from some probability distribution
(typically Normal with mean 0, variance 2e). The
ei are random effects.
26
This distinction between fixed and random effects is

extremely important in terms of how we analyzed a model.
If a parameter is a fixed constant we wish to estimate,
it is a fixed effect. If a parameter is drawn from
some probability distribution and we are trying to make
inferences on either the distribution and/or specific
realizations from this distribution, it is a random effect.
We generally speak of estimating fixed factors (BLUE) and
predicting random effects (BLUP -- best linear unbiased
Predictor)
Mixed models (MM) contain both fixed and random factors
y = Xb + Zu + e, u ~MVN(0,R), e ~ MVN(0,2eI)
Key: need to specify covariance structures for MM
27
Example: Sire model

yij = + si + eij
Here is a fixed effect, e a random effect
Is the sire effect s fixed or random ?
It depends. If we have (say) 10 sires, if we are ONLY
interested in the values of these particular 10 sires and
dont care to make any other inferences about the
population from which the sires are drawn, then we can
treat them as fixed effects. In the case, the model is
fully specified by the covariance structure for the residuals.
Thus, we need to estimate , s1 to s10 and 2e, and we
write the model as yij = + si + eij, 2(e) = 2e I
28
Random effects models

It is often useful to treat certain effects as
random, as opposed to fixed
Suppose we have k effects. If we treat these as

fixed, we lose k degrees of freedom
If we assume each of the k realizations are drawn
from a normal with mean zero and unknown
variance, only one degree of freedom lost --- that
for estimating the variance
We can then predict the values of the k realizations
29
Environmental effects
Consider yield data measured over several years in a
series of plots.
Standard to treat year-to-year variation at a specific
site as being random effects
Often the plot effects (mean value over years) are
also treated as random.
For example, consider plants group in growing
region i, location j within that region, and year
(season) k for that location-region effect
E = Ri + Lij + eijk
Typically R can be a fixed effect, while L and e are

random effects, Lik ~ N(0,2L) and eikj ~ N(0,2e)
30
Random models
With a random model, one is assuming that
all levels of a factor are not observed.
Rather, some subset of values are drawn
from some underlying distribution
For example, year to year variation in rainfall at a

location. Each year is a random sample from the
long-term distribution of rainfall values
Typically, assume a functional form for this
underlying distribution (e.g., normal with mean 0)
and then use observations to estimate the
distribution parameters (here, the variance)
31
Random models (cont)

Key feature:
Only one degree of freedom used (estimate of

the variance)
Using the fixed effects and the estimated
underlying distribution parameters, one then
predicts the actual realizations of the individual
values (i.e., the year effects)
Assumption: the covariance structure among the
individual realizations of the realized effects. If
only a variance is assume, this implies they are
independent. If they are assumed to be
correlated, this structure must be estimated.
32
Random models
Lets go back to treating yearly effects as random
If assume these are uncorrelated, only use one
degree of freedom, but makes assumptions about
covariance structure
Standard: Uncorrelated
Option: some sort of autocorrelation process, say with a
yearly decay of r (must also be estimated)
Conversely, could all be treated as fixed, but would

use k degrees of freedom for k years, but no
assumptions on their relationships (covariance
structure)
33
yij = + si + eij
Conversely, if we are not only interested in these
10 particular sires but also wish to make some
inference about the population from which they
were drawn (such as the additive variance, since
2A = 42s, ), then the si are random effects. In this
case we wish to estimate and the variances
2s and 2w. Since 2si also estimates (or predicts)
the breeding value for sire i, we also wish to
estimate (predict) these as well. Under a
random-effects interpretation, we write the model as
yij = + si + eij, 2(e) = 2eI, 2(s) = 2AA
34
Identifiability
Recall that a fixed effect is said to be
estimable if we can obtain a unique estimate
for it (either because X is of full rank or when
using a generalized inverse it returns a
unique estimate)
Lack of estimable arises because the experiment
design confounds effects
The analogous term for random models is

identifiability
The variance components have unique estimates

35
The general mixed model

Vector of fixed effects (to be estimated),
e.g., year, sex and age effects
Vector of
observations
(phenotypes)
Incidence matrix for random effects
y = X + Zu + e
Incidence
matrix for
fixed effects
Vector of residual errors

(random effects)
Vector of random
effects, such as
individual
Breeding values
(to be estimated)
36
The general mixed model

Vector of
observations
(phenotypes)
Vector of fixed effects

Incidence matrix for random effects
y = X + Zu + e
Incidence
matrix for
fixed effects
Vector of residual errors
Vector of random
effects
Observe y, X, Z.
Estimate fixed effects
Estimate random effects u, e
37
Means & Variances for y = X + Zu + e

Means: E(u) = E(e) = 0, E(y) = X
Variances:
Let R be the covariance matrix for the
residuals. We typically assume R = 2e*I
Let G be the covariance matrix for the vector
u of random effects
The covariance matrix for y becomes
V = ZGZT + R
Hence, y ~ MVN (X, V)
Mean X due to fixed effects
Variance V due to random effects
38
Chi-square and F distributions

Let Ui ~ N(0,1), i.e., a unit normal
The sum U12 + U22 + + Uk2 is a chi-square random
variable with k degrees of freedom
Under appropriate normality assumptions, the
sums of squares that appear in linear models
are also chi-square distributed. In particular,
The ratio of two chi-squares is an F distribution

39
In particular, an F distribution with k numerator

degrees of freedom, and n denominator degrees
of freedom is given by
The expected value of a chi-square with k degrees

of freedom is k, hence numerator and denominator
both have expected value one
F distributions frequently arise in tests
of linear models, as these usually involve ratios
of sums of squares.
40
Sums of Squares in linear models

The total sums of squares (SST) of a linear model
can be written as the sum of the error (or residual)
sum of squares and the model (or regression) sum
of squares
SST = SSM + SSE
r2, the coefficient of determination, is the

fraction of variation accounted for by the model
r2 =
SSM
SST
=1-
SSE
SST
41
Sums of Squares are quadratic products
We can write this as a quadratic product as
Where J is a matrix all of whose elements are 1s
42
Expected value of sums of

squares
In ANOVA tables, the E(MS), or expected
value of the Mean Squares (scaled SS or Sum
of Squares), often appears
This directly follows from the quadratic
product. If E(x) = , Var(x) = V, then
E(xTAx) = tr(AV) + TA"
43
Hypothesis testing
Provided the residual errors in the model are MVN, then for a model
with n observations and p estimated parameters,
Consider the comparison of a full (p parameters)

and reduced (q < p) models, where SSEr = error SS for
reduced model, SSEf = error SS for full model
The difference in the error sum of squares for the full and reduced
model provided a test for whether the model fit is the same
This ratio follows an Fp-q,n-p distribution
44
Does our model account for a significant fraction of the

variation?
Here the reduced model is just yi = u + ei
In this case, the error sum of squares for the
reduced model is just the total sum of squares,
and the F test ratio becomes
This ratio follows an Fp-1,n-p distribution

45
Different statistical models

GLM = general linear model
OLS ordinary least squares: e ~ MVN(0,cI)

GLS generalized least squares: e ~ MVN(0,R)
Mixed models
Both fixed and random effects (beyond the residual)
Mixture models
A weighted mixture of distributions
Generalized linear models
Nonlinear functions, non-normality
46
Mixture models
Under a mixture model, an observation potentially
comes from one of several different distributions, so
that the density function is 11 + 22 + 33
The mixture proportions i sum to one

The i represent different distribution, e.g., normal with mean i
and variance 2
Mixture models come up in QTL mapping -- an

individual could have QTL genotype QQ, Qq, or qq
See Lynch & Walsh Chapter 13
They also come up in codon models of evolution, were a

site may be neutral, deleterious, or advantageous, each
with a different distribution of selection coefficients
See Walsh & Lynch (volume 2A website), Chapters 10,11
47
Generalized linear models
Typically assume non-normal distribution for

residuals, e.g., Poisson, binomial, gamma, etc
48
Likelihoods for GLMs

Under assumption of MVN, x ~ MVN( ,V), the likelihood
function becomes
!
!
1
!
!
!
!
=
!
T
!
!
!
1
!
2
!
1
!
L(,V | x) =! (!2!!)!
j!V!j!
ex! !p! ! (!x!! ")! V! (x! ! ! !)!
2!
-
n =2
Variance components (e.g., 2A, 2e, etc.) are included in V
REML = restricted maximum likelihood. Method of

choice for variance components, as it maximizes
that part of the likelihood function that is independent
of the fixed effects, .
49
Lecture 4:
Introduction to Quantitative
Genetics
Basic model of Quantitative Genetics

Phenotypic value -- we will occasionally
also use z for this value
Basic model: P = G + E
Environmental value
Genotypic value
G = average phenotypic value for that genotype

if we are able to replicate it over the universe
of environmental values, G = E[P]
Hence, genotypic values are functions of the
environments experienced.
Basic model of Quantitative Genetics

Basic model: P = G + E
G = average phenotypic value for that genotype
if we are able to replicate it over the universe
of environmental values, G = E[P]
G = average value of an inbred line over a series
of environments
G x E interaction --- The performance of a particular
genotype in a particular environment differs from
the sum of the average performance of that
genotype over all environments and the average
performance of that environment over all genotypes.
Basic model now becomes P = G + E + GE
3
East (1911) data

on US maize
crosses
Same G, Var(P) = Var(E)
Each sample (P1, P2, F1) has same G, all variation in

P is due to variation in E
All same G, hence

Var(P) = Var(E)
Variation in G
Var(P) = Var(G) +
Var(E)
Var(F2) > Var(F1) due to Variation in G
Johannsen (1903) bean data

Johannsen had a series of fully inbred
(= pure) lines.
There was a consistent between-line
difference in the mean bean size
Differences in G across lines
However, within a given line, size of

parental seed independent of size of
offspring speed
No variation in G within a line
The transmission of genotypes versus

alleles
With fully inbred lines, offspring have the same genotype as
their parent, and hence the entire parental genotypic value G is
passed along
Hence, favorable interactions between alleles (such as with
dominance) are not lost by randomization under random mating
but rather passed along.
When offspring are generated by crossing (or random mating),

each parent contributes a single allele at each locus to its
offspring, and hence only passes along a PART of its genotypic
value
This part is determined by the average effect of the allele
Downside is that favorable interaction between alleles are NOT

passed along to their offspring in a diploid (but, as we will see, are
in an autoteraploid)
Genotypic values
It will prove very useful to decompose the genotypic
value into the difference between homozygotes (2a) and
a measure of dominance (d or k = d/a)
aa
C-a
Aa
AA
C+d
C+a
Note that the constant C is the average value of

the two homozygotes.
If no dominance, d = 0, as heterozygote value equals
the average of the two parents. Can also write d = ka,
so that G(Aa) = C + ak
10
Computing a and d
Suppose a major locus influences plant height, with
the following values
Genotype
aa
Aa
AA
Trait value
10
15
16
C = [G(AA) + G(aa)]/2 = (16+10)/2 = 13

a = [G(AA) - G(aa)]/2 = (16-10)/2 = 3
d = G(Aa)] - [G(AA) + G(aa)]/2
= G(Aa)] - C = 15 - 13 = 2
11
Population means: Random mating

Let p = freq(A), q = 1-p = freq(a). Assuming
random-mating (Hardy-Weinberg frequencies),
Genotype
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
q2
2pq
p2
Mean = q2(C - a) + 2pq(C + d) + p2(C + a)

RM = C + a(q-p) + d(2pq)
Contribution from
homozygotes
Contribution from
heterozygotes
12
Population means: Inbred cross F2

Suppose two inbred lines are crossed. If A is fixed
in one population and a in the other, then p = q = 1/2
Genotype
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
1/4
1/2
1/4
Mean = (1/4)(C - a) + (1/2)(C + d) + (1/4)( C + a)

RM = C + d/2
Note that C is the average of the two parental lines, so when d
> 0, F2 exceeds this. Note also that the F1 exceeds
this average by d, so only half of this passed onto F2.
13
Population means: RILs from an F2

A large number of F2 individuals are fully inbred, either by selfing
for many generations or by generating doubled haploids. If p an
q denote the F2 frequencies of A and a, what is the expected
mean over the set of resulting RILs?
Genotype
aa
Aa
AA
Value
C-a
C+d
C+a
Frequency
RILs = C + a(p-q)
Note this is independent of the amount of dominance (d)
14
The average effect of an allele

The average effect A of an allele A is defined by the
difference between offspring that get allele A and a
random offspring.
A = mean(offspring value given parent transmits
A) - mean(all offspring)
Similar definition for a.
Note that while C, a, and d (the genotypic
parameters) do not change with allele frequency, x
is clearly a function of the frequencies of alleles with
which allele x combines.
15
Random mating
Consider the average effect of allele A when a parent is randomlymated to another individual from its population
Suppose parent contributes A

Allele from other
parent
Probability
Genotype
Value
AA
C+a
Aa
C+d
Mean(A transmitted) = p(C + a) + q(C + d) = C + pa + qd

A = Mean(A transmitted) - = q[a + d(q-p)]
16
Random mating
Now suppose parent contributes a
Allele from other
parent
Probability
Genotype
Value
Aa
C+d
aa
C-a
Mean(a transmitted) = p(C + d) + q(C - a) = C - qa + pd

a = Mean(a transmitted) - = -p[a + d(q-p)]
17
, the average effect of an

allelic substitution
= A - a is the average effect of an allelic
substitution, the change in mean trait value when an
a allele in a random individual is replaced by an A
allele
= a + d(q-p). Note that

A = q and a =-p.
E(X) = pA + qa = pq - qp = 0,
The average effect of a random allele is zero,
hence average effects are deviations from the
mean
18
Dominance deviations
Fisher (1918) decomposed the contribution
to the genotypic value from a single locus as
Gij = + i + j + ij
Here, is the mean (a function of p)
i are the average effects
Hence, + i + j is the predicted genotypic
value given the average effect (over all
genotypes) of alleles i and j.
The dominance deviation associated with
genotype Gij is the difference between its true
value and its value predicted from the sum of
average effects (essentially a residual)
19
Fishers (1918) Decomposition of G

One of Fishers key insights was that the genotypic value
consists of a fraction that can be passed from parent to
offspring and a fraction that cannot.
In particular, under sexual reproduction, parents only
pass along SINGLE ALLELES to their offspring
Consider the genotypic value Gij resulting from an
AiAj individual
Gij = G + i + j + ij
Average contribution to genotypic value for allele i
Mean value G = Gij Freq(AiAj)
20
Since parents pass along single alleles to their
offspring, the i (the average effect of allele i)
represent these contributions
The average effect for an allele is POPULATIONSPECIFIC, as it depends on the types and frequencies
of alleles that it pairs with
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
21
The genotypic value predicted from the individual
allelic effects is thus
^ = + +
G
ij
G
i
j
Dominance deviations --- the difference (for genotype
AiAj) between the genotypic value predicted from the
two single alleles and the actual genotypic value,
^ij = ij
Gij - G
22
+ 22
Genotypic Value
G22
G21
12
+ 21
22
+ 1 + 2
Slope = = 2 - 1
1
11
G11
0
11
N = # Copies of Allele 2
Genotypes
1
21
2
22
23
Fishers decomposition is a Regression
Predicted value
Residual error
A notational change clearly shows this is a regression,
Gij = G + 21 +(2 1) N + ij
Independent (predictor) variable N = # of A2 alleles
Note that the slope 2 - 1 = , the average effect
of an allelic substitution
24
Gij = G + 21 + (2 1) N + ij
Intercept
Regression slope
A key point is that the average effects change with

allele frequencies. Indeed, if overdominance is present
they can change sign with allele frequencies.
25
Allele A2 common, 1 > 2

G21
G22
G
G11
0
1
2
N
The size of the circle denotes the weight associated with
that genotype. While the genotypic values do not change,
their frequencies (and hence weights) do.
26
Allele A1 common, 2 > 1

G21
Slope = 2 - 1
G22
G
G11
0
1
N
Again, same genotypic values as previous slide, but

different weights, and hence a different slope
(here a change in sign!)
27
Both A1 and A2 frequent, 1 = 2 = 0

G21
G22
G
G11
0
With these allele frequencies, both alleles have the same

mean value when transmitted, so that all parents have the
same average offspring value -- no response to selection
28
Average Effects and Additive Genetic Values

The values are the average effects of an allele
A key concept is the Additive Genetic Value (A) of
an individual
A ( G ij ) = i + j
i(k) = effect of allele i at locus k

A is called the Breeding value or the Additive genetic
value
29
Why all the fuss over A?

Suppose pollen parent has A = 10 and seed parent has
A = -2 for plant height
Expected average offspring height is (10 - 2)/2
= 4 units above the population mean. Offspring A =
average of parental As
KEY: parents only pass single alleles to their offspring.
Hence, they only pass along the A part of their genotypic
value G
30
Genetic Variances
Writing the genotypic value as
Gij = G + (i + j) + ij
The genetic variance can be written as
This follows since

As Cov(,) = 0
31
Genetic Variances
Additive Genetic Variance

(or simply Additive Variance)
Dominance Genetic Variance

(or simply dominance variance)
Hence, total genetic variance = additive + dominance

variances,
$G2 = $ 2A + $D2
32
Key concepts (so far)
i = average effect of allele i
Property of a single allele in a particular population

(depends on genetic background)
A = Additive Genetic Value (A)
A = sum (over all loci) of average effects

Fraction of G that parents pass along to their offspring
Property of an Individual in a particular population
Var(A) = additive genetic variance

Variance in additive genetic values
Property of a population
Can estimate A or Var(A) without knowing any of the

underlying genetical detail (forthcoming)
33
Q1Q1
Q1Q2
Q2Q2
a(1+k)
2a
Since E[] = 0,
Var() = E[( -a)2] = E[2]
One locus, 2 alleles:
When dominance present, Additive variance is an

asymmetric function of allele frequencies
34
Dominance variance
Q1Q1
Q1Q2
Q2Q2
a(1+k)
2a
This is a symmetric function of

allele frequencies
Can also be expressed in terms of d = ak

35
Additive variance, VA, with no dominance (k = 0)
VA
Allele frequency, p
36
Complete dominance (k = 1)
VA
VD
Allele frequency, p
37
Epistasis
These components are defined to be uncorrelated,

(or orthogonal), so that
38
Additive x Additive interactions -- , AA

interactions between a single allele
at one locus with a single allele at another
Additive x Dominance interactions -- , AD
interactions between an allele at one
locus with the genotype at another, e.g.
allele Ai and genotype Bkj
Dominance x dominance interaction --- , DD
the interaction between the dominance
deviation at one locus with the dominance
deviation at another.
39

Day 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Day 1

Uploaded by

Copyright:

Available Formats

Lecture 1:

Basic Statistical Tools

e.g., a genotype is AA, a phenotype is larger

Pr(E) denotes the probability of an event E

The AND rule

What is the probability of an aabbcc offspring?

Pr(A,B) = P(A|B)P(B) = P(B|A)P(A)

Examples of Prob (cont)

What is the probability that a yellow F2 offspring is a YY

A typical application in genetics is that A is some

Example: BRCA1/2 & Breast

Event B = has a BRCA mutation

Bayes: Pr(B|A) = Pr(A|B)* Pr(B)/Pr(A)

Hence, for the assumed BRCA frequency

Second example: Suppose height > 70. What is

Pr(height >70 | genotype)

Pr(height > 70) = 0.3*0.5 +0.6*0.3 + 0.9*0.2 = 0.51

Pr(QQ | height > 70) =

Pr(QQ) * Pr (height > 70 | QQ)

Discrete Random Variables

Example: Suppose the probability of seeing no

The Binominal Distribution

Example: Suppose p = 0.05 and n = 10. What is the

What is the probability of seeing AT LEAST one

The Poisson Distribution

Example: suppose = 0.5.

Pr(k = 1) = e-0.50.51/1! = 0.303

Connection with binominal: = n*p

Can either use Poisson as an approximation or

The geometric distribution

Pr(k failures before a success) = (1-p)kp

Continuous Random Variables

Finally, the cdf, or cumulative probability function,

Example: The normal (or Gaussian)

The variance is a measure of spread about the mean. The

Joint and Conditional Probabilities

The marginal density of x, p(x)

Joint and Conditional Probabilities

Relationships among p(x), p(x,y), p(y|x)

Note that p(y|x) = p(y) if x and y are independent

Expectations of Random Variables

E[x] = the (arithmetic) mean, , of a random variable x

Expectations of Random Variables

More generally, the rth moment about the mean is given

Cov(x,y) > 0, positive (linear) association between x & y

Cov(x,y) < 0, negative (linear) association between x & y

Cov(x,y) = 0, no linear association between x & y

Cov(x,y) = 0 DOES NOT imply no association

If x and y are independent, then cov(x,y) = 0

r = 1 implies a perfect (positive) linear association

Useful Properties of Variances and

Hence, the variance of a sum equals the sum of the

The slope of this linear regression is a function of Cov,

The fraction of the variation in y accounted for by knowing

Relationship between the correlation and the regression

If Var(x) = Var(y), then by|x = b x|y = r(x,y)

Properties of Least-squares Regressions

The average value of the residual is zero

Different methods of analysis

Very little assumptions about the underlying distribution.

The explicit form of the distribution used

More in Walsh & Lynch (online chapters = Vol 2)

Likelihood Ratio tests

Pr(height > 70) = 0.30.5 +0.60.3 + 0.9*0.2 = 0.51

a b = 14 + 25 + 37 + 49 = 60