Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs

National Taiwan University
Department of Computer Science

and Information Engineering
Linkage Disequilibrium and Recent

Studies of Haplotypes and SNPs
Speaker: Yao-Ting Huang

Advisor: Kun-Mao Chao
Algorithms and Computational Biology Lab.

Dept. of Computer Science & Information Engineering
Variations in DNA Sequence

 Variants in the human genome include
 Single Nucleotide Polymorphisms (SNPs),
 deletions (e.g., loss of heterozygosity),
 and insertions.
 SNPs become the preferred DNA markers for

association studies because of
 their high abundance (e.g., ~1 SNP/1000 base pairs), and
 high-throughput genotyping technology which allows
building a large SNP database (e.g., International
HapMap Project).
2
SNPs Arise from Mutations

Variations observed
Mutations over time in a population
Common Disease Mutation

Ancestor
time present 3
Haplotype
 A set of closely linked SNPs located on one
chromosome.
SNP 1
SNP 2
SNP 3
GATATTCGTACGGA-T
Haplotypes
GATGTTCGTACTGAAT
GATATTCGTACGGA-T AG- 2/6
GATATTCGTACGGAAT GTA 3/6
GATGTTCGTACTGAAT AGA 1/6
GATGTTCGTACTGAAT
DNA
Sequences
4
Factors Affecting Haplotypes

 The chromosome recombination
breaks up and reorganizes halotypes.
 If SNPs are closely linked, they will
tend to be inherited together as
haplotypes.
 Less chance that recombination will
occur between them.
 Linkage Disequilibrium (LD) is a
measure of the non-random
association of alleles at linked loci.
5
Linkage Disequilibrium
A B
A b
Consider only a B
two SNPs a b
There are 4 possible haplotypes
SNP 1
B b Total
The probabilities A PAB PaB PA

for each haplotype SNP 2
a PaB Pab Pa
Total PB Pb 1.0 6
Linkage Equilibrium
 PAB = PAPB
 PAb = PAPb = PA(1-PB)
 PaB = PaPB = (1-PA) PB
 Pab = PaPb = (1-PA) (1-PB) SNP 1
B b Total
A PAB PaB PA
SNP 2
a PaB Pab Pa
Total PB Pb 1.0
7
Linkage Disequilibrium
 PAB ≠ PAPB
 PAb ≠ PAPb = PA(1-PB)
 PaB ≠ PaPB = (1-PA) PB
 Pab ≠ PaPb = (1-PA) (1-PB)
SNP 1
B b Total
A PAB PaB PA
SNP 2
a PaB Pab Pa
Total PB Pb 1.0
8
An Example of Linkage
Disequilibrium
Before mutation After mutation
-- A -- -- -- G -- -- -- -- A -- -- -- G -- -- --
-- C -- -- -- G -- -- -- -- C -- -- -- G -- -- --
PA=1/2 PG=1 -- C -- -- -- C -- -- --
PC=1/2 PA=1/3 PG=2/3
PC=2/3 PC=1/3
 We got only three haplotypes: AG, CG, and CC.

 There is no AC haplotype, i.e., PAC = 0.
 However, PAPC =1/9, thus PAPC ≠ PAC .
 These two SNPs are linkage disequilibrium. 9
An Example of Linkage
Equilibrium
Before recombination After recombination
-- A -- -- -- G -- -- -- -- A -- -- -- G -- -- --
-- C -- -- -- G -- -- -- -- C -- -- -- G -- -- --
-- C -- -- -- C -- -- -- -- C -- -- -- C -- -- --
-- A -- -- -- C -- -- --
 After recombination, PA=1/2 PG=1/2
 PAG = PAPG = 1/4, PC=1/2 PC=1/2
 PCG = PCPG = 1/4,
 PCC = PCPC = 1/4, and
 PAC = PAPC = 1/4.
 Thus, these two SNPs are linkage equilibrium. 10
D Coefficient
 We can measure the non-randomness of two loci by
means of a deviation, D, defined as follows:
 D = PAB – PAPB or PABPab – PAbPaB
 PAB = PAPB + D
 PAb = PA(1-PB) - D
 PaB = (1-PA) PB - D
 Pab = (1-PA) (1-PB) + D
 These two SNPs are linkage equilibrium iff D = 0.
11
Standardization of D Coefficient
 D coefficient can be standardized in many ways.
 D’ = D/Dmax, where Dmax stands for the absolute maximal
possible value of D.
 D
 min( P P , P P ) , if D  0; D D
D'   A B a b
D
 , if D  0.
 min( PA Pb , Pa PB )
 PA PB  D  PAB  0  D   PA PB -PAPB 0 PaPB
Pa Pb  D  Pab  0  D   Pa Pb
PA Pb  D  PAb  0  D  PA Pb
Pa PB  D  PaB  0  D  Pa PB 12
Interpretation of D’
 D’ is constrained between -1 and +1.
 D’ = 1 (perfect positive LD between SNP alleles)
 D’ = 0 (linkage equilibrium between SNP alleles)
 D’ = -1 (perfect negative LD between SNP alleles)
 D’ = 0.87 (strong positive LD between SNP alleles)
 D’ = 0.12 (weak positive LD between SNP alleles)
 Other measures of D coefficient:
2
 r2 or Δ2: D
2 
PA (1  PA ) PB (1  PB )
 Chi-square Test.
 P value.
13
Decay of LD over Time

 The chromosome recombination decreases LD and
should reach equilibrium at the end.
14
Haplotype Blocks in Human

Genome
 The human genome has been shown to contain
regions of high LD interspersed by regions of low
LD.
 The recombination occurs frequently in low LD regions.
 The high LD regions can form haplotype blocks.
 The International HapMap Project aims to build the

haplotype map across human genome.
Recombination hot spots Haplotype blocks

(Low LD regions) (High LD regions)
Chromosome 15
Genotype Data v.s. Halotype Data

 The use of haplotype map has been limited due to
the fact that the human genome is diploid.
 Genotype data instead of haplotype data are obtained.
 Phase problem: loss of the information of the
chromosome where each base appears.
 e.g., we don’t know they are (GA, TC) or (GC, TA).
G A
Diploid
T C
16
SNP1 SNP2
Haplotype Reconstruction with

Pedigree
 Haplotype reconstruction with pedigree (Li and Jiang,
2004).
 There is no mutations but only recombinants happened within
a pedigree.
 Given a pedigree and genotype data for each member in the
pedigree, find a haplotype configuration for the pedigree that
requires minimum number of recombinants.
Pedigree
1|2 1|2 2|2
3|1 1|3 2|2
1|2 1|2
17
1|2 3|2
Haplotype Block Partition and Tag

SNP Selection Using Genotype Data
 Zhang et al. (2004) combine a dynamic programming
and an EM algorithms to partition haplotype blocks.
 The EM algorithm infers the haplotypes for a range of SNPs.
 The dynamic programming algorithm minimizes the number
of tag SNPs used in the haplotype block partition.
 The experiments examine the factors that affect block
partition and tag SNPs used, which include
 number of haplotypes,
 density of SNPs,
 minor allele frequency of SNPs,
 missing data, and
 genotyping error rate.
18
Thoughts
 How to modify the tag SNP selection algorithm to
process genotype data.
 The naïve approach is inferring haplotype data by
existing algorithms and finding tag SNPs.
 Is it possible to determine tag SNPs directly from
genotype data?
 Assume 0: homozygous wild type, 1: homozygous
mutant, 2: heterozyhous. P P P P
1 2 3 4
S1 1 1 0 0
S2 1 0 1 0
S3 1 2 0 1
S4 1 2 0 1 19
The Relation Between Minor Allele

Frequency and Tag SNPs
 The minor allele frequency ranges from 0% to 50%.
 The higher the frequency, the more useful tag SNPs
are available.
 0000000011 -> 20%.
 0010010011 -> 40%, this SNP can distinguish more
haplotype patterns.
 What is the relation between the minor allele
frequency and the number of tag SNPs.
20
Block-Free Selection of Tagging SNPs

 Bafna, et al. (2004) propose algorithms for selecting
tag SNPs without considering haplotype block
structure.
 They define a new measure called “Informativeness,”
which measures how well a set of SNPs can predict
another set of SNPs.
 Find a subset of SNPs which has the maximum
Informativeness.
 The number of total tag SNPs used in a whole genome is
less than block-dependent approaches.
21

Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs

Uploaded by

Copyright:

Available Formats

National Taiwan University

Department of Computer Science

Linkage Disequilibrium and Recent

Speaker: Yao-Ting Huang

Algorithms and Computational Biology Lab.

Variations in DNA Sequence

 SNPs become the preferred DNA markers for

SNPs Arise from Mutations

Common Disease Mutation

Factors Affecting Haplotypes

The probabilities A PAB PaB PA

 We got only three haplotypes: AG, CG, and CC.

 Pab = (1-PA) (1-PB) + D

 These two SNPs are linkage equilibrium iff D = 0.

Decay of LD over Time

Haplotype Blocks in Human

 The International HapMap Project aims to build the

Recombination hot spots Haplotype blocks

Genotype Data v.s. Halotype Data

Haplotype Reconstruction with

Haplotype Block Partition and Tag

The Relation Between Minor Allele

Block-Free Selection of Tagging SNPs

You might also like