Professional Documents
Culture Documents
*DNA sequencing:
Determining the number and order of
nucleotides that make up a given molecule of
DNA.
(Relevant) Trivia
How many base pairs (bp) are there in a human genome?
(Relevant) Trivia
How many base pairs (bp) are there in a human genome?
~3 billion (haploid)
How much did it cost to sequence the first human
genome?
~$2.7 billion
How long did it take to sequence the first human
genome?
~13 years
When
was the first human genome sequence complete?
2000-2003
Whose
genome
was it? but actually mostly a dude
Several
peoples,
from Buffalo
Overview
Prologue: Assembly
The Past: Sanger
The Present: Next-Gen (454, Illumina,
)
The Future: ? (Nanopore, MinION,
Single-molecule)
Overview
Prologue: Assembly
The Past: Sanger
The Present: Next-Gen (454, Illumina, )
The Future: ? (Nanopore, MinION, Singlemolecule)
Method
Sanger
454
Illumina
Ion Torrent
Read Length
Method
Read Length
Sanger
600-1000 bp
454
Illumina
Ion Torrent
Method
Read Length
Sanger
600-1000 bp
454
300-500 bp
Illumina
Ion Torrent
Method
Read Length
Sanger
600-1000 bp
454
300-500 bp
Illumina
~100 bp
Ion Torrent
Method
Read Length
Sanger
600-1000 bp
454
300-500 bp
Illumina
~100 bp
Ion Torrent
~200 bp
But
Phage Genome: 30,000 to 500,000 bp
Bacteria: Several million bp
Human: 3 billion bp
Complete genome
Fragmented
genome
copies
chunks
Assembly, aka
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
66 bp
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
6x coverage
100% identity
Coverage: # of reads underlying the consensus
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
5x coverage
80% identity
Coverage: # of reads underlying the consensus
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
2x coverage
50% identity
Coverage: # of reads underlying the consensus
Assembly
Consensus:
TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC
ATTGTTCCCACAGACCG
CGGCGAAGCATTGTTCC
ACCGTGTTTTCCGACCG
AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC
ATGCCGGCGAAGCATTGT
ACAGACCGTGTTTCCCGA
TAATGCGACCTCGATGCC
AAGCATTGTTCCCACAG
TGTTTTCCGACCGAAAT
TGCCGGCGAAGCCTTGT
CCGACCGAAATGGCTCC
1x coverage
Assembly
Overview
Prologue: Assembly
The Past: Sanger
The Present: Next-Gen (454, Illumina, )
The Future: ? (Nanopore, MinION, Singlemolecule)
x millions
x millions
x millions
x millions
G A T
C
G
C
A
C T
G
A T G
G
A
T T A
1. Labeled
2. Terminators
Sanger Sequencing
5
3
Prime
r
T G C G C G G C C C A
A C G C G C C G G G T ???????????????
Sanger Sequencing
5
3
Prime
r
G T C T T G G G C T
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
Sanger Sequencing
5
3
5
Prime
r
G T C T T G G G C T A G C G C
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
Sanger Sequencing
5
3
5
Prime
r
G T C T T G G G C T A
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
T G C G C G G C C C A
G T C T T G G G C T A G C G C
26 bp
Sanger Sequencing
5
3
5
Prime
r
G
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
T G C G C G G C C C A
G T C T T G G G C T A G C G C
26 bp
T G C G C G G C C C A
G T C T T G G G C T A
22 bp
Sanger Sequencing
5
3
5
5
5
Prime
r
G T C T T G G G C
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
T G C G C G G C C C A
G T C T T G G G C T A G C G C
26 bp
T G C G C G G C C C A
G T C T T G G G C T A
22 bp
T G C G C G G C C C A
12 bp
Sanger Sequencing
5
3
5
5
5
5
Prime
r
G T C T T
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
T G C G C G G C C C A
G T C T T G G G C T A G C G C
26 bp
T G C G C G G C C C A
G T C T T G G G C T A
22 bp
T G C G C G G C C C A
12 bp
T G C G C G G C C C A
G T C T T G G G C
20 bp
Sanger Sequencing
3
5
5
5
5
5
A C G C G C C G G G T C A G A A C C C G A T C G C G
T G C G C G G C C C A
G T C T T G G G C T
21 bp
T G C G C G G C C C A
G T C T T G G G C T A G C G C
26 bp
T G C G C G G C C C A
G T C T T G G G C T A
22 bp
T G C G C G G C C C A
12 bp
T G C G C G G C C C A
G T C T T G G G C
20 bp
T G C G C G G C C C A
G T C T T
16 bp
Sanger Sequencing
3
5
5
5
5
5
A C G C G C C G G G T ???????????????
T G C G C G G C C C A
?????????T
21 bp
T G C G C G G C C C A
??????????????C
26 bp
T G C G C G G C C C A
??????????A
22 bp
T G C G C G G C C C A
12 bp
T G C G C G G C C C A
????????C
20 bp
T G C G C G G C C C A
????T
16 bp
Laser
Laser
Reader
Reader
Sanger Sequencing
T G C G C G G C C C A
G T C T T G G G C T A
20
21
19
18
13
22
16
14
15
17
12 bp
Overview
Prologue: Assembly
The Past: Sanger
The Present: Next-Gen (454, Illumina,
)
The Future: ? (Nanopore, MinION, Singlemolecule)
Shotgun sequencing by
PGM/454
Genomic
Fragment
Adapters
Shotgun sequencing by
PGM/454
Genomic
Fragment
Barcode
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Bead/ISP
Adapter
Complement
Sequences
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
T
T
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
A
A
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
G
G
G
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
T
T
G T
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
C
C
G T C
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
A
A
G T C
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
T
T
G T C T T
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Shotgun sequencing by
PGM/454
Only give polymerase one nucleotide at a time:
5
3
Prime
r
G
G
G T C T T G G G
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Prime
r
G
G
G T C T T G G G
T G C G C G G C C C A
A C G C G C C G G G T C A G A A C C C G A T C G C G
T C A G T C A G T C A G
Illumina Sequencing
Next-Gen Sequencing
Take home message: Massively Parallel
1,000 monkeys at 1,000 typewriters is
nothing
Were talking 100,000 to 100 million
concurrent reads
Overview
Prologue: Assembly
The Past: Sanger
The Present: Next-Gen (454, Illumina, )
The Future: ? (Nanopore, MinION,
Single-molecule)
Single Molecule
Sequencing
(Relevant) Trivia
How many base pairs (bp) are there in a human genome?
~3 billion (haploid)
How much did it cost to sequence the first human
genome?
~$2.7 billion
How long did it take to sequence the first human
genome?
~13 years
When
was the first human genome sequence complete?
2000-2003
Whose
genome
was it? but actually mostly a dude
Several
peoples,
from Buffalo
Final Thoughts
DNA sequencing is becoming vastly
faster and more affordable
Generating data is no longer the
bottleneck, understanding it is
Bioinformatics types should be in
high demand in the near future
Epilogue
So should we really still be sequencing
more mycobacteriophage genomes? We
have 250+
Disadvantages
Disadvantages
Relatively high cost
per base
Must run at large scale
Medium/high startup
costs
Disadvantages
New, developing
technology
Cost not as low as
Illumina
Read lengths only
~100-200 bp so far
Disadvantages
Must run at very large
scale
Disadvantages