BGi RNA-Seq Analysis

RNA-Seq analysis
The transcriptome is the total set of transcripts, mRNA and non-coding RNA, in one or a
population of cells under specific conditions. The transcriptome analysis lay the foundation of
gene structure and function research. Based on next-generation high-throughput sequencing
technologies, RNA-seq found its applications in many research fields including fundamental
science research, medical research and drug development.
Services
1. RNA-seq without reference genome (De novo transcriptome)
1.1 Sequencing and basic data processing
We will first test the quality of total RNA provided by the customer. If the sample is qualified, we
will then conduct the following technical route: sample preparation→sequencing.
The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminations of samples.
1.2 Bioinformatics analysis
1) Statistics and quality assessment of output

data
2) Contig length distribution
3) Scaffold-gene length distribution
4) Functional annotation of the scaffold-gene
5) GO categories of the scaffold-gene
6) Differentially expressed scaffold-gene
7) Protein function prediction and classification
8) Enriched metabolic pathway of
scaffold-gene
9) Enriched GO categories of differentially Figure 1. RNA-Seq （De novo）
expressed scaffold-gene
2．RNA-Seq with reference genome（In reference transcriptome）
1
1) Summary of data output and

alignment to reference
sequences
2) Distribution of reads in
reference genome
3) Randomness assessment of
sequencing
4) Gene coverage and sequencing
depth
5) Differentially expressed genes
6) Optimization of gene structure
7) Identification of alternative
spliced transcripts
8) Identification of novel genes Figure2. RNA-Seq （In reference）
9) Identification of gene fusion
3．Non-coding RNA analysis
We will first test the quality of total RNA or size-fractionated RNA (eg. 200-700 nt) provided by
the customer. If the sample is qualified, we will then conduct the following technical route: sample
preparation→sequencing.
Experimental pipeline
Figure 3． Flowchart of RNA-Seq
2
Application of RNA-Seq
1. Identification of genes (De novo transcriptome only)

2. Structure of transcripts: Identification of untranslated region (UTR), boundary of intron,
alternative splicing and start codon, etc.
3. Identification of non-coding unit: Non-coding RNA, precursor of microRNA, etc.
4. Determing gene expression in transcriptional level
5. Identification of new transcription unit
Technical features of RNA-Seq
capacity RNA-Seq
Detected signals Digital signals
Detected range Nearly all the transcripts
Detected accuracy From several copies to 100,000 copies
Resolution Allele specific expression, alternative splicing
Case Study
Discover new alternative spliced transcripts
Marc Sultan et al. reported that RNA-Seq can detect 25% more genes than
those by microarrays. A global survey of messenger RNA splicing events
identified 94,241 splice junctions (4096 of which were previously unidentified)
in a study of human embryonic kidney and B cell.
Fig4.RNA-Seq versus microarrays

A.Comparison of the number of expressed genes detected by RNA-Seq and microarrays
B.Distribution of the RNA-Seq NEs and the proportion of genes detected on microarrays.
Genes missed by microarrays are shown with gray (HEK) and black (B cells) bars. Genes
3
detected by microarrays are shown with light red (HEK) and dark red (B cells) bars.
Identify 5’and 3’UTRs in yeast
After comparing 5’ RACE results with RNA-Seq results, researchers found both
methods identified 5’ boundaries within 50 bp of one another for 786 genes
(77.9%). RNA-Seq could identify the 3’ boundary precisely.
Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq

A. The 5′ UTRs determined by RNA-Seq and by 5′ RACE for gene YKL004W
B. 3′ UTR determined by RNA-Seq for gene YDR419W. A colored box represents an ORF and
an arrow indicates the transcription direction.
Detect more low abundance transcripts
In rice RNA-seq project in the Beijing Genomics Insitute, we found RNA-seq can find more low
abundance genes than traditional methods. (Fig. 6-A)
A B
4
Fig6.Transcriptome study can detect more low abundance
transcripts than cDNA sequencing
A.The length distribution of newly identified transcripts.
B.A comparison of expression level between novel transcripts and cDNA genes.
Detect gene fushion
Researchers at the University

of Michigan performed the
transcriptome sequencing of
patient cell lines and tumor
samples using 454 together with
the GA(Solexa) to discover new
gene fusion in prostate cancer.
This established high-throughput
sequencing as a reliable method
for discovering new gene fusion
and other disease-related
mutation.
Fig 7 RNA-seq detect gene fushion
Quantify RNA expression level
Researchers in Yale University found a strong correlation (R = 0.9775) between

the qPCR and RNA-Seq data of the 34 genes predicted to be expressed at a
range of high, medium, and low expression level respectively.
15
con=0.9775
10
qPCR(log2)
5
0
-5
-10
-10 -5 0 5 10 15
RNA-seq(log2)
Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and

qRCR at the transcriptional level
5
Reference
Sultan, M, Schulz, M. H.A global view of gene activity and alternative splicing by deep sequencing of the human
transcriptome. et al., Science 321 (5891), 956 (2008).
Maher, C. A,Kumar-Sinha, C,Cao, X. Transcriptome sequencing to detect gene fusions in cancer. et al., Nature
458 (7234), 97 (2009).
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, The Transcriptional Landscape of the YeastGenome Defined
by RNA Sequencing. et al.,Science 320 (5881), 1344 (2008).
Wilhelm BT, Marguerat S, Watt S.Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide
resolution. et al., Nature 453 (7199), 1239 (2008).
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B.Mapping and quantifying mammalian transcriptomes
by RNA-Seq. et al., Nat Methods 5 (7), 621 (2008).
FAQ
1. What are the samples requirements?

Please provide total RNA with concentration no less than 400 ng/μl and quantity no less than 20
μg. Minimum quantity requirement is 10 μg. The RNA quality requirement: OD260/280 is 1.8-2.2,
28S/18S >1.8, RIN ≥8. The customers should ship the RNA a week before sequencing.
2． Can the Beijing Genomics Institute (BGI) perform the transcriptome analysis of
bacteria?
Yes. We recommend the customer to submit purified mRNA or cDNA rather than total RNA.
3. Can the BGI perform the non-coding RNA sequencing?
Yes. We recommend customer to submit RNA free of rRNA and tRNA.
4. How many unigenes can be retrieved from 1 Gb sequencing data？
In general, more than 6000 unigenes more than 1Kb in length can be identified from 1 Gb
sequencing data. However, the exact number of unigenes more than 1Kb will vary according to
the nature of the sample.
5. What species have the BGI sequenced?
We have sequenced many model organisms and main crops, for example Homo sapiens,
Nematoda, Silkworm, Arabidopsis thaliana, rice and corn etc. Many novel structures and
transcripts were identified. We also performed the transcriptome sequencing of many species
without reference genome such as trees, flowers, vegetables, insects, fishes and fungi etc.
6
RNA-Seq analysis
The transcriptome is the total set of transcripts, mRNA and non-coding RNA, in one or a
population of cells under specific conditions. The transcriptome analysis lay the foundation of
gene structure and function research. Based on next-generation high-throughput sequencing
technologies, RNA-seq found its applications in many research fields including fundamental
science research, medical research and drug development.
Services
1. RNA-seq without reference genome (De novo transcriptome)
10) Statistics and quality assessment of output

data
11) Contig length distribution
12) Scaffold-gene length distribution
13) Functional annotation of the scaffold-gene
14) GO categories of the scaffold-gene
15) Differentially expressed scaffold-gene
16) Protein function prediction and classification
17) Enriched metabolic pathway of
scaffold-gene
18) Enriched GO categories of differentially Figure 1. RNA-Seq （De novo）
expressed scaffold-gene
2．RNA-Seq with reference genome（In reference transcriptome）
7
10) Summary of data output and

alignment to reference
sequences
11) Distribution of reads in
reference genome
12) Randomness assessment of
sequencing
13) Gene coverage and sequencing
depth
14) Differentially expressed genes
15) Optimization of gene structure
16) Identification of alternative
spliced transcripts
17) Identification of novel genes Figure2. RNA-Seq （In reference）
18) Identification of gene fusion
3．Non-coding RNA analysis
We will first test the quality of total RNA or size-fractionated RNA (eg. 200-700 nt) provided by
the customer. If the sample is qualified, we will then conduct the following technical route: sample
preparation→sequencing.
Figure 3． Flowchart of RNA-Seq

8
Application of RNA-Seq
6. Identification of genes (De novo transcriptome only)

7. Structure of transcripts: Identification of untranslated region (UTR), boundary of intron,
alternative splicing and start codon, etc.
8. Identification of non-coding unit: Non-coding RNA, precursor of microRNA, etc.
9. Determing gene expression in transcriptional level
10. Identification of new transcription unit
Technical features of RNA-Seq
capacity RNA-Seq
Detected signals Digital signals
Detected range Nearly all the transcripts
Detected accuracy From several copies to 100,000 copies
Resolution Allele specific expression, alternative splicing
Case Study
Discover new alternative spliced transcripts
Marc Sultan et al. reported that RNA-Seq can detect 25% more genes than
those by microarrays. A global survey of messenger RNA splicing events
identified 94,241 splice junctions (4096 of which were previously unidentified)
in a study of human embryonic kidney and B cell.
Fig4.RNA-Seq versus microarrays

A.Comparison of the number of expressed genes detected by RNA-Seq and microarrays
B.Distribution of the RNA-Seq NEs and the proportion of genes detected on microarrays.
Genes missed by microarrays are shown with gray (HEK) and black (B cells) bars. Genes
detected by microarrays are shown with light red (HEK) and dark red (B cells) bars.
9
Identify 5’and 3’UTRs in yeast
After comparing 5’ RACE results with RNA-Seq results, researchers found both
methods identified 5’ boundaries within 50 bp of one another for 786 genes
(77.9%). RNA-Seq could identify the 3’ boundary precisely.
Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq

B. The 5′ UTRs determined by RNA-Seq and by 5′ RACE for gene YKL004W
B. 3′ UTR determined by RNA-Seq for gene YDR419W. A colored box represents an ORF and
an arrow indicates the transcription direction.
Detect more low abundance transcripts
In rice RNA-seq project in the Beijing Genomics Insitute, we found RNA-seq can find more low
abundance genes than traditional methods. (Fig. 6-A)
A B
Fig6.Transcriptome study can detect more low abundance

transcripts than cDNA sequencing
A.The length distribution of newly identified transcripts.
B.A comparison of expression level between novel transcripts and cDNA genes.
10
Detect gene fushion
Researchers at the University

of Michigan performed the
transcriptome sequencing of
patient cell lines and tumor
samples using 454 together with
the GA(Solexa) to discover new
gene fusion in prostate cancer.
This established high-throughput
sequencing as a reliable method
for discovering new gene fusion
and other disease-related
mutation.
Fig 7 RNA-seq detect gene fushion
Quantify RNA expression level
Researchers in Yale University found a strong correlation (R = 0.9775) between

the qPCR and RNA-Seq data of the 34 genes predicted to be expressed at a
range of high, medium, and low expression level respectively.
15
con=0.9775
10
qPCR(log2)
5
0
-5
-10
-10 -5 0 5 10 15
RNA-seq(log2)
Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and

qRCR at the transcriptional level
11
Reference
Sultan, M, Schulz, M. H.A global view of gene activity and alternative splicing by deep sequencing of the human
transcriptome. et al., Science 321 (5891), 956 (2008).
Maher, C. A,Kumar-Sinha, C,Cao, X. Transcriptome sequencing to detect gene fusions in cancer. et al., Nature
458 (7234), 97 (2009).
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, The Transcriptional Landscape of the YeastGenome Defined
by RNA Sequencing. et al.,Science 320 (5881), 1344 (2008).
Wilhelm BT, Marguerat S, Watt S.Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide
resolution. et al., Nature 453 (7199), 1239 (2008).
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B.Mapping and quantifying mammalian transcriptomes
by RNA-Seq. et al., Nat Methods 5 (7), 621 (2008).
FAQ

Please provide total RNA with concentration no less than 400 ng/μl and quantity no less than 20
μg. Minimum quantity requirement is 10 μg. The RNA quality requirement: OD260/280 is 1.8-2.2,
28S/18S >1.8, RIN ≥8. The customers should ship the RNA a week before sequencing.
2． Can the Beijing Genomics Institute (BGI) perform the transcriptome analysis of
bacteria?
Yes. We recommend the customer to submit purified mRNA or cDNA rather than total RNA.
5. Can the BGI perform the non-coding RNA sequencing?
Yes. We recommend customer to submit RNA free of rRNA and tRNA.
6. How many unigenes can be retrieved from 1 Gb sequencing data？
In general, more than 6000 unigenes more than 1Kb in length can be identified from 1 Gb
sequencing data. However, the exact number of unigenes more than 1Kb will vary according to
the nature of the sample.
5. What species have the BGI sequenced?
We have sequenced many model organisms and main crops, for example Homo sapiens,
Nematoda, Silkworm, Arabidopsis thaliana, rice and corn etc. Many novel structures and
transcripts were identified. We also performed the transcriptome sequencing of many species
without reference genome such as trees, flowers, vegetables, insects, fishes and fungi etc.
12
Small RNA analysis
RNA is one of the most important parts of the bio-material which constructs the framework of
life with DNA and protein together. Small RNA regulates life, such as the development and
growth of cell, the transcription and translation of gene, as well as the gene silence. Small RNA
sequencing is based on solexa technology, the deep sequencing yield numerous small fragments
from 18 to 30nt, we compare them with the known and relative species, find out the difference
between different samples and predict the novel miRNA, furthermore study its function.
Services
Sequencing and basic data processing

We will first test the quality of small RNA provided by the customer. If the sample is qualified,
we will then conduct the following technical route: sample preparation->TA clone->sequencing
reaction.
detecting contaminates of samples.
Bioinformatics analysis
Items of basic bioinformatics analysis
Length distribution of small RNA
Mapping small RNA sequences to genome sequences and exploring features of distribution along each chromosome
Differential small RNA between two samples
Comparing small RNA sequences with known miRNAs deposited at miRBase (miRBase13.0)
Identification of rRNA, tRNA, snRNA, snoRNA against Rfam (9.1) and Genebank
Identifying repeats associated with small RNAs
Identifying mRNA degradated fragments and siRNA candidates
Items of basic bioinformatics analysis
Annotating and classifying miRNA
Prediction of novel miRNA
Expression of miRNA
Differential expression analysis of miRNA gene and construction of miRNA expression profiles
Clustering analysis of differentially expressed miRNA
Target prediction of miRNA (only for plant)
Technical features
High-throughput: more than 2.5 millions reads can be obtained through the single-pass
sequencing.
13
High resolution: differences between single base pair can be detected.
High accuracy: digital signals to accurately detect the number of copies ranging from several to
hundreds of thousands.
Figure 5-1 Experimental pipeline of small RNA analysis
Case study
Extract about 20 μg of total RNA from an animal tissue, conduct the high-throughput sequencing
and do bioinformatics analysis.
Length distribution of small RNA
The length of small RNA is centered on 22 nt (more than 90%). This illustrates small RNA
sequencing is reliable. The length distribution of small RNA from the tissue is shown in Fig5-2.
Figure 5-2 The distribution of small RNA
Annotating and classifying small RNA sequences
After removing contaminants of adaptor and low quality sequences, 3,333,504 reads are
14
generated. Align the sequence to database of miRBase, mRNA/EST and rRNA, and identify known
and candidate miRNAs (Fig.5-3).
Figure 5-3 Proportion of miRNAs and other categories of RNA
Among these data, the most of unique_reads is exon, but miRNA is the major part of total reads.
These miRNA data make the the result more reliable for predicting the novel miRNA.
Different expression patterns analysis on miRNAs
Different miRNAs show different expression patterns in the same tissue (Fig. 5-4).That is
relative to the difference of the tissue and the selective expression of gene.
Expression level(10K)
Figure 5-4 Expression profile for part of miRNAs in the same tissue
Figure 5-5 Expression profile for part of miRNAs in the different tissue
15
As shown in Fig. 5-5, the expression of miRNA is tissue-specific (A, B, C, D, E, F, G, H, I, J
indicate different tissues respectively, has-let-7b and has-miR-22 indicate different miRNA genes).
Identification of miRNA nucleotide bias
As a special kind of RNA, there usually is U in 5’ end, but not G. The position of 2 and 4 base is
short at U. Generally speaking, all positions are short at G but the fourth position. miRNAs have
high conservation in sequence, high time orderency and tissue specificity. The count of all variants
of a miRNA gene can be used as a digital measure its expression level. (Fig. 5-6)
Percent
Figure 5-6 The distribution of all bases
Identification of miRNA related to repeat sequence
Except acting as sequence specific guides to regulate mRNA stability or inhibit protein
synthesis, lots of recent studies discovered some novel small RNA types which bound with
different Agonaute proteins and involved in some important biological process, such as chromatin
maintenance and transposon control. These small RNAs always derived from highly repeated
elements and called repeat-associated small RNAs (always interchangeable with piRNA).
According to type of Agonaute proteins they bind to, these small RNAs can be future divided into
different classes. (Fig. 5-7)
Figure 5-7 the distribution of repeat
16
Prediction of new miRNA candidates
miRNA precursors have characteristic fold-back structure, which can be used to predict novel
miRNAs. By folding the flanking genome sequence of small RNAs, followed by analysis of its
structural features, we can identify novel miRNA candidates. (Fig. 5-8)
Figure 5-8 the identification of novel miRNA
Expression differences of miRNA between two samples
One type of gene at different condition has differential expression. Expression level of known
miRNA between different samples and use Log2-ratio drawing, Scatter plot drawing to campare
known miRNA expressed in different samples. (Figure 5-9,5-10)
1000000
Expression level(Day7)
10000
100
1
1
0
10
00
00
10
00
10
Expression level(Day2)
Figure 5-9 Scatter plot of different Samples
Figure 5-10 Log2-ratio of different Samples
17
Clustering differentially expressed miRNA between two samples
Analysis clusterly miRNA gene which standardized to 1TPM by sequence similarlity. Cluster
the similar sequence .Red indicated up trend, green indicated down trend ,and gray the gene which
hasn’t expressed in any sample.
Figure 5-11 cluster analysis of miRNA
References
Y Zhang,X Zhou,X Ge, et al.(2009)Insect-Specific microRNA Involved in the Development of the Silkworm
Bombyx mori.PLos One.
Xi Chen,QB Li,J Wang,et al. (2009)Identifucation and characterization of novel amphioxus microRNAs by
Solexa sequencing. Genome Biology.
JM Guo,Y Miao,BX Xiao,et al. (2009)Differential expression of microRNA species in humann gastric cancer
versus non-tumorous tissues.J Gastroenterol Hepatol.
Xi Chen,Yi Ba,LJ Ma,et al. (2008)Characterization of microRNAs in serum: a novel class of biomarkers for
diagnosis of cancer and other diseases.Cell Research.
18
XH Wang,S Tang,SY Le,et al. (2008)Aberrant Expression of Oncogenic and Tumor-Supperessive MicorRNAs in
Cervical Cancer Is Required for Cancer cell Growth.PLos One.
Mi S, Cai T, Hu Y, Chen Y, Hodges E, et al. (2008) Sorting of Small RNAs into Arabidopsis Ar-gonaute
Complexes Is Directed by the 5’ Terminal Nucleotide. Cell.
Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE, et al. (2008) Specificity of ARGONAUTE7-miR390
Interaction and Dual Functionality in TAS3 Trans-Acting siRNA Formation. Cell.
Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, et al. (2008) Application of massively parallel
sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res.
Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, et al. (2008) Identification of microRNAs and other small
regulatory RNAs using cDNA library sequencing. Methods 44(1): 3-12.
Ibarra I, Erlich Y, Muthuswamy SK, Sachidanandam R, Hannon GJ (2007) A role for microRNAs in maintenance
of mouse mammary epithelial progenitor cells. Genes Dev 21(24): 3238-3243.
Frequently asked questions

Please provide total RNA with concentration of no less than 750 ng/μl and minimum quantity
of no less than 20 μg. We recommend that customers should avoid using spin columns to
extract total RNA. We use the Agilent machine to detect the number of RIN, so you better
send the total RNA. You also could detect your sample in OD or gel.
4. What is the TA clone for?
TA is after the construct of the library as detecting the quality of the library. We choose more
than 80 fragments to TA, using Sanger for sequencing. Compare the insert fragments with the
database. TA can avoid the dad result after sequencing.
5. What should the customer offer beside samples ?
You should offer the information of genome and exon/intron, repeat, if the sample don’t have
genome, you must offer the nearest specie’s information.
6. How long can I get the data ?
We promise to submit the report in 50 work days after affirming the pro money received.
7. How can I understand the data? What should I do?
In the BGI miRNA result repots, there is a part-analysis method remark. You could
understand the method by it. The README part can also help you find the answer.
19

BGi RNA-Seq Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BGi RNA-Seq Analysis

Uploaded by

Copyright:

Available Formats

RNA-Seq analysis

1. RNA-seq without reference genome (De novo transcriptome)

1.1 Sequencing and basic data processing

will then conduct the following technical route: sample preparation→sequencing.

1.2 Bioinformatics analysis

1) Statistics and quality assessment of output

2．RNA-Seq with reference genome（In reference transcriptome）

2.1 Sequencing and basic data processing

2.2 Bioinformatics analysis

1) Summary of data output and

3．Non-coding RNA analysis

Figure 3． Flowchart of RNA-Seq

1. Identification of genes (De novo transcriptome only)

Technical features of RNA-Seq

Discover new alternative spliced transcripts

Fig4.RNA-Seq versus microarrays

Identify 5’and 3’UTRs in yeast

Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq

Detect more low abundance transcripts

Detect gene fushion

Researchers at the University

Fig 7 RNA-seq detect gene fushion

Quantify RNA expression level

Researchers in Yale University found a strong correlation (R = 0.9775) between

Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and

1. What are the samples requirements?

1. RNA-seq without reference genome (De novo transcriptome)

1.1 Sequencing and basic data processing

will then conduct the following technical route: sample preparation→sequencing.

1.2 Bioinformatics analysis

10) Statistics and quality assessment of output

2．RNA-Seq with reference genome（In reference transcriptome）

2.1 Sequencing and basic data processing

2.2 Bioinformatics analysis

10) Summary of data output and

3．Non-coding RNA analysis

Figure 3． Flowchart of RNA-Seq

6. Identification of genes (De novo transcriptome only)

Technical features of RNA-Seq

Discover new alternative spliced transcripts

Fig4.RNA-Seq versus microarrays

Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq

Detect more low abundance transcripts

Fig6.Transcriptome study can detect more low abundance

Researchers at the University

Fig 7 RNA-seq detect gene fushion

Quantify RNA expression level

Researchers in Yale University found a strong correlation (R = 0.9775) between

Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and

2. What are the samples requirements?

Sequencing and basic data processing

Figure 5-1 Experimental pipeline of small RNA analysis

Length distribution of small RNA

Figure 5-2 The distribution of small RNA

Annotating and classifying small RNA sequences

Figure 5-3 Proportion of miRNAs and other categories of RNA

Different expression patterns analysis on miRNAs

Identification of miRNA nucleotide bias

Figure 5-6 The distribution of all bases

Identification of miRNA related to repeat sequence

Figure 5-7 the distribution of repeat

Figure 5-8 the identification of novel miRNA

Expression differences of miRNA between two samples