You are on page 1of 19

RNA-Seq analysis

The transcriptome is the total set of transcripts, mRNA and non-coding RNA, in one or a
population of cells under specific conditions. The transcriptome analysis lay the foundation of
gene structure and function research. Based on next-generation high-throughput sequencing
technologies, RNA-seq found its applications in many research fields including fundamental
science research, medical research and drug development.

Services

1. RNA-seq without reference genome (De novo transcriptome)

1.1 Sequencing and basic data processing

We will first test the quality of total RNA provided by the customer. If the sample is qualified, we

will then conduct the following technical route: sample preparation→sequencing.

The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminations of samples.

1.2 Bioinformatics analysis

1) Statistics and quality assessment of output


data
2) Contig length distribution
3) Scaffold-gene length distribution
4) Functional annotation of the scaffold-gene
5) GO categories of the scaffold-gene
6) Differentially expressed scaffold-gene
7) Protein function prediction and classification
8) Enriched metabolic pathway of
scaffold-gene
9) Enriched GO categories of differentially Figure 1. RNA-Seq (De novo)
expressed scaffold-gene

2.RNA-Seq with reference genome(In reference transcriptome)

2.1 Sequencing and basic data processing

We will first test the quality of total RNA provided by the customer. If the sample is qualified, we

1
will then conduct the following technical route: sample preparation→sequencing.

The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminations of samples.

2.2 Bioinformatics analysis

1) Summary of data output and


alignment to reference
sequences
2) Distribution of reads in
reference genome
3) Randomness assessment of
sequencing
4) Gene coverage and sequencing
depth
5) Differentially expressed genes
6) Optimization of gene structure
7) Identification of alternative
spliced transcripts
8) Identification of novel genes Figure2. RNA-Seq (In reference)
9) Identification of gene fusion

3.Non-coding RNA analysis

We will first test the quality of total RNA or size-fractionated RNA (eg. 200-700 nt) provided by
the customer. If the sample is qualified, we will then conduct the following technical route: sample

preparation→sequencing.

Experimental pipeline

Figure 3. Flowchart of RNA-Seq

2
Application of RNA-Seq

1. Identification of genes (De novo transcriptome only)


2. Structure of transcripts: Identification of untranslated region (UTR), boundary of intron,
alternative splicing and start codon, etc.
3. Identification of non-coding unit: Non-coding RNA, precursor of microRNA, etc.
4. Determing gene expression in transcriptional level
5. Identification of new transcription unit

Technical features of RNA-Seq

capacity RNA-Seq
Detected signals Digital signals
Detected range Nearly all the transcripts
Detected accuracy From several copies to 100,000 copies
Resolution Allele specific expression, alternative splicing

Case Study

Discover new alternative spliced transcripts

Marc Sultan et al. reported that RNA-Seq can detect 25% more genes than
those by microarrays. A global survey of messenger RNA splicing events
identified 94,241 splice junctions (4096 of which were previously unidentified)
in a study of human embryonic kidney and B cell.

Fig4.RNA-Seq versus microarrays


A.Comparison of the number of expressed genes detected by RNA-Seq and microarrays
B.Distribution of the RNA-Seq NEs and the proportion of genes detected on microarrays.
Genes missed by microarrays are shown with gray (HEK) and black (B cells) bars. Genes

3
detected by microarrays are shown with light red (HEK) and dark red (B cells) bars.

Identify 5’and 3’UTRs in yeast

After comparing 5’ RACE results with RNA-Seq results, researchers found both
methods identified 5’ boundaries within 50 bp of one another for 786 genes
(77.9%). RNA-Seq could identify the 3’ boundary precisely.

Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq


A. The 5′ UTRs determined by RNA-Seq and by 5′ RACE for gene YKL004W
B. 3′ UTR determined by RNA-Seq for gene YDR419W. A colored box represents an ORF and
an arrow indicates the transcription direction.

Detect more low abundance transcripts

In rice RNA-seq project in the Beijing Genomics Insitute, we found RNA-seq can find more low
abundance genes than traditional methods. (Fig. 6-A)

 A B

4
Fig6.Transcriptome study can detect more low abundance
transcripts than cDNA sequencing
A.The length distribution of newly identified transcripts.
B.A comparison of expression level between novel transcripts and cDNA genes.

Detect gene fushion

Researchers at the University


of Michigan performed the
transcriptome sequencing of
patient cell lines and tumor
samples using 454 together with
the GA(Solexa) to discover new
gene fusion in prostate cancer.
This established high-throughput
sequencing as a reliable method
for discovering new gene fusion
and other disease-related
mutation.

Fig 7 RNA-seq detect gene fushion

Quantify RNA expression level

Researchers in Yale University found a strong correlation (R = 0.9775) between


the qPCR and RNA-Seq data of the 34 genes predicted to be expressed at a
range of high, medium, and low expression level respectively.

15
con=0.9775
10
qPCR(log2)

5
0

-5
-10
-10 -5 0 5 10 15
RNA-seq(log2)

Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and


qRCR at the transcriptional level

5
Reference

Sultan, M, Schulz, M. H.A global view of gene activity and alternative splicing by deep sequencing of the human
transcriptome. et al., Science 321 (5891), 956 (2008).
Maher, C. A,Kumar-Sinha, C,Cao, X. Transcriptome sequencing to detect gene fusions in cancer. et al., Nature
458 (7234), 97 (2009).
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, The Transcriptional Landscape of the YeastGenome Defined
by RNA Sequencing. et al.,Science 320 (5881), 1344 (2008).
Wilhelm BT, Marguerat S, Watt S.Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide
resolution. et al., Nature 453 (7199), 1239 (2008).
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B.Mapping and quantifying mammalian transcriptomes
by RNA-Seq. et al., Nat Methods 5 (7), 621 (2008).

FAQ

1. What are the samples requirements?


Please provide total RNA with concentration no less than 400 ng/μl and quantity no less than 20
μg. Minimum quantity requirement is 10 μg. The RNA quality requirement: OD260/280 is 1.8-2.2,
28S/18S >1.8, RIN ≥8. The customers should ship the RNA a week before sequencing.
2. Can the Beijing Genomics Institute (BGI) perform the transcriptome analysis of
bacteria?
Yes. We recommend the customer to submit purified mRNA or cDNA rather than total RNA.
3. Can the BGI perform the non-coding RNA sequencing?
Yes. We recommend customer to submit RNA free of rRNA and tRNA.
4. How many unigenes can be retrieved from 1 Gb sequencing data?
In general, more than 6000 unigenes more than 1Kb in length can be identified from 1 Gb
sequencing data. However, the exact number of unigenes more than 1Kb will vary according to
the nature of the sample.
5. What species have the BGI sequenced?
We have sequenced many model organisms and main crops, for example Homo sapiens,
Nematoda, Silkworm, Arabidopsis thaliana, rice and corn etc. Many novel structures and
transcripts were identified. We also performed the transcriptome sequencing of many species
without reference genome such as trees, flowers, vegetables, insects, fishes and fungi etc.

6
RNA-Seq analysis
The transcriptome is the total set of transcripts, mRNA and non-coding RNA, in one or a
population of cells under specific conditions. The transcriptome analysis lay the foundation of
gene structure and function research. Based on next-generation high-throughput sequencing
technologies, RNA-seq found its applications in many research fields including fundamental
science research, medical research and drug development.

Services

1. RNA-seq without reference genome (De novo transcriptome)

1.1 Sequencing and basic data processing

We will first test the quality of total RNA provided by the customer. If the sample is qualified, we

will then conduct the following technical route: sample preparation→sequencing.

The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminations of samples.

1.2 Bioinformatics analysis

10) Statistics and quality assessment of output


data
11) Contig length distribution
12) Scaffold-gene length distribution
13) Functional annotation of the scaffold-gene
14) GO categories of the scaffold-gene
15) Differentially expressed scaffold-gene
16) Protein function prediction and classification
17) Enriched metabolic pathway of
scaffold-gene
18) Enriched GO categories of differentially Figure 1. RNA-Seq (De novo)
expressed scaffold-gene

2.RNA-Seq with reference genome(In reference transcriptome)

2.1 Sequencing and basic data processing

We will first test the quality of total RNA provided by the customer. If the sample is qualified, we

7
will then conduct the following technical route: sample preparation→sequencing.

The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminations of samples.

2.2 Bioinformatics analysis

10) Summary of data output and


alignment to reference
sequences
11) Distribution of reads in
reference genome
12) Randomness assessment of
sequencing
13) Gene coverage and sequencing
depth
14) Differentially expressed genes
15) Optimization of gene structure
16) Identification of alternative
spliced transcripts
17) Identification of novel genes Figure2. RNA-Seq (In reference)
18) Identification of gene fusion

3.Non-coding RNA analysis

We will first test the quality of total RNA or size-fractionated RNA (eg. 200-700 nt) provided by
the customer. If the sample is qualified, we will then conduct the following technical route: sample

preparation→sequencing.

Experimental pipeline

Figure 3. Flowchart of RNA-Seq


8
Application of RNA-Seq

6. Identification of genes (De novo transcriptome only)


7. Structure of transcripts: Identification of untranslated region (UTR), boundary of intron,
alternative splicing and start codon, etc.
8. Identification of non-coding unit: Non-coding RNA, precursor of microRNA, etc.
9. Determing gene expression in transcriptional level
10. Identification of new transcription unit

Technical features of RNA-Seq

capacity RNA-Seq
Detected signals Digital signals
Detected range Nearly all the transcripts
Detected accuracy From several copies to 100,000 copies
Resolution Allele specific expression, alternative splicing

Case Study

Discover new alternative spliced transcripts

Marc Sultan et al. reported that RNA-Seq can detect 25% more genes than
those by microarrays. A global survey of messenger RNA splicing events
identified 94,241 splice junctions (4096 of which were previously unidentified)
in a study of human embryonic kidney and B cell.

Fig4.RNA-Seq versus microarrays


A.Comparison of the number of expressed genes detected by RNA-Seq and microarrays
B.Distribution of the RNA-Seq NEs and the proportion of genes detected on microarrays.
Genes missed by microarrays are shown with gray (HEK) and black (B cells) bars. Genes
detected by microarrays are shown with light red (HEK) and dark red (B cells) bars.

9
Identify 5’and 3’UTRs in yeast

After comparing 5’ RACE results with RNA-Seq results, researchers found both
methods identified 5’ boundaries within 50 bp of one another for 786 genes
(77.9%). RNA-Seq could identify the 3’ boundary precisely.

Fig5. Identify 5’ and 3’UTRS in yeast using RNA-seq


B. The 5′ UTRs determined by RNA-Seq and by 5′ RACE for gene YKL004W
B. 3′ UTR determined by RNA-Seq for gene YDR419W. A colored box represents an ORF and
an arrow indicates the transcription direction.

Detect more low abundance transcripts

In rice RNA-seq project in the Beijing Genomics Insitute, we found RNA-seq can find more low
abundance genes than traditional methods. (Fig. 6-A)

 A B

Fig6.Transcriptome study can detect more low abundance


transcripts than cDNA sequencing
A.The length distribution of newly identified transcripts.
B.A comparison of expression level between novel transcripts and cDNA genes.

10
Detect gene fushion

Researchers at the University


of Michigan performed the
transcriptome sequencing of
patient cell lines and tumor
samples using 454 together with
the GA(Solexa) to discover new
gene fusion in prostate cancer.
This established high-throughput
sequencing as a reliable method
for discovering new gene fusion
and other disease-related
mutation.

Fig 7 RNA-seq detect gene fushion

Quantify RNA expression level

Researchers in Yale University found a strong correlation (R = 0.9775) between


the qPCR and RNA-Seq data of the 34 genes predicted to be expressed at a
range of high, medium, and low expression level respectively.

15
con=0.9775
10
qPCR(log2)

5
0

-5
-10
-10 -5 0 5 10 15
RNA-seq(log2)

Fig 8.Comparison of 34 ORFs indentified by RNA-Seq and


qRCR at the transcriptional level

11
Reference

Sultan, M, Schulz, M. H.A global view of gene activity and alternative splicing by deep sequencing of the human
transcriptome. et al., Science 321 (5891), 956 (2008).
Maher, C. A,Kumar-Sinha, C,Cao, X. Transcriptome sequencing to detect gene fusions in cancer. et al., Nature
458 (7234), 97 (2009).
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, The Transcriptional Landscape of the YeastGenome Defined
by RNA Sequencing. et al.,Science 320 (5881), 1344 (2008).
Wilhelm BT, Marguerat S, Watt S.Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide
resolution. et al., Nature 453 (7199), 1239 (2008).
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B.Mapping and quantifying mammalian transcriptomes
by RNA-Seq. et al., Nat Methods 5 (7), 621 (2008).

FAQ

2. What are the samples requirements?


Please provide total RNA with concentration no less than 400 ng/μl and quantity no less than 20
μg. Minimum quantity requirement is 10 μg. The RNA quality requirement: OD260/280 is 1.8-2.2,
28S/18S >1.8, RIN ≥8. The customers should ship the RNA a week before sequencing.
2. Can the Beijing Genomics Institute (BGI) perform the transcriptome analysis of
bacteria?
Yes. We recommend the customer to submit purified mRNA or cDNA rather than total RNA.
5. Can the BGI perform the non-coding RNA sequencing?
Yes. We recommend customer to submit RNA free of rRNA and tRNA.
6. How many unigenes can be retrieved from 1 Gb sequencing data?
In general, more than 6000 unigenes more than 1Kb in length can be identified from 1 Gb
sequencing data. However, the exact number of unigenes more than 1Kb will vary according to
the nature of the sample.
5. What species have the BGI sequenced?
We have sequenced many model organisms and main crops, for example Homo sapiens,
Nematoda, Silkworm, Arabidopsis thaliana, rice and corn etc. Many novel structures and
transcripts were identified. We also performed the transcriptome sequencing of many species
without reference genome such as trees, flowers, vegetables, insects, fishes and fungi etc.

12
Small RNA analysis

RNA is one of the most important parts of the bio-material which constructs the framework of
life with DNA and protein together. Small RNA regulates life, such as the development and
growth of cell, the transcription and translation of gene, as well as the gene silence. Small RNA
sequencing is based on solexa technology, the deep sequencing yield numerous small fragments
from 18 to 30nt, we compare them with the known and relative species, find out the difference
between different samples and predict the novel miRNA, furthermore study its function.

Services

Sequencing and basic data processing


We will first test the quality of small RNA provided by the customer. If the sample is qualified,
we will then conduct the following technical route: sample preparation->TA clone->sequencing
reaction.
The basic data analysis includes image recognition, base calling, filtering adapter sequences and
detecting contaminates of samples.

Bioinformatics analysis
Items of basic bioinformatics analysis
Length distribution of small RNA
Mapping small RNA sequences to genome sequences and exploring features of distribution along each chromosome
Differential small RNA between two samples
Comparing small RNA sequences with known miRNAs deposited at miRBase (miRBase13.0)
Identification of rRNA, tRNA, snRNA, snoRNA against Rfam (9.1) and Genebank
Identifying repeats associated with small RNAs
Identifying mRNA degradated fragments and siRNA candidates
Items of basic bioinformatics analysis
Annotating and classifying miRNA
Prediction of novel miRNA
Expression of miRNA
Differential expression analysis of miRNA gene and construction of miRNA expression profiles
Clustering analysis of differentially expressed miRNA
Target prediction of miRNA (only for plant)

Technical features

High-throughput: more than 2.5 millions reads can be obtained through the single-pass
sequencing.

13
High resolution: differences between single base pair can be detected.
High accuracy: digital signals to accurately detect the number of copies ranging from several to
hundreds of thousands.

Experimental pipeline

Figure 5-1 Experimental pipeline of small RNA analysis

Case study
Extract about 20 μg of total RNA from an animal tissue, conduct the high-throughput sequencing
and do bioinformatics analysis.

Length distribution of small RNA

The length of small RNA is centered on 22 nt (more than 90%). This illustrates small RNA
sequencing is reliable. The length distribution of small RNA from the tissue is shown in Fig5-2.

Figure 5-2 The distribution of small RNA

Annotating and classifying small RNA sequences

After removing contaminants of adaptor and low quality sequences, 3,333,504 reads are

14
generated. Align the sequence to database of miRBase, mRNA/EST and rRNA, and identify known
and candidate miRNAs (Fig.5-3).

Figure 5-3 Proportion of miRNAs and other categories of RNA

Among these data, the most of unique_reads is exon, but miRNA is the major part of total reads.
These miRNA data make the the result more reliable for predicting the novel miRNA.

Different expression patterns analysis on miRNAs

Different miRNAs show different expression patterns in the same tissue (Fig. 5-4).That is
relative to the difference of the tissue and the selective expression of gene.
Expression level(10K)

Figure 5-4 Expression profile for part of miRNAs in the same tissue

Figure 5-5 Expression profile for part of miRNAs in the different tissue

15
As shown in Fig. 5-5, the expression of miRNA is tissue-specific (A, B, C, D, E, F, G, H, I, J
indicate different tissues respectively, has-let-7b and has-miR-22 indicate different miRNA genes).

Identification of miRNA nucleotide bias

As a special kind of RNA, there usually is U in 5’ end, but not G. The position of 2 and 4 base is
short at U. Generally speaking, all positions are short at G but the fourth position. miRNAs have
high conservation in sequence, high time orderency and tissue specificity. The count of all variants
of a miRNA gene can be used as a digital measure its expression level. (Fig. 5-6)
Percent

Figure 5-6 The distribution of all bases

Identification of miRNA related to repeat sequence

Except acting as sequence specific guides to regulate mRNA stability or inhibit protein
synthesis, lots of recent studies discovered some novel small RNA types which bound with
different Agonaute proteins and involved in some important biological process, such as chromatin
maintenance and transposon control. These small RNAs always derived from highly repeated
elements and called repeat-associated small RNAs (always interchangeable with piRNA).
According to type of Agonaute proteins they bind to, these small RNAs can be future divided into
different classes. (Fig. 5-7)

Figure 5-7 the distribution of repeat

16
Prediction of new miRNA candidates

miRNA precursors have characteristic fold-back structure, which can be used to predict novel
miRNAs. By folding the flanking genome sequence of small RNAs, followed by analysis of its
structural features, we can identify novel miRNA candidates. (Fig. 5-8)

Figure 5-8 the identification of novel miRNA

Expression differences of miRNA between two samples

One type of gene at different condition has differential expression. Expression level of known
miRNA between different samples and use Log2-ratio drawing, Scatter plot drawing to campare
known miRNA expressed in different samples. (Figure 5-9,5-10)

1000000
Expression level(Day7)

10000

100

1
1

0
10

00

00
10

00
10

Expression level(Day2)

Figure 5-9 Scatter plot of different Samples

Figure 5-10 Log2-ratio of different Samples

17
Clustering differentially expressed miRNA between two samples

Analysis clusterly miRNA gene which standardized to 1TPM by sequence similarlity. Cluster
the similar sequence .Red indicated up trend, green indicated down trend ,and gray the gene which
hasn’t expressed in any sample.

Figure 5-11 cluster analysis of miRNA

References

Y Zhang,X Zhou,X Ge, et al.(2009)Insect-Specific microRNA Involved in the Development of the Silkworm
Bombyx mori.PLos One.
Xi Chen,QB Li,J Wang,et al. (2009)Identifucation and characterization of novel amphioxus microRNAs by
Solexa sequencing. Genome Biology.
JM Guo,Y Miao,BX Xiao,et al. (2009)Differential expression of microRNA species in humann gastric cancer
versus non-tumorous tissues.J Gastroenterol Hepatol.
Xi Chen,Yi Ba,LJ Ma,et al. (2008)Characterization of microRNAs in serum: a novel class of biomarkers for
diagnosis of cancer and other diseases.Cell Research.

18
XH Wang,S Tang,SY Le,et al. (2008)Aberrant Expression of Oncogenic and Tumor-Supperessive MicorRNAs in
Cervical Cancer Is Required for Cancer cell Growth.PLos One.
Mi S, Cai T, Hu Y, Chen Y, Hodges E, et al. (2008) Sorting of Small RNAs into Arabidopsis Ar-gonaute
Complexes Is Directed by the 5’ Terminal Nucleotide. Cell.
Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE, et al. (2008) Specificity of ARGONAUTE7-miR390
Interaction and Dual Functionality in TAS3 Trans-Acting siRNA Formation. Cell.
Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, et al. (2008) Application of massively parallel
sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res.
Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, et al. (2008) Identification of microRNAs and other small
regulatory RNAs using cDNA library sequencing. Methods 44(1): 3-12.
Ibarra I, Erlich Y, Muthuswamy SK, Sachidanandam R, Hannon GJ (2007) A role for microRNAs in maintenance
of mouse mammary epithelial progenitor cells. Genes Dev 21(24): 3238-3243.

Frequently asked questions

3. What are the samples requirements?


Please provide total RNA with concentration of no less than 750 ng/μl and minimum quantity
of no less than 20 μg. We recommend that customers should avoid using spin columns to
extract total RNA. We use the Agilent machine to detect the number of RIN, so you better
send the total RNA. You also could detect your sample in OD or gel.
4. What is the TA clone for?
TA is after the construct of the library as detecting the quality of the library. We choose more
than 80 fragments to TA, using Sanger for sequencing. Compare the insert fragments with the
database. TA can avoid the dad result after sequencing.
5. What should the customer offer beside samples ?
You should offer the information of genome and exon/intron, repeat, if the sample don’t have
genome, you must offer the nearest specie’s information.
6. How long can I get the data ?
We promise to submit the report in 50 work days after affirming the pro money received.
7. How can I understand the data? What should I do?
In the BGI miRNA result repots, there is a part-analysis method remark. You could
understand the method by it. The README part can also help you find the answer.

19

You might also like