You are on page 1of 34

EMBL

Subject : Bioinformatic

Lesson : European Molecular Biology Laboratory (EMBL)

Lesson Developer: Sandip Das

Department/ College: Department of Botany, University of Delhi

Institute of Life Long Learning 0


EMBL

Table of Contents Contents

Chapter : European Molecular Biology Laboratory


Introduction
European Bioinformatics Institute

Databases at EBI
Nucleotide databases
Functional Genomics Databases
Protein databases
Structure databases

Sequence Analysis
Pairwise analysis
Multiple sequence alignment
Homology Searching

Summary
Exercises
Glossary
References

Institute of Life Long Learning 1


EMBL

Introduction

Keeping with the tremendous growth in field of computational biology, a need was felt to
establish an independent and parallel research institute that would act not just as a mirror
housing the Genbank nucleotide resources of NCBI, but would also develop matching
databases and analysis tools. The European Molecular Biology Laboratory (EMBL) was
thus established in 1974 and is now supported with funding from 20 members states of the
European Union (EU), Israel and Australia. EMBL currently operates five research institutes
in different countries with main institute at Heidelberg, Germany.

The five institutes of EMBL with their core research activities are (http://www.embl.org/):

a. EMBL Heidelberg (Germany; http://www.embl.de/)

b. EMBL Grenoble (France; http://www.embl.fr/index.php)- Structural Biology

c. EMBL-European Bioinformatics Institute (EBI) Hinxton (UK; http://www.ebi.ac.uk/)-


Bioinformatics

d. EMBL Hamburg (Germany; http://www.embl-hamburg.de/index.php) Structural


Biology

e. EMBL Monterotondo (Italy; http://www.embl.it/index.php)- Mouse Biology

The broad goals of EMBL are:

a. Basic research in Molecular biology

b. Training manpower i.e. students, scientist and visitors

c. Develop new tools, technologies and methods

d. Offer service to the research community

e. Transfer technology to industry for commercialization

Institute of Life Long Learning 2


EMBL

We would limit ourselves to the bioinformatics research, database and facilities of EMBL
that is located at EMBL-EBI in the following sections.

European Bioinformatics Institute (EBI)

The European Bioinformatics Institute (EBI-EMBL) was established in 1980 as the EMBL
Nucleotide Sequence Data Library at Heidelberg and in-fact was the worlds first public
nucleotide database preceding NCBI by eight years (NCBI was established in 1988) with an
objective to create database of published nucleotide sequences. Subsequently in 1992, the
EMBL decided to establish EBI as a dedicated research cum analysis facility at the Wellcome
Trust Genome Campus, Hinxton (UK) in close proximity to Sanger Sequencing Center.

At present, EBI-EMBL houses databases and provides service and analysis tools for all major
research disciplines requiring computational support. In addition, EBI is also a partner and
coordinator for International Nucleotide Sequence Database Collaboration (INSDC;
http://www.insdc.org) for public domain nucleotide sequence information together with
Genbank at NCBI (www.ncbi.nim.nih.gov) and DNA databank of Japan (DDBJ;
www.ddbj.nig.ac.jp).

The following are the broad categories of databases at EBI-EMBL:

a. Biological Ontologies

b. Literature

c. Functional Genomics or microarray

d. Nucleotides

e. Pathways and Networks

f. Protein

g. Proteomics

h. Small Molecules

i. Structure

Institute of Life Long Learning 3


EMBL

Figure: Webportal of EMBL and EBI

Source: http://www.embl.org/ , http://www.ebi.ac.uk/

Databases at EBI
The following section will deal with selected databases of EBI-EMBL:

Nucleotide databases

a. European Nucleotide Archive (ENA): ENA receives nucleotide data from a variety of
sources, including small scale sequencing studies, sequencing centers and the INSDC (i.e.
Genbank and DDBJ). In order to better manage the sequencing resources, ENA has been
divided in several sub-databases such as

ENA-Genome - for genome sequencing data

Institute of Life Long Learning 4


EMBL

Sequence Read Archive (SRA) for Next Generation Sequencing (NGS) data

EMBL-Bank- for assembled and annotated sequence data (note that submission
of nucleotide data should be done at either Genbank, EBI or DDBJ and not to all of
these, as the data submitted in one of the database is automatically replicated or
sent to the other two)

Institute of Life Long Learning 5


EMBL

Figure: ENA database with information about sequences

Source: http://www.ebi.ac.uk/ena/

Institute of Life Long Learning 6


EMBL

b. DGva: Database of Genomic Variant Archive (DGVa) is a publicly accessible


database that stores information about genomic structural variants having role in
causing diseases. Such variant may be in the form of

size ranging from few nucleotides to several Kilobase or even Megabases,

structural, i.e. insertions, deletions, translocations, and

copy number variants (CNV)

The DGVa is analogous to the dbVar database of NCBI. The data at DGVa can be accessed
via the ensemble (www.ensembl.org) portal

Institute of Life Long Learning 7


EMBL

Figure : DGVa database accessed through ENSEMBL portal

Source: www.ensembl.org

Institute of Life Long Learning 8


EMBL

c. EGA: The European Genome Phenome Archive (EGA) stores data from studies that
are carried out with an objective to understand the linkages between genotype and
phenotype, especially from biomedical research. This database is analogous to the dbGaP
database at NCBI. Such data may have been generated from Genome wide association
studies (GWAS). As the studies and datasets generally deals with disorders such as
cancer, coronary artery defects, hypertension, Rheumatoid arthritis and diabetes, strict
control during submission and public access is implemented on ethical grounds (as it
contains information about patients and subjects taking part in the study) to prevent misuse
or data.

Figure : The European Genome Pheome database

Source: https://www.ebi.ac.uk/ega/

Institute of Life Long Learning 9


EMBL

A Data Access Committee (DAC) determines on a case-to-case basis the access to EGA
dataset by users, and requires signing of an agreement for downloading and subsequent
use of the dataset.

d. ENA- Genome: This database contains the completed genome sequence data from a
variety of organisms such as:

Archaea and archeal virus

Bacteria

Eukaryotes

Organelles

Phages

Plasmid

Viroids

EMBL-EBI developed the ENSEMBL genomes tool to browse, analyse and visualize the
genome sequencing data. Currently, there are close to 350 completed genome
sequences available for browsing, analysis and downloading. The sequence analysis
tools at ENSEMBL genome server provides tools for analysis at all levels of genome
organization, such as whole genome, chromosome, genome segment, gene and
transcript level. The genome visualization and analysis tool at ENSEMBL genome also
provides links to molecular function, gene ontology, protein summary and structure
tables.

Institute of Life Long Learning 10


EMBL

Figure : Genome browser at ENSEMBL provides genome visualization and analysis tool
at various levels of genome organization

Source: http://plants.ensembl.org/index.html

Institute of Life Long Learning 11


EMBL

e. Several other databases such as Immuno Polymorphism database (IPD) (such


as IMGT/HLA, IMGT/LIGM, IPD-MHC, IPD-KIR e etc), Metagenomics and
Patentdata resources are also part of the nucleotide resources at EBI-EMBL.

IMGT/HLA database is the nucleotide sequence database for human major


histocompatibility complex HLA. This database is a part of the International
ImmunoGenetics Project (IMGT) and the data has been subdivided into the
following five classes of alleles of HLA
(http://www.ebi.ac.uk/ipd/imgt/hla/stats.html):

HLA Class I alleles (6725)

HLA Class II alleles (1771)

HLA alleles (8496)

Other non-HLA alleles (148)

Confidential alleles (8)

Alignment tools built into the database allows users to perform analysis and detect
polymorphism at HLA loci.

IMGT/LIGM similarly is a database for Immuglobulins and T-Cell receptors

IPD- MHC contains sequences for Major histocompatibility factors for a large
number of species

IPD-HPA is the database for human platelet antigens

IPD-KIR is the database for human Killer cell Immunoglobulin like Receptors and
contains information about 614 alleles ( http://www.ebi.ac.uk/ipd/kir/stats.html)

EBI-Metagenomic contains sequence information from microflora samples that


have been collected from various environments. Some such examples include core
gut microflora, aquatic microflora from Antarctica, glaciers, ocean samples, meat
samples and so on. The metagenome sequences are analyzed to reveal the

Institute of Life Long Learning 12


EMBL

frequency of predicted CDS (coding DNA sequence), their GO (genome Ontology)


annotation, putative proteins with biochemical, cellular and molecular functions.

Figure : The IPD and Metagenome web portal at EBI-EMBL

Source: https://www.ebi.ac.uk/metagenomics/

Institute of Life Long Learning 13


EMBL

Figure : Analysis of sequence data at Metagenome to reveal the genetic makeup of the

Sample

Source: https://www.ebi.ac.uk/metagenomics/

Functional Genomics database:

ArrayExpress: This database contains functional genomics data including microarray data
that are MIAME compliant (see chapter of NCBI for MIAME information). As on December
2012, the database contained dataset from 34145 experiments. ArrayExpress also
contains a Gene Expression Atlas containing datasets from 3558 experiments from 99484

Institute of Life Long Learning 14


EMBL

assays performed under 20806 conditions. Both ArrayExpress-Experiments archives and


Gene Expression Atlas provide a searchable browser for users to access
data.

Institute of Life Long Learning 15


EMBL

Figure : ArrayExpress portal for functional genomics data

Source: https://www.ebi.ac.uk/arrayexpress/

ArrayExpress not only accepts microarray data, but also is a central repository for sequence
data that have been generated as part of functional genomics studies. For example,
sequence data from non-human and human non-identifiable experiments such as RNA-
sequencing (RNA-seq) and ChIP-Chip generated via high-throughput technologies are also
deposited at Arrayexpress. In case the data is from human-potentially identifiable then
it should be submitted to EGA (European Genome Phenome Archive) database
(http://www.ebi.ac.uk/fg/doc/help/UHTS_submissions.html). The data submitted to
ArrayExpress is eventually transferred to the ENA (European Nucleotides Archive), and the
Metadata from EGA can also be accessed via ArrayExpress.

Protein Database:

There are several databases catering to the various dimensions of bioinformatics needs for
protein analysis at EBI.

The InterPro is the primary, integrated database for protein families, domains and
functional sites. The InterPro collates information from various other resources to prepare
and catalogue information about protein structure, domain, function, signature and other
allows users to search and perform analysis. InterProScan is the software that allows
users to analyse their protein sequences againsts InterPro database.

UniProt is the centralized universal database for all protein sequences. Each of the protein
sequence is annotated with functional details, amino acid sequence, taxonomic description
and other information from related disciplines. UniProt has several sub-databases such as:

UniProt Archive (UniParc) containing non-redundant protein sequences from all


public databases such as EMBL-WGS, WormBase, USPTO, EPTO, FlyBase, PIR etc

UniProtUniRef: Cluster of similar protein sequences and is a representative


containing subset of UniProtArchive

UniProtKB/SwissProt: Annotated protein resources from UniProtKB for the


purpose of Gene Ontology

UniProt/UniMES: Protein database for environmental and metagenomic samples

Institute of Life Long Learning 16


EMBL

The database termed PANDIT: Protein and Associated Nucleotide Domains with
Inferred Trees is no longer supported by for updates but is available for browsing. It
contains a collection of proteins domains covering several phylogenetic trees in multiple
sequence alignment format

Figure: UniProt database provides a collection of non-redundant protein sequences;


whereas PANDIT is a pre-aligned database of proteins based on their domains and covers a
wide phylogenetic range.

Institute of Life Long Learning 17


EMBL

Source: http://www.uniprot.org/ , http://www.ebi.ac.uk/goldman-srv/pandit/

The CSA (Catalytic Site Atlas) contains a collection of catalytic sites for proteins and
enzymes from experimental and structural data. CSA contains information for nearly 29,000
entries.

Figure : Catalytic site atlas compiles information about catalytic sites of various enzymes
based on experimental and structural data

Source: http://www.ebi.ac.uk/thornton-srv/databases/CSA/

Institute of Life Long Learning 18


EMBL

Structure databases:

Structural information for a variety of molecules including proteins and other


macromolecular structure are grouped under the Structure database.

a. DSSP database stores nearly 84000 secondary structure assignments for Protein
Data Bank (PDB) entries (http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-
page+LibInfo+-lib+DSSP)
b. FSSP (Family of structurally similar proteins) stores alignments based on
structure for PDB entries
c. HSSP (Homology derived structures of proteins) is a database that merges 2-D
and 3-D information (structural) with 1-D (sequence) information
d. PDBe (Protein Data bank in Europe) is a database for experimentally derived
structures of protein and other biomolecules. The structure files can be viewed using
free viewers such as Rasmol (rasmol.org/)
e. EMDB (Electron Microscopy Data Bank) database stores electron microscopy
structural information for macromolecular complexes and subcellular structures. It
currently stores nearly 1570 structures (http://www.ebi.ac.uk/pdbe/emdb/)
f. PDBeNMR stores NMR derived structures in PDBe
g. PDBeChem is the central repository for chemical and ligand molecules found in
PDBe and currently has over 15000 ligand structures (http://www.ebi.ac.uk/pdbe-

Institute of Life Long Learning 19


EMBL

srv/pdbechem/)

Figure: Few examples of structures from EMDB and PDB database at EBI-EMBL

Source: http://www.ebi.ac.uk/pdbe/emdb/ , http://www.ebi.ac.uk/pdbe/

Institute of Life Long Learning 20


EMBL

Sequence Analysis
Analysis of sequence can be performed at EBI for several objectives including pairwise
alignment and multiple alignment.

Pairwise alignment:

Global alignment of nucleotide and protein sequences can be performed by


implementing Needleman and Wunsch Algorithm via EMBOSS needle webserver
(http://www.ebi.ac.uk/Tools/psa/emboss_needle/). Two sequences that need to be
aligned are to be submitted either by copy and paste in the window or the FASTA
files need to be uploaded followed by submission of the jobs.

Local alignment of two sequences can be submitted via EMBOSS water that
implement Smith-Waterman algorithm
(http://www.ebi.ac.uk/Tools/psa/emboss_water/)

Genewise compares a protein sequence to DNA sequences


(http://www.ebi.ac.uk/Tools/psa/genewise/)

Promoterwise tool compares two DNA sequences which may have undergone
inversions and translocations and thus is considered useful for detecting cis elements
such as promoters (http://www.ebi.ac.uk/Tools/psa/promoterwise/)

Institute of Life Long Learning 21


EMBL

Figure: Pairwise alignment of using global alignment using EMBOSS-needle, and DNA-
Protein using GeneWise

Source: http://www.ebi.ac.uk/Tools/psa/

Multiple sequence alignment:


Multiple Sequence alignment attempts to achieve the following objectives:

a. Identity conserved regions and thus are slow evolving

b. Identify diverged region and thus are rapidly evolving

c. Identity functional regions as domains, motifs and invariant residues

d. Perform molecular phylogeny

ClustalW: ClustalW is a preogressive method of sequence alignment and is available at


http://www.ebi.ac.uk/Tools/msa/clustalwo/ for web based service

Institute of Life Long Learning 22


EMBL

Figure : A multiple sequence alignment output obtained with progressive alignment tool
Clustal

Source: http://www.ebi.ac.uk/Tools/msa/clustalw2/

Institute of Life Long Learning 23


EMBL

MAFFT ((Multiple Alignment using Fast Fourier Transform) is deemed to be a fast Multiple
Sequence Alignment program available at http://www.ebi.ac.uk/Tools/msa/mafft/).

The third multiple sequence alignment tool at EBI is MUSCLE (Multiple Sequence
Comparison by Log-Expectation) which is not only faster but also accurate as compared
to other MSA tools

Institute of Life Long Learning 24


EMBL

Figure : A MAFFT output for multiple sequence alignment

Source: http://www.ebi.ac.uk/Tools/msa/mafft/

Homology Searches: Analysis of sequences using local alignment BLAST tool can be used
to search for homology or similarity.

a. BLASTN and BLASTP can be launched by clicking on


http://www.ebi.ac.uk/Tools/sss/ncbiblast/

b. Subsequently choose the program i.e. BLASTN or BLASTP.

c. After selecting the tool, an appropriate query is to be entered into the box (nucleotide for
BLASTN and protein for BLASTP),

d. select the appropriate database to be searched, and then

e. initiate the homology search by pressing the submit button

The results of BLASTN include

i. a summary overview of all the subjects that have matches to the query with
ranking based on the percent similarity, query coverage and e-value,

ii. a pairwise alignment between the query and the subject, and

iii. a graphical overview

The results of BLASTP is much richer and includes

i. a summary overview of all the subjects that have matches to the query with
ranking based on the percent similarity, query coverage and e-value,

ii. a pairwise alignment between the query and the subject, and

iii. a graphical overview

iv. Link to Gene Expression database (ArrayExpress) for expression profile of the
transcripts (subject)

v. Genome view with location of the subject

vi. Protein family information

vii. Gene Ontology Information i.e. functional classification

Institute of Life Long Learning 25


EMBL

Figure: Snapshots of the various output windows of BLASTN at EBI-EMBL

Institute of Life Long Learning 26


EMBL

Figure: Snapshots of BLASTP result output window-1

Source: http://www.ebi.ac.uk/Tools/sss/ncbiblast/nucleotide.html

Institute of Life Long Learning 27


EMBL

Figure : Snapshots of BLASTP result output windows-2 showing links to genome view,
protein families and classification at UniProt and Gene Ontology (functional classification)

Source: http://www.ebi.ac.uk/Tools/sss/ncbiblast/

Institute of Life Long Learning 28


EMBL

Exercises
1. What does EMBL and EBI stand for?

2. What are the objectives of EBI/EMBL?

3. EMBL and EBI were was established in the years ------ and ------- respectively.

4. What are the various institutes and core research areas of EMBL?

5. NCBI, EMBL and DDBJ are coordinated by ------------------ for public domain nucleotide
sequence information.

6. What are the categories of databases at EBI?

7. What is ENA? What are the various sub-databases of ENA?

8. Which database stores genomic structural variation information? What is the comparable
database at NCBI?

9. Genotype-phenotype relationship data can be found in ------------- database.

10. Which tool developed by EBI can be used to browse and visualize genome sequencing
data? List the attributes of the tool.

11. Where would you find datasets related to Immunologically relevant biomolecules?

12. What is metagenomics? Which database contains the metagenome data? With the help
of flowchart list the steps taken to retrieve and browse metagenome data.

13. What are the various databases for protein and their associated analysis tools at EBI?

14. Which database would you explore to retrieve and analyse gene expression datasets?

15. How does the two databases ArrayExpress and EGA differ from each other?

16. What are the various databases for protein at EBI?

17. How would you retrieve a protein sequence from EBI?

18. List the features associated with PANDIT and CSA databases.

19. Determination and prediction of protein structure is an important feature for proteomic
research. Which databases in EBI cater to such information?

Institute of Life Long Learning 29


EMBL

20. Comment on some of the module and features associated with EMBOSS.

21. How would you perform Pairwise comparison of DNA using BLASTN?

22. How would you perform BLASTP at EBI? Comment on the features of the results that are
obtained upon BLASTP

23.What are the various multiple sequence alignment tools at EBI?


24. Expand the following:
a. BLASTN
b. DGVa
c. ENA
d. NGS
e. SRA
f. GWAS
g. GEO
h. CSA
i. HSSP
j. FSSP

25.Prepare a list of comparable databases between NCBI and EBI.

Glossary

a. EMBOSS: European Molecular Biology Open Software Suite, a collection of free


open source software developed by EMBNet for bioinformatics analysis
b. EST: Expressed Sequence Tags are generated through single-pass sequencing of 5
and 3 ends of cDNA clones from a cDNA libraries and are a rapid and inexpensive
method to get a snapshot into the transcript profile and generate sequence data
c. Metagenomics: The study of genetic material directly recovered from
environmental samples without first isolating or purifying the source organism from
the milieu or mixture or consortia or organisms. Environmental samples may
include those from soil (soil microflora), ocean, water bodies, gut, skin etc
d. MHC: Cell surface immunogenic antigen found on surface of WBC, with other cells of
the body

Institute of Life Long Learning 30


EMBL

e. MIAME: An internationally accepted norm for performing microarray experiments.


The MIAME guideline requires the researchers to record and submit experimental
design, sample annotation, raw data and processed data. Further details can be
retrieved at http://www.mged.org/Workgroups/MIAME/miame_2.0.html or at
http://www.ncbi.nlm.nih.gov/geo/info/MIAME.html
f. Next Generation Sequencing (NGS): All new sequencing technologies that are not
dependent of Sangers chain termination methods are broadly clubbed under NGS.
Some of these include reversible chain termination reactions, or single molecule
sequencing or Ligation based Sequencing.
g. RNA-Seq: It is a new technology that allows researchers to sequence RNA or
transcriptome using NGS based deep-sequencing (Wang et al. 2009).

Summary
The European Molecular Biology Laboratory (EMBL) was established in 1974 as a need was
felt to serve as a parallel institute for hosting and developing computational data and tools.
There are five institutes that meet the overall goals and objectives of EMBL with the
European Bioinformatics Institutes (EBI) dedicated towards bioinformatics research. A
collaborative effort under the International Nucleotide Sequence and Database Collaboration
(INSDC) allows EMBL-EBI to share data and resources with NCBI (U.S.A) and DDBJ (Japan).
At the EMBL-EBI, the databases are divided into Literature, Ontology, Functional Genomics,
Nucleotides, Pathways, Proteins, Proteomics, small molecules and Structures. The
nucleotide database is further sub-divided into ENA-genome (for genomic sequence), SRA
(for NGS data), DGVa (for genomic variant data), EGA (for genome-phenome data). The
functional genomics databases are exemplified by the ArrayExpress(microarray data);
Protein database include InterPro, UniProt, PANDIT, CSA; and Structure databases include
DSSP, FSSP, PDBe, EMDB etc. All these databases are provided with links for analysis tools.
For example EMBOSS Needle, EMBOSS water and Promoterwise are tools for pairwise
sequence analysis; for multiple sequence alignment several tools such as ClustalW,
MUSCLE, MAAFT are available. EMBL thus provides a complete suite of databases and tools
for a wide range of sequence, structure and functional analysis.

References
Works Cited

Institute of Life Long Learning 31


EMBL

1. Altschul S.F, Gish W, Miller W, Myers E W and Lipman D J. Basic Local Alignment
Search Tool. J. Mol. Biol. 215, 403-410 (1990)
2. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. A model of evolutionary change in
proteins. In "Atlas of Protein Sequence and Structure" 5(3) M.O. Dayhoff (ed.), 345 -
352 (1978)
3. Henikoff, S. and Henikoff, J. Amino acid substitution matrices from protein blocks
Proc. Natl. Acad. Sci. USA. 89(biochemistry): 10915 - 10919 (1992).
4. Robert C Edgar (2004) MUSCLE: a multiple sequence alignment method with reduced
time and space complexity. BMC Bioinformatics: 5:113 doi:10.1186/1471-2105-5-
113
5. Katoh, Misawa, Kuma, Miyata (2002) MAFFT: a novel method for rapid multiple
sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059-
3066)
6. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H,
Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG.
(2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948.
7. Magrane, M. and UniProt Consortium (2011) UniProt Knowledgebase: a hub

of integrated protein data. Database (Oxford) 2011, bar009.


8. Hunter et al. (2012) InterPro in 2011: new developments in the family and

domain prediction database. NAR 40(D1):D306-D312

1. Suggested Readings
Bioinformatics and Functional Genomics: 2nd Edition, Jonathon Pevsner (2009), Wiley
Blackwell
2. Wang et al. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature Review
Genetics 10(1): 57-63

Web Links
1. http://www.embl.org/
2. www.ebi.ac.uk
3. http://www.insdc.org
4. www.ensembl.org
5. http://www.ebi.ac.uk/fg/doc/help/UHTS_submissions.html

Institute of Life Long Learning 32


EMBL

6. http://www.ebi.ac.uk/pdbe/emdb/
7. http://www.ebi.ac.uk/Tools/psa/emboss_needle/http://www.ebi.ac.uk/Tools/psa/em
boss_water/)
8. http://www.ebi.ac.uk/Tools/psa/genewise
9. http://www.ebi.ac.uk/Tools/psa/promoterwise
10. http://www.ebi.ac.uk/Tools/msa/clustalw2/
11. http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+DSSP
12. http://www.ebi.ac.uk/Tools/sss/ncbiblast/

Institute of Life Long Learning 33

You might also like