Professional Documents
Culture Documents
ISBN : 978-1-63315-205-2
Short Views on Insect Biochemistry and Molecular Biology Vol.(2), October 2014
2014
Section VIII
Insect Bioinformatics
NAL B OO
IO
TERNA
IN
T
SION
MIS
Short
on Insect
Biochemistry
and Biochemistry
Molecular Biologyand
Vol.(1),
2014 Biology
Views Short
Views
on Insect
Molecular
Vol. (1), 00 00, 2009
Invited Review
Review
Invited
Chapter 28
1. Introduction
Massive growth in information, due to experimental and technological advances,
has led to an absolute requirement for computerized databases to store, organize, and
index the data and for specialized tools to view and analyze the data. Bioinformatics,
621
Overview
1. Introduction
2. Databases
3. Insect Specific Databases
4. Protein Databases
5. Methods in Bioinformatics
5.1. Sequence Analysis
i) B. mori Cytochrome Oxidase Gene
ii) B.mori Lysozyme
5.2. Computational proteomics
i) Electrophoresis analysis
ii) Mass Spectrometry
5.3. Transcriptome analysis
5.4. Phylogenetic Analysis
5.5. Structural Bioinformatics
i) Drosophila melanogaster sex protein
receptor (SPR)
5.6. Molecular Docking
i) Spodoptera litura EcR
6. Conclusion
7. References
Invited Review
Invited Review
where the complex biological problems are solved with the speed and accuracy of
computers. Over the years bioinfor- matics has evolved from an application to a
fully-fledged discipline without which biological research finds itself in dire straits
(13). It has its applications in almost all corners of life sciences, crop protection,
evolutionary studies, insect plant interaction, functional & structural genomics,
proteomics, drug discovery & development, database development, pest management,
agriculture, software development, etc. (14, 15, 16). Hence an attempt was made to
describe entomo-informatics some of the key concepts, tools used in this field and
opportunities for new development and improvement. Here, we provide the
entomological databases most relevant to the material are accessible by anyone with a
connection to the World Wide Web (WWW) and suitable Internet browser software;
URL addresses are given in table 1. The first section describes the different types of
data bases (genome, protein and insect specific data bases). The second section we
focus on how we utilize the online resources into the entomology field and detailed
description of methods in bio- informatics with suitable examples (i.e. sequence
analysis, computational proteomics, phylogenetic analysis, transcriptome anlaysis
etc). Third section addresses how we analysis structural bioinformatics combined with
molecular docking by using Spodoptera litura Ecdysone Receptor (Sl-EcR) protein for
ideal example.
2. Databases
Insects constitute a remarkably diverse and largest animal group in the world, as
75% of all species are insects (17). Insects are ecologically and economically
important, as they provide an amazing diversity from being highly beneficial to
harmful pests. Harmful insects cause severe diseases, crop damage leading to
agricultural calamity. Crop pollination and crop protection, production of silk and
honey to mankind are some of the benefits of this diverse group. Enormous DNA
sequence has led to the availability of whole genome sequences, expressed sequence
tags, genetic linkage maps and insect trans-genesis has opened up new vistas for
fundamental research in entomology (18). In the following section we will have a
look at the database resources in bioinformatics in general and the ones exclusively
meant for insects.
The insect genomic databases contain information of all proteins, biochemical
and physiological processes that occur in an insect. The major common platforms for
storing biological data include the NCBI (19), DDBJ (20) and EMBL (21); these
databases store data under various sub disciplines. The Entrez platform at NCBI is a
versatile biological search engine which helps to trace down the content we require
from the NCBI (With so much of research activity across the globe, and in order to
avoid the redundancy of data in the above mentioned genome databases, an
International Nucleotide Sequence Database Consortium (INSDC) was formed (22).
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
623
Invited Review
Under this consortium and its agreement, these databases should share the data
available with them on a regular basis. This is to make sure that public access to all
the biological data available and most importantly to eliminate data redundancy
helping the scientists community. The INSDC has a uniform policy of free and
unrestricted access to all of the data records contained in their databases. Scientists
worldwide can access these records to plan experiments or publish any analysis or
critique.
Fig. 1. Growth of Genbank and WGS over the years (a) Base wise data; (b) Sequence wise data.
From 1982 to the present, the number of bases in GenBank has doubled approximately every 18
months.
NCBI provides access to the whole genomes of over 3,200 organisms. Genomes
represent both completely sequenced organisms and those for which sequencing is in
progress. NCBI's sequence databases accept genome data from sequencing projects
from around the world and serve as the cornerstone of bioinformatics research. Under
the nucleotide database category it stages genbank, EST, GSS, Homologene, HTG,
SNPs, RefSeq, STS, UniSTS, UniGene databases each having an exclusive concept.
For example as Release.3 in December, 1982 of genbank was a storehouse for a mere
606 sequences made of 680338 bases which grew steeply and exponentially to
164136731 sequences consisting of 151178979155 bases by release.195 in April,
2013 (Fig. 1). NCBI now stores over 80 million sequences (19).
With so much data stored, how much off it does belong to the most diverse
group, the insects? A search using a keyword insecta at the entrez search engine
produced significant number of data (Fig. 2). Data categorized under the genome tab
informs that there are 4772610 number of core subset of nucleotide sequence records.
By the time this book is being written 285 whole genome sequences are deposited in
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
624
Invited Review
the genbank database pertaining to insect specific and/or their pathogens. Not always
that only the genome sequences are submitted to these databases, researchers
interested in specific genes will go further, sequencing them and submitting to the
various repositories, and from that point of view there are 406923 gene centered
information available for the insects group. So far 1748 insect exclusive bioprojects
have been completed or underway; meaning an ocean of insect specific data which
could pave way for cross species research and data analysis.
Fig. 2: Hits obtained from a search performed against the NCBI Entrez search across various
in-house databases using a keyword insecta. The numbers in blue represent the no of available
records in a specific database type (in Brown) followed by the definition of the database type (grey).
(As of April, 2013).
Genes code for proteins and their sequences are highly variable than the former
because they are constituted by 20 amino acids in comparison to 4 bases. Therefore,
variation in this class of macromolecules holds a vital key as well. It highlights the
most important concept of mutations leading to an altered phenotype. Analysis at the
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
625
Invited Review
protein level will thereby, unravel many hidden secrets as the structural and function
motifs could easily be predicted to which a specific function be assigned based up on
the pattern of the occurrence of these amino acids forming a signature leading to a
specific class of protein families.
626
Invited Review
http://www.ncbi.nlm.nih.gov/Information
http://www.vanderbilt.edu/IIID
http://www.agripestbase.org
http://flybase.bio.indiana.edu/
http://www.fruitfly.org/annot/index.html
http://flybrain.neurobio.arizona.edu/
http://www.fruitfly.org/
http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/7227.html
http://kr.expasy.org/cgi-bin/lists?fly.txtCross-Refs)
http://www.tigr.org/tdb/dgi/
http://quantgen.med.yale.edu/
http://www.fruitfly.org/expression/immunity/
http://silkworm.genomics.org.cn/
http://sgp.dna.affrc.go.jp/index.html
http://www.ab.a.u-tokyo.ac.jp/silkbase/
http://www.cdfd.org.in/silksatdb
http://www.ensembl.org/Anopheles gambiae/
http://mosquito.colostate.edu/
http://konops.anodb.gr/AnoDB/
http://www.anobase.org/
http://bioweb.pasteur.fr/BBMI/index.html
http://www.tigr.org/tdb/aggi/
http://www.pherobase.com/
http://hymenopteragenome.org/
http://titan.biotec.uiuc.edu/bee/honeybeeproject.htm
http://www.ksu.edu/tribolium/
http://www.butterfliesandmoths.org
http://www.sbcollections.org/cbp/cbp_db_informatics.aspx
http://lepidoptera.butterflyhouse.com.au/
http://www.aphidbase.com/aphidbase/
627
Invited Review
http://bioweb.ensam.inra.fr/spodopbase/
http:// antbase.org
http://blattodea.speciesfile.org/HomePage/Blattodea
http://www.cate-sphingidae.org/
http://www.ars-grin.gov/nigrp/robo.html
http://cricket.inhs.uiuc.edu/edwipweb/edwipabout.htm
http://pest.ceris.purdue.edu/
http://www.sel.barc.usda.gov/selhome/database.htm
http://www.chori.org/bacpac/
http://www.genome.clemson.edu/Institute
http://www.malaria.atcc.org
http://bioinf.ibun.unal.edu.co/insecta/
As mentioned in the initial part of this chapter, pest can be beneficial and lethal. With
the agricultural point of view, the beneficial terms include those which help in the
production of silk by the silkworm Bombyx mori, which also serves as a central
model organism for the lepidoptera genomics and facilitates studies of comparative
genomics and basic research leading toward new genome-based approaches for
sericulture and pest control (10 Insect genomics resources and see Table 1). SilkDB is
a database of the integrated genome resource for the B. mori. Apart from providing
access to genomic data including chromosomal mapping, gene products and
functional annotation of genes, it also provides extensive biological information such
as ESTs, microarray expression data and corresponding references (26).
Silkbase a B. mori exclusive EST database was developed with randomly
selected cDNAs from 36 libraries resulting in 35000EST sequences. This was
followed by the complete genome sequencing of this beneficial species in 2004. This
sequencing practice resulted in an information that there are more number of protein
coding genes in B. mori than that of Drosophila, and many B. mori genes which
arent homologous to Drosophila genes have been found (27, 28).
Microsatellites are important repeats model found in all eukaryotic genomes, are
widely used in a variety of applications including genetic distance measures and
phylogeny reconstruction, population genetics, genetic mapping, predicting
evolutionary history and forensics. Study of microsatellites will help in genetic
fingerprinting of diverse silkmoths, construction of molecular linkage map, in
addition to the basic understanding of microsatellites. Therefore, a group of Indian
scientists at the Centre for DNA Fingerprinting and Diagnostics (CDFD), laboratory,
India, in collaboration with National Institute of Agro-biological Sciences (NIAS),
Japan, came up with SilkSatDb a relational database of microsatellites extracted from
the available ESTs and WSGs of the B. mori. SilkSatDb is an online relational
database that catalogues information about the microsatellite repeats of the silkworm.
The database stores three kinds of data: the microsatellite repeats found in B. mori
EST and WGS sequences, sequence details and the primers developed for these
microsatellites (29).
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
628
Invited Review
Fig. 3. List of completely sequenced genomic data of various species along with information
pertaining to the gene count, and genome size in base pairs hosted at the vectorbase.
The idea of SilkSatDb was further extended to coin a new and vast database
termed, InSatDb by the CDFD, India, exclusively on its own reputation. InSatDb
(Fig. 4) is a similar platform as that of the above discussed SilkSatDb, but, on a larger
scale as it includes information on multiple insects and also greater information in
terms of genomic location of an exon, intron or transposon, sequence composition
like repeat motif and GC%. It is an interactive interface to query information
regarding microsatellites across five fully sequenced genomes of fruit-fly, honeybee,
malarial mosquito, red-flour beetle and silkworm. This study by the CDFD lab
resulted that the percent of microsatellites in exons is high for Drosophila and least in
that category was for B. mori. The study extended to the intronic region experts that
the B. mori accounts for greater percent of microsatellites (30).
Continuing with the beneficial insects to the mankind, honey bee
(Apis mellifera), a model species for social behavior and are essential to global
ecology as pollinators. A. mellifera belongs to the order Hymenoptera which
comprises approximately 10% of the species on earth. This group of
'membrane-winged' insects includes sawflies, bees, ants and wasps, which directly
affect human health and agriculture through diverse roles such as pollinators, pests
and parasitoids. The Hymenoptera genome database (HGD) is a resource for the
genomics of this order. HGD is residence for genomes of bees A. mellifera, B.
impatiens and B. terrestris, under BeeBase, a comprehensive sequence data source
for the bee research community (31). The Bee Pests and Pathogens tab under Beebase
consists of the available genomic information about the pests and pathogens of bees,
namely, Ascosphaera apis (32), Nosema ceranae (33), Paenibacillus larvae (32),
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
629
Invited Review
Varroa destructor, well supported by the BeeBase Wiki. BeeBase also supports
PSI-BLAST searches of special protein data sets that combine the GenBank
non-redundant protein set with honey bee predicted genes. The PSI-BLAST site has
enabled researchers to identify divergent paralogs that may not be easily identified
using other BLAST programs. The genome browsers for the genomes of the ants
species Acromyrmex echinatior, Atta cephalotes, Camponotus floridanus,
Herpenathos saltator, Linepithema humile, Pogonomyrmex barbatus and Solenopsis
invicta through the Ant Genomes Portal and also the genome of parasitoid wasp
Nasonia vitripennis make it easy to identify the location of genes on a genome map
(31).
Fig. 4. Database query page of InsatDb: Insect Microsatellite Database; One can look up and
search for different characteristics of a microsatellite against a whole genome of a species of
interest.
4. Protein Databases
The genome browser gives a detailed idea about the whereabouts of a gene;
these genes primarily function being expressed in the form of a protein. Any change
in the gene sequence may result into consequences, ranging from silent mutation to
lethal mutation where the function and structure of the protein may completely be
lost. The next level of bioinformatics studies requires one to have knowledge about
the abbreviations used for every amino acid to its physical and chemical properties
(The single letter and three letter keywords for these are shown in table 2). These
classes of macromolecules, the proteins, are complex molecules, which are necessary
630
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
Invited Review
for most of cells work such as structure, function and regulation of body and its
environment. Sophisticated databases at protein level are aplenty.
Table 2. Twenty amino acids with their 3-letter & 1-letter codes along with their secondary
structural propensity values for alpha helix and Beta sheets. *Polar (may participate in hydrogen
bonds); $ Hydrophobic (normally buried inside the protein core); #Charged.
Amino Acid
Single
Letter
Code
Three
Letter Code
Alpha Helix
propensity
Beta Strand
Propensity
Arginine#
ARG
0.79
0.90
Lysine#
LYS
1.07
0.74
ASP
0.98
0.80
Glutamic Acid#
GLU
1.53
0.26
Glutamine*
GLN
1.17
1.23
Asparagine*
ASN
0.73
0.65
Histidine*
HIS
1.24
0.71
Serine*
SER
0.79
0.90
Threonine*
THR
0.82
1.20
Tyrosine*
TYR
0.61
1.29
Cysteine*
CYS
0.77
1.30
Methionine*
MET
1.20
1.67
Tryptophan*
TRP
1.14
1.19
Alanine$
ALA
1.45
0.97
Isoleucine$
ILE
1.00
1.60
Leucine$
LEU
1.34
1.22
Phenylalanine$
PHE
1.12
1.28
Valine$
VAL
1.14
1.65
Proline$
PRO
0.59
0.62
Glycine$
GLY
0.53
0.81
Aspartic
Acid#
631
Invited Review
632
Invited Review
5. Methods in Bioinformatics
5.1. Sequence Analysis
The modus operandi of biological research is that the sequence defines the
structure which derives its function; bioinformatics takes a shorter route to unravel
the function directly from the sequence. This technique of predicting the function
from its sequence is termed as sequence analysis. It is a phenomenon of comparing
two or more sequences against each other or against a database, where, if a significant
match is found between two sequences, then the same function can be assigned to the
unknown sequence as well. Now this brings us to a question what is that
terminology significant? One must have a basic understanding of the concepts
involved in protein biochemistry to settle this. Anyhow, the answer is simple; two
sequences can be similar, but identical. At a larger scale, two amino acids which have
same physical and chemical properties are termed as similar; example being,
aspartate and glutamate (acidic amino acids). And if a same amino acid is matched in
both the sequences, at the same position, it is termed as identical (Asp & Asp). There
are more basic terms in bioinformatics to be familiarized with. Two sequences are
said to be homologous if they have been inherited or derived from the common
633
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
Invited Review
Fig. 5. Comparative view of Global alignment (top) by Needle program and Local alignment
(bottom) by Watermann program available at EBI tools.
634
Invited Review
to find the matching segments without any hiccups. In case of longer sequences this
method becomes difficult a task; which could be addressed with the application of
window size and stringency value.
Dot plot is surely better method for alignment of sequences; however, consider
multiple sequences then the plot would be multidimensional. Hence a scope for other
methods, the next in the category is the dynamic programing a highly mathematically
and computationally intensive method. This method to an extent is similar to the dot
plot method, but involves scoring pattern. Global alignment is achieved by dynamic
programming by the application of Needlemann Wunsch (48) algorithm and the
modified version of the global alignment to align local segments is done by
employing the Smith Watermann algorithm (49). In this technique, once again, as in
dot plot, the sequences are written on top across the page from left to right
horizontally and top to bottom vertically each cell being filled with a character. One
additional row and column is added to accommodate the gap penalty. A gap is a
blank space under or above a column / position in either sequence, which is
introduced to get as many matches as possible. They are indicators for an insertion or
deletion at a particular position in a sequence during evolution over the years. Since,
these gaps are not real and part of the sequence and are inserted to get the optimal
alignment they had to be penalized, and that is gap penalty for you. Generally, a
scoring scheme will be such that, always, a match will be given higher positive value,
a mismatch a zero or negative at times, but the gap penalty will be negative on the
higher side. Now an algorithm is followed and the cells are filled with values. Once,
the matrix is completed, a trace back is done to get the final alignment. Now, there is
a possibility of getting more than one alignment from this practice. To narrow down
to the optimal alignment a scoring will be done after getting the alignments and the
alignment with highest score is optimal. Remember, it is not a rule of thumb that
there must be only one possible way of aligning sequences.
Dynamic programming is a widely followed and highly accurate method as it
always provides the accurate alignment. But there are cons to it as well, the biggest
con in this method is that it takes lot of time as it does tedious calculations; meaning
longer the sequence more time it takes to solve the mystery, which directly affects its
time efficiency, memory required and so on.
The next in the class of methods are the heuristics methods, which govern and
function by splitting the sequence into smaller fragments (words) and gradually
increasing the characters in the sequence. One of the most famous and highly trusted
tool to find similar sequences is Basic Local Alignment Search Tool (BLAST) (50).
It is a faster method of the lot; for rapid sequence comparisons, directly approximates
alignments that enhance the measure of local similarity. Blast searches the query
sequence against large databases and aligns them; the results may not be the best.
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
635
Invited Review
Fig. 6. Blast results of similar sequence search performed using COI of Oxycarenus laetus Kirby
as query sequence.
636
Invited Review
The substitution matrices are a currency for sequence analysis which plays a
vital role in deciding the quality of alignments. Two major matrices used in protein
sequence analysis are the Percent or Point Accepted Mutation (PAM) and the Blocks
Substituted Matrix (BLOSUM). PAM, derived by Margaret Dayhoff, takes the
frequency at which an amino acid can mutate in to other and this is calculated with
due importance to rate of evolution, hence considered as evolutionary matrices (52).
The matrices derived by Dayhoff are obtained by taking the closely related proteins in
to account, which may not be perfect for highly divergent group of sequences. This
was overcome by Henikoff and Henikoff, by introducing a new matrix BLOSUM,
constructed out of multiple alignment of evolutionarily divergent sequences. These
matrices are based only on highly conserved regions, with an implicit model of
evolution. The values of this matrix are calculated by looking at the blocks of
conserved segments with a sequence identity above a threshold (53). With so much to
consider and choose from, the question is which matrix should I use for my analysis?
Not so tricky a situation, if the sequences taken for study is highly conserved then one
would choose higher BLOSUM and lower PAM, and vice versa.
Pairwise sequence alignment is a comparison between two fixed sequences,
based on string similarity; therefore, the inference will be based on the selected
sequence alone; which may not always be the perfect solution to predict the
functional answers to your unknown sequence. Also, it becomes cumbersome to do
so many pairwise alignments. Rather, one can compare a query sequence against a
class of sequence by employing a multiple sequence alignment (MSA). We use
multiple sequence alignment to find the areas of sequence similarity that could point
to the structure of an evolutionary ancestor or provide information about the
evolutionary history of the sequences. MSAs, are also more sensitive to sequence
similarities than a pairwise alignment because the conserved regions could be so
dispersed that a pairwise alignment wouldnt find them.
Fig.7A, shows a typical multiple alignments performed using mitochondrial
CO1 sequences across from 12 insect species. Further phylogenetic analysis of
B.mori mitochondrial CO1 (Fig.7B) showed that sequences are highly conserved with
other insect species and must be from the same protein structure as well as
physiological properties (Fig.7C,D,E). Yes, they are so easy and intuitive is MSA if
viewed in a colorful format. It is evident from figure 7 that almost all sequences are
highly conserved and share a great deal of similarity. Few positions have mismatches,
for example, position 15; first two species have Asparagine (N) a basic amino acid
being aligned with Glutamate (G) an acidic amino acid, hence different colors. The
asterisk symbol (* Identical characters), a colon (: Similar character) and a space
(varying characters) provided at the base of every column guides the researchers to
track down the variable positions quickly. The image shown here is only partial.
637
Invited Review
638
Invited Review
constructing MSA is really a trivial exercise, if the sequences are highly variable. In
our example you could see there are no gaps in this because there is no ambiguity in
the sequences; means no indels (insertions / deletions) are needed to make the
alignment, and the ungapped sequences can simply be arranged together. However,
if the sequences are of various lengths, problem becomes potentially very complex.
Various algorithm are available to carry out these tasks, such as a sum of pairs
method (SP), a very good method but loses its shine as the sequences increase in their
length, such is their way of functioning. Faster methods are the need of the hour and,
there are techniques which do these studies faster. Other methods of MSA include the
progressive and iterative methods, profile methods executed by the application of
HMM (47). Discussion on these is out of the context of this chapter.
MSAs are very helpful and they pave way for further understanding of the
sequences, as other domains of bioinformatics such as identifying patterns of
conservation, phylogeny reconstruction depends on the MSA, as they serve as input
to these studies.
Invited Review
640
Invited Review
genotypes or time courses reveals genes that have highly correlated patterns of
transcript expression. Many tools are available that perform a variety of analysis on
large microarray data sets (GCOS; GeneSpring: http://www.agilent.com
/chem/genespring; CaRRAY: http://caarray.nci. nih.gov/; Bioconductor : http://www.
bioconductor.org).
5.4. Phylogenetic Analysis
Life on this planet has taken three different lineage routes in the first instance,
known as bacteria, the archaea and the eukarya (57). From the days of identifying an
organism with its morphological characters, the field of taxonomy has gathered pace
and there seems very different approach to modern evolutionary studies. The study of
evolutionary history relies on genealogical theory, which assumes that all alleles,
genes, individuals, populations, and higher taxa (species, genera, etc.) that have ever
existed were born from pre-existing alleles, genes, individuals, populations, and
higher taxa, respectively (47).
Phylogenetics on sequence data is an attempt to reconstruct the evolutionary
history of those sequences. Phylogenetic relationships usually depicted as trees, with
branches representing ancestors of children; the bottoms of the tree (individual
organisms) are leaves. Individual branch points are nodes. Trees can be classified into
two types namely rooted and unrooted. Rooted trees have an explicit ancestor; the
direction of time is explicit in these trees; unrooted trees do not have an explicit
ancestor. The branching patterns of a phylogenetic tree can be used to convey
information about the sequence in which evolutionary events occurred. Trees can also
be classified as scaled and unscaled trees, while the former are ones in which branch
lengths are proportional to the differences between pairs of neighboring nodes. Scaled
trees are also additive, meaning that the physical length of the branches connecting
any two nodes is an accurate representation of their accumulated difference. In
contrary unscaled trees line up all terminal nodes and convey only their relative
kinship without making any representation regarding the number of changes. Briefly,
the phylogenetic analysis methods are of two types, namely distance based and
character based methods (47).
Phenetics methods, also called as the clustering methods due to the way in which
they go about their business. These are the very basic, easy and the comfortable
methods which are followed most often by the phylogenetic community. The most
common distance based methods are the Unweighted / Wighted Pair Group Method
with Arithmetic Mean (UPGMA / WPGMA), Neighbor Joining method (NJ),
Fitch-Margoliasch method.
641
Invited Review
642
Invited Review
643
Invited Review
Fig.8: Construction of phylogenetic tree stepwise by using MEGA software version 4.0 and
different model for phylogenetic tree design. 1. Target sequence and NCBI blast search; 2. Identify the
homologs sequences; 3. Importing sequence into MEGA software version 4.0; 4.Sequence alignment and
percentage identity; 5. Before construct Text Maximum Likelihood tree; 6. Screen view shows that ML
Tree constructs during running program; 7. After construct ML tree screen view; 8. Various designs for
phylogenic tree (Rectangular Model tree, Curve Model tree, Straight Model tree and Circular Model
tree).
644
Invited Review
645
Invited Review
CommonName
TradeName(s)
ModeofAction
Fenoxycarb
Preclude
Kinoprene
Enstar II/AQ
Pyriproxyfen
Distance
Buprofezin
Talus
Cyromazine
Citation
Diflubenzuron
Adept
Etoxazole
TetraSan
Novaluron
Pedestal
Azadirachtin
Ecdysone Antagonist
646
Invited Review
where there is no consensus for helix or a sheet, but P () <P (T) > P (), results in
turns, when P (T) > 100. The remaining part of the sequence where there is no proper
assessment, that tends to form coils. CF method is known to be highly alpha helix
oriented and therefore this turns out to be its disadvantage and the accuracy of
prediction is 56 60% (Fig. 9A).
Fig. 9 A) Secondary structure prediction of B.mori lysozyme sequence using PHD (neural
network), CFSSP (Chou-Fasman) & GOR (GOR4) and the comparison of results with the
experimentally determined structure of the target structure PDB ID: 1GD6. B) 3D homology
model of B.mori lysozyme. C) Metal binding site.
One another problem of CF method is that it does not give importance to the
neighboring residues as an individual amino acid cannot form a secondary structure.
This was overcome to an extent by Garnier Osguthorpe & Robson (GOR) method
(79). It is an efficient tool and has undergone various updates over the years
depending upon the requirements. It is a probabilistic method, where the probability
647
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
Invited Review
of the occurrence of a type state S (alpha, beta, turn & coil) when the given residue is
R [P(S/R)]. Considering the information carried by a residue about its own secondary
structure, in combination with the information carried by other residues in a local
window of eight residues on either side of the sequence of the residue concerned.
According to the definition of conditional probabilities, P(SIR) = P(S, R)/P(R) where
P(S,R) is the joint probability of observing the events S and R and P(R) is the
probability of observing a residue R. It is easy to have an estimation of I (S; R) from a
database of known sequences and corresponding observed secondary structures.
This way it has to search the structural database for 1360 parameters to come
about a conclusion for one single residue. The advantage of this method is that takes
the effect of adjacent residues on to the central residue concerned in to account,
however one disadvantage of this method is that under predicts the strands. The Q3
accuracy of this method is found to be between 60 65% for GORIII.
The latest in the race and considered to be next generation method in the class of
secondary structure prediction methods is the one designed on the concept of neurons
in the human brain system and named as Neural Networks method (80). It is a highly
accurate method with Q3 of 70 - 75%. Similar to GOR method this also takes the
neighboring residues in to consideration and considered to be a binary algorithm.
Three layers are formed in this method where the input layer feeds the information to
hidden layer through a scanning of all the 17 * 21 residual comparisons, where all the
processing takes place (which is still not clear, therefore, it is called as black boxes
method). Once the processing is over the information from the hidden layer is passed
on to the output layer in the form of 0 and 1. If the value is 1 in the helix output
section, the resulting state is a helix and vice-versa.
Fig.9B, shows a comparative assessment of
the three mentioned discussed above with that
of an experimentally determined structure of
B. mori Lysozyme (PDBID: 1GD6). The
selected sequence is 119 residues long; the
results indicate, the accuracy of PHD,
following the neural networks algorithm is
highly accurate as most of its predictions are
in sync with that of the experimentally
predicted structure. The first helix region
ranging from fourth residue (R) to fourteenth
residue (K) is correctly predicted by PHD.
CFSSP predicts this region to be helix but it extends the region further to eighteen
more residues continuously, to an extent as in PDB structure. This result is obvious as
CF method is known to highly poised towards helices. The next helical region
9B
N
648
Invited Review
predicted was between residues seventy fourth residue (K) to one hundred tenth
residue (C), a region predicted as it is in PDB by PHD and CFSSP; a little variation in
terms of intermittent coils in the experimentally determined structure. This result
when extended to the GOR4 tells us a completely different story (Fig. 9A). Therefore,
it is necessary to make up once mind in selecting an appropriate method for secondary
structure prediction of a given protein from its sequence alone, which partially solves
the problem of tertiary structure prediction in terms of threading methodology.
Getting back to the 3D structure of proteins (Fig.9C), it can be derived easily
from the sequence itself, provided suitable homologs are available. A greater understanding of the protein structure explains that the main chain tends to be the same no
matter what the residue type is. It is only the side chain where variations take place.
The 3D structure of proteins is highly stabilized by the Hbonds, Van der Waals
interactions, Coulombic interactions etc, all these falling under the category of
non-bonded inter- actions. Because of the assumption, that sequence derives the
structure, it is disputed that homologous
sequences may have similar structure. In
accordance with this assumption, one can
predict the 3D structure of protein
theoretically using in-silico approaches.
Depending up on the sequence similarity, the
approach of predicting the 3D structure of
proteins can be classified in to three types,
namely, comparative modeling or Homology
modeling or Knowledge based modeling,
Fold recognition or Threading and ab-initio
methods (47, 81).
9C
649
Invited Review
different models of protein structure (Fig.10A, B). This way the basic fold of the
structure is ready. This is achieved by many software tools available, such as
Swissmodel (82), Modeller (83), VMD, etc. The regions which do not share a
significant similarity with the template, is fitted in to the loop region. Once the model
is ready, it is subjected to evaluation.
Fig.10. Schematic diagram shows the 3D homology modeling (A) stepwise by using
D. melanogaster sex protein and design different model (B) by using PyMol software program.
650
Invited Review
template in homology modeling, the target sequence is aligned with the template
structure. Confusing? I will clear the air! A protein has various structural parameters
such as polarity, buried or exposed to environment, leading to 6 (P1, P2, B1, B2, B3
& E) possible environments, this when associated with three secondary structural
elements alpha helix, beta sheet and turn will give rise to 18 environmental structural
descriptors. This forms the base and crux of this method. A library of these
environmental descriptors and their frequency of occurrence for each of the twenty
residues is calculated and tabulated and were converted into log odds score. This is
done by comparing the structures that are already deposited in the PDB database.
Now there is a numerical value.
On many occasions, one comes across a situation where neither the sequence
identity between the template and the target nor the fold is available in the fold
database. To overcome these situations, there is a third category of methods, called
ab-initio methods. These are very tedious methods; one needs to have a greater
understanding of the laws and principles of physics and chemistry. This involves a lot
of computation and requires high memory space as well. One of the approaches is the
simulated annealing, in which the temperature plays a vital role. It works in a way
such that, the protein is subjected to higher temperatures; this makes the protein loose
its credibility. As higher the temperature, greater will be the energy. Now, gradually
the temperature is decreased and the protein will assume a conformation that is near
to a global minimum. This is continued until there is no further fluctuation in the
energy versus temperature trajectory. Finally, the structure where there is no change
in the energy of the protein is considered to be modeled structure. This may sound
easy, the way I had explained. Remember, it is highly a cumbersome method (84).
Energy is a vital requirement for a living organism to survive. All the cellular
processes require energy, in order to make or break a bond, for any interaction to take
place. A biomolecule is considered to be in a stable and functional form, if it is in a
ground state; its energy being global minimum. The total energy of the system is
calculated as a sum of kinetic energy and potential energy. Calculating the kinetic
energy of a system is a task, as the electronic behavior of the system is highly
fluctuating due to the size of the electrons which are 1581 times greater than the
nucleus (85). According to Born Oppenheimer approximation, the energies of the
electronic motion and nucleic motion can be calculated separately and be summed up
later to get the overall energy of the system. The field of mechanics which considers
only the nucleic contribution is termed as the molecular mechanics. The potential
energy of all systems in molecular mechanics is calculated using force fields. The
functional form of a force field is the summation of energies due to bonded and
non-bonded interactions. Bond angle, bond distance, torsional angle form the bonded
interactions and electrostatic, Van der Waals, etc form the non-bonded interactions.
These values are the deviations from the reference values calculated by the high scale
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
651
Invited Review
quantum mechanical calculations. Every force field may differ from the other in its
functional form, because one force field may take hydrophilic part in to consideration
and the other may not. There is a plethora of force fields and it depends on the
researcher to select an appropriate force field. The force field for water solvent is
TIP3P, considered one of the best in business (85).
5.6. Molecular Docking
Molecular docking helps one in narrowing down on the compounds which
show affinity towards the target, this way enormous amount of time is saved and
helps reduce the cost. Understanding the mode of binding between a target and its
candidate small molecule is termed as molecular docking. This interaction is mainly
brought about by hydrogen bonds, electrostatic interactions, Van der Waals
interactions, etc (85).
i) Molecular 3D structure of Ecdysone receptor protein from Spodoptera litura
Ecdysteriods is a steroid hormone that plays an important role in molting,
metamorphosis, reproduction, and many other developmental processes in insects and
in other arthropods (86). The ecdysteroid hormones act through specific receptor
protein molecules called the Ecdysteroid receptors (EcR). The EcR is a member of
the nuclear receptor superfamily, which comprises a group of receptors containing at
least one of two highly conserved domains: the centrally located DNA binding
domain (DBD) and the C-terminal ligand-binding domain (LBD). The subsequent
availability of cloned genetic sequences encoding both the EcR and USP/RXR
(ultraspiracle/ X receptor) subunits of ecdysone receptors from a range of arthropods,
and other animals, has advanced our understanding of the receptors structural
biology, evolution, and ligand interactions, and of the selectivity of certain
environmentally friendly insecticides (86, 87, 88). The phylogenetic tree result shows
(Fig.11A) that the Sl-EcR from Lepidoptera in the evolutionary tree can be correlated
with the highest amino acid sequences with other insect groups. Interestingly, we
noticed that there was a distinct separation of the more divergent Diptera and
Lepidoptera from those of other insects and arthropods. The sequence analyses of
Sl-EcR and show (Fig.11B) the location of DNA binding domain and ligand binding
domain in helix 5-6 (E185-E256) and C-terminal region (T381-F487). The
homologues of the Sl-EcR were used for predicting the 3D structure of ligand binding
domain using PDB template of H.virescens (95%) and DNA binding domain using
PDB template of D. melanogaster (76%). The structure analysis reveals that Sl-EcR
consists of 12 helices, a small antiparallel -sheet located between helices H5 and
H6 and 14 loops (Fig.11C, D and E). Biological zinc binding region situated near
10th helix (His518, His464) and other co-factors (Cys188, Cys191, Cys205, Cys208,
Cys240, Cys243 and His228, His290) residue is responsible for metal binding site
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
652
Invited Review
Fig.11. A) Phylogenetic tree was reconstructed based on all 35 amino acid sequences of EcR from
insects. The tree was made by neighbor joining method using ClustalW multiple alignment program.
MEGA software version 4.0 can export the drawings to graphics programs, and can export trees in
Newick format for use by other programs. B) Sequence alignment of EcR from Spodoptera litura
(SlEcR). The helixes position based on the crystal structure of Helionthis virscens (EcR-LBD) and
Drosophila melanogaster (EcR-DBD); cylindrical: -helix 1-12; empty arrow: beta sheet 1and 2; histine
residues: cyan color; cysteine residue: blue color.
653
Invited Review
(Fig.11C). The Psi and Phi angles of the residues correspond to the allowed and
partially allowed regions of the Ramachandran plot (Fig.11F-H), supporting the
validity of the predicted structure of Sl-EcR. The hetero-dimerization interface
between EcR and USP is centered on a conserved core of residues localized at helices
H9 and H10 of their LBDs. The sequences contributing to this surface are relatively
well conserved in all insects. Nevertheless, within this third region, residues Cys394,
Leu397, Leu408, and Trp412 appear strictly conserved in hemipteran, lepidopteran,
dipteran, and coleopteran speciesof these, all except Leu408 are in contact with the
bound ecdysteroid (86, 87). Recent studies proved that the substantial remodeling of
the EcR LBD is observed in the presence of bound synthetic ligand, affecting
essentially the region encompassing the -sheet, helices H6 and H7, and the loop
connecting helices H1 to H3. Fig. 11D shows a binding pocket of the DBH ligand and
an ecdysteroid as found in the EcR cavities and illustrates in Sl-EcR. Local resistance
to a pesticide could be attributed due to the variation in the active site. The binding
site of an enzyme to a pesticide may differ among different class of insects (89).
Fig.11 C) Three dimensional structure view of SlEcR and metal binding region.
654
Invited Review
Fig.11 C) Three dimensional structure view of SlEcR and metal binding region; D) Molecular
graphics program (PyMOL) used for the visualization for ligand binding domain (LBD- sky blue
color) and DNA binding domain (DBB- brown color) region of SlEcR; E) DNA binding domain
located in the helix 5, 6 and amino residues from E185-E256 visualization by using PyMOL
program. F) Ramanchandran plot; G) Hydrophobicity plot of SlEcR; H) Electrostatic surface
model of SlEcR.
655
Invited Review
this forms the prelude to the further experimentation in the laboratory with the insects
as specimens. One way of extending this experiment by the entomologists is to
perform a growth characteristics study, along with other studies such as antifeedancy
of the selected compound by the applying the principles of Wauldbayer statistics with
suitable controls (90).
Table 4. Molecular docking results of Spodoptura litura Ecdysone Receptor with various synthetic
and biopesticides performed using the Glide module of Schrodinger software.
Lipophilic
Ligand
Score
Evd W
Bond
Electro
Plumbagin
-4.99
-1.92
-0.81
Pyridalyl
-4.88
-3
Oleandrin
-4.44
-1.45
Methoxy
fenozide
-4.34
-2.66
-0.3
Embelin
-4.33
-2.44
-0.81
Chlorpyrifos
-4.14
-1.66
-0.29
Malathion
-3.45
-1.89
-0.5
Low
HB
Sitemap
MW
Penal
-0.38
-0.5
0.61
-0.67
-0.44
0.9
-1.22
-0.45
0.62
-0.31
-0.27
0.4
-0.74
-0.19
-0.5
0.74
-0.38
-0.33
0.32
-0.28
-0.33
-0.4
0.43
656
Invited Review
This way itbecomes easier for the agricultural scientist to come to a conclusion
about the binding affinity of the group of compounds which could pave way to
quicker confinement of the pest menace leading to losses. One interesting aspect of
this approach of this molecular docking is that; rather than testing all the available
selected compounds and waiting for the serendipity to happen by wet lab
experiments. Serendipity, because, this word has very good relationship with research
scientists. Testing all the compounds in wet lab means one has perform replicates and
wait for a compound at least among the group of compounds to have the desired
effect and be proven to be a candidate pesticide. This is not only tedious, wastage of
money, specimens, labor, time, etc. One huge disadvantage is that none of the
selected compounds may have the activity at all to resist the pests mnace by some
means; and the researcher is still continuing his experiments blindly. Therefore, to
overcome all these blind expenses of multiple fields, one will directly apply
molecular docking to his research and then could test the best promising compounds.
One may argue the need for target structure or sequence in the databases of a pest
selected for study. If sequence analysis and similarity studies branch of
bioinformatics, allows one to take the related or same target sequence of other related
species and perform further experiments. If one has the sequence then will have the
structure as well by applying the practices of structural bioinformatics.
Now, where will one go to perform all these experiments of bioinformatics? Is
there software available to go about these experiments? Is there a server or an expert
available to take care of these experiments? Bioinformatics is a field which heavily
depends on computers and internet, therefore, one can easily google across the
internet to find the weapons one want. If one is still unsuccessful, it can be found in
any of the ocean of standard bioinformatics textbooks.
6. Conclusion
Insect Bioinformatics is a discipline with wide applications, has the potential to
solve complex problems in various fields. Its strength lies on the fact, every living
organism has a biological product, nucleotides and amino acids; which form the base
for this discipline. Modern research targets genes and its related products for the
desired activities. There was two principal approaches strengthen all the studies in
Entomo-informatics. Primary is that of comparing and grouping the data according to
biologically meaningful similarities and secondly, that of analyzing one type of data
to conclude and understand the observations for another type of data. In short, which
are to understand and organize the information associated with biological molecules
on a large scale. As a result, Entomo-informatics has not only provided greater depth
to biological investigations, but added the new dimension of extensiveness as well.
Entomo-informatics is an approach that will be essential part of entomological
research and we hope that every entomologists/Researchers will incorporate more
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
657
Invited Review
Acknowledgement
I thank the management of SRM University, for their encouragement, continuous
support to innovative research and academic activities, which resulted in this chapter. I thank
Dr. K. P. Sanjayan, Head, Dept. of Zoology, Guru Nanak College, Chennai, for his critical
review of this chapter; I thank Almighty, my family, Ashraf Ali, Sewali Ghosh, Pinky Sheetal
Vincent for their contributions to this chapter.
7. References
1.
National
Center
for
Biotechnology
Information
[NCBI]. A
Science
Primer:
Bioinformatics. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html (15 June 2011, date
last accessed).
2. National Institutes of Health [NIH]. NIH Working Definition of Bioinformatics and Computational
Biology. http://www.bisti.nih.gov (17 July 2000, date last accessed).
3. Mansfield, E. (1998) Academic Research and Industrial Innovation: An Update of Empirical Findings.
Research Policy 26: 773-776.
4. Stephen, H.D. (1970) Charles Babbage, Father of the Computer. Crowell-Collier Press. ISBN
0-02-741370-5.
5. Jane Demerica (2009) How the computer has changed through the years. Helium: 1580299.
6. Watson J.D. and Crick F.H.C. (1953) A Structure for Deoxyribose Nucleic Acid. Nature 171 (4356):
737738.
7. Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, J.C., Hutchison, C.A.,
Slocombe, P.M. et al. (1977) Nucleotide sequence of bacteriophage X174 DNA. Nature 265 (5596):
68795.
8. Hood, L. and Galas, D. (2003). The digital code of DNA. Nature 421: 444448.
9. Chial, H. (2008) DNA sequencing technologies key to the Human Genome Project. Nature Education
1(1).
10. Elaine R. Mardis. (2008) Next-Generation DNA Sequencing Methods. Annual Rev. Genomics and Human
Genetics, 9: 387-402.
11. Kedes, L., Liu, E.T. (2010) The Archon Genomics X PRIZE for whole human genome sequencing.
Nature Genetics, 42 (11): 917918.
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
658
Invited Review
12. Kedes, L., Campany, G. (2011) The new date, new format, new goals and new sponsor of the Archon
Genomics X PRIZE Competition. Nature Genetics 43 (11): 10551058.
13. Ouzounis, C.A. and Valencia, A. (2003). Early bioinformatics: the birth of a disciplinea personal view
Bioinformatics: Review. Bioinformatics, 19 (17): 2176-2190.
14. Derek, J. Smith (2003) Applications of bioinformatics and computational biology to influenza
surveillance and vaccine strain selection, Vaccine 21 (16): 1758-1761.
15. Buehler, L.K. and Rashidi, H.H. (2006) Review of bioinformatics basics: applications in biological
science and medicine. Biomed. Engg. Online. 5: 41.
16. Xue, J., Zhao, S., Liang, Y., Hou, C., Wang, J. (2008) Bioinformatics and its applications in agriculture:
Computer and Computing Technologies in Agriculture, Volume II. Springer, p.985 990.
17. Grimmelikhuijzen, C.J., Cazzamali, G., Williamson, C.M. and Hauser, F. (2007) The promise of insect
genomics. Pest Manag. Sci. 63: 413416.
18. Chilana, P., Sharma, A., and Anil Rai (2012) Insect Genomics Resources: Status, Availability and Future.
Current Sci. 102 (4): 25.
19. Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.
(2013) Genbank. NAR D36-42.
20. Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K., Saitou, N., Sugawara, H., et al. (2002)
DNA Data Bank of Japan (DDBJ) for genome scale research in life science. NAR 30 (1): 2730.
21. Stoesser, G., Baker, W., van den Broek, A., Camon, E., Garcia-Pastor, M., Carola Kanz et al (2002)
The EMBL Nucleotide Sequence Database. NAR 30(1): 2126.
22. Cochrane, G., Karsch-Mizrachi, I., and Nakamura, Y. (2010) The International Nucleotide Sequence
Database Collaboration. NAR DOI: 10.1093/nar/gkq1150.
23. Gilbert, D.G. (2007) DroSpeGe: rapid access database for new Drosophila species genomes. NAR Vol.
35: D480 485.
24. Marygold, S.J., Leyland, P.C., Seal, R.L., Goodman, J.L., Thurmond, J.R., Strelets, V.B., Wilson, R.J. and
the FlyBase Consortium (2013) FlyBase: improvements to the bibliography. NAR 41(D1):D751-D757.
25. Megy, K., Emrich, S.J., Lawson, D., Campbell, D. and Dialynas, E., et al. (2011) VectorBase:
improvements to a bioinformatics resource for invertebrate vector genomics. NAR, Vol. 40: D729D734.
26. Duan, J., Li, R., Cheng, D., Wei Fan and Zha, X., et al. (2010) SilkDB v2.0: a platform for silkworm
(Bombyx mori) genome biology. NAR, 38: D453D456.
27. Mita, K., Morimyo, M., Okano, K., Koike, Y., Nohata, J., et al. (2003) The construction of an EST
database for Bombyx mori and its application. Proc. Natl. Acad. Sci. USA. 100: 14121-14126.
28. Mita, K., Kasahara, M., Sasaki, S., Nagayasu, Y., Yamada, T., Kanamori, H. et al. (2004) The genome
sequence of silkworm, Bombyx mori. DNA Res. 11: 27-35.
29. Bose, B., Nagarajaram, H.A., Mita, K., Shimada, T., and Nagaraju, J. et al. (2005) SilkSatDb: a
microsatellite database of the silkworm, Bombyx mori. NAR, 33: D403D406.
30. Sunil, A., Eshwar, M., Sravana, K.P., Nagaraju, J. (2007) InSatDb: a microsatellite database of fully
sequenced insect genomes. NAR, 35: D3639.
31. Munoz-Torres, M.C., Reese, J.T., Childers, C.P., Bennett, A.K., Sundaram, J.P, Childs, K.L., Anzola,
J.M, Milshina, N., Elsik, C.G. (2011) Hymenoptera Genome Database: integrated community resources
for insect species of the order Hymenoptera. NAR 39: D658-D662.
32. Qin, X., Evans, J.D., Aronstein, K.A., Murray, K.D., Weinstock, G.M. (2006) Genome sequences of the
honey bee pathogens Paenibacillus larvae and Ascosphaera apis. Insect Mol. Biol. 15(5):715-8.
33. Cornman, R.S., Chen, Y.P., Schatz, M.C., Street, C., Zhao, Y., et al (2009) Genomic analyses of the
microsporidian Nosema ceranae, an emergent pathogen of honey bees. PLoS Pathog. 5(6):e1000466.
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
659
Invited Review
34. Wu, C.H., Yeh, L-S.L., Huang, H., Arminski, L., Castro-Alvear, J., et al. (2003) The Protein Information
Resource. NAR, 31: 345347
35. Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement
TrEMBL in 2000. NAR 28:45-48.
36. The UniProt Consortium. (2013) Update on activities at the Universal Protein Resource (UniProt) in
2013. NAR 41: D43-D47.
37. Michele Magrane and UniProt Consortium (2011) UniProt Knowledgebase: a hub of integrated protein
data. Database: The Journal of Biological Databases and Curation, bar009, Oxford University Press.
38. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., et al. (2012) The Pfam protein family
database. NAR 40: D290-D301.
39. Sigrist, C.J.A., de Castro, E., Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L., Xenarios, I.
(2012) New and continuing developments at PROSITE. Nucleic Acids Research, 14.
40. Zdobnov, E.M., and Apweiler, R. (2001) InterProScan an integration platform for the
signature-recognition methods in InterPro. Bioinformatics 17 (9): 847-848.
41. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer Jr., E.E., Brice, M.D., Rodgers, J.R., Kennard, O.,
Shimanouchi, T., Tasumi, T.(1977) The Protein Data Bank: A Computer-based Archival File For
Macromolecular Structures. J. Mol. Biol. 112: 535.
42. Murzin, A.G., Brenner, S.E., Hubbard, T.J.P., Chothia, C. (1995) SCOP: a structural classification of
proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536-540.
43. Sillitoe, I., Cuff, A.L., Dessailly, B.H., Dawson, N.L., Furnham, N., Lee, D., et al. (2013) New functional
families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. NAR
D490-498.
44. Roger Sayle and James Milner-White, E. (1995) RasMol: Biomolecular graphics for all. Trends in
Biochemical Sci. 20 (9): 374.
45. Johansson, M.U., Zoete, V., Michielin, O. and Guex, N. (2012) Defining and searching for structural
motifs using DeepView/Swiss-PdbViewer. BMC Bioinformatics 13:173.
46. Wang, Y., Geer, L.Y., Chappey, C., Kans, J.A., Bryant, S.H (2000) Cn3D: sequence and structure views
for Entrez. Trends Biochem. Sci. 25(6): 300-302.
47. David W. Mount (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory
Press.
48. Saul, B.N., and Christian, W.D. (1970) A general method applicable to the search for similarities in the
amino acid sequence of two proteins. J.Mol. Biol. 48 (3): 44353.
49. Temple, S.F. and Michael, W.S. (1981) Identification of Common Molecular Subsequences. J. Mol. Biol.
147: 195197.
50. Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., W. and Lipman, D.J. (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25:3389-3402.
51. Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl.
Acad. Sci. USA 85(8): 2444-2448.
52. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978) A model of evolutionary change in proteins. Atlas of
Protein Sequence and Structure 5 (3): 345352.
53. Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl.
Acad. Sci. USA 89 (22): 109159.
54. Grg, A., Weiss, W., Dunn, M.J. (2004) Current two-dimensional electrophoresis technology for
proteomics. Proteomics, 4(12): 3665-3685.
660
Invited Review
55. Bae, S.H., Harris, A.G., Hains, P.G., Chen, H., Garfin, D.E., Hazell, S.L., Paik, Y.K., Walsh,
B.J., Cordwell, S.J. (2003) Strategies for the enrichment and identification of basic proteins in proteome
projects. Proteomics, 3(5): 569-579.
56. Blueggel, M., Chamrad, D. and Meyer, H.E. (2004) Bioinformatics in proteomics. Curr. Pharm.
Biotechnol. 5: 79-88.
57. Carl R. Woese, Otto Kandler and Mark L. Wheelis (1990) Towards a natural system of organisms:
Proposal for the domains Archaea, Bacteria and Eucarya. Proc. Natl. Acad. Sci. Vol. 87, pp. 4576-4579,
June 1990, Evolution.
58. Sokal, R. and Michener, C. (1958) A statistical method for evaluating systematic relationships. Uni.
Kansas Science Bulletin 38: 14091438.
59. Saitou, N., and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Mol. Biol. and Evol. 4: 406-425.
60. Yang, Z. (2007) PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24 (8):
1586-1591.
61. Wernegreen, J.J. (2002) Genome evolution in bacterial endosymbionts of insects. Nature Rev. 3:850-861.
62. Thao, M.L., Moran, N.A., Abbot, P., Brennan, E.B., Burckhardt, D.H., Baumann, P. (2000) Cospeciation
of psyllids and their primary prokaryotic endosymbionts. Appl. Environ. Microbiol. 66:2898-2905.
63. Thao, M.L., Moran, N.A., Abbot, P., Brennan, E.B., Burckhardt, D.H., Baumann, P. (2004) Evolutionary
relationships of primary prokaryotic endosymbionts of whiteflies and their hosts. Appl. Environ.
Microbiol. 70:3401-3406.
64. Beckenbach, A.T., Joy, J.B. (2009) Evolution of the mitochondrial genomes of gall midges (Diptera:
Cecidomyiidae): rearrangement and severe truncation of tRNA genes. Genome Bio. Evol. 1(1): 278-287.
65. Lewis, D.L., Farr, C.L., Kaguni, L.S. (1995) Drosophila melanogaster mitochondrial DNA: completion of
the nucleotide sequence and evolutionary comparisons. Insect Mol. Biol. 4(4): 263-278.
66. Hua, J.M., Li, M., Dong, P.Z., Cui, Y., Xie, Q., Bu, W.J. (2008) Comparative and phylogenomic studies
on the mitochondrial genomes of Pentatomomorpha (Insecta: Hemiptera: Heteroptera). BMC Genomics,
9: 610.
67. Shao, R., Barker, S.C. (2003) The highly rearranged mitochondrial genome of the plague thrips, Thrips
imagines (Insecta: Thysanoptera): convergence of two novel gene boundaries and an extraordinary
arrangement of rRNA genes. Mol. Biol. Evol. 20:362-370.
68. Shao, R., Campbell, N.J.H., Schmidt, E.R., Barker, S.C. (2001) Increased rate of gene rearrangement in
the mitochondrial genomes of three orders of Hemipteroid insects. Mol. Biol. Evol. 18:1828-1832.
69. Loxdale, H.D., Lushai, G. (1998) Molecular markers in entomology. Bulletin of Entomol. Res. 88:
577600.
70. Avise, J.C. (2004) Molecular Markers, Natural History, and Evolution, 2nd edn, pp. 684. Sinauer
Associates, Sunderland, Massachusetts.
71. Avise, J.C. (2000) Phylogeography: the History and Formation of Species. Harvard University Press,
Cambridge, Massachusetts.
72. Severson, D.W., Brown, S.E., Knudson, D.L. (2001) Genetic and physical mapping in mosquitoes:
molecular approaches. Annual Rev. Entomol. 46: 183219.
73. Heckel, D.G. (2003) Genomics in pure and applied entomology. Annual Rev. Entomol.48: 235260.
74. Ware, G.W. (1982) Pesticides: Theory and Application. Thompson publications, Fresno, California.
p.380.
75. Dubey, N.K., Ravindra Shukla, Ashok Kumar, Priyanka Singh and Bhanu Prakash. (2010) Prospects of
botanical pesticides in sustainable agriculture: Commentary. Current Sci. 98 (4): 25.
Printed in the United States of America, 2014
ISBN:978-1-63315-205-2
661
Invited Review
76. Kendrew, J.C., Bodo, G., Dintzis, H.M., Parrish, R.G., Wyckoff, H., and Phillips, D.C. (1958) A
Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis. Nature 181 (4610):
662666.
77. Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. New York: Garland Publishing
Company.
78. Chou, P.Y, Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid
sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47: 45148.
79. Garnier, J, Gibrat, J.F., Robson, B. (1996) GOR method for predicting protein secondary structure from
amino acid sequence. Methods Enzymol. 266:540-53.
80. Holley, L.H. and Karplus, M. (1989) Protein secondary structure prediction with a neural network. Proc.
Natl. Acad. Sci. USA, 86 (1):152-156.
81. Attwood, T.K. (1999) Introduction to Bioinformatics. Pearson Education India.
82. Arnold, K., Bordoli, L., Kopp, J. and Schwede, T. (2006) The SWISS-MODEL Workspace: A web-based
environment for protein structure homology modeling. Bioinformatics, 22:195-201.
83. Sali, A., Blundell, T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol.
Biol. 234: 779-815.
84. Krieger, E., Nabuurs, S.B. and Vriend, G. (2005) Homology Modeling, in Structural Bioinformatics,
Vol.44 (eds P. E. Bourne and H. Weissig), John Wiley & Sons, Inc., Hoboken, NJ, USA.
85. Andrew Leach. (2001) Molecular Modelling: Principles and Applications (2nd Edition). Pearson Prentice
Hall.
86. Zotti, M.J., Christiaens, O., Rouge, P., Grutzmacher, A.D., Zimmer, P.D. and Smagghe, G. (2012)
Sequencing and structural homology modeling of the ecdysone receptor in two chrysopids used in
biological control of pest insects. Ecotoxicology, 21:906-918.
87. Hill, R.J., Billas, I.M.L., Bonneton, F., Graham, L.D., Lawrence M.C. (2013) Ecdysone Receptors: From
the Ashburner Model to Structural Biology. Annu. Rev. Entomol. 58: 251271.
88. Malik, F.A., Reddy, S. and Venketesh, S. (2010) Sequences analysis and 3D structurue prediction of
ecdysone receptor protein in silkworm, Bombyx mori L. Indian J. Seri. 49(1): 17-27.
89. Habeeb, S.K.M., Anuradha, V. and Praveena, A. (2011) Comparative Molecular Modeling of Insect
Glutathione S-Transferases. Intl. J. Computer Applications 14(5):1622.
90. Waldbauer, G.P. (1968) The Consumption and Utilization of food by insects, pp. 229-288. In J. W.L.
Beament, J.E. Treherne & V.B. Wigglesworth (des), Adv. Insect Physiol. Vol. 5, Academic, New York.
Article History:
Reviewed by:
Received 5th July 2013; Revised 15th December 2013; Accepted 10th April 2014 and
Published 30th Oct. 2014
Brintha, P.G, Kansas State University, USA.
Siva Ramamoorthy, VIT University, India.
662
Table Contents
SION
MIS
TERNA
IN
T
N AL B OO
IO
Page No.
Preface
Forward message
Contributors
Reviewers
Acknolwedgement
i
ii
iii
iv
v
Volume1
Section I: Insect Biochemical approaches
Raman Chandrasekar, P.G., Brintha, Enoch Y.Park, Paolo Pelsoi, Fei Liu,
Marian Goldsmith, Anthony Ejiofor, B.R., Pittendrigh, Y.S., Han,
Fernando G. Noriega, Manickam Sugumaran, B.K., Tyagi, Zhong Zheng Gui,
Fang Zhu, Bharath Bhusan Patnaik, and P. Michailova
2.
57
Sahayaraj, K.
3.
75
4.
99
5.
127
xvii
149
Manickam Sugumaran
185
217
Insect Immunity
233
253
271
291
317
331
Paraskeva V. Michailova
355
Dhanenjeyan, K. J., Paramasivam, R., Thanmozhi, V., Chandrasekar,R., and Tyagi, B.K.
Index
363
xviii
Volume2
Section V:
373
385
in Lepidoptera.
409
Section VI:
429
449
473
497
509
Ronald J. Nachman
xix
Section VII:
533
549
Usha Rani, P.
575
595
Section VIII:
Insect Bioinformatics
621
633
685
Jitrayut Jitonnom
Index
709
xx
Book Mission Project # 2: Initiated on June 2010; Completed on March 2014 and Published on Oct. 2014.
PREFACE
Entomology as a science of inter-depended branches like biochemistry, molecular entomology, insect
biotechnology; has made rapid progress in its attributes in the light of modern discoveries. This also
implies that there is an urgent need to manage the available resources scientifically for the good of man.
In the past five decades, entomology in the world/country has taken giant steps ahead. Continued
research has evolved better pest management through molecular approaches. The aim of the Short
Views on Insect Biochemistry and Molecular Biology book is to integrate perspectives across
biochemistry and molecular biology, physiology, immunology, molecular evolution, genetics,
developmental biology and reproduction of insects. This century is proclaimed as the Era of
Biotechnology and its consists of all types of Mol-Bio applications, which is an essential component for
a through understanding of the Insect Biology. This volume 1 & 2 (8 section with 30 chapters)
establishes a thorough understanding of physiological and biochemical functions of proteins, genes in
insects life processes; the topics dealt with in the individual chapters include chemistry of the insect
cuticle, hormone and growth regulators; biochemical defenses of insects; the biochemistry of the toxic
and detoxification action; modern molecular genetics and evolution; inter- and intra-specific chemical
communication and behavior; insect pheromone and molecular architecture, phylogeny and chemical
control of insect by using insect pheromones biotechnology; insect modern biology and novel plant
chemical and microbial insecticides for insect control, followed by a discussion of the various
mechanisms of resistance (both behavioral and physiological) and resistance management; modern insect
pest management through biochemical and molecular approaches; Mimetic analogs of insect
neuropeptide for pest management; entomo-informatics and computer-aided pesticide designing. In short
this book provides comprehensive reviews of recent research from various geographic areas around the
world and contributing authors area recognized experts (leading entomologist/scientist) in their
respective filed of molecular entomology. We will miss this collaboration now it has ended, but will feel
rewarded if this book is appreciated by our team/colleagues and remarkable mile stone in entomology
field.
This book emphasizes upon the need for and relevance of studying molecular aspects of entomology in
Universities, Agricultural Universities and other centers of molecular research. To encompass this
knowledge and, particularly disseminate it to the scientific community free of cost, was the major
inspiring force behind the launch of Short Views on Insect Biochemistry and Molecular Biology.
Editors
Raman Chandrasekar
Brij Kishore Tyagi
ii
iii
iv
vi
ShortViewson
InsectBiochemistryand
MolecularBiology
Editedby
Raman Chandrasekar, Ph.D.,
Kansas State University, USA.
B.K.Tyagi, Ph.D.,
Centre for Research in Medical Entomology (ICMR), India.
Zhong Zheng Gui, Ph.D.,
Jiangsu University of Science and Technology,
Sericultural Research Institute, Chinese Academy of
Agricultural Sciences, China.
Gerald R. Reeck, Ph.D.,
Kansas State University, USA.
vii
viii
SION
MIS
TERNA
IN
T
N AL B OO
IO
Contributing Authors
Dr. B.K.Tyagi
Prof.Fernando G. Noriega
Prof. K. Sahayaraj
Prof.Yanyuan Bao
Institute of Insect Science,
Zhejiang University, China.
Prof. PatriciaY.Scaraffia
Department of Tropical Medicine,
Tulane University, New Orleans,
LA 70112, USA.
Dr. P. Somasundaram
Central Sericultural Germplasm Resources Centre,
P.B.No.44, Thally Road,
Hosur-635109,
Tamilnadu, India.
College of Forestry,
Northwest A & F University
Yangling, Shaanxi 712100, China
ix
Dr. R. Srinivasan
School of Biotechnology,
Trident Academy of Creative Technology
(TACT), Bhubaneswar 751013 Odisha, India.
School of Science
University of Phayao, Thailand.
Department of Entomology,
University of Illinois, Urbana-Champaign, IL,
61801, USA
.
Prof. K. Murugan
SION
MIS
TERNA
IN
T
N AL B OO
IO
xi
xii
Acknowledgements
Writing and publishing a book requires the assistance of individuals who are
creative, talented, and hard-working. All of these qualities were present in the
individuals assembled to produce this book volume. I would like to express my
heartfelt gratitude to my former teacher Prof. Seo Sook Jae, (GSNU, South Korea),
Prof. Subba Reddy Palli (University of Kentucky, USA), and other external mentors
Prof. Marian R. Goldsmith (University of Rhode Island, USA), Prof. Enoch Y. Park
(Shizuoka University, Japan), Prof. M. Kobayashi (Nagoya University, Japan), Prof.
CHU Jang Hann (National University of Singapore, Singapore), Prof. Thomas W.
Sappington (USDA-ARS, USA), Prof. Fernando G. Noriega (Florida International
University, USA), Dr. Srinivasan Ramasamy, AVRDC, The World Vegetable
Center, Taiwan), Dr. H.C. Sharam (ICRISAT, India), who inspiration and
supported me at many ways for the commencement of this International Book
Mission Program. The book mission program was initiated on May 2010,
completed on March 2014 and published on October 2014. I have no words to
express my feeling for all those who provided valuable contributions from USA,
South Korea, Japan, China, India, Thailand, Taiwan, Bulgaria, France, Iseral, and
Portugal (Contributors name list, see page no. v) and made the completion of this
book possible. We express our appreciation to the following people (Reviewer
name list, see page no. vii) who reviewed various part of the manuscript as it was
being developed and improved quality of each chapter. I thank the ICMR, New
Delhi, and Chinese Academy of Agricultural, China, and Kansas State University for
support from several aspects. Many others (scientists and publishers) have also
allowed us to use their materials in the various chapters, their color image have then
been converted to gray color/BW. Iam especially indebted to International Book
Mission Organization, Academic Publishing Services for the production of book. I
thank my Co-Editors for their continuous vigilance over the book project and for
always giving advance notice of the editing and proofreading schedules. I thank also
my Brintha, P.G., (my wife), who in all possible way, encouragement helped
transform our original efforts into an acceptable final form. I apologize to those
whose work could not be cited owing to space considerations limitation. Further, I
wish to recognize the moral support extended by colleagues and friends. I hope that
this volume will inspire interest on the diverse aspects of insect biochemistry and
molecular biology in aspiring and established scientists.
Raman Chandrasekar
xiii
xiv
xv
Book Series
xvi
xvii
xviii