tmp5002 TMP

Printed in the Unitated States of America, 2014
ISBN : 978-1-63315-205-2
Short Views on Insect Biochemistry and Molecular Biology Vol.(2), October 2014
2014
Section VIII
Insect Bioinformatics
Printed in the Unitated States of America, 2014

ISBN : 978-1-63315-205-2
NAL B OO
IO
TERNA
IN
T
SION
MIS
Short
on Insect
Biochemistry
and Biochemistry
Molecular Biologyand
Vol.(1),
2014 Biology
Views Short
Views
on Insect
Molecular
Vol. (1), 00 00, 2009
Vol. (2) 621 662, 2014
Invited Review
Review
Invited
Chapter 28
Entomo-informatics: A prelude to the concepts in

Bioinformatics
Habeeb, S.K.M.*1, and Raman Chandrasekear2
Department of Bioinformatics, School of Bioengineering, Faculty of Engineering & Technology,

SRM University, Kattankulathur, Chennai 603203, Tamilnadu, India.
2
Department of Biochemistry and Molecular Biophysics, 238 Burt Hall, Kansas State University,
Manhattan 66506, KS, USA.
Abstract
Entomo-informatics actually is as a scientific discipline and plays an essential role in todays

entomological sciences. The amount of data being generated in the sequencing labs has resulted in
new avenue called bioinformatics. Currently many researchers in biology are unfamiliar with
available bioinformatics methods, tools and databases, which could lead to missed opportunities or
misinterpretation of the information. Considering the rate at which pests have developed resistance
to various insecticides; entomological studies have gained importance. The in-vitro and field
assessments to test a class of pesticide against a pest have become a task for the researchers. The
promising concepts and methodologies of bioinformatics which are discussed in detail in this
chapter will help agricultural scientists and entomologists to find the routes to understand the
evolutionary history of pest, its mechanism of action, its relatedness to other species, etc. This will
result in improvements in the quality of crops, their yield, controlling the pest to synthesis target
oriented drug design. We also cover certain basic and applied concepts of bioinformatics in a brief
and crispy manner to help entomologists to grasp them easily. Keeping in mind the application of
these aspects to insects, few worked out examples are also been given.
Key words: bioinformatics, computational proteomics, entomo-informatics, ecdysone receptor, homology modeling,
protein databases, phylogenic analysis, molecular docking
*For Correspondence (email: habeeb.m@ktr.srmunir.ac.in, habeeb_skm@yahoo.co.in, biochandrus@yahoo.com)
1. Introduction
Massive growth in information, due to experimental and technological advances,
has led to an absolute requirement for computerized databases to store, organize, and
index the data and for specialized tools to view and analyze the data. Bioinformatics,
Printed in the United States of America, 2014

ISBN:978-1-63315-205-2
621
Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014
Overview
1. Introduction
2. Databases
3. Insect Specific Databases
4. Protein Databases
5. Methods in Bioinformatics
5.1. Sequence Analysis
i) B. mori Cytochrome Oxidase Gene
ii) B.mori Lysozyme
5.2. Computational proteomics
i) Electrophoresis analysis
ii) Mass Spectrometry
5.3. Transcriptome analysis
5.4. Phylogenetic Analysis
5.5. Structural Bioinformatics
i) Drosophila melanogaster sex protein
receptor (SPR)
5.6. Molecular Docking
i) Spodoptera litura EcR
6. Conclusion
7. References
an emerging interdisciplinary field,

applies of computer sciences and
information technologies of to make the
vast, diverse and complex life sciences
data more understandable (storing,
accessing, analyzing data and 3D
visualization) and useful, and to help
realize its full potential (1). The ultimate
goal of bioinformatics is to enable the
discovery of new biological insights as
well as to create a global perspective
from which unifying principles in
biology can be discerned (2).
Bioinformatics is now an integral part of
modern biology. With the advent of
various modern and well equipped
automated instruments, the world of
biological research has reached heights
which were unimaginable in 1950s. Be it
in the field of biotechnology, pharmaceutical, or per se medical sciences, the
impact of technology has definitely
inclined the rate at which tasks in these
ISBN:978-1-63315-205-2
Invited Review
fields were completed (3). The universe

shall and will remain thankful to the
genius named Charles Babbage for his
wonderful invention called Computer
(4). The application of this machine in
all the fields has done wonders to all;
logistics, defense, education, science and
technology, politics and the list will go
on. But mankind has never been moral
to this machine, as increase in the
advancement and efficiency of the
computers, decreasing is its size (5).
Advancement in the field of biological
research could be attributed to the
computers. DNA, a genetic material
consists of four vital ingredients such as
Adenine (A), Guanine (G), Cytosine (C)
and Thymine (T) form the base for
sequencing (6). The very first DNA
sequencing was done of bacteriophage
X174 in 1977, and it took considerable
time as it was performed using tedious
and time consuming methods (7). The
first human genome was sequenced at
the expense of $3 billion and it took 13
years to achieve this goal (8, 9). But
with the advancements in instrumentation, it is now expected that one can
sequence more than ten human genomes
in a week, a challenge thrown out to
develop new strategies in third
generation sequencing (10, 11, 12).
The rate at which the biological data
now generated is stupendous; the large
quantities of data produced by DNA
sequencing have also required development of new methods and programs for
sequence
analysis,
data
storage,
manipulation and data mining. This task
is achieved by Bioinformatics, a field
622
Invited Review
where the complex biological problems are solved with the speed and accuracy of
computers. Over the years bioinformatics has evolved from an application to a
fully-fledged discipline without which biological research finds itself in dire straits
(13). It has its applications in almost all corners of life sciences, crop protection,
evolutionary studies, insect plant interaction, functional & structural genomics,
proteomics, drug discovery & development, database development, pest management,
agriculture, software development, etc. (14, 15, 16). Hence an attempt was made to
describe entomo-informatics some of the key concepts, tools used in this field and
opportunities for new development and improvement. Here, we provide the
entomological databases most relevant to the material are accessible by anyone with a
connection to the World Wide Web (WWW) and suitable Internet browser software;
URL addresses are given in table 1. The first section describes the different types of
data bases (genome, protein and insect specific data bases). The second section we
focus on how we utilize the online resources into the entomology field and detailed
description of methods in bioinformatics with suitable examples (i.e. sequence
analysis, computational proteomics, phylogenetic analysis, transcriptome anlaysis
etc). Third section addresses how we analysis structural bioinformatics combined with
molecular docking by using Spodoptera litura Ecdysone Receptor (Sl-EcR) protein for
ideal example.
2. Databases
Insects constitute a remarkably diverse and largest animal group in the world, as
75% of all species are insects (17). Insects are ecologically and economically
important, as they provide an amazing diversity from being highly beneficial to
harmful pests. Harmful insects cause severe diseases, crop damage leading to
agricultural calamity. Crop pollination and crop protection, production of silk and
honey to mankind are some of the benefits of this diverse group. Enormous DNA
sequence has led to the availability of whole genome sequences, expressed sequence
tags, genetic linkage maps and insect trans-genesis has opened up new vistas for
fundamental research in entomology (18). In the following section we will have a
look at the database resources in bioinformatics in general and the ones exclusively
meant for insects.
The insect genomic databases contain information of all proteins, biochemical
and physiological processes that occur in an insect. The major common platforms for
storing biological data include the NCBI (19), DDBJ (20) and EMBL (21); these
databases store data under various sub disciplines. The Entrez platform at NCBI is a
versatile biological search engine which helps to trace down the content we require
from the NCBI (With so much of research activity across the globe, and in order to
avoid the redundancy of data in the above mentioned genome databases, an
International Nucleotide Sequence Database Consortium (INSDC) was formed (22).
ISBN:978-1-63315-205-2
623
Invited Review
Under this consortium and its agreement, these databases should share the data
available with them on a regular basis. This is to make sure that public access to all
the biological data available and most importantly to eliminate data redundancy
helping the scientists community. The INSDC has a uniform policy of free and
unrestricted access to all of the data records contained in their databases. Scientists
worldwide can access these records to plan experiments or publish any analysis or
critique.
Fig. 1. Growth of Genbank and WGS over the years (a) Base wise data; (b) Sequence wise data.
From 1982 to the present, the number of bases in GenBank has doubled approximately every 18
months.
NCBI provides access to the whole genomes of over 3,200 organisms. Genomes
represent both completely sequenced organisms and those for which sequencing is in
progress. NCBI's sequence databases accept genome data from sequencing projects
from around the world and serve as the cornerstone of bioinformatics research. Under
the nucleotide database category it stages genbank, EST, GSS, Homologene, HTG,
SNPs, RefSeq, STS, UniSTS, UniGene databases each having an exclusive concept.
For example as Release.3 in December, 1982 of genbank was a storehouse for a mere
606 sequences made of 680338 bases which grew steeply and exponentially to
164136731 sequences consisting of 151178979155 bases by release.195 in April,
2013 (Fig. 1). NCBI now stores over 80 million sequences (19).
With so much data stored, how much off it does belong to the most diverse
group, the insects? A search using a keyword insecta at the entrez search engine
produced significant number of data (Fig. 2). Data categorized under the genome tab
informs that there are 4772610 number of core subset of nucleotide sequence records.
By the time this book is being written 285 whole genome sequences are deposited in
ISBN:978-1-63315-205-2
624
Invited Review
the genbank database pertaining to insect specific and/or their pathogens. Not always
that only the genome sequences are submitted to these databases, researchers
interested in specific genes will go further, sequencing them and submitting to the
various repositories, and from that point of view there are 406923 gene centered
information available for the insects group. So far 1748 insect exclusive bioprojects
have been completed or underway; meaning an ocean of insect specific data which
could pave way for cross species research and data analysis.
Fig. 2: Hits obtained from a search performed against the NCBI Entrez search across various
in-house databases using a keyword insecta. The numbers in blue represent the no of available
records in a specific database type (in Brown) followed by the definition of the database type (grey).
(As of April, 2013).
Genes code for proteins and their sequences are highly variable than the former
because they are constituted by 20 amino acids in comparison to 4 bases. Therefore,
variation in this class of macromolecules holds a vital key as well. It highlights the
most important concept of mutations leading to an altered phenotype. Analysis at the
ISBN:978-1-63315-205-2
625
Invited Review
protein level will thereby, unravel many hidden secrets as the structural and function
motifs could easily be predicted to which a specific function be assigned based up on
the pattern of the occurrence of these amino acids forming a signature leading to a
specific class of protein families.
3. Insect Specific Databases

DroSpeGe is a platform to find the completed genomes under the genus
Drosophila. Genomes of twelve species (persimilis, willistoni, pseudoobscura,
sechellia, simulans, grimshawi, mojavensis, virilis, ananassae, erecta, kikkawai and
yakuba) of Drosophila are made available for comparative studies at this platform
(23). DroSpeGe provides a resource for biologists interested in comparing differences
and similarities among species, including novel and known genes, genome structure
and evolution, gene function associations. It also holds the Blast search tool
incorporated into it to search for similar sequences as the newest assemblies are
integrated with genome maps. A utility named Biomart is an additional feature of
DroSpeGe for data mining of annotations and sequences.
Flybase, is another Drosophila exclusive database holding all the updated
information in the world of genome sequencing (24). Scientists in the field of
Drosophila research also frequent this database for information on gene annotations,
phenotypic data, and expressions data. Flyview and GIFTS (Gene Interactions in the
Fly) are the resources which provide the graphical atlas of expression patterns of
genes and a catalog for gene interactions in Drosophila pattern formation. A catalog
of Drosophila related EST sequences are housed at DRES. Several other tools are
available exclusively for the most studied organism; which has paved way for
scientists, as a model for geneticist and development and molecular biologists. The
next in the category of insects to be well studied by the scientists include the
medically important pest and most dangerously considered; the mosquitoes, which
are vectors for serious diseases such as yellow fever (the mosquito A. aegypti),
Chagas disease (the blood-sucking bug R. prolixus), malaria (the mosquito
A. gambiae), elephantiasis (the mosquito C. pipiens), and typhus (the body louse P.
humanus).
The National Institutes of Allergy and Infectious Diseases (NIAID), a
component of the National Institute of Health (NIH) funds five Bioinformatics
Resource Centers (BRC); one among them is VectorBase, which focuses on
invertebrate vectors of human pathogens, is involved in facilitating the sequencing
centers and the research community to curate vector genomes (25). It is a huge
platform providing multiple supports ranging from an access to the sequenced
genomes of selected species, tools and resources for the analysis of these genome
data to submitting data to Vectorbase. The data section of the Vectorbase website is
ISBN:978-1-63315-205-2
626
Invited Review
designed to provide access to genomes, transcripts & transcriptomes, proteins &

proteomes and mitochondrial sequences. Under the tools and resources section,
multiple analysis software tools such as genome browser, blast, Expression Map,
Population biology browser, ontology browser, clustalw, images, Biomart and
Insecticide Resistance tools are available which facilitate the researcher in cross
analysis. Currently, Vectorbase has spaced nine complete genomes of different
species' (Fig. 3). A single platform where the research community can lounge for data
and analysis in the field human pathogens.
Table 1. World Wide Web resources for insects
General
Insect Genome Databases
Description URL address

http://www.hgmp.mrc.ac.uk/GenomeWeb/insect-gen-db.html
http://www.biologie.uni-hamburg.de/b-online/library/genomeweb/
National Center for Biotechnology

Insect Innate Immunity Database
Agri. Pest Genomics Database
Drosophila
FlyBase @ flybase.bio.indiana.edu
GadFly: Genome Annotation
Flybrian nervous system database
BDGP: Home
Drosophila melanogaster @ NCBI
fly.txt (SwissProt Index C FlyBase)
TIGR Drosophila gene index
Access the Metamorphosis Database
Drosophila Immune Response
Silkworm
SilkDB, China
Silkworm Genome Project (SGP)
SilkBase (ESTs of Bombyx mori)
SilkSatDb
Mosquitoes
ENSEMBL Mosquito Genome Server
The Mosquito Genomics WWW Server
An Anopheles Database (AnoDB)
Anopheles gambiae
BBMI Anopheles gambiae Genome Page
TIGR Anopheles gambiae gene index
Others
Pherobase - database
Hymenoptera Genome Database
Honey Bee Brain EST Project
KSU Tribolium Genetics Program
Butterflies and Moths of North America
The California beetle databases
Caterpillars: Australian region
Aphidbase

ISBN:978-1-63315-205-2
http://www.ncbi.nlm.nih.gov/Information
http://www.vanderbilt.edu/IIID
http://www.agripestbase.org
http://flybase.bio.indiana.edu/
http://www.fruitfly.org/annot/index.html
http://flybrain.neurobio.arizona.edu/
http://www.fruitfly.org/
http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/7227.html
http://kr.expasy.org/cgi-bin/lists?fly.txtCross-Refs)
http://www.tigr.org/tdb/dgi/
http://quantgen.med.yale.edu/
http://www.fruitfly.org/expression/immunity/
http://silkworm.genomics.org.cn/
http://sgp.dna.affrc.go.jp/index.html
http://www.ab.a.u-tokyo.ac.jp/silkbase/
http://www.cdfd.org.in/silksatdb
http://www.ensembl.org/Anopheles gambiae/
http://mosquito.colostate.edu/
http://konops.anodb.gr/AnoDB/
http://www.anobase.org/
http://bioweb.pasteur.fr/BBMI/index.html
http://www.tigr.org/tdb/aggi/
http://www.pherobase.com/
http://hymenopteragenome.org/
http://titan.biotec.uiuc.edu/bee/honeybeeproject.htm
http://www.ksu.edu/tribolium/
http://www.butterfliesandmoths.org
http://www.sbcollections.org/cbp/cbp_db_informatics.aspx
http://lepidoptera.butterflyhouse.com.au/
http://www.aphidbase.com/aphidbase/
627

Spodobase
Ant species of the world
Blattodea species database
Lepidoptera, Sphingidae
ROBO, USA
EDWIP
CAPS
Systemic Entomology Laboratory
BACPAC Resources
CUGI (Clemson University Genomics)
Malaria Research & RR Resource Center
Insect protein Domain Analysis
Invited Review
http://bioweb.ensam.inra.fr/spodopbase/
http:// antbase.org
http://blattodea.speciesfile.org/HomePage/Blattodea
http://www.cate-sphingidae.org/
http://www.ars-grin.gov/nigrp/robo.html
http://cricket.inhs.uiuc.edu/edwipweb/edwipabout.htm
http://pest.ceris.purdue.edu/
http://www.sel.barc.usda.gov/selhome/database.htm
http://www.chori.org/bacpac/
http://www.genome.clemson.edu/Institute
http://www.malaria.atcc.org
http://bioinf.ibun.unal.edu.co/insecta/
As mentioned in the initial part of this chapter, pest can be beneficial and lethal. With
the agricultural point of view, the beneficial terms include those which help in the
production of silk by the silkworm Bombyx mori, which also serves as a central
model organism for the lepidoptera genomics and facilitates studies of comparative
genomics and basic research leading toward new genome-based approaches for
sericulture and pest control (10 Insect genomics resources and see Table 1). SilkDB is
a database of the integrated genome resource for the B. mori. Apart from providing
access to genomic data including chromosomal mapping, gene products and
functional annotation of genes, it also provides extensive biological information such
as ESTs, microarray expression data and corresponding references (26).
Silkbase a B. mori exclusive EST database was developed with randomly
selected cDNAs from 36 libraries resulting in 35000EST sequences. This was
followed by the complete genome sequencing of this beneficial species in 2004. This
sequencing practice resulted in an information that there are more number of protein
coding genes in B. mori than that of Drosophila, and many B. mori genes which
arent homologous to Drosophila genes have been found (27, 28).
Microsatellites are important repeats model found in all eukaryotic genomes, are
widely used in a variety of applications including genetic distance measures and
phylogeny reconstruction, population genetics, genetic mapping, predicting
evolutionary history and forensics. Study of microsatellites will help in genetic
fingerprinting of diverse silkmoths, construction of molecular linkage map, in
addition to the basic understanding of microsatellites. Therefore, a group of Indian
scientists at the Centre for DNA Fingerprinting and Diagnostics (CDFD), laboratory,
India, in collaboration with National Institute of Agro-biological Sciences (NIAS),
Japan, came up with SilkSatDb a relational database of microsatellites extracted from
the available ESTs and WSGs of the B. mori. SilkSatDb is an online relational
database that catalogues information about the microsatellite repeats of the silkworm.
The database stores three kinds of data: the microsatellite repeats found in B. mori
EST and WGS sequences, sequence details and the primers developed for these
microsatellites (29).
ISBN:978-1-63315-205-2
628
Invited Review
Fig. 3. List of completely sequenced genomic data of various species along with information
pertaining to the gene count, and genome size in base pairs hosted at the vectorbase.
The idea of SilkSatDb was further extended to coin a new and vast database
termed, InSatDb by the CDFD, India, exclusively on its own reputation. InSatDb
(Fig. 4) is a similar platform as that of the above discussed SilkSatDb, but, on a larger
scale as it includes information on multiple insects and also greater information in
terms of genomic location of an exon, intron or transposon, sequence composition
like repeat motif and GC%. It is an interactive interface to query information
regarding microsatellites across five fully sequenced genomes of fruit-fly, honeybee,
malarial mosquito, red-flour beetle and silkworm. This study by the CDFD lab
resulted that the percent of microsatellites in exons is high for Drosophila and least in
that category was for B. mori. The study extended to the intronic region experts that
the B. mori accounts for greater percent of microsatellites (30).
Continuing with the beneficial insects to the mankind, honey bee
(Apis mellifera), a model species for social behavior and are essential to global
ecology as pollinators. A. mellifera belongs to the order Hymenoptera which
comprises approximately 10% of the species on earth. This group of
'membrane-winged' insects includes sawflies, bees, ants and wasps, which directly
affect human health and agriculture through diverse roles such as pollinators, pests
and parasitoids. The Hymenoptera genome database (HGD) is a resource for the
genomics of this order. HGD is residence for genomes of bees A. mellifera, B.
impatiens and B. terrestris, under BeeBase, a comprehensive sequence data source
for the bee research community (31). The Bee Pests and Pathogens tab under Beebase
consists of the available genomic information about the pests and pathogens of bees,
namely, Ascosphaera apis (32), Nosema ceranae (33), Paenibacillus larvae (32),
ISBN:978-1-63315-205-2
629
Invited Review
Varroa destructor, well supported by the BeeBase Wiki. BeeBase also supports
PSI-BLAST searches of special protein data sets that combine the GenBank
non-redundant protein set with honey bee predicted genes. The PSI-BLAST site has
enabled researchers to identify divergent paralogs that may not be easily identified
using other BLAST programs. The genome browsers for the genomes of the ants
species Acromyrmex echinatior, Atta cephalotes, Camponotus floridanus,
Herpenathos saltator, Linepithema humile, Pogonomyrmex barbatus and Solenopsis
invicta through the Ant Genomes Portal and also the genome of parasitoid wasp
Nasonia vitripennis make it easy to identify the location of genes on a genome map
(31).
Fig. 4. Database query page of InsatDb: Insect Microsatellite Database; One can look up and
search for different characteristics of a microsatellite against a whole genome of a species of
interest.
4. Protein Databases
The genome browser gives a detailed idea about the whereabouts of a gene;
these genes primarily function being expressed in the form of a protein. Any change
in the gene sequence may result into consequences, ranging from silent mutation to
lethal mutation where the function and structure of the protein may completely be
lost. The next level of bioinformatics studies requires one to have knowledge about
the abbreviations used for every amino acid to its physical and chemical properties
(The single letter and three letter keywords for these are shown in table 2). These
classes of macromolecules, the proteins, are complex molecules, which are necessary
630
ISBN:978-1-63315-205-2
Invited Review
for most of cells work such as structure, function and regulation of body and its
environment. Sophisticated databases at protein level are aplenty.
Table 2. Twenty amino acids with their 3-letter & 1-letter codes along with their secondary
structural propensity values for alpha helix and Beta sheets. *Polar (may participate in hydrogen
bonds); $ Hydrophobic (normally buried inside the protein core); #Charged.
Amino Acid
Single
Letter
Code
Three
Letter Code
Alpha Helix
propensity
Beta Strand
Propensity
Arginine#
ARG
0.79
0.90
Lysine#
LYS
1.07
0.74
ASP
0.98
0.80
Glutamic Acid#
GLU
1.53
0.26
Glutamine*
GLN
1.17
1.23
Asparagine*
ASN
0.73
0.65
Histidine*
HIS
1.24
0.71
Serine*
SER
0.79
0.90
Threonine*
THR
0.82
1.20
Tyrosine*
TYR
0.61
1.29
Cysteine*
CYS
0.77
1.30
Methionine*
MET
1.20
1.67
Tryptophan*
TRP
1.14
1.19
Alanine$
ALA
1.45
0.97
Isoleucine$
ILE
1.00
1.60
Leucine$
LEU
1.34
1.22
Phenylalanine$
PHE
1.12
1.28
Valine$
VAL
1.14
1.65
Proline$
PRO
0.59
0.62
Glycine$
GLY
0.53
0.81
Aspartic
Acid#

ISBN:978-1-63315-205-2
631
Invited Review
Depending upon the type of data stored, a database can be classified in to

primary, secondary and composite. The first database was created within a short
period after the insulin protein sequence was made available in 1950s. Later, in 1972,
with a mere 10 entries the first protein structure database PDB (http://www.rcsb.org/
pdb) developed as the three dimensional structures of proteins were studied at that
time. A primary database contains information of the sequence or the structure alone,
which is further used and the data gets converted to the next category of secondary
databases, which are derived from the primary database. It contains information on
conserved sequence, active site amino acids of the proteins and motif, domains of
structural and functional importance respectively. Composite databases aim at
integrating the available information in more than one primary database, which paves
way for ease searching across multiple databases from a single platform with
validated outputs.
Margaret Dayhoff holds a special place in the history of bioinformatics for her
excellent work by compiling the first comprehensive collection of macromolecular
sequences in the Atlas of protein sequence and structures, made available from 1965 1978, under the support of National Biomedical Research Foundation (NBRF). This
effort of Margaret Dayhoff, formed the stepping stone for the evolution of the
concept of biological databases. The primary databases include the Protein
Information Resources (PIR), established in 1984, with tremendous inputs from the
prior information available with Margaret Dayhoff in her Atlas of protein sequence
and structures, by the National Biomedical Research Foundation (NBRF) as an
integrated public bioinformatics resource to support the scientific research (34).
SWISS-PROT is a curated protein sequence database which strives to provide a high
level of annotations (such as the description of the function of a protein, its domains
structure, post-translational modifications, variants, etc.), a minimal level of
redundancy and high level of integration with other databases and the Translated
European Molecular Biology Laboratory (TrEMBL), TrEMBL is a computerannotated supplement of SWISS-PROT that contains all the translations of EMBL
nucleotide sequence entries not yet integrated in SWISS-PROT (35).
The maintainers of these three major protein databases, namely, EMBL-EBI
(http://www.ebi.ac.uk/), SIB (http://www.expasy.ch/swissmod/ SWISS-MODEL.
html) and PIR (http://pir.georgetown.edu/), came together and now coexist as Uniprot
consortium, mainly supported by National Institute of Health (NIH), in 2002 (36).
The primary mission of this initiative was to support biological research by
maintaining a high quality database that serves as a stable, comprehensive, fully
classified, richly and accurately annotated protein sequence knowledgebase, with
extensive cross-references and querying interfaces freely accessible to the scientific
community. Uniprot Knowledgebase is updated on a daily basis with cross

ISBN:978-1-63315-205-2
632
Invited Review
referencing to more than 120 databases provides access to additional relevant

information in more specialized data collections (37).
Other class of protein databases includes the one which have a specific mode of
classification associated with them. For instance, based on protein families
Pfam (38), Prosite (39), based on blocks and groups; Interproscan (40),
structural database PDB http://www.rcsb.org/pdb/home/home.do (41), SCOP
http://scop.mrc-lmb.cam.ac.uk/scop/ (42), CATH http://www.cathdb.info/ (43) and
others. The theme of bioinformatics is such that the wealth of information available
will always be dynamic, as newer databases and tools will and are being updated on a
daily basis. This leaves a researcher with lots of scope and options to choose a
platform relevant to his study and his satisfaction. The need for the visualization
softwares such as Rasmol (44), SwissPDBViewer http://spdbv.vital-it.ch/ (45), Cn3D
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml (46), Pymol (http://www.
pymol.org/) to analyze the hidden secrets in the protein structure is very vital and
these visualization tools do these tasks at a very efficient manner.
From the very basic level of viewing the structure to calculating the distances
between two atoms in a structure, finding the presence of residues in their allotted
position in Ramachandran plot are other important credentials of a bioinformatics
study. Because these visualization softwares (http://molprobity.biochem.duke.edu/)
facilitate the viewing of structure in a manner a researcher wants; which will be a
colorful experience.
5. Methods in Bioinformatics
5.1. Sequence Analysis
The modus operandi of biological research is that the sequence defines the
structure which derives its function; bioinformatics takes a shorter route to unravel
the function directly from the sequence. This technique of predicting the function
from its sequence is termed as sequence analysis. It is a phenomenon of comparing
two or more sequences against each other or against a database, where, if a significant
match is found between two sequences, then the same function can be assigned to the
unknown sequence as well. Now this brings us to a question what is that
terminology significant? One must have a basic understanding of the concepts
involved in protein biochemistry to settle this. Anyhow, the answer is simple; two
sequences can be similar, but identical. At a larger scale, two amino acids which have
same physical and chemical properties are termed as similar; example being,
aspartate and glutamate (acidic amino acids). And if a same amino acid is matched in
both the sequences, at the same position, it is termed as identical (Asp & Asp). There
are more basic terms in bioinformatics to be familiarized with. Two sequences are
said to be homologous if they have been inherited or derived from the common
633
ISBN:978-1-63315-205-2
Invited Review
ancestor. Sequences are considered to be orthologous if a homologous sequence is

found in multiple organisms due to speciation with the same function (eg. the
hemoglobin, expressed across diverse life forms); paralogous, if the sequence is
duplicated in the same species due to a duplication event, resulting in an altered
function (eg. hemoglobin and myoglobin).
Sequence alignment is a method of aligning two sequences and finding out
relatedness between them. Sequences are written across the page, one under the other
and the matching characters are placed in the same column or position and the
scoring is performed by the application of scoring matrices. The alignment with the
highest score is the considered to be the optimal alignment. If the alignment is done
between two sequences it is termed as pairwise alignment and the one involving three
or more sequences is called the multiple sequence alignment. A structural motif or a
functional domain is very important to a protein, as it holds the key to its identity as a
class of protein. Alignments are of two types, local and global. In the former,
importance is given only to a portion of the sequence which has structural or
functional or any important characteristic attached to it; and the alignment which is
performed from the start of the sequence till the end (Fig.5), to find as many matches
as possible is called the global alignment (47).
Fig. 5. Comparative view of Global alignment (top) by Needle program and Local alignment
(bottom) by Watermann program available at EBI tools.
A sequence can be of any length; therefore, choosing a proper method to do this

analysis is crucial. There are several methods to this; the very basic and intuitive
method is the Dot Plot method of aligning sequences (47), it is a graphical method
(graphical because it looks like a normal graph). In this method one sequence is
written on top across the page and the other sequence is written from top to bottom
vertically. Each cell is occupied by a character; the cell where a match is found, a dot
is placed and this way the comparison is done and dots are placed. Now one has to
look for a diagonal by connecting these dots. The length of the diagonal is
proportional to the match between the two sequences compared. This method is easy
ISBN:978-1-63315-205-2
634
Invited Review
to find the matching segments without any hiccups. In case of longer sequences this
method becomes difficult a task; which could be addressed with the application of
window size and stringency value.
Dot plot is surely better method for alignment of sequences; however, consider
multiple sequences then the plot would be multidimensional. Hence a scope for other
methods, the next in the category is the dynamic programing a highly mathematically
and computationally intensive method. This method to an extent is similar to the dot
plot method, but involves scoring pattern. Global alignment is achieved by dynamic
programming by the application of Needlemann Wunsch (48) algorithm and the
modified version of the global alignment to align local segments is done by
employing the Smith Watermann algorithm (49). In this technique, once again, as in
dot plot, the sequences are written on top across the page from left to right
horizontally and top to bottom vertically each cell being filled with a character. One
additional row and column is added to accommodate the gap penalty. A gap is a
blank space under or above a column / position in either sequence, which is
introduced to get as many matches as possible. They are indicators for an insertion or
deletion at a particular position in a sequence during evolution over the years. Since,
these gaps are not real and part of the sequence and are inserted to get the optimal
alignment they had to be penalized, and that is gap penalty for you. Generally, a
scoring scheme will be such that, always, a match will be given higher positive value,
a mismatch a zero or negative at times, but the gap penalty will be negative on the
higher side. Now an algorithm is followed and the cells are filled with values. Once,
the matrix is completed, a trace back is done to get the final alignment. Now, there is
a possibility of getting more than one alignment from this practice. To narrow down
to the optimal alignment a scoring will be done after getting the alignments and the
alignment with highest score is optimal. Remember, it is not a rule of thumb that
there must be only one possible way of aligning sequences.
Dynamic programming is a widely followed and highly accurate method as it
always provides the accurate alignment. But there are cons to it as well, the biggest
con in this method is that it takes lot of time as it does tedious calculations; meaning
longer the sequence more time it takes to solve the mystery, which directly affects its
time efficiency, memory required and so on.
The next in the class of methods are the heuristics methods, which govern and
function by splitting the sequence into smaller fragments (words) and gradually
increasing the characters in the sequence. One of the most famous and highly trusted
tool to find similar sequences is Basic Local Alignment Search Tool (BLAST) (50).
It is a faster method of the lot; for rapid sequence comparisons, directly approximates
alignments that enhance the measure of local similarity. Blast searches the query
sequence against large databases and aligns them; the results may not be the best.
ISBN:978-1-63315-205-2
635
Invited Review
Fig.6, explains the blast practice of finding similar sequence to a query

sequence. In this exercise the query sequence is mitochondrial Cytochrome Oxidase
Subunit 1 sequence of Dusky cotton bug Oxycarenus laetus Kirby, a member of
Lygaeidae family in the order Hemiptera. This partial COI sequence consists of 222
amino acids; was searched against non-redundant database using the BlastP program.
The graphical summary indicates the family (Cytochrome Oxidase I superfamily) the
query sequence belongs. Further, the hits obtained from this Blast search, confine the
results to CO family from lygaeid species of the class Insecta. In this example I have
tried to explain the results with just one example. The lower section of this figure
shows an alignment between our query sequence and the hit obtained (i.e.,
Oxycarenussp WCW-2003) to have none gaps and they share 98% sequence identity
between them with a tremendous E-value of 3e-136 with 93% query coverage. The
first hit is match to self and therefore can't be considered.
Blast has variants to it, BlastP, BlastN, BlastX, TBlastN and TBlastX. More
specialised Blast versions are also available like, PHI-Blast, PSI-Blast, Mega Blast,
WU-Blast and others. Explanation regarding these types can be obtained from any
bioinformatics book. Other similar search tool is FASTA (51) which also follows the
heuristics approach for aligning sequences.
Fig. 6. Blast results of similar sequence search performed using COI of Oxycarenus laetus Kirby
as query sequence.

ISBN:978-1-63315-205-2
636
Invited Review
The substitution matrices are a currency for sequence analysis which plays a
vital role in deciding the quality of alignments. Two major matrices used in protein
sequence analysis are the Percent or Point Accepted Mutation (PAM) and the Blocks
Substituted Matrix (BLOSUM). PAM, derived by Margaret Dayhoff, takes the
frequency at which an amino acid can mutate in to other and this is calculated with
due importance to rate of evolution, hence considered as evolutionary matrices (52).
The matrices derived by Dayhoff are obtained by taking the closely related proteins in
to account, which may not be perfect for highly divergent group of sequences. This
was overcome by Henikoff and Henikoff, by introducing a new matrix BLOSUM,
constructed out of multiple alignment of evolutionarily divergent sequences. These
matrices are based only on highly conserved regions, with an implicit model of
evolution. The values of this matrix are calculated by looking at the blocks of
conserved segments with a sequence identity above a threshold (53). With so much to
consider and choose from, the question is which matrix should I use for my analysis?
Not so tricky a situation, if the sequences taken for study is highly conserved then one
would choose higher BLOSUM and lower PAM, and vice versa.
Pairwise sequence alignment is a comparison between two fixed sequences,
based on string similarity; therefore, the inference will be based on the selected
sequence alone; which may not always be the perfect solution to predict the
functional answers to your unknown sequence. Also, it becomes cumbersome to do
so many pairwise alignments. Rather, one can compare a query sequence against a
class of sequence by employing a multiple sequence alignment (MSA). We use
multiple sequence alignment to find the areas of sequence similarity that could point
to the structure of an evolutionary ancestor or provide information about the
evolutionary history of the sequences. MSAs, are also more sensitive to sequence
similarities than a pairwise alignment because the conserved regions could be so
dispersed that a pairwise alignment wouldnt find them.
Fig.7A, shows a typical multiple alignments performed using mitochondrial
CO1 sequences across from 12 insect species. Further phylogenetic analysis of
B.mori mitochondrial CO1 (Fig.7B) showed that sequences are highly conserved with
other insect species and must be from the same protein structure as well as
physiological properties (Fig.7C,D,E). Yes, they are so easy and intuitive is MSA if
viewed in a colorful format. It is evident from figure 7 that almost all sequences are
highly conserved and share a great deal of similarity. Few positions have mismatches,
for example, position 15; first two species have Asparagine (N) a basic amino acid
being aligned with Glutamate (G) an acidic amino acid, hence different colors. The
asterisk symbol (* Identical characters), a colon (: Similar character) and a space
(varying characters) provided at the base of every column guides the researchers to
track down the variable positions quickly. The image shown here is only partial.

ISBN:978-1-63315-205-2
637
Invited Review
Fig. 7. A) A multiple sequence alignment of 12 Cytochrome Oxidase Subunit I sequences of

various insect species performed using the ClustalW tool available at EBI tools. B) Phylogenetic
analysis of Bombyx mori mt CO1 and 3D homology of B. mori CO1.

ISBN:978-1-63315-205-2
638
Invited Review
constructing MSA is really a trivial exercise, if the sequences are highly variable. In
our example you could see there are no gaps in this because there is no ambiguity in
the sequences; means no indels (insertions / deletions) are needed to make the
alignment, and the ungapped sequences can simply be arranged together. However,
if the sequences are of various lengths, problem becomes potentially very complex.
Various algorithm are available to carry out these tasks, such as a sum of pairs
method (SP), a very good method but loses its shine as the sequences increase in their
length, such is their way of functioning. Faster methods are the need of the hour and,
there are techniques which do these studies faster. Other methods of MSA include the
progressive and iterative methods, profile methods executed by the application of
HMM (47). Discussion on these is out of the context of this chapter.
MSAs are very helpful and they pave way for further understanding of the
sequences, as other domains of bioinformatics such as identifying patterns of
conservation, phylogeny reconstruction depends on the MSA, as they serve as input
to these studies.
5.2. Computational proteomics

The objectives of proteomics include large-scale identification and quantification
of all protein types in a cell or tissue, analysis of post-translational modification and
association with other proteins, and characterization of the protein activities and
structures. Applications of the proteomics in insects are still in its initial phase,
mostly in protein identification. However, recent efforts such as the insect genome
project promote to encouraging for further investigate structure and functions of
proteins and conceptualizing complex systems with descriptive and predictive
computational models.
i) Electrophoresis Analysis
Electrophoresis analysis can qualitatively and quantitatively investigate
expression of proteins under different conditions (54). Recently several
bioinformatics tools have been developed for two-dimensional (2D) electrophoresis
analysis for ability to identify proteins (SWISS-2D PAGE map: http://au.
exasy.org/ch2d/; http://us.expasy.org/swiss-2dpage), annotate (Melanie 2D gel:
http://au.expasy.org /melanie/; http://gelbank.anl.gov/index.asp), visually comparison for
gel images (Flicker 2D gel images comparison: http://open2dprot.sourceforge.
ne/Flicker/; PDQuest: http://www.protemeworks.bio-rad.com).
ii) Mass Spectrometry
In general protein separation using 2D Gel or liquid chromatography and protein
digestion using an enzyme (trypsin, pepsin, etc.), proteins are identified by typically
639
ISBN:978-1-63315-205-2
Invited Review
using Mass Spectrometry (MS). MS provides a high-throughput approach for

large-scale protein identification. The data generated from mass spectrometers are
often complicated and computational analyses critical in interpreting the data for
protein identification (55, 56). There are two types of MS-based protein identification
methods: peptide mass finger-printing (PMF) and tandem mass spectrometry
(MS/MS). Most widely used computational tools are expensive commercial
packages (Mascot: http://www.matrixscience.com; MS-Fit: http://prospector.ucsf.
edu/; PMF-Emowse: http://emboss.soruceforge.net/; SEQUEST: http:filds.scripps.
edu/sequest/) and unable access for open-source software.
The key challenge in metabolite profiling is the rapid, consistent, and
unambiguous identification of metabolites from complex insect samples. Currently
the most mature technology for rapid metabolite p rofiling is gas chromatography
coupled with electron impact (EI) quadrupole or time-of flight (TOF) MS
(GCMS). Using this approach, it is possible to simultaneously quantify several
hundred chemically divers compounds (i.e. organic acids, most amino acids, sugars,
sugar alcohols, aromatic amines and fatty acids) from single insect tissue sample.
Volatile metabolites can be separated and quantified by GCMS, directly. The
introduction of compatible and reliable label and label-free techniques. These
advances now require further developments in bioinformatics and downstream
validation, technologies that are required to make sense of complex data and enable
researchers to infer more meaningful data with accuracy. A significant advantage of
GCMS with EI ionization is the availability of many searchable mass spectral
libraries. The largest commercial EI libraries are the 2005 NIST/EPA/NIH Mass
Spectral Library (http://www.nist.gov/srd/nist1.htm) containing 190825 spectra and
7th edition of the Wiley Registry of Mass Spectral Data (http://www.wiley.com),
which contains 338000 EI mass spectra (or over 460000 EI mass spectra combined
with NIST library). In addition to commercially available EI libraries, a number of
public libraries are also available. Unfortunately, most mass spectral libraries are
tailored toward the chemical industry, drug studies or natural product discovery, and
therefore do not represent a large number of naturally occurring metabolites and their
intermediates. This limits their applicability to metabolomics studies.
5.3. Transcriptome Analysis
Cutting edges of technology especially DNA microarrays proved a powerful
technology for observing the transcriptional profile (measurement of transcript
abundance for thousands of genes) of insect genes at a genome-wide level
(Expression profiling). Microarray data are also being combined with other
information such as regulatory sequence analysis, gene ontology, alternative splice
variants and pathway information to infer co-regulated processes. Observing the
patterns of transcriptional activity that occur under different conditions such as
ISBN:978-1-63315-205-2
640
Invited Review
genotypes or time courses reveals genes that have highly correlated patterns of
transcript expression. Many tools are available that perform a variety of analysis on
large microarray data sets (GCOS; GeneSpring: http://www.agilent.com
/chem/genespring; CaRRAY: http://caarray.nci. nih.gov/; Bioconductor : http://www.
bioconductor.org).
5.4. Phylogenetic Analysis
Life on this planet has taken three different lineage routes in the first instance,
known as bacteria, the archaea and the eukarya (57). From the days of identifying an
organism with its morphological characters, the field of taxonomy has gathered pace
and there seems very different approach to modern evolutionary studies. The study of
evolutionary history relies on genealogical theory, which assumes that all alleles,
genes, individuals, populations, and higher taxa (species, genera, etc.) that have ever
existed were born from pre-existing alleles, genes, individuals, populations, and
higher taxa, respectively (47).
Phylogenetics on sequence data is an attempt to reconstruct the evolutionary
history of those sequences. Phylogenetic relationships usually depicted as trees, with
branches representing ancestors of children; the bottoms of the tree (individual
organisms) are leaves. Individual branch points are nodes. Trees can be classified into
two types namely rooted and unrooted. Rooted trees have an explicit ancestor; the
direction of time is explicit in these trees; unrooted trees do not have an explicit
ancestor. The branching patterns of a phylogenetic tree can be used to convey
information about the sequence in which evolutionary events occurred. Trees can also
be classified as scaled and unscaled trees, while the former are ones in which branch
lengths are proportional to the differences between pairs of neighboring nodes. Scaled
trees are also additive, meaning that the physical length of the branches connecting
any two nodes is an accurate representation of their accumulated difference. In
contrary unscaled trees line up all terminal nodes and convey only their relative
kinship without making any representation regarding the number of changes. Briefly,
the phylogenetic analysis methods are of two types, namely distance based and
character based methods (47).
Phenetics methods, also called as the clustering methods due to the way in which
they go about their business. These are the very basic, easy and the comfortable
methods which are followed most often by the phylogenetic community. The most
common distance based methods are the Unweighted / Wighted Pair Group Method
with Arithmetic Mean (UPGMA / WPGMA), Neighbor Joining method (NJ),
Fitch-Margoliasch method.

ISBN:978-1-63315-205-2
641
Invited Review
In UPGMA (58) method pairwise evolutionary distance is calculated, which is

usually an estimate of number of non-matching characters between the sequences.
A distance matrix is constructed, followed by the clustering which is dependent upon
the one with the least distance. With regards to UPGMA method, first the two closely
related taxa are clustered together and their relative difference to other taxa is
calculated as d (AB)C = (dAC + dBC)/2. This is repeated to all the taxa resulting in a
reduced distance matrix, which helps us in further clustering of new groups. This
continues until the last two clusters are joined. The length of the branch at each step is
determined by the difference in heights of the nodes at each end of the branch.
UPGMA method is not much in use today; it gave way to NJ method. Though it
works based on the concept of distance, it indeed is different in the way it functions.
A distance matrix is created between each taxon. The one with minimal distance is
selected and the distance from the two nodes to the node that directly links them. This
new node will be considered as taxa for further clustering and the steps are repeated.
The initial tree looks like a star which gradually gets converted into a tree. Additivity
is a measurement that is based on the distance measure used. A test and result of
additivity is the sum of the lengths of two distances must be greater than the third
distance. NJ method (59) employs the common clustering techniques, therfore it is
easy and efficient to execute and understand. The resulting tree is an unrooted tree.
The character based methods are highly effective but take enormous time to
complete a task. Maximum Parsimony (MP) method is a measure of number of
changes needed to be done in order to convert one sequence in to other (60).
Importance in this class of method is given to the tree which required very least
number of changes or substitutions. This method works by searching through all
possible tree structures and assigning a cost to each tree. The premise that taxa sharing
a common characteristic do so because they inherited those characteristics from a
common ancestor. MP method does not find branch lengths but the overall length in
terms of the number of changes. Since, this method constructs tree for every column
and had to choose the best among them; it loses its credibility due to time it takes to
construct a tree in comparison to other methods. However, MP provides the best
possible tree among all. The other methods are also available such as the Maximum
likelihood method, computationally intensive and the most flexible. ML (60)
optimizes the likelihood of observing the data given a tree topology and a model of
evolution. The advantage of this method is its ability to make statistical comparisons
between topologies and data sets. Today MEGA4 now contains a wide array of
functionalities
for
the
molecular
evolutionary
analysis
of
data
(http://www.megasoftware. net/features.html). It is useful to note that while MEGA
latest version is continuously adding new methods and functions up to MEGA 4.
MEGA, not intend to make it a catalog of all evolutionary analysis methods available.
Rather, it is anticipated to become a workbench for the exploration of sequence data
ISBN:978-1-63315-205-2
642
Invited Review
from evolutionary perspectives. MEGA version have produced the foundation of a

powerful, easy to use, extensible software package that is able to make a working
hypothesis of characteristics of a sequence and evolutionary analysis. Fig.8 shows
that construction of phylogenic tree stepwise demonstrate with MEGA version 5.0 and
design different phylo tree and MEGA produces annotation results that are interpreted
by computer and presented in a useful manner (quality displays and descriptons), more
so than a conglomeration of separate analyses.
Insects constitute the most species-rich class among animals with almost a
million of taxa described to date (61). Within the insects, the Heteroptera (true bugs),
the Sternorrhyncha (aphids, scale bugs, whiteflies, and psyllids), and the
Auchenorrhyncha (planthoppers, leafhoppers, spittlebugs, and cicadas) comprise the
largest non-holometabolan insect assemblage-Hemiptera (62, 63).
To date the complete or near complete mitochondrial genomes have been
sequenced from 175 insect species in 22 orders. Most species come from the order
Diptera, Hemiptera, Hymenoptera, Lepidoptera and Orthoptera. The mitochondria of
insects contain their own double-stranded circular genomes which range from 14, 503
bp (64) to 19, 517 bp in size (65, 66).
Early studies within the class Insecta suggested conservation of the gene order
over a wide range of different organisms indicating an ancestral gene order for this
group (56, 64). However, more recent studies have shown that within the
Hemipteroid assemblage, there is considerable variation in the order of genes in the
orders Phthiraptera, Psocoptera, and Thysanoptera, but no variation in the order
Hemiptera (that includes the suborder Sternorrhyncha) (67, 68). Over the last past 15
years or so, DNA markers have made a significant contribution to rapid rise of
molecular studies of genetic relatedness, phylogeny, population dynamics or gene and
genome mapping in insects (69, 70, 71, 72, and 73). Application of DNA markers in
entomology has gone through and is still undergoing a noticeable change in
continuously accommodating new technologies for robust and less expensive
genotyping methods. Entomologists are getting more accustomed with the refinement
of marker systems and are applying the new techniques to study insect genomes more
efficiently. Traditionally, mitochondrial DNA (mtDNA) has been a choice of marker
for studying genetic variations in insect species. Mitochondrial gene sequences have
been used for phylogenetic and population-genetic studies to construct evolutionary
history of related insect species. More precisely, these markers have provided
invaluable insights into the history and genetic basis of speciation and phenotypic
evolution of recently diverged species.

ISBN:978-1-63315-205-2
643
Invited Review
Fig.8: Construction of phylogenetic tree stepwise by using MEGA software version 4.0 and
different model for phylogenetic tree design. 1. Target sequence and NCBI blast search; 2. Identify the
homologs sequences; 3. Importing sequence into MEGA software version 4.0; 4.Sequence alignment and
percentage identity; 5. Before construct Text Maximum Likelihood tree; 6. Screen view shows that ML
Tree constructs during running program; 7. After construct ML tree screen view; 8. Various designs for
phylogenic tree (Rectangular Model tree, Curve Model tree, Straight Model tree and Circular Model
tree).
5.5. Structural Bioinformatics

Insect pests have mainly been controlled with synthetic insecticides in the last
fifty years. Most insecticidal compounds fall within four main classes, the
Organochlorines, Organophosphates, the Carbamates and Pyrethroids (74). The use
of synthetic pesticides has been banned by many countries due to their effect on the
non-target organisms including human and the environment negatively. Sustainable
agriculture aims at reducing the incidence of pests and diseases to such a degree that
they do not seriously damage crops without upsetting natures balance. One of the
aims of sustainable agriculture is to rediscover and develop strategies whose cost and
ecological side-effects are minimal. The use of synthetic pesticides has undoubtedly
resulted in achievement of green revolution in different countries through increased
crop production. However, in recent years there has been considerable pressure on
ISBN:978-1-63315-205-2
644
Invited Review
consumers and farmers to reduce or eliminate synthetic pesticides in agriculture (75).

Therefore, there has been a search for alternative methods of insect control.
Whatever is the pesticide type, a pesticide or a controlling agent has a preferred
mode of action on its target. This target may be an enzyme; the controlling agent may
completely block the expression of that enzyme, if it is brought about at the DNA
level; may inhibit the enzyme from functioning by binding to a site on it by different
modes of inhibition such as competitive, non-competitive or uncompetitive. Most of
the pesticides available are known to function by acting on the esterases such as
Carboxylesterases, Glutathione S-Transferases, Monooxygenase P450 (detoxification), AcetylCholinEsterases (neurotransmitter), Proteases (digestive), ecdysone
receptors, NADH Dehydrogenases (ATP synthesis). These controlling agents can act
as growth regulators (Table 3), anti-feedant, ATP synthesis inhibition, anti-moulting
etc.
Knowledge about the interaction between a pesticide and its target is necessary
for further improvement of the pesticide. Enzymes are vital ingredients of cell which
hold all the cellular processes together and let the life function. The first
crystallographic structure of protein solved was in 1950s that of sperm whale
myoglobin (76). Since then, several thousands of structures have been determined and
were deposited in different repository, PDB to be specific. As of now, the time of
writing this text, there are 75564 protein structures solved by X-ray crystallography
out of 84955 including those determined by other methods as well. However, the time
taken by these methods to answer the hidden truths and come out with three
dimensional atomic structures is enormous. This is mainly due to the intricacies
involved in these experimental procedures, leading to huge expenditures pertaining to
chemicals, labor and most importantly patience. The rate at which the genome
projects are being completed and new proteins are being predicted, the structural
inference of these proteins is unmatchable, and hence the function of these proteins
still remains a treasure hunt. This is clearly lauded by the fact that 27485 insect
proteins are still hypothetical meaning; neither their function nor structure has been
predicted yet.
The principles of bioinformatics can be applied to predict the three dimensional
structure of the target within quick time. Discovery of drugs with assistance from the
computers is the evergreen advantageous applications of bioinformatics termed as
computer aided drug design. On the same lines, pesticides may also be predicted. To
follow this track three dimensional structure is necessary.

ISBN:978-1-63315-205-2
645
Invited Review
Table 3. Commercially available insect growth regulators (IGR)
CommonName
TradeName(s)
ModeofAction
Fenoxycarb
Preclude
Juvenile Hormone Mimic
Kinoprene
Enstar II/AQ
Pyriproxyfen
Distance
Buprofezin
Talus
Chitin Synthesis Inhibitor
Cyromazine
Citation
Diflubenzuron
Adept
Etoxazole
TetraSan
Novaluron
Pedestal
Azadirachtin
Azatin, Ornazin, Molt-X
Ecdysone Antagonist
The three dimensional structure of a protein consists of secondary structural

elements which are a local regularly occurring structure in proteins and is mainly
formed through hydrogen bonds between backbone atoms. The 3D structure of
protein generally consists of secondary structural elements such as helices, sheets
(stable) are preferably located at the core of the molecule; coils, loops (unstable) are
known to prefer to reside in outer regions. If the secondary structure of protein is
known, it is possible to derive a comparatively small number of possible tertiary
structures using knowledge about the ways that secondary structural elements pack
(77). This exercise of predicting secondary structures of proteins from the sequence
data is a specialized field in the field of bioinformatics (47).
Several algorithms have been defined by various scientists to achieve this task,
each having its pros and cons. These secondary structure prediction methods have
come across generations depending upon the way the problem is solved. The very
first of the secondary structure prediction methods was the one enunciated by Chou
Fasman algorithm bearing their names as CF method (78). CF method works on the
principle of amino acid propensities, meaning, given a residue type, its ability to
adopt a helix, sheet or a turn. The scientists calculated the propensity of all the twenty
amino acids to occur in helix, sheet or turn by comparing the three dimensional
structures available in the database. This is followed by assigning the sequences, their
propensities and the rule if four out of six contiguous residues have P () > 100, that
region molds in to helix. In case of a sheet, three out of five contiguous residues have
P ()>100. This continues on either side until P ( or ) < 100. Disputed regions
ISBN:978-1-63315-205-2
646
Invited Review
where there is no consensus for helix or a sheet, but P () <P (T) > P (), results in
turns, when P (T) > 100. The remaining part of the sequence where there is no proper
assessment, that tends to form coils. CF method is known to be highly alpha helix
oriented and therefore this turns out to be its disadvantage and the accuracy of
prediction is 56 60% (Fig. 9A).
Fig. 9 A) Secondary structure prediction of B.mori lysozyme sequence using PHD (neural
network), CFSSP (Chou-Fasman) & GOR (GOR4) and the comparison of results with the
experimentally determined structure of the target structure PDB ID: 1GD6. B) 3D homology
model of B.mori lysozyme. C) Metal binding site.
One another problem of CF method is that it does not give importance to the
neighboring residues as an individual amino acid cannot form a secondary structure.
This was overcome to an extent by Garnier Osguthorpe & Robson (GOR) method
(79). It is an efficient tool and has undergone various updates over the years
depending upon the requirements. It is a probabilistic method, where the probability
647
ISBN:978-1-63315-205-2
Invited Review
of the occurrence of a type state S (alpha, beta, turn & coil) when the given residue is
R [P(S/R)]. Considering the information carried by a residue about its own secondary
structure, in combination with the information carried by other residues in a local
window of eight residues on either side of the sequence of the residue concerned.
According to the definition of conditional probabilities, P(SIR) = P(S, R)/P(R) where
P(S,R) is the joint probability of observing the events S and R and P(R) is the
probability of observing a residue R. It is easy to have an estimation of I (S; R) from a
database of known sequences and corresponding observed secondary structures.
This way it has to search the structural database for 1360 parameters to come
about a conclusion for one single residue. The advantage of this method is that takes
the effect of adjacent residues on to the central residue concerned in to account,
however one disadvantage of this method is that under predicts the strands. The Q3
accuracy of this method is found to be between 60 65% for GORIII.
The latest in the race and considered to be next generation method in the class of
secondary structure prediction methods is the one designed on the concept of neurons
in the human brain system and named as Neural Networks method (80). It is a highly
accurate method with Q3 of 70 - 75%. Similar to GOR method this also takes the
neighboring residues in to consideration and considered to be a binary algorithm.
Three layers are formed in this method where the input layer feeds the information to
hidden layer through a scanning of all the 17 * 21 residual comparisons, where all the
processing takes place (which is still not clear, therefore, it is called as black boxes
method). Once the processing is over the information from the hidden layer is passed
on to the output layer in the form of 0 and 1. If the value is 1 in the helix output
section, the resulting state is a helix and vice-versa.
Fig.9B, shows a comparative assessment of
the three mentioned discussed above with that
of an experimentally determined structure of
B. mori Lysozyme (PDBID: 1GD6). The
selected sequence is 119 residues long; the
results indicate, the accuracy of PHD,
following the neural networks algorithm is
highly accurate as most of its predictions are
in sync with that of the experimentally
predicted structure. The first helix region
ranging from fourth residue (R) to fourteenth
residue (K) is correctly predicted by PHD.
CFSSP predicts this region to be helix but it extends the region further to eighteen
more residues continuously, to an extent as in PDB structure. This result is obvious as
CF method is known to highly poised towards helices. The next helical region
9B
N

ISBN:978-1-63315-205-2
648
Invited Review
predicted was between residues seventy fourth residue (K) to one hundred tenth
residue (C), a region predicted as it is in PDB by PHD and CFSSP; a little variation in
terms of intermittent coils in the experimentally determined structure. This result
when extended to the GOR4 tells us a completely different story (Fig. 9A). Therefore,
it is necessary to make up once mind in selecting an appropriate method for secondary
structure prediction of a given protein from its sequence alone, which partially solves
the problem of tertiary structure prediction in terms of threading methodology.
Getting back to the 3D structure of proteins (Fig.9C), it can be derived easily
from the sequence itself, provided suitable homologs are available. A greater understanding of the protein structure explains that the main chain tends to be the same no
matter what the residue type is. It is only the side chain where variations take place.
The 3D structure of proteins is highly stabilized by the Hbonds, Van der Waals
interactions, Coulombic interactions etc, all these falling under the category of
non-bonded interactions. Because of the assumption, that sequence derives the
structure, it is disputed that homologous
sequences may have similar structure. In
accordance with this assumption, one can
predict the 3D structure of protein
theoretically using in-silico approaches.
Depending up on the sequence similarity, the
approach of predicting the 3D structure of
proteins can be classified in to three types,
namely, comparative modeling or Homology
modeling or Knowledge based modeling,
Fold recognition or Threading and ab-initio
methods (47, 81).
9C
Homology modeling is a method where the 3D structure of the target protein is

predicted based on sequence to sequence comparison. The target protein is searched
against the PDB structure database using Blast to search for suitable template
structure. The criteria for a structure to be chosen as template is >35% sequence
identity between the target sequence and the template sequence. This is because
homologous sequences share at least 35% identity between them. And also as the
identity decreases, so the quality of the model and the resulting model is not
considered to be good enough; which cannot be used for further studies. Also, the
modeling of structure at 25 35% identity is termed as Twilight zone modeling (47).
Anyways, once the template is selected, the next step in this method is to copy the
structural parameters of the template to the target sequence. Here we demonstrate
stepwise 3D homology modeling using D. melanogaster sex protein and design

ISBN:978-1-63315-205-2
649
Invited Review
different models of protein structure (Fig.10A, B). This way the basic fold of the
structure is ready. This is achieved by many software tools available, such as
Swissmodel (82), Modeller (83), VMD, etc. The regions which do not share a
significant similarity with the template, is fitted in to the loop region. Once the model
is ready, it is subjected to evaluation.
Fig.10. Schematic diagram shows the 3D homology modeling (A) stepwise by using
D. melanogaster sex protein and design different model (B) by using PyMol software program.
Fold recognition also called as threading approach to 3D structure prediction is a

versatile method, a bit complicated than the earlier one explained. This method is
employed only when there is no template with required sequence identity between the
target and the template (>35%) (84). The fundamental idea behind this method is
largely based on the assumption that there are only about 1000 folds available in the
structures that have been predicted and deposited in the PDB database. And any new
structure that is going to be predicted may adopt any of these fewer folds only. So, in
this approach rather than sequence to sequence alignment between the target and the

ISBN:978-1-63315-205-2
650
Invited Review
template in homology modeling, the target sequence is aligned with the template
structure. Confusing? I will clear the air! A protein has various structural parameters
such as polarity, buried or exposed to environment, leading to 6 (P1, P2, B1, B2, B3
& E) possible environments, this when associated with three secondary structural
elements alpha helix, beta sheet and turn will give rise to 18 environmental structural
descriptors. This forms the base and crux of this method. A library of these
environmental descriptors and their frequency of occurrence for each of the twenty
residues is calculated and tabulated and were converted into log odds score. This is
done by comparing the structures that are already deposited in the PDB database.
Now there is a numerical value.
On many occasions, one comes across a situation where neither the sequence
identity between the template and the target nor the fold is available in the fold
database. To overcome these situations, there is a third category of methods, called
ab-initio methods. These are very tedious methods; one needs to have a greater
understanding of the laws and principles of physics and chemistry. This involves a lot
of computation and requires high memory space as well. One of the approaches is the
simulated annealing, in which the temperature plays a vital role. It works in a way
such that, the protein is subjected to higher temperatures; this makes the protein loose
its credibility. As higher the temperature, greater will be the energy. Now, gradually
the temperature is decreased and the protein will assume a conformation that is near
to a global minimum. This is continued until there is no further fluctuation in the
energy versus temperature trajectory. Finally, the structure where there is no change
in the energy of the protein is considered to be modeled structure. This may sound
easy, the way I had explained. Remember, it is highly a cumbersome method (84).
Energy is a vital requirement for a living organism to survive. All the cellular
processes require energy, in order to make or break a bond, for any interaction to take
place. A biomolecule is considered to be in a stable and functional form, if it is in a
ground state; its energy being global minimum. The total energy of the system is
calculated as a sum of kinetic energy and potential energy. Calculating the kinetic
energy of a system is a task, as the electronic behavior of the system is highly
fluctuating due to the size of the electrons which are 1581 times greater than the
nucleus (85). According to Born Oppenheimer approximation, the energies of the
electronic motion and nucleic motion can be calculated separately and be summed up
later to get the overall energy of the system. The field of mechanics which considers
only the nucleic contribution is termed as the molecular mechanics. The potential
energy of all systems in molecular mechanics is calculated using force fields. The
functional form of a force field is the summation of energies due to bonded and
non-bonded interactions. Bond angle, bond distance, torsional angle form the bonded
interactions and electrostatic, Van der Waals, etc form the non-bonded interactions.
These values are the deviations from the reference values calculated by the high scale
ISBN:978-1-63315-205-2
651
Invited Review
quantum mechanical calculations. Every force field may differ from the other in its
functional form, because one force field may take hydrophilic part in to consideration
and the other may not. There is a plethora of force fields and it depends on the
researcher to select an appropriate force field. The force field for water solvent is
TIP3P, considered one of the best in business (85).
5.6. Molecular Docking
Molecular docking helps one in narrowing down on the compounds which
show affinity towards the target, this way enormous amount of time is saved and
helps reduce the cost. Understanding the mode of binding between a target and its
candidate small molecule is termed as molecular docking. This interaction is mainly
brought about by hydrogen bonds, electrostatic interactions, Van der Waals
interactions, etc (85).
i) Molecular 3D structure of Ecdysone receptor protein from Spodoptera litura
Ecdysteriods is a steroid hormone that plays an important role in molting,
metamorphosis, reproduction, and many other developmental processes in insects and
in other arthropods (86). The ecdysteroid hormones act through specific receptor
protein molecules called the Ecdysteroid receptors (EcR). The EcR is a member of
the nuclear receptor superfamily, which comprises a group of receptors containing at
least one of two highly conserved domains: the centrally located DNA binding
domain (DBD) and the C-terminal ligand-binding domain (LBD). The subsequent
availability of cloned genetic sequences encoding both the EcR and USP/RXR
(ultraspiracle/ X receptor) subunits of ecdysone receptors from a range of arthropods,
and other animals, has advanced our understanding of the receptors structural
biology, evolution, and ligand interactions, and of the selectivity of certain
environmentally friendly insecticides (86, 87, 88). The phylogenetic tree result shows
(Fig.11A) that the Sl-EcR from Lepidoptera in the evolutionary tree can be correlated
with the highest amino acid sequences with other insect groups. Interestingly, we
noticed that there was a distinct separation of the more divergent Diptera and
Lepidoptera from those of other insects and arthropods. The sequence analyses of
Sl-EcR and show (Fig.11B) the location of DNA binding domain and ligand binding
domain in helix 5-6 (E185-E256) and C-terminal region (T381-F487). The
homologues of the Sl-EcR were used for predicting the 3D structure of ligand binding
domain using PDB template of H.virescens (95%) and DNA binding domain using
PDB template of D. melanogaster (76%). The structure analysis reveals that Sl-EcR
consists of 12 helices, a small antiparallel -sheet located between helices H5 and
H6 and 14 loops (Fig.11C, D and E). Biological zinc binding region situated near
10th helix (His518, His464) and other co-factors (Cys188, Cys191, Cys205, Cys208,
Cys240, Cys243 and His228, His290) residue is responsible for metal binding site
ISBN:978-1-63315-205-2
652
Invited Review
Fig.11. A) Phylogenetic tree was reconstructed based on all 35 amino acid sequences of EcR from
insects. The tree was made by neighbor joining method using ClustalW multiple alignment program.
MEGA software version 4.0 can export the drawings to graphics programs, and can export trees in
Newick format for use by other programs. B) Sequence alignment of EcR from Spodoptera litura
(SlEcR). The helixes position based on the crystal structure of Helionthis virscens (EcR-LBD) and
Drosophila melanogaster (EcR-DBD); cylindrical: -helix 1-12; empty arrow: beta sheet 1and 2; histine
residues: cyan color; cysteine residue: blue color.

ISBN:978-1-63315-205-2
653
Invited Review
(Fig.11C). The Psi and Phi angles of the residues correspond to the allowed and
partially allowed regions of the Ramachandran plot (Fig.11F-H), supporting the
validity of the predicted structure of Sl-EcR. The hetero-dimerization interface
between EcR and USP is centered on a conserved core of residues localized at helices
H9 and H10 of their LBDs. The sequences contributing to this surface are relatively
well conserved in all insects. Nevertheless, within this third region, residues Cys394,
Leu397, Leu408, and Trp412 appear strictly conserved in hemipteran, lepidopteran,
dipteran, and coleopteran speciesof these, all except Leu408 are in contact with the
bound ecdysteroid (86, 87). Recent studies proved that the substantial remodeling of
the EcR LBD is observed in the presence of bound synthetic ligand, affecting
essentially the region encompassing the -sheet, helices H6 and H7, and the loop
connecting helices H1 to H3. Fig. 11D shows a binding pocket of the DBH ligand and
an ecdysteroid as found in the EcR cavities and illustrates in Sl-EcR. Local resistance
to a pesticide could be attributed due to the variation in the active site. The binding
site of an enzyme to a pesticide may differ among different class of insects (89).
Fig.11 C) Three dimensional structure view of SlEcR and metal binding region.

ISBN:978-1-63315-205-2
654
Invited Review
Fig.11 C) Three dimensional structure view of SlEcR and metal binding region; D) Molecular
graphics program (PyMOL) used for the visualization for ligand binding domain (LBD- sky blue
color) and DNA binding domain (DBB- brown color) region of SlEcR; E) DNA binding domain
located in the helix 5, 6 and amino residues from E185-E256 visualization by using PyMOL
program. F) Ramanchandran plot; G) Hydrophobicity plot of SlEcR; H) Electrostatic surface
model of SlEcR.
A demonstration of the application of molecular docking studies in selecting a

suitable pesticide is explained here. The aim of this subject is to find the biopesticide
that is best in controlling the molting of a pest by inhibiting the activity of EcD
receptors by molecular docking studies. Plumbagin, Pyridalyl, Oleandrin, Methoxy
fenozide, Embelin, Chlorpyrifos and Malathion are the compounds which form the
basis for our research exercise and these are treated as the inhibitors of EcD receptor
being the target. Molecular modeling and docking software Schrodinger Suite of
programs were used to perform the experiments. From the table 4 it is evident that
Plumbagin to have the performed the best, followed by Pyridalyl and Oleandrin by a
closer margins. This is based on the glide score (GlideScore is an empirical scoring
function that approximates the ligand bnding free energy). Recall the concept of
molecular force fields, used to calculate energies of molecules. In the table, a split up
of various factors contributing to the binding energy of the complex is given. This is
calculated as a summation of over all the factors; this also results in the free energy as
well. It is evident from this exercise that the biopesticides to have the better G-score,
dock score from various selected compounds with its target EcD receptor (Fig.11I);
ISBN:978-1-63315-205-2
655
Invited Review
this forms the prelude to the further experimentation in the laboratory with the insects
as specimens. One way of extending this experiment by the entomologists is to
perform a growth characteristics study, along with other studies such as antifeedancy
of the selected compound by the applying the principles of Wauldbayer statistics with
suitable controls (90).
Fig.11I. Hydrogen bonding

interaction of Pumbagin (green)
with Ecdysone receptor.
Table 4. Molecular docking results of Spodoptura litura Ecdysone Receptor with various synthetic
and biopesticides performed using the Glide module of Schrodinger software.
Lipophilic
Ligand
Score
Evd W
Bond
Electro
Plumbagin
-4.99
-1.92
-0.81
Pyridalyl
-4.88
-3
Oleandrin
-4.44
-1.45
Methoxy
fenozide
-4.34
-2.66
-0.3
Embelin
-4.33
-2.44
-0.81
Chlorpyrifos
-4.14
-1.66
-0.29
Malathion
-3.45
-1.89
-0.5

ISBN:978-1-63315-205-2
Low
HB
Sitemap
MW
Penal
-0.38
-0.5
0.61
-0.67
-0.44
0.9
-1.22
-0.45
0.62
-0.31
-0.27
0.4
-0.74
-0.19
-0.5
0.74
-0.38
-0.33
0.32
-0.28
-0.33
-0.4
0.43
656
Invited Review
This way itbecomes easier for the agricultural scientist to come to a conclusion
about the binding affinity of the group of compounds which could pave way to
quicker confinement of the pest menace leading to losses. One interesting aspect of
this approach of this molecular docking is that; rather than testing all the available
selected compounds and waiting for the serendipity to happen by wet lab
experiments. Serendipity, because, this word has very good relationship with research
scientists. Testing all the compounds in wet lab means one has perform replicates and
wait for a compound at least among the group of compounds to have the desired
effect and be proven to be a candidate pesticide. This is not only tedious, wastage of
money, specimens, labor, time, etc. One huge disadvantage is that none of the
selected compounds may have the activity at all to resist the pests mnace by some
means; and the researcher is still continuing his experiments blindly. Therefore, to
overcome all these blind expenses of multiple fields, one will directly apply
molecular docking to his research and then could test the best promising compounds.
One may argue the need for target structure or sequence in the databases of a pest
selected for study. If sequence analysis and similarity studies branch of
bioinformatics, allows one to take the related or same target sequence of other related
species and perform further experiments. If one has the sequence then will have the
structure as well by applying the practices of structural bioinformatics.
Now, where will one go to perform all these experiments of bioinformatics? Is
there software available to go about these experiments? Is there a server or an expert
available to take care of these experiments? Bioinformatics is a field which heavily
depends on computers and internet, therefore, one can easily google across the
internet to find the weapons one want. If one is still unsuccessful, it can be found in
any of the ocean of standard bioinformatics textbooks.
6. Conclusion
Insect Bioinformatics is a discipline with wide applications, has the potential to
solve complex problems in various fields. Its strength lies on the fact, every living
organism has a biological product, nucleotides and amino acids; which form the base
for this discipline. Modern research targets genes and its related products for the
desired activities. There was two principal approaches strengthen all the studies in
Entomo-informatics. Primary is that of comparing and grouping the data according to
biologically meaningful similarities and secondly, that of analyzing one type of data
to conclude and understand the observations for another type of data. In short, which
are to understand and organize the information associated with biological molecules
on a large scale. As a result, Entomo-informatics has not only provided greater depth
to biological investigations, but added the new dimension of extensiveness as well.
Entomo-informatics is an approach that will be essential part of entomological
research and we hope that every entomologists/Researchers will incorporate more
ISBN:978-1-63315-205-2
657
Invited Review
Entomo-informatics/bioinformatics tools and approaches in their research projects.

We can also integrate data on protein functions; given that the particular protein folds
are often related to specific biochemical functions, these findings highlight the
diversity of metabolic pathways in different organisms. Entomo-informatics will
provide the glue with which all of these types of integration will occur. Finally,
docking algorithms could design molecules that could bind the model structure,
leading the way for biochemical assays to test their biological activity on actual
protein. By studying the evolution of insects one can regulate the pests; and also can
come up with new methodologies to understand the pesticide resistance; leading to
enhanced crop quality and productivity.
Acknowledgement
I thank the management of SRM University, for their encouragement, continuous
support to innovative research and academic activities, which resulted in this chapter. I thank
Dr. K. P. Sanjayan, Head, Dept. of Zoology, Guru Nanak College, Chennai, for his critical
review of this chapter; I thank Almighty, my family, Ashraf Ali, Sewali Ghosh, Pinky Sheetal
Vincent for their contributions to this chapter.
7. References
1.
National
Center
for
Biotechnology
Information
[NCBI]. A
Science
Primer:
Bioinformatics. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html (15 June 2011, date
last accessed).
2. National Institutes of Health [NIH]. NIH Working Definition of Bioinformatics and Computational
Biology. http://www.bisti.nih.gov (17 July 2000, date last accessed).
3. Mansfield, E. (1998) Academic Research and Industrial Innovation: An Update of Empirical Findings.
Research Policy 26: 773-776.
4. Stephen, H.D. (1970) Charles Babbage, Father of the Computer. Crowell-Collier Press. ISBN
0-02-741370-5.
5. Jane Demerica (2009) How the computer has changed through the years. Helium: 1580299.
6. Watson J.D. and Crick F.H.C. (1953) A Structure for Deoxyribose Nucleic Acid. Nature 171 (4356):
737738.
7. Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, J.C., Hutchison, C.A.,
Slocombe, P.M. et al. (1977) Nucleotide sequence of bacteriophage X174 DNA. Nature 265 (5596):
68795.
8. Hood, L. and Galas, D. (2003). The digital code of DNA. Nature 421: 444448.
9. Chial, H. (2008) DNA sequencing technologies key to the Human Genome Project. Nature Education
1(1).
10. Elaine R. Mardis. (2008) Next-Generation DNA Sequencing Methods. Annual Rev. Genomics and Human
Genetics, 9: 387-402.
11. Kedes, L., Liu, E.T. (2010) The Archon Genomics X PRIZE for whole human genome sequencing.
Nature Genetics, 42 (11): 917918.
ISBN:978-1-63315-205-2
658
Invited Review
12. Kedes, L., Campany, G. (2011) The new date, new format, new goals and new sponsor of the Archon
Genomics X PRIZE Competition. Nature Genetics 43 (11): 10551058.
13. Ouzounis, C.A. and Valencia, A. (2003). Early bioinformatics: the birth of a disciplinea personal view
Bioinformatics: Review. Bioinformatics, 19 (17): 2176-2190.
14. Derek, J. Smith (2003) Applications of bioinformatics and computational biology to influenza
surveillance and vaccine strain selection, Vaccine 21 (16): 1758-1761.
15. Buehler, L.K. and Rashidi, H.H. (2006) Review of bioinformatics basics: applications in biological
science and medicine. Biomed. Engg. Online. 5: 41.
16. Xue, J., Zhao, S., Liang, Y., Hou, C., Wang, J. (2008) Bioinformatics and its applications in agriculture:
Computer and Computing Technologies in Agriculture, Volume II. Springer, p.985 990.
17. Grimmelikhuijzen, C.J., Cazzamali, G., Williamson, C.M. and Hauser, F. (2007) The promise of insect
genomics. Pest Manag. Sci. 63: 413416.
18. Chilana, P., Sharma, A., and Anil Rai (2012) Insect Genomics Resources: Status, Availability and Future.
Current Sci. 102 (4): 25.
19. Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.
(2013) Genbank. NAR D36-42.
20. Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K., Saitou, N., Sugawara, H., et al. (2002)
DNA Data Bank of Japan (DDBJ) for genome scale research in life science. NAR 30 (1): 2730.
21. Stoesser, G., Baker, W., van den Broek, A., Camon, E., Garcia-Pastor, M., Carola Kanz et al (2002)
The EMBL Nucleotide Sequence Database. NAR 30(1): 2126.
22. Cochrane, G., Karsch-Mizrachi, I., and Nakamura, Y. (2010) The International Nucleotide Sequence
Database Collaboration. NAR DOI: 10.1093/nar/gkq1150.
23. Gilbert, D.G. (2007) DroSpeGe: rapid access database for new Drosophila species genomes. NAR Vol.
35: D480 485.
24. Marygold, S.J., Leyland, P.C., Seal, R.L., Goodman, J.L., Thurmond, J.R., Strelets, V.B., Wilson, R.J. and
the FlyBase Consortium (2013) FlyBase: improvements to the bibliography. NAR 41(D1):D751-D757.
25. Megy, K., Emrich, S.J., Lawson, D., Campbell, D. and Dialynas, E., et al. (2011) VectorBase:
improvements to a bioinformatics resource for invertebrate vector genomics. NAR, Vol. 40: D729D734.
26. Duan, J., Li, R., Cheng, D., Wei Fan and Zha, X., et al. (2010) SilkDB v2.0: a platform for silkworm
(Bombyx mori) genome biology. NAR, 38: D453D456.
27. Mita, K., Morimyo, M., Okano, K., Koike, Y., Nohata, J., et al. (2003) The construction of an EST
database for Bombyx mori and its application. Proc. Natl. Acad. Sci. USA. 100: 14121-14126.
28. Mita, K., Kasahara, M., Sasaki, S., Nagayasu, Y., Yamada, T., Kanamori, H. et al. (2004) The genome
sequence of silkworm, Bombyx mori. DNA Res. 11: 27-35.
29. Bose, B., Nagarajaram, H.A., Mita, K., Shimada, T., and Nagaraju, J. et al. (2005) SilkSatDb: a
microsatellite database of the silkworm, Bombyx mori. NAR, 33: D403D406.
30. Sunil, A., Eshwar, M., Sravana, K.P., Nagaraju, J. (2007) InSatDb: a microsatellite database of fully
sequenced insect genomes. NAR, 35: D3639.
31. Munoz-Torres, M.C., Reese, J.T., Childers, C.P., Bennett, A.K., Sundaram, J.P, Childs, K.L., Anzola,
J.M, Milshina, N., Elsik, C.G. (2011) Hymenoptera Genome Database: integrated community resources
for insect species of the order Hymenoptera. NAR 39: D658-D662.
32. Qin, X., Evans, J.D., Aronstein, K.A., Murray, K.D., Weinstock, G.M. (2006) Genome sequences of the
honey bee pathogens Paenibacillus larvae and Ascosphaera apis. Insect Mol. Biol. 15(5):715-8.
33. Cornman, R.S., Chen, Y.P., Schatz, M.C., Street, C., Zhao, Y., et al (2009) Genomic analyses of the
microsporidian Nosema ceranae, an emergent pathogen of honey bees. PLoS Pathog. 5(6):e1000466.
ISBN:978-1-63315-205-2
659
Invited Review
34. Wu, C.H., Yeh, L-S.L., Huang, H., Arminski, L., Castro-Alvear, J., et al. (2003) The Protein Information
Resource. NAR, 31: 345347
35. Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement
TrEMBL in 2000. NAR 28:45-48.
36. The UniProt Consortium. (2013) Update on activities at the Universal Protein Resource (UniProt) in
2013. NAR 41: D43-D47.
37. Michele Magrane and UniProt Consortium (2011) UniProt Knowledgebase: a hub of integrated protein
data. Database: The Journal of Biological Databases and Curation, bar009, Oxford University Press.
38. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., et al. (2012) The Pfam protein family
database. NAR 40: D290-D301.
39. Sigrist, C.J.A., de Castro, E., Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L., Xenarios, I.
(2012) New and continuing developments at PROSITE. Nucleic Acids Research, 14.
40. Zdobnov, E.M., and Apweiler, R. (2001) InterProScan an integration platform for the
signature-recognition methods in InterPro. Bioinformatics 17 (9): 847-848.
41. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer Jr., E.E., Brice, M.D., Rodgers, J.R., Kennard, O.,
Shimanouchi, T., Tasumi, T.(1977) The Protein Data Bank: A Computer-based Archival File For
Macromolecular Structures. J. Mol. Biol. 112: 535.
42. Murzin, A.G., Brenner, S.E., Hubbard, T.J.P., Chothia, C. (1995) SCOP: a structural classification of
proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536-540.
43. Sillitoe, I., Cuff, A.L., Dessailly, B.H., Dawson, N.L., Furnham, N., Lee, D., et al. (2013) New functional
families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. NAR
D490-498.
44. Roger Sayle and James Milner-White, E. (1995) RasMol: Biomolecular graphics for all. Trends in
Biochemical Sci. 20 (9): 374.
45. Johansson, M.U., Zoete, V., Michielin, O. and Guex, N. (2012) Defining and searching for structural
motifs using DeepView/Swiss-PdbViewer. BMC Bioinformatics 13:173.
46. Wang, Y., Geer, L.Y., Chappey, C., Kans, J.A., Bryant, S.H (2000) Cn3D: sequence and structure views
for Entrez. Trends Biochem. Sci. 25(6): 300-302.
47. David W. Mount (2004) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory
Press.
48. Saul, B.N., and Christian, W.D. (1970) A general method applicable to the search for similarities in the
amino acid sequence of two proteins. J.Mol. Biol. 48 (3): 44353.
49. Temple, S.F. and Michael, W.S. (1981) Identification of Common Molecular Subsequences. J. Mol. Biol.
147: 195197.
50. Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., W. and Lipman, D.J. (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25:3389-3402.
51. Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl.
Acad. Sci. USA 85(8): 2444-2448.
52. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978) A model of evolutionary change in proteins. Atlas of
Protein Sequence and Structure 5 (3): 345352.
53. Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl.
Acad. Sci. USA 89 (22): 109159.
54. Grg, A., Weiss, W., Dunn, M.J. (2004) Current two-dimensional electrophoresis technology for
proteomics. Proteomics, 4(12): 3665-3685.

ISBN:978-1-63315-205-2
660
Invited Review
55. Bae, S.H., Harris, A.G., Hains, P.G., Chen, H., Garfin, D.E., Hazell, S.L., Paik, Y.K., Walsh,
B.J., Cordwell, S.J. (2003) Strategies for the enrichment and identification of basic proteins in proteome
projects. Proteomics, 3(5): 569-579.
56. Blueggel, M., Chamrad, D. and Meyer, H.E. (2004) Bioinformatics in proteomics. Curr. Pharm.
Biotechnol. 5: 79-88.
57. Carl R. Woese, Otto Kandler and Mark L. Wheelis (1990) Towards a natural system of organisms:
Proposal for the domains Archaea, Bacteria and Eucarya. Proc. Natl. Acad. Sci. Vol. 87, pp. 4576-4579,
June 1990, Evolution.
58. Sokal, R. and Michener, C. (1958) A statistical method for evaluating systematic relationships. Uni.
Kansas Science Bulletin 38: 14091438.
59. Saitou, N., and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Mol. Biol. and Evol. 4: 406-425.
60. Yang, Z. (2007) PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24 (8):
1586-1591.
61. Wernegreen, J.J. (2002) Genome evolution in bacterial endosymbionts of insects. Nature Rev. 3:850-861.
62. Thao, M.L., Moran, N.A., Abbot, P., Brennan, E.B., Burckhardt, D.H., Baumann, P. (2000) Cospeciation
of psyllids and their primary prokaryotic endosymbionts. Appl. Environ. Microbiol. 66:2898-2905.
63. Thao, M.L., Moran, N.A., Abbot, P., Brennan, E.B., Burckhardt, D.H., Baumann, P. (2004) Evolutionary
relationships of primary prokaryotic endosymbionts of whiteflies and their hosts. Appl. Environ.
Microbiol. 70:3401-3406.
64. Beckenbach, A.T., Joy, J.B. (2009) Evolution of the mitochondrial genomes of gall midges (Diptera:
Cecidomyiidae): rearrangement and severe truncation of tRNA genes. Genome Bio. Evol. 1(1): 278-287.
65. Lewis, D.L., Farr, C.L., Kaguni, L.S. (1995) Drosophila melanogaster mitochondrial DNA: completion of
the nucleotide sequence and evolutionary comparisons. Insect Mol. Biol. 4(4): 263-278.
66. Hua, J.M., Li, M., Dong, P.Z., Cui, Y., Xie, Q., Bu, W.J. (2008) Comparative and phylogenomic studies
on the mitochondrial genomes of Pentatomomorpha (Insecta: Hemiptera: Heteroptera). BMC Genomics,
9: 610.
67. Shao, R., Barker, S.C. (2003) The highly rearranged mitochondrial genome of the plague thrips, Thrips
imagines (Insecta: Thysanoptera): convergence of two novel gene boundaries and an extraordinary
arrangement of rRNA genes. Mol. Biol. Evol. 20:362-370.
68. Shao, R., Campbell, N.J.H., Schmidt, E.R., Barker, S.C. (2001) Increased rate of gene rearrangement in
the mitochondrial genomes of three orders of Hemipteroid insects. Mol. Biol. Evol. 18:1828-1832.
69. Loxdale, H.D., Lushai, G. (1998) Molecular markers in entomology. Bulletin of Entomol. Res. 88:
577600.
70. Avise, J.C. (2004) Molecular Markers, Natural History, and Evolution, 2nd edn, pp. 684. Sinauer
Associates, Sunderland, Massachusetts.
71. Avise, J.C. (2000) Phylogeography: the History and Formation of Species. Harvard University Press,
Cambridge, Massachusetts.
72. Severson, D.W., Brown, S.E., Knudson, D.L. (2001) Genetic and physical mapping in mosquitoes:
molecular approaches. Annual Rev. Entomol. 46: 183219.
73. Heckel, D.G. (2003) Genomics in pure and applied entomology. Annual Rev. Entomol.48: 235260.
74. Ware, G.W. (1982) Pesticides: Theory and Application. Thompson publications, Fresno, California.
p.380.
75. Dubey, N.K., Ravindra Shukla, Ashok Kumar, Priyanka Singh and Bhanu Prakash. (2010) Prospects of
botanical pesticides in sustainable agriculture: Commentary. Current Sci. 98 (4): 25.
ISBN:978-1-63315-205-2
661
Invited Review
76. Kendrew, J.C., Bodo, G., Dintzis, H.M., Parrish, R.G., Wyckoff, H., and Phillips, D.C. (1958) A
Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis. Nature 181 (4610):
662666.
77. Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. New York: Garland Publishing
Company.
78. Chou, P.Y, Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid
sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47: 45148.
79. Garnier, J, Gibrat, J.F., Robson, B. (1996) GOR method for predicting protein secondary structure from
amino acid sequence. Methods Enzymol. 266:540-53.
80. Holley, L.H. and Karplus, M. (1989) Protein secondary structure prediction with a neural network. Proc.
Natl. Acad. Sci. USA, 86 (1):152-156.
81. Attwood, T.K. (1999) Introduction to Bioinformatics. Pearson Education India.
82. Arnold, K., Bordoli, L., Kopp, J. and Schwede, T. (2006) The SWISS-MODEL Workspace: A web-based
environment for protein structure homology modeling. Bioinformatics, 22:195-201.
83. Sali, A., Blundell, T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol.
Biol. 234: 779-815.
84. Krieger, E., Nabuurs, S.B. and Vriend, G. (2005) Homology Modeling, in Structural Bioinformatics,
Vol.44 (eds P. E. Bourne and H. Weissig), John Wiley & Sons, Inc., Hoboken, NJ, USA.
85. Andrew Leach. (2001) Molecular Modelling: Principles and Applications (2nd Edition). Pearson Prentice
Hall.
86. Zotti, M.J., Christiaens, O., Rouge, P., Grutzmacher, A.D., Zimmer, P.D. and Smagghe, G. (2012)
Sequencing and structural homology modeling of the ecdysone receptor in two chrysopids used in
biological control of pest insects. Ecotoxicology, 21:906-918.
87. Hill, R.J., Billas, I.M.L., Bonneton, F., Graham, L.D., Lawrence M.C. (2013) Ecdysone Receptors: From
the Ashburner Model to Structural Biology. Annu. Rev. Entomol. 58: 251271.
88. Malik, F.A., Reddy, S. and Venketesh, S. (2010) Sequences analysis and 3D structurue prediction of
ecdysone receptor protein in silkworm, Bombyx mori L. Indian J. Seri. 49(1): 17-27.
89. Habeeb, S.K.M., Anuradha, V. and Praveena, A. (2011) Comparative Molecular Modeling of Insect
Glutathione S-Transferases. Intl. J. Computer Applications 14(5):1622.
90. Waldbauer, G.P. (1968) The Consumption and Utilization of food by insects, pp. 229-288. In J. W.L.
Beament, J.E. Treherne & V.B. Wigglesworth (des), Adv. Insect Physiol. Vol. 5, Academic, New York.
Article History:
Reviewed by:
Received 5th July 2013; Revised 15th December 2013; Accepted 10th April 2014 and
Published 30th Oct. 2014
Brintha, P.G, Kansas State University, USA.
Siva Ramamoorthy, VIT University, India.

ISBN:978-1-63315-205-2
662
Table Contents
SION
MIS
TERNA
IN
T
N AL B OO
IO
Page No.
Preface
Forward message
Contributors
Reviewers
Acknolwedgement
i
ii
iii
iv
v
Volume1
Section I: Insect Biochemical approaches
1. Introduction to Insect Molecular Biology.
Raman Chandrasekar, P.G., Brintha, Enoch Y.Park, Paolo Pelsoi, Fei Liu,
Marian Goldsmith, Anthony Ejiofor, B.R., Pittendrigh, Y.S., Han,
Fernando G. Noriega, Manickam Sugumaran, B.K., Tyagi, Zhong Zheng Gui,
Fang Zhu, Bharath Bhusan Patnaik, and P. Michailova
2.
Modulation of Botanicals on pests biochemistry.
57
Sahayaraj, K.
3.
Detoxication, stress and immune responses in insect antenna:

new insights from transcriptomics.
75
David Siaussat, Thomas Chertemps and Martine Maibeche
4.
Application of isotopically labeled compounds and tandem mass

spectrometry for studying metabolic pathways in mosquitoes.
99
Stacy Mazzalupo and PatriciaY.Scaraffia
5.
Field Response of Dendroctonus armandi Tsai & Li (Coleoptera:

Scolytinae) to Synthetic Semiochemicals in Shaanxi, China.
127
Shou-An Xie, Shu-Jie L.V., Hui-Chen, Raman Chandrasekar
xvii
Section II: Insect Growth
6. Insect Cuticular SclerotizationHardening Mechanisms and Enzymes.
149
Manickam Sugumaran
7. New Approaches to Study Juvenile Hormone Biosynthesis in Insects.
185
Crisalejandra Rivera-Perez, Marcela Nouzova and Fernando G. Noriega
8. The regulatory biosynthetic pathway of juvenile hormone.
217
Zhentao Sheng and Raman Chandrasekar

Section III:
Insect Immunity
9. The innate immune network in a hemimetabolous insect, the brown

planthopper, Nilaparvata lugens.
233
Yanyuan Bao, Raman Chandrasekar, Chuan-Xi Zhang
10. Immune Pathways in Anopheles gambiae.
253
Maria L. Simes and Raman Chandrasekar
11. Key biochemical markers in silkworms challenged with immuno-
271
elicitors and their association in genetic resistance for survival.
Somasundaram, P., Chandraskear, R., Kumar,K.A., and Manjula, A.

Section IV:
Insect Molecular Genetics
12. The recent progress of the W and Z chromosome studies of the
291
silkworm, Bombyx mori
Hiroaki Abe, Tsuguru Fujii and Raman Chandrasekar
13. Molecular characterization and DNA barcoding for identification of
317
agriculturally important insects.
Rakshit Ojha, Jalali, S.K., and Venkatesan, T.
14. Polytene chromosomes and their significance for Taxonomy,
331
Speciation and Genotoxicology
Paraskeva V. Michailova
15. Insect exuvium extracted DNA marker: a good complementary

molecular taxonomic characteristics with special reference
to mosquitoes.
355
Dhanenjeyan, K. J., Paramasivam, R., Thanmozhi, V., Chandrasekar,R., and Tyagi, B.K.
Index
363
xviii
Volume2
Section V:
Molecular Biology of Insect Pheromones
16. Understanding the functions of sex-peptide receptors?
373
Orly Hanin, Ada Rafaeli
17. Current views on the function and evolution of olfactory receptors
385
in Lepidoptera.
Arthur de Fouchier, Nicolas Montagn, Olivier Mirabeau, Emmanuelle Jacquin-Joly
18. Molecular architecture, phylogeny and biogeography of pheromone
409
biosynthesis and reception genes / proteins in Lepidoptera.
Jian-Cheng Chang, P. Malini, R. Srinivasan
Section VI:
Insect Molecular Biology
19. Application of Nanoparticles in sustainable Agriculture :
429
Its Current Status.
Atanu Bhattacharyya , Raman Chandrasekar, Asit Kumar Chandra,

Timothy T. Epidi and Prakasham, R.S.
20. Mosquito Ribonucleotide Reductase: A Site for Control.
449
Daphne Q.-D. Pham, Victor H. Perez, Lissette Velasquez, Dharty Bhakta,

Erica L. Berzin, Guoli Zhou, and Joy. J. Winzerling.
21. Green protocol for synthesis of metal nanoparticles

to control insect pests.
473
Murugan, K., Chandrasekar, R., Panneerselvam, C., Naresh Kumar, A.,

Madhiyazhagan, P., Mahesh Kumar, P., Jiang-Shiou Hwang, Jiang Wei
22. Aquaporins in Blood-Feeding Arthropods.
497
Lisa L. Drake, Hitoshi Tsujimoto, Immo A. Hansen
23. Mimetic analogs of three insect neuropeptide classes
509
for pest management.
Ronald J. Nachman
xix
Section VII:
Insect Pest Management through

Biochemical and Molecular approaches
24. Induced resistance in plants against insect pests and
533
counter-adaptation by insect pests.
Abdul Rashid War and Hari C Sharma
25. Insect Chemical communication - an important component of
549
novel approaches to insect pest management.
Usha Rani, P.
26. Mosquito control using biological larvicides: Current Scenario.
575
Subbiah Poopathi, C. Mani and R. Chandrasekar
27. Application of RNAi toward insecticide resistance management.
595
Fang Zhu, Yingjun Cui, Douglas B. Walsh, Laura C. Lavine
Section VIII:
Insect Bioinformatics
28. Entomo-informatics: A prelude to the concepts in Bioinformatics.
621
Habeeb, S.K.M. and Raman Chandrasekar
29. Molecular expression and structure-function relationships of
633
apolipophorin III in insects with special reference to innate immunity.
Bharat Bhusan Patnaik, Raman Chandrasekar, Yeon Soo Han
30. Computer-aided pesticide design: A short view
685
Jitrayut Jitonnom
Index
709
xx
ISBN No. 978-1-63315-205-2 (USA)
First Edition: Volume 1, 2 October 2014

Total No. Pages: 398 + 372 = 770
Edited by Raman Chandrasekar

B.K. Tyagi
Zhong Zheng Gui
Gerald R. Reeck
Copyright Reserved
Published by International Book Mission, Academic Publisher, South India.
Printed in the K-State Union, Copy and Printing services,

Kansas State University, Manhattan 66506, KS, USA.
This publication is considered to provide accurate and authoritative information with regards to the
subject matter has been obtained by its authors. The publisher has taken reasonable care in the
preparation of this book volume. However, the publisher and its authors shall in no event be liable for
any errors or omission arising out of use of this information and specifically disclaim any implied
warranties or merchantability or fitness for any particular use. No part of these books may be
reproduced, stored in retrieval system or transmitted in any form or by any means, electronic,
mechanical, photocopying or otherwide, without the prior permission of the Copyright owner.
Application for such permission, with a statement of the purpose and extend for the reproduction,
should be addressed to the publisher (IBM, Academic Publisher).
Price: US $ 250
Distributors and Subscription:
International Book Mission

Academic Publisher,
Nachiyar Silk and Printing House
76 Circuit House Road, 10th Cross NMK Colony
Tiruchirappalli 620 020, Tamilnadu,
South India.
Email: ibm_secretary@yahoo.com
ibmpublisher@gmail.com
ibmpublisher@yahoo.com
Tel. +91-431-2311187
Tel. +1-859-608-7694 (USA)
Book Mission Project # 2: Initiated on June 2010; Completed on March 2014 and Published on Oct. 2014.
Volume 1 & 2, October 2014
Short Views on Insect Biochemistry

and Molecular Biology
PREFACE
Entomology as a science of inter-depended branches like biochemistry, molecular entomology, insect
biotechnology; has made rapid progress in its attributes in the light of modern discoveries. This also
implies that there is an urgent need to manage the available resources scientifically for the good of man.
In the past five decades, entomology in the world/country has taken giant steps ahead. Continued
research has evolved better pest management through molecular approaches. The aim of the Short
Views on Insect Biochemistry and Molecular Biology book is to integrate perspectives across
biochemistry and molecular biology, physiology, immunology, molecular evolution, genetics,
developmental biology and reproduction of insects. This century is proclaimed as the Era of
Biotechnology and its consists of all types of Mol-Bio applications, which is an essential component for
a through understanding of the Insect Biology. This volume 1 & 2 (8 section with 30 chapters)
establishes a thorough understanding of physiological and biochemical functions of proteins, genes in
insects life processes; the topics dealt with in the individual chapters include chemistry of the insect
cuticle, hormone and growth regulators; biochemical defenses of insects; the biochemistry of the toxic
and detoxification action; modern molecular genetics and evolution; inter- and intra-specific chemical
communication and behavior; insect pheromone and molecular architecture, phylogeny and chemical
control of insect by using insect pheromones biotechnology; insect modern biology and novel plant
chemical and microbial insecticides for insect control, followed by a discussion of the various
mechanisms of resistance (both behavioral and physiological) and resistance management; modern insect
pest management through biochemical and molecular approaches; Mimetic analogs of insect
neuropeptide for pest management; entomo-informatics and computer-aided pesticide designing. In short
this book provides comprehensive reviews of recent research from various geographic areas around the
world and contributing authors area recognized experts (leading entomologist/scientist) in their
respective filed of molecular entomology. We will miss this collaboration now it has ended, but will feel
rewarded if this book is appreciated by our team/colleagues and remarkable mile stone in entomology
field.
This book emphasizes upon the need for and relevance of studying molecular aspects of entomology in
Universities, Agricultural Universities and other centers of molecular research. To encompass this
knowledge and, particularly disseminate it to the scientific community free of cost, was the major
inspiring force behind the launch of Short Views on Insect Biochemistry and Molecular Biology.
Editors
Raman Chandrasekar
Brij Kishore Tyagi
ii
iii
iv
vi
ShortViewson
InsectBiochemistryand
MolecularBiology
Editedby
Raman Chandrasekar, Ph.D.,
Kansas State University, USA.
B.K.Tyagi, Ph.D.,
Centre for Research in Medical Entomology (ICMR), India.
Zhong Zheng Gui, Ph.D.,
Jiangsu University of Science and Technology,
Sericultural Research Institute, Chinese Academy of
Agricultural Sciences, China.
Gerald R. Reeck, Ph.D.,
Kansas State University, USA.
vii
viii
SION
MIS
TERNA
IN
T
N AL B OO
IO
Contributing Authors
Dr. B.K.Tyagi
Prof.Fernando G. Noriega
Centre for Research in Medical Entomology,

4Sarojini Street, Chinna Chokkikulam,
Madurai 625002 (TN), India
Department of Biological Sciences

HLS 227, Florida International University
11200 SW 8th St, Miami, FL 33199, USA.
Prof. Gui Zhongzheng
Dr. Zhentao Sheng
Sericultural Research Institute,

Chinese Academy of Agricultural
Sciences, Zhenjiang, 212018,
Jiangsu, P. R. China.
Chicogo University, Chicogo, USA.
Prof. K. Sahayaraj
Prof.Yanyuan Bao
Institute of Insect Science,
Zhejiang University, China.
Dept. of Advanced Zoology and Biotechnology,

St. Xavier's College
Palayamkottai 627 002, Tamil Nadu, India.
Prof. Chuan-Xi Zhang,
Prof. David Siaussat
Dr. Maria L. Simes
Universit Pierre et Marie Curie (Paris 6/UPMC),

UMR 1272A Physiologie de l'Insecte:
Signalisation et Communication (PISC),
7 Quai Saint Bernard, Batiment A - 4me tage bureau 410, 75252 Paris Cedex 05, France.
Prof. PatriciaY.Scaraffia
Department of Tropical Medicine,
Tulane University, New Orleans,
LA 70112, USA.
Prof. Shou-An Xie
Institute of Insect Science,

Zhejiang University, China.
UEI Parasitologia Mdica,

Centro de Malria e Outras Doenas Tropicais,
Instituto de Higiene e Medicina Tropical,
Rua da Junqueira 96, 1300 Lisboa,
Portugal.
Dr. P. Somasundaram
Central Sericultural Germplasm Resources Centre,
P.B.No.44, Thally Road,
Hosur-635109,
Tamilnadu, India.
College of Forestry,
Northwest A & F University
Yangling, Shaanxi 712100, China
Dr. Hiroaki Abe
Dr. Raman Chandrasekar
Dr. S.K. Jalali
Department of Biochemistry and Molecular

Biophysics, Kanas State University,
Manhattan, 66506, KS, USA.
Prof. Gerald R. Reeck

Department of Biochem. and Molecular
Biophyscis, Kansas State University, KS, USA.
Prof. Manickam Sugumaran

Department of Biology
University of Massachusetts Boston
100 Morrissey Blvd,
Boston, MA 02125, USA.
Tokyo University of Agriculture and Technology,

Japan.
National Bureau of Agriculturally Important

Insects, ICAR, India.
Prof. Paraskeva V. Michailova

Institute of Biodiversity and
Ecosystem Research,
1 Tzar Osvoboditel boulv
Bulgarian Academy of Sciences
Sofia 1000, Bulgaria.
Prof. Ada Rafaeli

Associate Director for Academic Affairs &
International Cooperation
Agricultural Research Organization,
The Volcani Center, P. O. Box 6,
Bet Dagan 50250, Iseral.
ix
Prof. Emmanuelle Jacquin-Joly
Dr. Fei Liu
UMR PISC Physiologie de l'insecte

INRA, Route de Saint-Cyr
78026 Versailles cedex, France..
Department of Biological Science & Technol.,

Shaanxi Xueqian Normal University,
Shaanxi, China.
Dr. R. Srinivasan
Prof. Marian Goldsmith
Entomologist and Head of Entomology Group

AVRDC-The World Vegetable Center
60 Yi Ming Liao, Shanhua
Tainan 74151, Taiwan.
Biological Sciences Department,

University of Rhode Island,
Kingston, RI 02881, USA
Prof. Atanu Bhattacharyya
Prof. Anthony Ejiofor
Vidyasagar College for Women,

Post Graduate Department of Environmental
Science,
University of Kolkata, India.
Department of Biological Sciences,

College of Agriculture, Human & Natural
Sciences, Tennessee State University,
3500 John A Merritt Blvd., Nashville,
Tennessee 37209, USA.
Prof. Daphne Q.-D. Pham
Dr. Bharath Bhusan Patnaik
Dept of Biological Sciences,

University of Wisconsin-Parkside,
900 Wood Road, Kensoha,
WI 53144, USA.
School of Biotechnology,
Trident Academy of Creative Technology
(TACT), Bhubaneswar 751013 Odisha, India.
Prof. Jitrayut Jitonnom
Prof. B.R. Pittendrigh
School of Science
University of Phayao, Thailand.
Department of Entomology,
University of Illinois, Urbana-Champaign, IL,
61801, USA
.
Prof. K. Murugan
Dr. Subbiah Poopathi
Department of Zoology, School of Life Sciences,

Bharathiar University,
Coimbatore - 641 046, India.
Prof. Immo A. Hansen

Department of Biology,
New Mexico State University,
Las Cruces, NM, USA.
Dr. Ronald J. Nachman

USDA-ARS,
Food Animal Protection Research Laboratory,
USA.
Dr. Hari C Sharma

International Crops Research Institute for the
Semi-Arid Tropics (ICRISAT), Patancheru502324,
Andhra Pradesh, India.
Prof. Paolo Pelsoi

State Key Laboratory for Biology Plant Diseases
and Insect Pests, Institute of Plant Protection,
Chinease Academy of Agricultural Sciences,
Bejing, China.
Unit of Microbiology and Immunology,

Vector Control Research Centre
(Indian Council of Medical Research),
Medical complex, Indira Nagar,
Puducherry 60 5006, India.
Dr. P.Usha Rani

Biology and Biotechnology Division
Indian Institute of Chemical Technology
(CSIR)Taranaka,
Hyderabad - 500 007 (AP), India.
Dr. Fang Zhu

Irrigated Agriculture Research and Extension
Center, Dept.of Entomology,
Washington State University,
Prosser, WA, USA.
Prof. S.K.M. Habeeb

Department of Bioinformatics,
Faculty of Engineering & Technology,
SRM University, Kattankulathur,
Chennai 603203, Tamilnadu, India.
Prof. Yeon Soo Han

Division of Plant Biotechnology,
College of Agriculture & Life Science,
Chonnam National University,
Gwangju 500-757, South Korea
SION
MIS
TERNA
IN
T
N AL B OO
IO
Reviewer & External supportive members
Prof. Michael Riehle, Department of Entomology, University of Arizona, USA.

Dr. Dawn L.Geiser, College of Agriculture and Life Sciences, University of Arizona, USA.
Prof. Young Jung Kwon, School of Applied Biosci., Kyungpook National University, South Korea.
Dr. Kaliappandar Nellaiappan, CuriRx Inc. USA.
Prof. Patricia Y. Scaraffia, Department of Tropical Medicine, Tulane University, USA.
Prof. Richard Newcomb, Plant & Food Research, University of Auckland, New Zealand.
Dr. S. Krishnaswamy, School of Biotechnology, Madurai Kamaraj University, South India.
Dr. Mary-Anne Hartley, University of Lausanne, Switzerland.
Dr. Igor F. Zhimulev, Institute of Molecular and Cellular Biology, Novosibirsk, Russia.
Dr. S. Subramanin, Indian Agricultural Research Institute. India.
Prof. Gustavo F. Martins, Departament de Biologia Geral, Universidade Federal de Vicosa, Brazil.
Prof. Helena Janols, Infektionsklinien, Skanes Universitetsisjukhus, Sweden.
Prof. Donald R.Barnard, USDA, Agricultural Research Service, CMAVE, USA.
Dr. Keith White, Faculty of Life Science, University of Manchester, UK.
Prof. Marten J.Edwards, Biology Department, Muhlenberg College, USA.
Prof. E. Warchalowska-Sliwa, Polish Academy of Sciences, Poland.
Dr. K. Balakrishnan, Department of Immunology, Madurai Kamaraj University, India.
Dr. J.Joe Hull, USAD-ARS, Arid Land Agricultural Research Centre, USA.
Dr. Neil Audsley, The Food & Environment Research Agency, UK.
Dr. Raman Chandrasekar, Kansas State University, USA.
Dr. B.K. Tyagi, Centre for Research in Medical Entomology (ICMR), Madurai, TN, India.
Prof. Zhongzheng Gui, Sericulture Research Institute, Chinese Academy of Agricultural Sci., China.
Dr. Fang Zhu, Irrigated Agril. Research and Extension Center, Washington State University, USA.
Prof. K. Murugan, Department of Zoology, Bharathiar University, Coimbatore, India.
Dr. Xiao-Wei Wang, Institute of Insect Science, Zhejiang University, China.
Dr. Haijun Xu, Institute of Insect Science, Zhejiang University, China.
Dr. Alisha Anderson, CSIRO Ecosystem Sciences, Australia.
Prof. Eric D.Dodds, Department of Chemistry, University of Nebraska-Lincoln, USA.
Prof. P. Mosae Selvakumar, Department of Chemistry, Karnaya University, Coimbatore, India.
Prof. A.K.Dikshit, Indian Agriculture Research Institute, New Delhi.
Prof. K.R.S. Sambasiva Rao, Dept. of Biotech. & Zoology, Acharya Nagarjuna University, India
Dr. R. Rangeshwaran, National Bureau of Agriculturally Important Insects, Banglore, India.
Dr. V. Selvanarayanan, Faculty of Agriculture, Annamalai University, Tamil Nadu, India.
Prof. Fernando G. Noriega, Florida International University, Miami, USA.
Prof. Ada Rafaeli, Department of Food Quality and Safety, A.R.O., Israel.
Prof. Daphne Q.-D. Pham, Dept. of Biological Sciences, University of Wisconsin-Parkside, USA.
Prof. Emmanuelle Jacquin-Joly, INRA, UMR 1272 Physiologie de lInsecte, Versailles, France.
Prof. Manickam Sugumaran, University of Massachusetts Boston, USA.
Prof. Nannan Liu, Auburn University, USA.
Prof. Michihiro Kobyashi, Nagoya University, Japan.
Prof. Enoch Y.Park, Innovative Joint Research Center, Shizuoka University, Japan.
Prof. Luiz Paulo Moura ANDRIOLI, Universidade de So Paulo, SP - Brazil
Prof. SHIMADA Toru, The University of Tokyo, Japan.
Prof. Erjun Ling, Institute of Plant Physiology and Ecology, China.
xi
xii
Acknowledgements
Writing and publishing a book requires the assistance of individuals who are
creative, talented, and hard-working. All of these qualities were present in the
individuals assembled to produce this book volume. I would like to express my
heartfelt gratitude to my former teacher Prof. Seo Sook Jae, (GSNU, South Korea),
Prof. Subba Reddy Palli (University of Kentucky, USA), and other external mentors
Prof. Marian R. Goldsmith (University of Rhode Island, USA), Prof. Enoch Y. Park
(Shizuoka University, Japan), Prof. M. Kobayashi (Nagoya University, Japan), Prof.
CHU Jang Hann (National University of Singapore, Singapore), Prof. Thomas W.
Sappington (USDA-ARS, USA), Prof. Fernando G. Noriega (Florida International
University, USA), Dr. Srinivasan Ramasamy, AVRDC, The World Vegetable
Center, Taiwan), Dr. H.C. Sharam (ICRISAT, India), who inspiration and
supported me at many ways for the commencement of this International Book
Mission Program. The book mission program was initiated on May 2010,
completed on March 2014 and published on October 2014. I have no words to
express my feeling for all those who provided valuable contributions from USA,
South Korea, Japan, China, India, Thailand, Taiwan, Bulgaria, France, Iseral, and
Portugal (Contributors name list, see page no. v) and made the completion of this
book possible. We express our appreciation to the following people (Reviewer
name list, see page no. vii) who reviewed various part of the manuscript as it was
being developed and improved quality of each chapter. I thank the ICMR, New
Delhi, and Chinese Academy of Agricultural, China, and Kansas State University for
support from several aspects. Many others (scientists and publishers) have also
allowed us to use their materials in the various chapters, their color image have then
been converted to gray color/BW. Iam especially indebted to International Book
Mission Organization, Academic Publishing Services for the production of book. I
thank my Co-Editors for their continuous vigilance over the book project and for
always giving advance notice of the editing and proofreading schedules. I thank also
my Brintha, P.G., (my wife), who in all possible way, encouragement helped
transform our original efforts into an acceptable final form. I apologize to those
whose work could not be cited owing to space considerations limitation. Further, I
wish to recognize the moral support extended by colleagues and friends. I hope that
this volume will inspire interest on the diverse aspects of insect biochemistry and
molecular biology in aspiring and established scientists.
Raman Chandrasekar
xiii
xiv
A Note from the Publisher

Dear Readers,
This edition represents the first number of the Short Views on Insect
Biochemistry and Molecular Biology book series published by International
Book Mission. It serves to show the public how important entomology field in
expanding basic knowledge or in the development of new technologies nowadays,
in virtually all fields of knowledge. We called for piece of work falling into two
volumes (Basic and Advance aspects).
Far from being complete, the 30 chapters clearly structured and simply explained
experts contributions may provide an overview about current and prominent
advances in insect biochemistry and molecular biology which will help students and
researchers to broaden their knowledge and to gain an understanding of both the
challenges and the opportunities behind each approach.
We look forward to receiving new proposals for the new edition 2015 - 2017.
International Book Mission
Academic Publisher
Manager
xv
Book Series
xvi
xvii
xviii

tmp5002 TMP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

tmp5002 TMP

Uploaded by

Copyright:

Available Formats

Printed in the Unitated States of America, 2014

Printed in the Unitated States of America, 2014

Vol. (2) 621 662, 2014

Entomo-informatics: A prelude to the concepts in

Department of Bioinformatics, School of Bioengineering, Faculty of Engineering & Technology,

Entomo-informatics actually is as a scientific discipline and plays an essential role in todays

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

an emerging interdisciplinary field,

fields were completed (3). The universe

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

3. Insect Specific Databases

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

designed to provide access to genomes, transcripts & transcriptomes, proteins &

Description URL address

National Center for Biotechnology

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Depending upon the type of data stored, a database can be classified in to

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

referencing to more than 120 databases provides access to additional relevant

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

ancestor. Sequences are considered to be orthologous if a homologous sequence is

A sequence can be of any length; therefore, choosing a proper method to do this

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Fig.6, explains the blast practice of finding similar sequence to a query

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Fig. 7. A) A multiple sequence alignment of 12 Cytochrome Oxidase Subunit I sequences of

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

5.2. Computational proteomics

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

using Mass Spectrometry (MS). MS provides a high-throughput approach for

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

In UPGMA (58) method pairwise evolutionary distance is calculated, which is

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

from evolutionary perspectives. MEGA version have produced the foundation of a

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

5.5. Structural Bioinformatics

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

consumers and farmers to reduce or eliminate synthetic pesticides in agriculture (75).

Printed in the United States of America, 2014

Short Views on Insect Biochemistry and Molecular Biology Vol.(1), 2014

Table 3. Commercially available insect growth regulators (IGR)

Juvenile Hormone Mimic

Juvenile Hormone Mimic