You are on page 1of 9

Biobytes Vol 5, July - 2009

21. Takahashi, K., Yugi, K., Hashimoto, K., Yamada, Y., Pickett, C.J., and Tomita, M. 2002.
Computational Challenges in Cell Simulation: A Software Engineering Approach. IEEE
Intelligent Systems 17: 64-71.
22. Tomita, M. 2001. Whole-cell simulation: a grand challenge of the 21st century. Trends
Biotechnol 19: 205-210.
23. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T., Matsuzaki, Y., Miyoshi, F., Saito, K.,
Tanida, S., Yugi, K., Venter, J.C., et al. 1997. E-CELL: Software Environment for Whole Cell
Simulation. Genome Inform Ser Workshop Genome Inform 8: 147-155.
24. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T.S., Matsuzaki, Y., Miyoshi, F., Saito, K.,
Tanida, S., Yugi, K., Venter, J.C., et al. 1999. E-CELL: software environment for whole-cell
simulation. Bioinformatics 15: 72-84.
25. Yugi, K., and Tomita, M. 2004. A general computational model of mitochondrial
metabolism in a whole organelle scale. Bioinformatics 20: 1795-1796.

Mycobacterium tuberculosis systems biology data in R


Srinivasan Ramachandran*, Amit Katiyar, Amit Sinha, Anshu Bharadwaj, Anirban Dutta,
Ayush Raman, Archana Pan, Balwant Kishen Malik, Balvinder Singh, Beena Pillai, Bharati
Dutta, Bhanot Priyamwada Sinha, Bhupesh Taneja, Chhabinath Mandal, Charu Kapil
Richa, Chitra Dutta, Debasis Dash, Debaprasad Mukherjee, Debdas Paul, Debojyoti
Chakraborty, Faraz Alam Ansari, Gajinder Pal Singh, Gajendra Pal Singh Raghava, Gargi
Guhathakurtha, Imran Siddiqui, Manish Kumar, Manoj Hariharan, Mekapati Bala
Subramanyam, Monika Joon, Mridula Bose, Mudgal Haymanti, Muthiah Gnanamani,
Muthukurussi Varieth Raghunandanan, Nanda Ghoshal, Nitin Kumar Singh, Pallavi
Sarmah, Ramaswamy Suyambu Kesava Vijayan, Rajni Verma, Rakesh Sharma, Ravishankar
Ramachandran, Rupanjali Chaudhuri, Sabyasachi Das, Samir Kumar Brahmachari, Sandip
Paul, Sanjib Chatterjee, Savita Bhutoria, Shantanu Chowdhury, Simone Gupta, Souvik Maiti,
Subhagata Ghosh, Suchir Arora, Sudipto Saha, Sumit K. Bag, Sumit Deb, Vani Brahmachari,
Vanika Gupta, Vikram Kumar, Vinod Scaria, Yasha Bhasin, Yogendra Singh, [OSDD
Consortium]
*G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology,
Mall Road, Delhi 110 007, India
*

Email: ramuigib@gmail.com

Abstract
M. tuberculosis is a dreaded pathogen causing the respiratory disease Tuberculosis with high rates of
mortality (approximately 1 death per 1.5 min. in India alone). The rate of susceptibility to M.
tuberculosis infection is extremely high. The circulation of drug resistance strains aggravates this
problem several folds higher. While considerable efforts are focused on identifying drug targets and
new vaccine candidates, novel strategies and new molecules are required to continuously battle with
the problem of drug resistance. In this scenario, it is very important to carry out systems modeling
40

Biobytes Vol 5, July - 2009


in order to simulate the dynamic tracts of molecular transformations or to select potential drug
targets using integrative approach. The OSDD Consortium has been contributing towards this aim
by dividing the drug discovery pipeline in various Work Packages which in turn are implemented as
Projects managed online by Project Managers. In order to enable data analysis through integrative
querying, I have packaged the available data into R environment. The R environment is open source,
comes with many functions and also can be readily interfaced with Bioconductor. Kinetic data
wherever available, can be used to carry out simulations using packages available in the R repository.
Availability
The M. tuberculosis SysBorg in R is available to download from
http://sysborgtb.osdd.net/bin/view/OpenLabNotebook/SysBorgInR
Introduction
The genomics heralded a radical transformation of biological sciences in terms of massive data
collection wherein, systems biology is being envisioned as a field in elevating our capabilities to
model integrative data and predict outcomes, which could be tested in appropriately designed
experiments. Even if rigorous modeling is not easily approachable to an all biologists, integrative
approaches could still be applied to address problems at the systems level, wherein analytical data
from different algorithms and experiments are used in combination. While reductionist approaches
have produced immensely valuable results, it is now being increasingly realized that integrative
approaches could provide a more holistic view of the biological phenomena. In other words, we
now see this transition phase as part of an upward movement of value addition as shown in Figure
1. Kitano (2002) emphasized examination of structure and dynamics at the cellular level or in whole
organism instead of parts as the basis for understanding the biological phenomena at systems level.
A key element of focus has been the robustness of the system for example the robustness of a
conserved biochemical network (Morohashi et al 2002). The first proposed standards for
development of models by systems biology community was released as Systems Biology Markup
Language (SBML) (Hucka et al 2003). These standards were developed with the aim to facilitate
sharing, evaluation and cooperative development of models.

Figure 1. The Knowledge Elevation Path of systems biology for biologists in the post genomics era.
The Systems Biology data in R is at the second stage of this knowledge elevation path.
41

Biobytes Vol 5, July - 2009


A few attempts have been made towards systems modeling in Tuberculosis. These include the
analysis of drug and stress response (Cabusora et al 2005), flux balance analysis of mycolic acid
pathway (Raman et al 2005), tricarboxylic acid cycle and glyoxylate bypass (Singh and Ghosh 2006),
M. tuberculosis metabolic network model (Beste et al 2007), regulatory network during growth arrest
(Balzsi et al 2008), target identification through network analysis (Raman et al 2008), and origin of
drug resistance through interactome analysis (Raman et al 2008). The main purpose of these
investigations is to enable identification of drug targets and by including multiple parameters, these
authors present deep analytical methods for drug target identification. These approaches include
comparative analysis between generic stress response and specific drug response, flux balance
analysis of the mycolic acid pathway in order to identify critical points as drug targets, and more
exhaustive stepwise network analysis for identification of drug targets. Identified drug targets also
can be assessed by probing the systemic effects using kinetic modeling or investigating the cause of
drug resistance through interactome analysis. In all these studies, data were sourced from different
sites and this is the usual procedure followed while setting up these analysis frameworks. The
availability of data in a open source environment with many mathematical and statistical tools, such
as the R package can enable wide application development and analysis. Very recently, The Council
of Scientific and Industrial Research (CSIR) has initiated a program on Open Source Drug
Discovery, a relatively new idea in the field of drug discovery. The start base of this program is
community collected data on Mycobacterium tuberculosis including gene sequences, expression function,
activity and the response to drugs and host-pathogen interactions (Seema 2008).
Methods
(1) Choice of R: The High-level interpreted language R is suitable for developing new computational
methods. The successful development of Bioconductor as an open software for computational
biology and bioinformatics is widely appreciated and currently it has many users (Gentleman et al
2004). Since then, several computational biology packages have been developed in R language. Very
recently even Chemistry packages are being developed in R (Cao et al 2008). The advantage of
developing computational packages in R is that one can carry out the analysis locally and also build
further tools and scripts. This facilitates development of both new applications and extension of
existing applications. In addition complex tasks can be performed using simple scripts. Another
major advantage of preparing datasets and computational biology tools in R is that a large set of
statistical and mathematical tools can be applied on the datasets for analysis. Furthermore, R is in
open source controlled by GNU General Public License allowing future developments and
customizations more widely. A core group takes responsibility for maintaining R and therefore the
availability of this platform remains ensured providing long life.
(2) Collection of datasets: A large consortium of scientists and students invested their expertise and
time to systematically collect curated data from the literature and also by applying bioinformatics
analysis. This consortium called M. tuberculosis SysBorg (Systems Biology of Organisms) consortium
was formed through a co-ordinated effort from the Institute of Genomics and Integrative Biology.
Scientists and students participated in thematic work BLOCKS namely, ANNOTATION, DRUGS
ACTIVITY, GENE EXPRESSION, HOST-PATHOGEN RELATIONSHIPS, STRAIN
42

Biobytes Vol 5, July - 2009


POLYMORPHISMS, and PATHWAYS. A seventh BLOCK called ADMINISTRATION was
responsible for managing the project. The equivalent data packs in R are annot, drugs, geneexpress,
hostpatho, strainpoly and pathways. Very recently a new BLOCK named others has been added in
order to receive new collections. Data folders were prepared using these names. Structured data were
prepared as both *.csv files and R image data files. The *.csv files are stored in the base folders
whereas the R image data files are stored in subfolder named as R_image. All the data objects within
a BLOCK can accesed instantly by loading the R image data files using the load command. Similarly
multiple R image datafiles can be loaded as desired within seconds. Additional data were sourced
from the Open Source Drug Discovery Consortium through the following link
http://sysborg.osdd.net
Features
(1) Getting to know the contents: A glance at the available data objects can be obtained with the
command ls(). This will list all the data objects currently available from the image data files
loaded. Each data object can be explored deeper step by step. After issuing the command
names(dataobjectname) the characteristics of the data table will be displayed. The headers of
the data fields will also be revealed. Using the dim(dataobjectname) command the number
of rows and columns will be displayed. This information enables users to plan their work for
in the subsequent steps. Much of further work depends on knowing the data types present
in the dataobject rows and columns. For example the data type could be a character such as
ORFid(Rvno.) or a boolean answer type such as Yes or No or could be a numerical type such
as normalized gene expression values. The command dataobjectname[rowno.,columnno.]
will display the data contained in that cell from which information one may note the data
type as a character or numeric. During this process user will also be able to get a glimpse of
the ways to prepare complex queries for example in preparing scripts to carry out filtering
searches meeting conditional criteria.
(2) Examples of scripts:
(i)

For a simple search for functions of a given set of Rvnos.


x<- c(Rv----,Rv----,Rv----,...)
or x, a dataobject read from a file with many Rv nos. The relevant dataobject from sysborg is
ORFFunctions. Running the following script will give the results:
for (i in 1:3513) { for (j in 1:n) {if (as.character(ORFFunctions[i,1]) == as.character(x[j]) )

print(ORFFunctions[i,])}}
here n is the no. of entries in x, which can be obtained using the command length(x).
(ii)

Example Does Rv----,Rv----,Rv----,... have homologs in human genome? The


relevant dataobject from sysborg is HostMimicry. Running the following script will
give the results
for (i in 1:3924) { for (j in 1:3) {if (as.character(HostMimicry[i,1]) ==
as.character(x[j]) )
43

Biobytes Vol 5, July - 2009


print(HostMimicry[i,])}} in the entries in second and third columns if there is no
match then you get No match and NM respectively but if there is a human homolog
then you get the details of the topmost matching protein. NOTE: these results are
better than a simple yes or no answer to the question
(iii)

Example Is RV--- a target of known drug? The relevant dataobjects are


FiveFirstLineDrug and
SecondLineDrugs. Running the following scripts will give the results
for (i in 1:8) { for (j in 1:3) {if (as.character(FirstLineDrug[i,1]) == as.character(x[j]) )
print(FirstLineDrugs[i,])}}
for (i in 1:11) { for (j in 1:3) {if (as.character(SecondLineDrugs[i,1]) ==
as.character(x[j]) )
print(SecondLineDrugs[i,])}} Obviously if no result comes then the given Rv nos. in
x are not target of known drugs. But if you do get an output then it is obvious that
the Rv no. is a known target.

(iv)

What are the genes in a certain pathway? The relevant dataobject is PathwayReaction
Running the following script will give the results
for (i in 1:952) { if (as.character(PathwayReaction[i,5]) == "Purine metabolism" )
print(paste(PathwayReaction[i,1]," ",PathwayReaction[i,5]))}
Caution: may get redundant entries because of KEGG data

(v)

If we want to check for essential and non polymorphic Rv nos. this can be done by
selecting essential and leave out the known polymorphic ones and running the
following scripts
x<- union(SNPintragenic[,1],SNPintergenic[,1])
x<- union(x,InDelintragenic[,1])
x<- union(x,InDelintergenic[,1])
y<- HighProbabilityOfEssentialGenes[,1]
z<- setdiff(y,x)
z
The dataobject z contains the Rv nos. as per requirement. Caution: The results are
strongly dependent on the known data on polymorphisms in genes, therefore
absence of data does not automatically conclude absence of polymorphism.
However, the known polymorphic genes will be certain to be excluded by this
approach.
44

Biobytes Vol 5, July - 2009


(vi)

If we want to use the Bioconductor as one example where we wish to get the
sequence of a given Rv id is
> library(Biobase)
> getSEQI(as.numeric(Rv2GI[1,2]))
Output is the sequence. Extending this further for example, the following command
can be used to get sequences of multiple Rv nos. using functions from
Bioconductor. For example one could combine queries from above and link to this
script to obtain sequence data along.
for (i in 1:3989) { for (j in 1:3) {if (as.character(Rv2GI[i,1]) == as.character(x[j]) )
print(getSEQ(as.numeric(Rv2GI[i,2]))) }}

(vii)

If we wish to use the KEGGgraph then as an example


pumetKGML <- system.file("extdata/mtu00230.xml", package="KEGGgraph")
> pumetpathway<- parseKGML(pumetKGML)
> pumetpathway
KEGG Pathway
[ Title ]: Purine metabolism
[ Name ]: path:mtu00230
[ Organism ]: mtu
[ Number ] :00230
[ Image ] :http://www.genome.jp/kegg/pathway/mtu/mtu00230.gif
[ Link ] :http://www.genome.jp/dbget-bin/show_pathway?mtu00230
-----------------------------------------------------------Statistics:
257 node(s)
154 edge(s)
66 reaction(s)
Note that the file mtu00230.xml must be downloaded from KEGG site as described
in KEGGgraph documentation. The downloaded file must be stored in C:\Program
Files\R\R-2.8.1\library\KEGGgraph\extdata
If we wish to check adhesions also involved in persistence and check for their
consistent expression in various strains as a new way to approach for drug targets, we
45

Biobytes Vol 5, July - 2009


could do it as follows:
(viii)

x<- NULL
for (i in 1:3997) {if(as.numeric(as.character(SurfaceAdhesion[i,2])) >= 0.7) x<cbind(x,as.character(SurfaceAdhesion[i,1]))}
z<- intersect(x[1,],as.character(MtbPersistance[,1]))
w<- NULL
> for (i in 1:4686) {for (j in 1:8) {if
(as.character(MtbStrainWiseExpressionZScores[i,1]) == z[j]) w<rbind(w,MtbStrainWiseExpressionZScores[i,])}}

(ix)

Can we get ORFids belonging to one pathway having high Drug Target Rank and
with known interactions in Human host? Example glycine, serine metabolism
pathway and assuming 100 is a high drug target rank, we have
x<- grep("^glycine,serine", PathwayReaction[1:952,5], ignore.case=TRUE)
y <- NULL
for (i in 1:26) {m<- x[i]; y<- cbind(y, as.character(PathwayReaction[m,1]))}
> z<- NULL
> for (i in 1:3927) {if (as.numeric(as.character(DrugTargetRank[i,2])) >=100 ) z<c(z,as.character(DrugTargetRank[i,1]))}
temp<- intersect(z,y)
result<- intersect(result, as.character(HumanMtbProteinInteractions[,1]))

Conclusion
At present different types of data on M. tuberculosis and tuberculosis related topics
have to be sourced from different sites. Besides these data will have to be organized in proper
formats so that they can be further analyzed through other algorithms and software. The
initialization of M. tuberculosis Sysborg in R aims to bridge all these gaps. However, we may have
some shortcomings in that all data tables may not have been packaged. This is due either to
restricted access or data distribution control issues. The present data provided is the publicly
available data. However, users can package their own data or other publicly available data and make
their own package. In this sense, this platform truly adheres to the basic tenets of open source.
Relevant data for modeling can also be extracted with simple commands and fed as input in to
programs like CellDesigner for simulation work. One can also use the deSolve package from R to
solve differential equations giving identical results. As shown above, one can also use the functions
of Bioconductor or KEGGgraph to get additional information or carry our further analysis. As
more analysis packages appear in R, they can be integrated with M. tuberculosis SysBorg thereby
enhancing the capabilities further. We as authors envisage many more developments by bright young
minds in order to contribute effectively to Tuberculosis research.

46

Biobytes Vol 5, July - 2009


Availability
The M. tuberculosis SysBorg in R is available to download from
http://sysborgtb.osdd.net/bin/view/OpenLabNotebook/SysBorgInR
We encourage students to contribute the enhancements to this first release back to this community.
Authors Contributors
AB, BKM, BS, BP, BT, CM, CD, DD, GPSR, IS, MB, MVR, NG, RS, RR, SKB, SC, SM, VB, VS, YS
were advisors, and the rest student's collected data. Data curation done by all.
Acknowledgements
SR thanks Prof. S.K. Brahmachari for giving an opportunity to develop this platform, Mr. Zakir
Thomas for giving guidance in open source, Dr. Andrew M. Lynn for constant encouragement,
CSIR for research grants under the task force In silico biology for drug target identification
CMM0017, Open Source Drug Discovery IAP0008, and DBT for National Bioscience Award
Grant, and all colleagues who have contributed to this platform.
References:
1. Balzsi G, Heath AP, Shi L, Gennaro ML. 2008.The temporal response of the Mycobacterium
tuberculosis gene regulatory network during growth arrest. Mol Syst Biol.4:225.
2. Beste DJ, Hooper T, Stewart G, Bonde B, Avignone-Rossa C, Bushell ME, Wheeler P, Klamt
S, Kierzek AM, McFadden J. GSMN-TB. 2007. A web-based genome-scale network model
of Mycobacterium tuberculosis metabolism. Genome Biol.8(5):R89
3. Cabusora L, Sutton E, Fulmer A, Forst CV. 2005 Jun 15. Differential network expression
during drug and stress response. Bioinformatics. 21(12):2898-905.
4. Cao Y, Charisi A, Cheng LC, Jiang T, Girke T. ChemmineR. . 2008 Aug 1. A compound
mining framework for R. Bioinformatics. 24(15):1733-4.
5. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B,Gautier L, Ge
Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R,Leisch F, Li C, Maechler M,
Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J. 2004. Bioconductor:
open software development for computational biology and bioinformatics. Genome Biol.
5(10):R80.
6. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ,
Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin
II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL,Kremling A,
47

Biobytes Vol 5, July - 2009


Kummer U, Le Novre N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED,
Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS,
Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J. 2003 Mar 1. SBML Forum
The systems biology markup language (SBML): a medium for representation and exchange
of biochemical network models. Bioinformatics. 19(4):524-31.
7. Kitano H. 2002 Mar 1. Systems biology: a brief overview. Science. 295(5560):1662-4.
8. Morohashi M, Winn AE, Borisuk MT, Bolouri H, Doyle J, Kitano H. 2002 May 7.
Robustness as a measure of plausibility in models of biochemical networks. J Theor Biol.
216(1):19-30.
9. Raman K, Rajagopalan P, Chandra N. 2005. Flux balance analysis of mycolic Acid pathway:
targets for anti-tubercular drugs. PLoS Comput Biol. 1:e46.
10. Raman K, Yeturu K, Chandra N. targetTB. 2008 Dec 19. A target identification pipeline for
Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural
analysis. BMC Syst Biol. 2:109.
11. Raman K, Chandra N. 2008 Dec 23. Mycobacterium tuberculosis interactome analysis unravels
potential pathways to drug resistance. BMC Microbiol.8:234.
12. Singh VK, Ghosh I . 2006 Aug 3. Kinetic modeling of tricarboxylic acid cycle and glyoxylate
bypass in Mycobacterium tuberculosis, and its application to assessment of drug targets. Theor Biol
Med Model.3:27
13. Singh, S. 2008. India Takes an Open Source Approach to Drug Discovery. Cell 133, April 18.

Systems Biology of Malaria: An Indian Perspective


Ashis Das3, Dhanpath Kochar2 and Utpal Tatu1*
1

Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, Karnataka, India.


Department of Medicine, S. P. Medical College, C-54, Sadul Ganj, Bikaner, Rajasthan 334003,
India.
3
Biological Sciences Group, BITS-Pilani, Rajasthan - 333031, India.
*Corresponding author:
Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, Karnataka, India
Tel: +91-080-22932823; Fax: +91-808-23600814/23600683
*
E-Mail: tatu@biochem.iisc.ernet.in
2

Introduction:
Malaria is a prehistoric disease. It is believed that malaria may have contributed to extinction of
dinosaurs from the earth. There is documentation of malaria in our old civilizations, Egyptian
48

You might also like