Professional Documents
Culture Documents
Background Literature Review Problem Statement Hypothesis Methodology Result Future Work References
Bioinformatics
Any application of computation in biology
including data management, developing algorithm and data mining. Bioinformatics is the field of science in which computer science, information technology, statistics and various branches of biology merge to from single discipline.
Antibody
Proteins
Hormones
Sr. No 1 2 3 4 5 6 7 . . 20
Amino Acid
Three Letter Code Ala Arg Ser Thr Tyr Gly His . . Ieu
Experimental Approach
vivo vitro silico
Description Addition of a phosphate group, to S, T, Y, H Addition of a glycosyl group to either S, T, N. Addition of a sulfate group to a Y Addition of an acetyl group, usually at the N-terminus of the protein Addition of a methyl group, usually at or R residues
Description Reference Database of consensus patterns for Sigrist et al., (2002) various PTMs Human protein reference database of Peri et al., (2003) disease-related proteins and their PTMs Database with collection of Garabelli (2003) annotations and structures for PTMs Database of phosphorylation sites. validated Diella et al., (2004)
HPRD
RESID
PhosphoBase ELM
Sr. No
1
Statement
Proteins often perform diverse and multiple functions. The diversity of proteome is higher complex then genome, in human genome the number of genes are 22,000 to 25,000 but in contrast number of proteins more than 10,0000 To identify the proteins functions and events mainly rely on their particular 3-D structure as well as the occurrence of targeted amino acid modification. PTMs regulate various functions of proteins by effecting verificational changes such as enzymes activation. Phosphorylated serine, theronine and tyrosine residues using MS is not easy in VIVO.
References
(Jeffery ,1999)
(Nicolle H et al ,2007)
(Attwood ,2000)
(Konstantinopoulos et al ,2007)
(Mann et al ,2002)
Sr. No
6
Statement
Many methods have been developed within the field of proteomics but these methods are still in early stages.
References
(Blom ,2004)
Application of machine learning and statistics in bioinformatics have always played a core role in understanding proteomics and to analysis of PTMs. ANN is one such Approach that has been extensively used in biological sequences analysis.
(Qazi et al,2006)
Mostly cellular proteins are regulated by reversible phosphorylation and at least 30% of protein have such alteration.
(Ficarro et al ,2002)
DISPHOS PredPhosPho GPS PPSP KinasePhos 1.0, KinasePhos 2.0 NetPhos, NetPhosK Neural-genetic
KinasePhos PredPhospho 1.0 HMM Sn=91% Sp=86% Acc=85% Sn=80% Sp=87% Acc=83% SVM Sn=88% Sp=%91 Ac=90% Sn=79% Sp=86% Ac=83%
false negative and positive prediction. BINS method show highly accuracy prediction about PTMs which will affect the specific site and kinases that act at each site, disclose the important biologically information from noisy data. BINS method can gives the best result as compare to the existing PTMs prediction methods.
BINS Bootstrapping Module Peptide dataset grouped by non modified classes Peptide dataset grouped by non modified classes BINS ANN Module Topology and Network Configuration Sparse Encoding Merge the Sparse Encoding dataset grouped by modified and non modified target classes Training [SN] [SP] [Acc] [MCC] Validation Training and validation Dataset Generator Validation dataset Generator Training dataset Generator [SN] [SP] [Acc] [MCC]
ELSFKQGE 3 QIYTA.
S T Y
O08539 -2
-,-,0.1 ,A,S,T, S,M,N S,Y,T,L K,S -,-,0.1 ,A,S,T, S,M,N, S,Y,T,L ,K,S,Y A,-,-,
O08539 -7
Coding Scheme
10000000000000000000 01000000000000000000 00100000000000000000 00010000000000000000 00001000000000000000 00000100000000000000 00000010000000000000 00000001000000000000
. .
-
. .
00000000000000000000
O08539 -2
-,-,0.1 ,A,S,T, S,M,N S,Y,T,L K,S -,-,0.1 ,A,S,T, S,M,N, S,Y,T,L ,K,S,Y A,-,-,
O08539 -7
O08539 -2
-,-,0.1 ,A,S,T, S,M,N S,Y,T,L K,S -,-,0.9 ,A,S,T, S,M,N, S,Y,T,L ,K,S,Y A,-,-,
O08539 -7
Evaluation Strategy
Sn=TP/(TP+FN) Sp=TN/(TN+FP) Acc=(Sn+Sp)/2 MCC=
Evaluation Strategy
PID Sequence Position Target Clarify
3 10 1 5 7
Mod Mod
TP FN
Validation
Sn 0 0.612 0.619 0.622 0.616 0.628 Sp 1 0.999 0.995 0.995 0.999 0.991 MCC None 0.662 0.663 0.664 0.665 0.663
Validation
Sn 0.688 0.737 0.750 0.771 0.774 0.772 0.761 0.768 0.770 0.770 Sp 0.965 0.932 0.901 0.884 0.875 0.872 0.890 0.880 0.874 0.871 MCC 0.680 0.683 0.658 0.659 0.653 0.648 0.657 0.652 0.647 0.645
Validation
Sn 0.735 0.741 0.778 0.780 0.779 0.779 0.778 0.779 0.778 0.768 Sp 0.951 0.939 0.891 0.890 0.881 0.877 0.876 0.875 0.872 0.879 MCC 0.705 0.697 0.675 0.676 0.665 0.661 0.659 0.659 0.654 0.652
Y
Sn 74% 70% NA 75% 81% Sp 95% 68% NA 75% 78% Acc 83% 72% 81% 78% 83%
T
Sn 74% 66% NA 78% 81% Sp 93% 77% NA 77% 84% Acc 81% 69% 76% 72% 75% Sn
S
Sp 99% 57% NA 72% 74%
BINS is a developed as Desktop Application, technically, there is no online WWW support available in the current version, nevertheless, increasing opportunities over the internet urges the need to develop an online version of this application for its wider scope and availability to multiple clients in different regions of the world. This effort would not only help us to enhance the embedded capability of BINS for efficient PTMs but also could be major resource for multi-nation research collaborations. BINS are the sub module of GEARS so in next version learn and optimize the parameters and weights of ANN with genetic algorithm. In next, BINS integrate with other GEARS modules like MAPRes and HMM for best classification of proteins data using pros and cons of each technique.
11, 1999. Bork P., Dansekar T., Diaz-Lazcoz Y., Eisenhaber F., Huynen M. and Yuan Y. Predicting function: from genes to genome and back. J. Mol. Biol., 283:707--725, 1998. Attwood T. The quest to deduce protein function from sequence: the role of pattern databases, Int. J. Biochem. Cell Biol., 32:139-155, 2000. Mann, M., Ong, S., Gronborg. M, .Steen, H. et al., Trends Biotechnol. 2002, 20, 261-268. Wu, C. H., Comput, Chem, 1997, 21, 237-256. Blom N., Sicheritz-Protein T., Gupta R., Gammeltoft S., and Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence, Proteomics, 4: 1633--1649, 2004.