You are on page 1of 9

ReadMe file for PAP BIO

Last update: June 19, 2012 Table of Contents


* Introduction * Software UI Layout * Bioinformatics Tools * Workflow * System Requirements

* Introduction

PAP BIO is software that consists of Bioinformatics tools and databases that are
loosely coupled together in a coordinated system to execute a set of analyses tools in series. This pipeline aims to builds a Phylogenetic tree of the orthologs detected for a given query sequence (protein) using a similarity search tool. User needs to provide only the query sequence to carry out a similarity search using BLAST against a chosen database. Parsers will be provided to read the output of BLAST and submit sequences to multiple sequence alignment tools like ClustalW, which would then pass the output to Phylip suite for reconstruction of Phylogenetic tree of the orthologs.

ReadMe file for PAP BIO

ReadMe file for PAP BIO


* Software UI Layout

PAP BIO has following layout components:


Panels: contain textfields and buttons for insertion & execution of parameters in the pipeline. Output_Window : To view output files of the pipeline. --------------------------------------------------------------------------------------------------------------------* Bioinformatics Tools

BLAST CLUSTALW PHYLIP

Tool Name: Blast - Basic Local Alignment Search Tool Version: 2.2.14 Description The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

ReadMe file for PAP BIO

ReadMe file for PAP BIO


Advanced parameters Parameter Allowed Values Default Values Query sequence (complete protein sequence) Database (complete protein sequences of organism against which ortholog has to be predicted) Program Name Output File Blastp. alphanumeric None stdout Fasta sequences Fasta sequences None None

Tool Name: ClustalW - Multiple sequence alignment programs Version: 0.13 Description ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Parameter Input Allowed Values Alphanumeric Default Values Blast parser output Database (complete protein sequences of organism against which ortholog has to be predicted) Output File alphanumeric stdout Fasta sequences None

ReadMe file for PAP BIO

ReadMe file for PAP BIO


From PHYLIP Package: Tool Name: Consense Version: 3.67 CONSENSE reads a file of computer-readable trees and prints out (and may also write out onto a file)a consensus tree. Basically the consensus tree consists of monophyletic groups that occur as often as possible in the data. If a group occurs in more than 50% of all the input trees it will definitely appear in the consensus tree. The tree printed out has at each fork a number indicating how many times the group which consists of the species to the right of (descended from) the fork occurred. Input file: Intree Output file: Outfile

Tool Name: protdist - From PHYLIP Package Version : 3.67 Description ProtDist uses protein sequences to compute a distance matrix, under three different models of amino acid replacement. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. This is an alternative to use of the sequence data itself in the parsimony program PROTPARS. Input file: Intree Output file: Outfile

ReadMe file for PAP BIO

ReadMe file for PAP BIO


Tool Name: fitch - From PHYLIP Package Version: 3.67 Description Fitch carries out Fitch-Margoliash, Least Squares, and a number of similar methods as described in the documentation file for distance methods. Input file: Intree Output file: Outfile

Tool Name: kitsch - From PHYLIP Package Version: 3.67 Description Kitsch carries out the Fitch-Margoliash and Least Squares methods, plus a variety of others of the same family, with the assumption that all tip species are contemporaneous, and that there is an evolutionary clock (in effect, a molecular clock). This means that branches of the tree cannot be of arbitrary length, but are constrained so that the total length from the root of the tree to any species is the same. The quantity minimized is the same weighted sum of squares described in the Distance Matrix Methods documentation file.

Input file: Intree Output file: Outfile

ReadMe file for PAP BIO

ReadMe file for PAP BIO

Tool Name: neighbor - From PHYLIP Package Version: 3.67 Description Neighbor implements the Neighbor-Joining method. It constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree. Input file: Intree Output file: Outfile Tool Name: protpars - From PHYLIP Package Version: 3.67 Description ProtPars infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971). Eck and Dayhoff (1966) allowed any amino acid to change to any other, and counted the number of such changes needed to evolve the protein sequences on each given phylogeny. This has the problem that it allows replacements which are not consistent with the genetic code, counting them equally with replacements that are consistent. Fitch, on the other hand, counted the minimum number of nucleotide substitutions that would be needed to achieve the given protein sequences. This counts silent changes equally with those that change the amino acid.

ReadMe file for PAP BIO

ReadMe file for PAP BIO

Input file: Intree Output file: Outfile

Tool Name: SeqBoot - From PHYLIP Package Version: 3.67 Description SEQBOOT is a general boostrapping tool. It is intended to allow you to generate multiple data sets that are resampled versions of the input data set. Since almost all programs in the package can analyze these multiple data sets, this allows almost anything in this package to be bootstrapped, jackknifed, or permuted. SEQBOOT can handle molecular sequences, binary characters, restriction sites, or gene frequencies. Input file: Intree Output file: Outfile

* Workflow-Pipeline

A Workflow-Pipeline is logical connection of commonly used Bioinformatics tools, which run in serial mode, to achieve a scientific target. The pipeline is created by using Bioinformatics tools, connecting them logically using BIOJAVA code, and then setting the appropriate input-output files and advanced parameters.

ReadMe file for PAP BIO

ReadMe file for PAP BIO


* Execute Pipeline

A protein/DNA or a genomic sequence in FASTA format is given as input to the BLAST, which is a tool that finds the similarity of that sequence in the database that is locally installed. The database contains a large number of sequences of genomes. The result of this, is a file that contains the information about various matches with the query sequence found in the database with their matching statistics such percent matches, identity, E-values etc.

Next step is to find the multiple sequence alignment. For this purpose, the tool which is mostly used by the user is CLUSTAL-W. The output file from the previous tool (BLAST) would be parsed by BLAST Parser (java program) which will prepare the output of blast to be used in the next tool i.e. CLUSTALW. CLUSTAL-W performs multiple sequence alignment for the query sequence. An output file containing the alignment of sequences is given by the CLUSTAL-W. This output file is treated as the input to PHYLIP.

The output of one tool is parsed to format required as input to the next tool. Again the process of parsing will be done after getting the result of another tool in the series. The tools being used here is connected using a multiple java program which allows the user to supply the parameters for individual tools.

ReadMe file for PAP BIO

ReadMe file for PAP BIO


* System Requirements

Software Requirement: Operating System must be Microsoft Windows xp or above version. The stand-alone tools such as: BLAST, CLUSTAL-W and PHYLIP. Biojava. & BioPerl Modules NetBeans

Hardware Requirement: PC with 486 or above Microprocessor Chip, 64 MB or above RAM for good performance, 4.2 or above hard disc drive.

Back-up Media: Compact Disc (CD). Pen Drive.

ReadMe file for PAP BIO

You might also like