You are on page 1of 2

A solved example using Sankoff's algorithm is shown on the following pages.

First, lets focus on a small subtree of the tree on next page, the portion on the left containing the nodes {C}, {A}, and their parent. Note that a set of 4 cells below each node "stores" the value of S for each possible nucleotide assignment, i.e., for the {C} leaf node:

C 0

G T

Because {C} and {A} are leaves, their values have already been assigned: 0 for the nucleotide they represent, and + for all other nucleotides. To fill their parent node, we must use the last formula on the previous page. This requires examining all 16 possibilities for transitions/transversions for each leaf, or 32 possibilities in total. Lets begin by assuming the parent node is an A. According to the formula, we must choose an i that gives the minimum value of cki + Su (i) , where u is the child node, i is the nucleotide chosen for the child node, and k is the nucleotide chosen for the parent node. In the case of our subtree, this choice is easy, because {C} and {A} are leaves, meaning that all but one of the scores they hold (Su) are +. To choose the minimum, we must choose the values of i that allow us to use the score of 0: C for the left child, and A for the right child. To compute the score for the A cell of the parent node, then, we calculate cki + Su(i) for both children and sum them. For the left child, cCA is 2.5 (because we are going from C, the value in the child, to A, the value we are calculating in our parent) and Su(i) is 0, for a total of 2.5 on the left side. For the right child, cAA is 0 (going from A in the child to A in the parent) and Su(i) is 0, for a total of 0 on the right side. Therefore, our total value for the A cell of the parent is 2.5 + 0 = 2.5. Now, lets look at a more complicated portion of the tree: the root and its two children. As you can see, none of the values in the child cells are + , meaning we cant skip any work this time and must look at all possible values of i for both child nodes. As an example, lets calculate the value of the G cell in the parent node. First, well calculate the possible scores from the left child: from A: cAG + Su(A) = 1 + 2.5 = 3.5; from C: cCG + Su(C) = 2.5 + 2.5 = 5; from G: cGG + Su(G) = 0 + 3.5 = 3.5; and from T: cTG + Su(T) = 3.5 + 2.5 = 6. Likewise from the right child, we get 4.5 from A, 6 from C, 3.5 from G, and 7 from T. Our final step is to choose the lowest value from both children, in this case 3.5 from the left and 3.5 from the right, and sum them to get the final score of 7 for the G cell in the parent node. Now, we must repeat a similar calculation for the A, C, and T cells of the parent node. Based on these calculations, the parsimony score for this tree is 6.

Once we complete our table, we must perform a traceback to reconstruct the path(s) that generated the parsimony score of 6; this is analogous to traceback to find the pairwise alignment that generates lowest score when using dynamic programming for sequence alignment. We are trying to determine which cells a given value could have come from. To begin, we start with the root of the tree and choose the cell with the lowest value. In this case, there are two such cells: both the A and C cells hold the value 6, so we must work through both of them. Lets work through determining the trace back from the A cell. To do this, we must determine the scores that cell would have had, had it come from any given cell in its child node. Lets start by calculating scores from the left child: 2.5 from A, 5 from C, 4.5 from G, and 6 from T. Now the scores from the right child: 3.5 from A, 6 from C, 4.5 from G, and 7 from T. Now we check to find which combinations of one value from each child result in a score of 6, which we know to be the score of the A cell of the parent. In this case, the only such combination is 2.5 from the left childs A and 3.5 from the right childs A. This means that there is only one pair of traceback arrows emerging from the parents A cell; one pointing to the A cell of the left child, and one pointing to the A cell of the right child. To continue the algorithm, we must determine the traceback from the C cell of the parent. For the purposes of demonstration, well skip the calculations for this step and note that they point to the C cell of the left child and the C cell of the right child. Next, we must trace back, following all arrows leading from the parent node to the correct cells in the child nodes, and perform the traceback calculation for only those cells. Remember, the question of which cell has the lowest value is only relevant for the root of the tree: like a dynamic programming problem, once we begin the traceback, we only follow the arrows.

Again, based on our calculations, the parsimony score for this tree is 6, but more than one labeling of the internal and root nodes could have generated this score. (What are these labelings?)

You might also like