You are on page 1of 34

Data Structures and Algorithms

UNIT - III Snapshots


Introduction The General Method Principle of Optimality Multistage Graphs All Pairs Shortest Paths 0/1 Knapsack The Traveling Salesperson Problem Introduction to Searching Basic Search Techniques Sequential Search Binary Search Algorithm for Binary Search Basic Traversal Techniques Optimization AND / OR Graphs Bi-directional Components Depth First Search and traversal

3.0

Introduction

Dynamic programming has developed into a major model of algorithm design in computer. Richard Bellman coined the name in 1957 to describe the optimal control problem. The programming means a series of choices. The dynamic means the idea of choice may depend on the current state, rather than being decide ahead of time. The main feature of this method was that to replace exponential time computation by polynomial computation. This chapter focuses on the General Method, developing dynamic programming solutions for problems from different application areas.

3.1

Objective

This chapter discusses the basic traversal methods and various search techniques. The techniques, which involve examining every node in the given data object instance are referred to as traversal methods. The second category includes techniques applicable to graphs. These may not examine all vertices and so are referred to only as search methods.

3.2

Content

3.2.1 The General Method

Page 83

Data Structures and Algorithms


This algorithm is the design method; it can be used when the result of the problem can be viewed as the answer for a sequence of decisions. The result of the knapsack problem can be viewed as the result of a sequence of decisions. One has to decide the values of xi, 1 i n. One forms the decision first on x1, then x2, x3, and so on. An elective sequence of decisions maximizes the objective function pixi. To find a shortest path from vertex i to j in a directed graph G is just to decide which one is first, second, third and so on until vertex j is reached. An optimal sequence is one that results in a path of least length. For some of the problems that may be viewed in this way, making the decisions one at a time and never making an erroneous decision can find an optimal sequence of decisions. It is applicable for all type of problems solved by greedy method. For numerous problems, it is not possible to make stepwise decisions in such a manner that the sequence of decisions made is optimal. Shortest path Assume that one wants to discover a shortest path from vertex i to vertex j. Let Ai be the vertices adjacent from vertex i. Which of the vertices in A i should be the second vertex on the path? There is no way to make a decision at this time and guarantee that future decisions leading to an optional sequence can be made. If on the other hand we wish to find a shortest path from vertex i to all other vertices in G, then at each step, a correct decision can be made. One way to solve problems for which it is not possible to make a sequence of stepwise decisions leading to an optimal decision sequence is to try all possible decision sequences. We could spell out all decision sequences and then pick out the best. But the time and space requirements may be prohibitive. Dynamic programming often drastically reduces the amount of enumeration by avoiding the enumeration of some decision sequences that cannot possibly be optimal. In dynamic programming an optional sequence of decisions is obtained by making explicit appeal to the principle of optimality.

3.2.2 Principle of Optimality


The principle of optimality states that an optimal sequence of decisions had the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal decision sequence with regard to the state resulting from the first decision. As a result the important difference between the greedy method and dynamic programming is that, in the greedy method only one-decision sequence is ever generated. In dynamic programming, many decision sequences may be generated.

3.2.3 Multistage Graphs


Page 84

Data Structures and Algorithms


A multistage graph G=(V,E) is a directed graph in which the vertices are partitioned into k>=2 disjoint sets Vi, 1 i k. In addition, if (u,v) is an edge in E, then uVi and vVi+1 for some i, 1 i<k. The sets V1 and Vk are such that |V1| = |Vk| = 1. Let s and t, respectively, be the vertices in V1 and Vk. The vertex s is the source, and t the sink. Let c(i,j) be the cost of edgei,j. The cost of a path from s to t is the sum of the costs of the edges of the path. The multistage graph problem is to find a minimum-cost path from s to t. Each set Vi defines a stage in the graph. Because of the constraints on E, every path from s to t starts in stage 1, goes to stage 2, then to stage 3, then to stage 4 so on and eventually terminates in stage k. Figure 3.2 shows a five-stage graph. A minimum cost s to t path is indicated by the broken edges.

1 7 8 10 5 8 3 4
Figure 3.1: Graph

Many problems can be formulated as multistage graph problems. Consider a resource allocation problem in which n units of resource are to be allocated to r projects. If j, 0 j n, units of the resource are allocated to project i, then the resulting net profits is N(i,j). The problem is to allocate the resource to the r projects in such a way as to maximize total net profit. This problem can be formulated as an r+1 stage graph problem as follows. Stage i, 1 i r, represents project i. There are n+1 vertices V(i,j), 0 j n, associated with stage i, 2 i r. Stages 1 and r+1 each have one vertex, V(1,0)=s and V(r+1,n)=t, respectively. Vertex V(i,j), 2 i r, represents the state in which a total of j units of resource have been allocated to projects 1,2,,i-1. The edges in G are of the form {V(i,j), V(i+1,l)} for all j l and 1 i<r. The edge {V(i,j), V(i+1,l)}, j l, is assigned a weight or cost of N(i,l-j) and corresponds to allocating l-j units of resource to project i, 1 i<r. In addition, G has edges of the type V(r,j), V(r+1,n) . Each such edge is assigned a weight of max0 p n-j{N(r,p)}. The resulting graph for a three-project problem with n=4 is shown in Figure 3.3. It should be easy to see that an optimal allocation of resources is defined by a maximum cost s to t path. This is easily converted into a minimum-cost problem by changing the sign of all the edge costs.

Page 85

Data Structures and Algorithms

V1

V2
2

V3 4 2

V4

v5

2 1
3

5 4 7
7

s
1

7 t 3
4

4 3

1 0

2 5

1 2

11
8

5 6
1 1

2 11
5

Figure 3.2: Five-Stage graph A dynamic programming formulation for a k-stage graph problem is obtained by first noticing that every s to t path is the result of a sequence of k-2 decisions. The ith decision involves determining which vertex in Vi+1, 1 i k-2, is to be on the path. It is easy to see that the principle of optimality holds. Let p(i,j) be a minimum-cost path from vertex j in Vi to vertex t. Let cost(i,j) be the cost of this path. Then, using the forward approach

Page 86

Data Structures and Algorithms


V(2,0) N(2,0) V(3,0) N(2,1) V(3,1) N(1,0) N(1,1) s=V(1,0) V(4,4) V(3,2) N(1,2) V(2,2) N(1,3) V(2,3) N(2,0) V(3,3) V(2,4) N(2,0) V(3,4) X=max{N(3,0),N(3,1)} Y=max{N(3,0),N(3,1),N(3,2)}
Figure 3.3: Four-stage corresponding to a three-project problem cost(i,j)= min{c(j,l) + cost(i+1,l)}
lVi+1 j,l E

V(2,1) N(2,2) t=

N(1,4)

N(3,0)

--(4.3)

Since cost(k-1,j)=c(j,t) if j,t E and cost(k-1,j) = if {j,t} E,(4.3) may be solved for cost(1,s) by first computing cost(k-2,j) for all j Vk-2, then cost(k-3,j) for all j Vk-3, and so on, and finally cost(1,s). Trying this out on the graph of Figure 3.2, we obtain

Page 87

Data Structures and Algorithms


cost(3,6) cost(3,7) cost(3,8) cost(2,2) cost(2,3) cost(2,4) cost(2,5) cost(1,1) = = = = = = = = = = = = min{6+cost(4,9),5 + cost(4,10)} 7 min{4+cost(4,9),3 + cost(4,10)} 5 7 min{4+cost(3,6),2+cost(3,7),1+cost(3,8)} 7 9 18 15 min{9+cost(2,2),7+cost(2,3),3+cost(2,4), 2+cost(2,5)} 16

Note that in the calculation of cost(2,2), one has reused the values of cost(3,6), cost(3,7) and cost(3,8) and so avoided their re-computation. A minimum cost s to t path has a cost of 16. This path can be determined easily if one record the decision made at each state(vertex). Let d(i,j) be the value of l(where l is a node) that minimizes c(j,l) +cost(i+1,l). From Figure 3.2, the result will be obtain: d(3,6) = d(2,2) = d(1,1) = 2; 10; d(3,7) = 7; d(2,3) = 10; 6; d(3,8) = 10; d(2,4) = 8; d(2,5) = 8;

Let the minimum-cost path be s=1, v2,v3,,vk-1,t. It is easy to see that v2=d(1,1)=2, v3=d(2,d(1,1)=7, and v4=d(3,d(2,d(1,1)))=d(3,7)=10. Before writing an algorithm to solve 4.3 for a general k-stage graph, let us impose an ordering on the vertices in V. This ordering makes it easier to write the algorithm. And the require that the n vertices in V are indexed 1 through n. Indices are assigned in order of stages. First, s is assigned index 1, then vertices in V2 are assigned indices, then vertices from V3, and so on. Vertex t has index n. Hence, indices assigned to vertices in Vi+1 are bigger than those assigned to vertices in Vi. As a result of this indexing scheme, cost and d can be computed in the order n-1, n-2,,1. The first subscript in cost, p, and d only identifies the stage number and is omitted in the algorithm. The resulting algorithm, in pseudo code, is FGraph (Algorithm 3.1). The complexity analysis of the function FGraph is fairly straightforward. If G is represented by its adjacency lists, then r in line 9 of Algorithm 3.1 can be found in time proportional to the degree of vertex j. Hence, if G has |E| edges, then the time for the for loop of line 7 is (|V|+|E|). The time for the for loop of line 16 is (k). Hence, the total time is (|V|+|E|). In addition to the space needed for the input, space is needed for cost[], d[], and p[].

Page 88

Data Structures and Algorithms


Algorithm FGraph(G,k,n,p) //The input is a k-stage graph G=(V,E) with n vertices //indexed in order of stages. //E is a set of edges and //c[i,j] is the cost of i,j . //p[1:k] is a minimum cost path. { cost[n] :=0.0; for j:= n-1 to 1 step-1 do { // Compute cost[j]. Let r be a vertex such that j,r is an edge of G and c[j,r] + cost[r] is minimum; cost[j] := c[j,r] + cost[r]; d[j] := r; } // Find a minimum-cost path. p[1]:=1; p[k]:=n; for j:=2 to k-1 do p[j]:=d[p[j-1]]; }
Algorithm 3.1: Multistage graph pseudo code corresponding to the forward approach.

The multistage graph problem can also be solved using the backward approach. Let bp(i,j) be a minimum-cost path from vertex s to a vertex j in Vi. Let bcost(i,j) be the cost of bp(i,j). From the backward approach it is obtained bcost(i,j) = min {bcost(i-1,l) + c(l,j)} --(4.4) lVi-1 l,j E Since bcost(2,j)=c(1,j) if 1,j E and bcost(2,j) = if1,j E, bcost(i,j) can be computed using (4.4) by first computing bcost for i=3, then for i=4, and so on. For the graph of Figure 3.2, it is obtained bcost(3,6) = min{bcost(2,2)+c(2,6),bcost(2,3)+c(3,6)} = min{9+4,7+2} =9 bcost(3,7) = 11 bcost(3,8) bcost(4,9) bcost(4,10) bcost(4,11) = 10 = 15 = 14 = 16 Page 89

Data Structures and Algorithms


bcost(5,12) = 16

The corresponding algorithm in pseudocode, to obtain a minimum-cost s-t path is BGraph (Algorithm 3.2). The first subscript on bcost, p and d are omitted for the same reasons as before. This algorithm has the same complexity as FGraph provided its inverse adjacency lists now represent G. Algorithm BGraph(G,k,n,p) //Same functions as FGraph { bcost[1] :=0.0; for j:= 2 to n do {// Compute bcost[j]. Let r be such that r,j is an edge of G and bcost[r] + c[r,j] is minimum; bcost[j] := bcost[r] + c[r,j]; d[j] :=r; } // Find a minimum-cost path. p[1]:= 1; p[k]:=n; for j:=k 1 to 2 do p[j] := d[p[j+1]]; } Algorithm 3.2: Multistage graph pseudo code corresponding to backward approach Both FGraph and BGraph work correctly even on a more generalized version of multistage graphs. In this generalization, the graph is permitted to have edges u,v such that u Vi, v Vj, and i<j.

3.2.4 All-Pairs Shortest Paths


Let G=(V,E) be a directed graph with n vertices. Let cost be a cost adjacency matrix for G such that cost(i,i)=0,1 i n. Then cost(i,j) is the length (or cost) of edge i,j if {i,j}E(G) and cost(i,j)= if i j and i,j E(G). The all-pairs shortest-path problem is to determine a matrix A such that A(i,j) is the length of a shortest path from i to j. The matrix A can be obtained by solving n single-source problems using the algorithm Shortest Paths. Since each application of this procedure requires O(n3) time. Our alternate solution requires a weaker restriction on edge costs than required by Shortest paths. Rather than require cost (i,j) 0, for every edgei,j , it is required that G have no cycles with negative length. Note that if G is allowed to contain a cycle of negative length, then the shortest path between any two vertices on this cycle has length -. Let us examine a shortest i to j path in G, i j. This path originates at vertex i and goes through some intermediate vertices (possibly none) and terminates at vertex j. One Page 90

Data Structures and Algorithms


can assume that this path contains no cycles for if there is a cycle and then this can be deleted without increasing the path length (no cycle has negative length). If k is an intermediate vertex on this shortest path, then the subpaths are from i to k and from k to j respectively. Otherwise, the i to j path is not of minimum length. So, the principle of optimality holds. This alerts us to the prospect of using dynamic programming. If k is the intermediate vertex with highest index, then the i to k path is a shortest i to k path in G going through no vertex with index greater than k-1. Similarly the k to j path is a shortest k to j path in G going through no vertex of index greater than k-1. It is regarded the construction of a shortest i to j path as first requiring a decision as to which is the highest indexed intermediate vertex k. Once this decision has been made, it is needed to find two shortest paths, one from i to k and other from k to j. Neither of these may go through a vertex with index greater than k-1. Using Ak(i,j) to represent the length of a shortest path from i to j going through no vertex of index greater than k, it is obtained A(i,j) = min { min{Ak-1(i,k)+Ak-1(k,j)},cost(i,j)} --(4.5)
1 k n

Clearly, A0(i,j)=cost(i,j), 1 i n, 1 j n. One can obtain a recurrence for Ak(i,j) using an argument similar to that used before. A shortest path from i to j going through no vertex higher than k either goes through vertex k or it does not. If it does, Ak(i,j)= Ak1 (i,k)+ Ak-1(k,j). If it does not, then no intermediate vertex, has index greater than k-1. Hence Ak(i,j)= Ak-1(i,j).Combining one gets Ak(i,j)= min{ Ak-1(i,j), Ak-1(i,k)+ Ak-1(k,j)}, k>=1 --(4.6)

Algorithm AllPaths(cost,A,n) //cost[1:n,1:n] is the cost adjacency matrix of a graph //with n vertices; A[i,j] is the cost of a shortest path //from vertex i to vertex j. //cost[i,j]=0.0, for 1<=i<=n. { for i=1 to n do for j:=1 to n do A[i,j]:=cost[i,j]; //Copy cost into A. for k:=1 to n do for i:= 1 to n do for j:= 1 to n do A[i,j]:=min(A[i,j],A[i,k]+A[k,j]); } Algorithm 3.3: Function to compute lengths of shortest paths Example: The graph of Figure (4.4(a)) has the cost matrix of Figure (4.4(b)). The initial A matrix, A(0), plus the values after three iterations A(1), A(2) and A(3) are given in Figure (4.4).

Page 91

Data Structures and Algorithms


6 1 3 11 4 2
(a) Example Digraph

3 A0 1 2 3 A2 1 2 3 1 0 6 3 1 4 0 3 11 2 0 3 6 2 0

A1 1 2 3 A3 1 2 3

1 0 6 3

2 4 0 7

3 11 2 0 3 6 2 0

(b)A0 1 2 0 6 3 4 0 7 (d)A2

(c)A1 1 2 0 5 4 0

3 7 3 (e)A

Figure 3.4: Directed graph and associated matrices Let M=max{cost(i,j)|i,j E(G). It is easy to see that An(ij)<=(n-1)M. From the working of All Paths, it is clear that if i,j E(G) and i j, then one can initialize cost(i,j) to any number greater than (n-1)M (rather than the maximum allowable floating point number). If, at termination, A(i,j)>(n-1)M, then there is no directed path from i to j in G. Even for this choice of , care should be taken to avoid any floating-point overflows. The time needed by All Paths (Algorithm 3.3) is especially easy to determine because the looping is independent of the data in the matrix A. Line 11 is iterated n3 times, and so the time for All Paths is 0(n3)

Page 92

Data Structures and Algorithms


3.2.5 0/1 Knapsack
The 0/1 knapsack problem is similar to the knapsack problem except that the xis are restricted to have a value of either 0 or 1. Using KNAP(l,j,y) to represent the problem maximize l i pixi subject to l i j wixi y xi = 0 or 1, l i j

--(4.1)

the Knapsack problem is KNAP(1,n,m). Let y1,y2,,yn be an optimal sequence of 0/1 values for x1,x2,,xn, respectively. If y1=0, then y2,y3,,yn must constitute an optimal sequence for the problem KNAP(2,n,m). If it does not, then y1,y2,yn is not an optimal sequence for KNAP(1,n,m). If y1=1, then y2,,yn must be an optimal sequence for the problem KNAP(2,n,m-w1). If it is not then there is another 0/1 sequence z2,z3,,zn such that 2 i n wizi m-w1 and 2 i n pizi > 2 i n piyi. Hence, the sequence y1,z2,z3,,zn is a sequence for (4.1) with greater value. Again the principle of optimality applies. Let S0 be the initial problem state. Assume that n decisions di, 1 i n, have to be made. Let D1={r1,r2,,rj} be the set of possible decision ri, 1 i j. Let i be an optimal sequence of decisions with respect to the problem state Si. Then, when the principle of optimality holds, an optimal sequence of decisions with respect to S0 is the best of the decision sequences ri, i, 1 i j. Let gj(y) be the value of an optimal solution to KNAP(j+1,n,y). Clearly, g 0(m) is the value of an optimal solution to KNAP(1,n,m). The possible decisions for x1 are 0 and 1 (D1={0,1}).From the principle of optimality it follows that g0(m) = max{ g1(m), g1(m-w1) + p1} --(4.2)

While the principle of optimality has been stated only with respect to the initial state and decision, it can be applied equally well to intermediate states and decisions. Another important feature of the dynamic programming approach is that optimal solutions to subproblems are retained so as to avoid recomputing their values. The use of these tabulated values makes it natural to recast the recursive equations into an iterative algorithm. A solution to the knapsack problem can be obtained by making a sequence of decisions on the variables x1,x2,,xn. A decision on variable xi involves determining which of the values 0 or 1 is to be assigned to it. Let us assume that decisions on the xi are made in the order xn,xn-1,,x1. Following a decision on xn, one may be in one of two possible states: the capacity remaining in the knapsack is m and no profit has accrued or the capacity remaining is m-wn and a profit of pn has accrued. It is clear that the remaining decisions xn-1,..,x1 must be optimal with respect to the problem state resulting from the decision on xn. Otherwise, xn,,x1 will not be optimal. Hence, the principle of optimality holds.

Page 93

Data Structures and Algorithms


Let fj(y) be the value of an optimal solution to KNAP(1,j,y). Since the principle of optimality holds, we obtain fn(m) = max {fn-1(m), fn-1(m-wn) + pn} For arbitrary fi(y), i>0, Equation 4.7 generalizes to fi(y) = max { fi-1(y), fi-1(y-wi)+pi} --(4.8) --(4.7)

Equation 4.8 can be solved for fn(m) by beginning with the knowledge fo(y)=0 for all y and fi(y)=-, y<0. Then f1,f2,,fn can be successively computed using (4.8). When the wis are integer, it is needed to compute fi(y) for integer y, 0 y m. Since fi(y) =- for y<0, these function values need not be computed explicitly . Since each fi can be computed from fi-1 in (m) time, it takes (mn) time to compute fn. When the wis are real numbers, fi(y) is needed for real numbers y such that 0 y m. So, fi cannot be explicitly computed for all y in this range. Even when the w is are integer, the explicit (mn) computation of fn may not be the most efficient computation. So, explore an alternative method for both cases. Notice that fi(y) is an ascending step function; that is, there are a finite number of ys, 0=y1<y2<<yk, such that fi(y1)<fi(y2)<<fi(yk); fi(y) = -, y<y1; fi(y)=f(yk), y yk; and fi(y)=fi(yj), yj y<yj+1. So, we need to compute only fi(yj), 1 j k. one uses the ordered set Si={(f(yj),yj)|1 j k} to represent fi(y). Each member of Si is a pair (P,W), where P=fi(yj) and W=yj. Notice that So = {(0,0)}. One can compute Si+1 from Si by first computing: Si1 = {(P,W)|(P-pi, W-wi) Si } --(4.9)

Now, Si+1 can be computed by merging the pairs in Si and Si1 together. Note that if Si+1 contains two pairs (Pj, Wj) and (Pk,Wk) with the property that Pj Pk and Wj Wk, then the pair (Pj,Wj) can be discarded because of 4.8. Discarding or purging rules such as this one are also known as dominance rules. Dominated tuples get purged. In the above, (Pk, Wk) dominates (Pj, Wj). Interestingly, the strategy one has have come up with can also be derived by attempting to solve the knapsack problem via a systematic examination of the up to 2 n possibilities for x1,x2,,xn. Let Si represent the possible states resulting from the 2i decision sequences for x1,x2,,xi. A state refers to a pair(Pj,Wj), Wj being the total weight of objects included in the knapsack and Pj being the corresponding profit. To obtain Si+1, one notes that the possibilities for xi+1 are xi+1=0 or Xi+1=1. When xi+1=0, the resulting states are the same as for Si. When xi+1=1, the resulting states are obtained by adding(pi+1, wi+1) to each state in Si. Call the set of these additional states Si1. The Si1 is the same as in Equation 4.9. Now, Si+1 can be computed by merging the states in Si and Si1 together. Example 1: Page 94

Data Structures and Algorithms


Consider the Knapsack instance n=3,(w1,w2,w3) =(2,3,4),(p1,p2,p3)=(1,2,5), and m=6. For these data we have S0 = {(0,0)}; S01 = {(1,2)} S1 = {(0,0)(1,2)}; S11 = {(2,3),(3,5)} S2 = {(0,0)(1,2),(2,3),(3,5)}; S21 = {(5,4),(6,6),(7,7),(8,9)} S3 = {(0,0)(1,2)(2,3),(5,4),(6,6),(7,7),(8,9)} Note that the pair(3,5) has been eliminated from S3 as a result of the purging rule stated above. Many occurrences of this problem can be solved in a reasonable amount of time. This happens because usually, all the ps and ws are integers and m is much smaller than 2n. The purging rule is effective in purging most of the pairs that would otherwise remain in the Sis. Algorithm DKnap can be speeded up by the use of heuristics. The worst case time is O(2 ).
n/2

PW = record {float p; float w;} Algorithm DKnap(p,w,x,n,m) { //pair[] is an array of PWs. b[0]:=1; pair[1].p:=pair[1].w :=0.0; // S0 t:=1; h:=1; //start and end of S0 b[1]:=next:=2; // Next free spot in pair[] for i:= 1 to n-1 do { //Generate Si. k:=t; u:=Largest(pair,w,t,h,i,m); for j:= t to u do { // Generates S1i-1 and merge. pp:=pair[j].p+p[i];ww:=pair[j].w+w[i]; //(pp,ww) is the next element in S1i-1 while ((k h) and (pair[k].w ww)) do { pair[next].p := pair[k].p; pair[next].w := pair[k].w; next := next +1 ; k:=k+1 } if()k h) and (pair[k].w=ww)) then { if pp<pair[k].p then pp:=pair[k].p; k:=k+1;

Page 95

Data Structures and Algorithms


} If pp>pair[next-1].p then { pair[next].p := pp; pair[next].w:=ww; next := next+1; } while ((k<=h) and (pair[k].p pair[next-1].p)) do k:=k+1; } //Merge in remaining terms from Si-1. While (k h) do { pair[next].p:=pair[k].p;pair[next].w:=pair[k].w; next := next + 1; k:=k+1; } //Initialize for Si+1 t:=h+1; h:=next-1; b[i+1]:=next; } TraceBack(p,w,pair,x,m,n); } Algorithm 3.4: Algorithm for 0/1 Knapsack problem

3.2.6 The Traveling Salesperson Problem


Usually Permutation problems are much harder to solve than subset problems, as there are n! different permutations of n objects whereas there are only 2 n different subsets of n objects (n!>2n). Let G =(V, E) be a directed graph with edge costs cij. The variable cij is defined such that cij>0 for all i and j and cij= if i,j E. Let |V|=n and assume n>1. A tour of G is a directed simple cycle that includes every vertex in V. The cost of a tour is the sum of the cost of the edges on the tour. This problem is used to find minimum cost of tour. This problem finds application in a different kind of situations. Suppose we have to route a postal van to collect mail from mailboxes placed at n different sites. An n+1 vertex graph can be used to represent the situation. One vertex represents the post office from which the postal van starts and to which it must return. Edgei,j is assigned a cost equal to the distance from site i to site j. The route taken by the postal van is a tour, and we are interested in finding a tour of minimum length. As a second example, assume that we wish to use a robot arm to tighten the nuts on some piece of machinery on an assembly line. The robot arm will start from its initial position successively move to each of the remaining nuts, and return to the initial position. The path of the arm is clearly a tour on a graph in which vertices represent the nuts. A minimum-cost tour will reduce the time needed for the arm to complete its task. Every tour consists of an edge 1,k for some kV-{1} and a path from vertex k to vertex 1. The path from vertex k to vertex 1 goes through each vertex in V-{1,k} exactly once. It is easy to see that if the tour is optimal, then the path from k to 1 must be a shortest k to 1 path going through all vertices in V-{1,k}. Hence, the principle of

Page 96

Data Structures and Algorithms


optimality holds. Let g(i,S) be the length of the shortest path starting at vertex i, going through all vertices in S, and terminating at vertex 1. The function g(1,V-{1}) is the length of an optimal salesperson tour. From the principal of optimality it follows that g(1,V-{1}) = min{c1k + g(k,V-{1,k})} --(4.10) 2 k n Generalizing (4.10) we obtain (for iS) g(i,S) = min { cij + g(j,S-{j})} --(4.11) jS Equation (4.10) can be solved for g(1,V-{1}) if we know g(k,V-{1,k}) for all choices of k. The g values can be obtained by using (4.11). Clearly, g(i,)=ci1, 1 i<=n. Hence, we can use (4.5) to obtain g(i,S) for all S of size 1. Then we can obtain g(i,S) for S with |S| = 2, and so on. When |S|<n-1, the values of i and S for which g(i,S) is needed are such that i 1,1S, and iS. Example: Consider the directed graph of Figure (4.5(a)). The edge lengths are given by matrix c of Figure (4.5(b)).

0 5 6

10 0 13 8 (b)

15 9 0 9

20 10 12 0

4 (a)

Figure 3.5: Directed graph and edge length matrix x

Thus g(2, )=C21 = 5, g(3, )=C31=6, and g(4, )=C41=8. Using (4.11), we obtain g(2,{3}) g(3,{2}) g(4,{2}) = = = C23+g(3, ) = 15 18 13 g(2,{4})=18 g(3,{4})=20 g(4,{3})=15 = = = 25 25 23

Next, we compute g(i,S) with |S| = 2 , i 1, 1S and iS. g(2,{3,4}) = min{c23+g(3,{4}),C24+g(4,{3})} g(3,{2,4}) = min{c32+g(2,{4}),C34+g(4,{2})} g(4,{2,3}) = min{c42+g(2,{3}),C43+g(3,{2})} Finally, from (4.10) we obtain

Page 97

Data Structures and Algorithms


g(1,{2,3,4})=min{c12+g(2,{3,4}),c13+g(3,{2,4}),c14+g(4,{2,3})} =min{35,40,43} =35 An optimal tour of the graph of Figure (4.5(a)) has length 35. A tour of this can be constructed if we retain with each g(i,S) the value of j that minimizes the right-hand side of (4.5). Let J(i,S) be this value. Then J(1,{2,3,4})=2. Thus the tour starts from 1 and goes to 2. The remaining tour can be obtained from g(2,{3,4}). So J(2,{3,4})=4. Thus the next edge is (2,4). The remaining tour is for g(4,{3}). So J(4,{3})=3. The optimal tour is 1,2,4,3,1. Let N be the number of g(i,S)s that have to be computed before (4.10) can be used to compute g(1,V-{1}). For each value of |S| there are n-1 choices for i. The number of distinct sets S of size k not including 1 and i is (Kn-2). Hence n-2 N = (n-1) (Kn-2)=(n-1)2n-2 k=0 An algorithm that proceeds to find an optimal tour by (4.10) and (4.11) will require O(n22n) time as the computation of g(i,S) with |S|=k requires k-1 comparisons when solving 4.11. This is better than enumerating all n! different tours to find the best one. The most serious drawback of this dynamic programming solution is the space needed, O(n2n).

3.2.7

Introduction to Searching

Data stored in an organized manner requires to be accessed for processing. Locating a particular data item in the memory involves searching the data item. Searching is a technique where the memory is scanned for the required data. Computer systems are often used to store large amounts of data from which individual records must be retrieved according to some search specification. Thus the efficient storage of data to facilitate fast searching is an important issue.

3.2.8

Basic Search Techniques

There are several types of searching techniques available, which require the familiarity of certain terms to understand them better. A Table or a file is a group of elements called record. Associated with each record is a key, which is used to differentiate among the records. The key is present within a record at a specific offset from the state of the record. Such a key is called an internal key or an embedded key. There is a separate table of keys that include pointers to the records. Such keys are known as external keys. For each file there are atleast one set of keys which are unique. Such a key is known as primary key. In a file of name and addresses, if the state is used as the key for a particular search, it will probably not be unique, since there may be two records with the same state in the file. Such a key is called a secondary key.

Page 98

Data Structures and Algorithms


A search algorithm is an algorithm that accepts an argument and tries to find a record whose key is a. The algorithm may return the entire record or more commonly it may return a pointer to that record. It is possible that the search for a particular argument is unsuccessful that is there is no record in the table with that argument as its key. In that case the algorithm returns a null record or a null pointer. Frequently, if a search is unsuccessful it may be desirable to add a new record with the argument as its key. Such an algorithm is called Search and Insertion algorithm. A successful search is frequently called retrieval. A table of records in which a key is used for retrieval is often called search table or a dictionary. Searches in which the entire table is constantly in main memory are called Internal searches where as those in which most of the table is kept in auxiliary storage are called External searches.

3.2.9

Sequential Search

Simplest form of a search is sequential or linear search. This search is applicable to a table organized either as an array or as a linked list. The simplest technique for searching an unordered table for a particular record is to scan each entry in the table in sequential manner until the record is found. The storage medium involved lacks any type of direct access facility. The logic of sequential search is extremely straight forward; it begins with the first available record and repeatedly proceeds to next record until the search key is found or can be concluded that it will not be found. This method which traverses data sequentially to locate item is called linear or sequential search. To simplify, item to data is assigned to the position following the last element of data, which leads to:

loc=n+1
Algorithm for linear search linear (data,n,jtem,loc) /* here data is a linear array of n items and item is a given item of information */ data[n+1]= item loc=1 search for item Do data[loc] not equal to item loc:= loc+1 while the termination condition if (loc = n+ 1) loc := 0 exit To examine how long it will take to find an item matching a key in the collections, discussed so far, the following are to be considered: The average time The worst-case time and The best possible time. However, the general concern is with the worst-case time as calculations based on worst-case times can lead to guaranteed performance predictions. Conveniently, the worst-case times are generally easier to calculate than average times. Page 99

Data Structures and Algorithms


If there are n items in the collection - either stored as an array or as a linked list it is obvious that in the worst case, when there is no item in the collection with the desired key, then n comparisons of the key with keys of the items in the collection will have to be made. To simplify analysis and comparison of algorithms, a dominant operation and count for the number of times that dominant operation has to be performed is maintained. In the case of searching, the dominant operation is the comparison, since the search requires n comparisons in the worst case, this is a O(n) algorithm. The best case - in which the first comparison returns a match - requires a single comparison and is O(1).

3.2.10

Binary Search

Suppose the items are placed in an array and are to be sorted in either ascending or descending order on the key first, a much better performance is to be obtained with an extremely efficient searching algorithm known as Binary search. In binary search, comparison of the key with the item in the middle position of the array is done. If there is a match, the control is returned. If the key is less than the middle key, then the item sought must lie in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array. Hence the procedure is repeated on the lower (or upper) half of the array. Suppose the items are placed in an array that is sorted either in ascending or in descending order, Binary Search technique can be used.In this method Key is compared with the middle item of the array If there is a match it is returned successfully If the key is lesser the lower half of the array is to be searched If the key is greater the upper half of the array is to be searched. This procedure is repeated till the array is exhausted or the item is found.

Algorithm for Binary search


//l and u are the lower and upper bounds respectively. //k is the key searched for. l = 1 ; u = n; done = f while ((l<=u) && (done=='f')) { m=(l+u)/2; if (k>f[m]) l=m+1; if (k<f[m]) u=m-1; if (k==f[m]) { i=m; done='t'; } } Page 100

Data Structures and Algorithms


if (done=='f') printf("\n KEY %d IS NOT FOUND IN THE FILE",k); else printf("\n KEY %d IS FOUND IN POSITION %d",k,i); Binary search gives a complexity of O (log n) . This is much faster than the linear search Searching an Ordered Table If the table is stored in ascending or descending order of the record keys, there are several techniques that can be used to improve the efficiency of searching. This is especially true if the table is of fixed size. One obvious advantage in searching a sorted file over searching an unsorted file is in the case that the argument key is absent from the file. In the case of an unsorted file, n comparisons are needed to detect this fact. In the case of a sorted file, assuming that the argument keys are uniformly distributed over the range of keys in the file, only n/2 comparisons (on the average) are needed. This is because it is known that a given key is missing from a file sorted in ascending order of keys as soon as a key that is greater than the argument is encountered. Suppose that it is possible to collect a large number of retrieval requests before any of them are processed. For example, in many applications a response to a request for information may be deferred to the next day; in such a case, all requests in a specific day may be collected and the actual searching may be done overnight, when no new requests are coming in. If both the table and the list of requests are sorted, the sequential search can proceed through both concurrently. Thus it is not necessary to search through the entire table for each retrieval request. In fact, if there are many such requests uniformly distributed over the entire table, each request will require only a few lookups (if the number of requests is less than the number of table entries) or perhaps only a single comparison (if the number of requests is greater than the number of table entries). In such situations sequential searching is probably the best method to use. Because of the simplicity and efficiency of sequential processing on sorted files, it may be worthwhile to sort a file before searching for keys in it. This is especially true in the situation described in the preceding paragraph, while dealing with a master file and a large Transaction file of requests for searches. Indexed Sequential Search There is another technique to improve search efficiency for a sorted file, but it involves an increase in the amount of space required. This method is called the Indexed Sequential Search Method. An auxiliary table, called an index, is set aside in addition to the sorted file itself. Each element in the index consists of a key kindex and a pointer to the record in the file that corresponds to kindex. The elements in the index, as well as the elements in the file, must be sorted on the key. If the index is one eighth the size of

Page 101

Data Structures and Algorithms


the file, every eighth record of the file is represented in the index. This is illustrated by Figure 3.7. The algorithm used for searching an indexed sequential file is straightforward. The keys r, k, are defined as before, let kindex be an array of the keys in the index, and let pindex be the array of pointers within the index to the actual records in the file. We assume that the file is stored as an array, that n is the size of the file, and that index is the size of the index.

Figure 3.6.a Indexed sequential File The real advantage of the indexed sequential method is that the items in the table be examined sequentially and if all the records in the file must be accessed, yet the search time for a particular item is sharply reduced. A sequential search is performed on the smaller index rather than on the larger table. Once the correct index position has been found a second sequential search is performed on a small portion of the record table itself. The use of an index is applicable to a sorted table stored as a linked list, as well as to one stored as an array. Use of a linked list implies a larger space overhead for pointers, although insertions and deletions can be performed much more readily. If the table is so large that even the use of an index does not achieve sufficient efficiency (either because the index is large in order to reduce sequential searching in the table or because the index is small so that adjacent keys in the index are far from each other in the table), a secondary index can be used. The secondary index acts as an index

Page 102

Data Structures and Algorithms


to the primary index, which points to entries in the sequential table. This is illustrated in Figure 3.7.b. Deletions from an indexed sequential table can be made most easily by flagging deleted entries. In sequential searching through the table, deleted entries are ignored. Note that if an element is deleted, even if its key is in the index, nothing need be done to the index; only the original table entry is flagged. Insertion into an indexed sequential table is more difficult, since there may not be room between two already existing table entries, thus necessitating a shift in a large number of table elements. However, if a nearby item has been flagged as deleted in the table, only a few items need to be shifted and the deleted item can be overwritten. This may in turn require alteration of the index if an item pointed to by an index element is shifted. An alternative method is to keep an overflow area at some other location and link together any inserted records.

Figure 3.6.b Use of secondary Index Interpolation search The other technique for searching an ordered array is called Interpolation Search. If the keys are uniformally distributed between k(0) and k(n-1) the method may be even more efficient than the binary search. A variation of interpolation search called Robust Interpolation Search(or fast search), attempts to remedy the poor practical behavior of

Page 103

Data Structures and Algorithms


interpolation search while extending its advantage over binary search to non-uniform key distributions. The expected number of comparisons for Robust Interpolation Search for a random distribution of keys is O(Log Log n).

3.2.11

Basic Traversal Techniques

During a search or traversal the fields of a node may be used several times. It may be necessary to distinguish those nodes, which have been searched from the nodes, which have not been searched. The nodes, which have been searched are called visited. Visiting a node may involve printing out its data field, evaluating the operation specified by the node in the case of a binary tree representing an expression, setting a mark bit to one or zero, and so on. The term visited is used rather than the term for the specific function performed on the node. Techniques for Binary Trees Manipulating the binary trees or graphs, can solve many problems. This manipulation requires us to determine a vertex or a subset of vertices in the given data object that satisfies a given property. Let us, for instance try to find all vertices in a binary tree with a data value less than x or find all vertices in a given graph G that can be reached from another given vertex v. The determination of this subset of vertices satisfying a given property can be carried out by systematically examining the vertices of the given data object. This often takes the form of a search in the data object. The search, which necessitates the examination of every vertex in the object being searched, is called a traversal. There are many operations that could be performed on binary trees. One that is done frequently is traversing a tree or visiting each node in the tree exactly once. A traversal produces a linear order for the information in a tree. This linear order may be familiar and useful. When traversing a binary tree, each node and its subtrees must be treated in the same fashion. If L, D, and R stand for moving left, printing the data, and moving right when at a node, then there are six possible combinations of traversal: LDR, LRD, DLR, DRL, RDL, and RLD. If one adopts the convention that one traverse left before right, then only three traversals are possible: LDR, LRD, and DLR. named inorder, postorder, and preorder. Recursive functions are shown in Algorithm for Inorder Traversal. treenode = record { Type data; // Type is the data type of data. treenode *lchild; treenode *rchild; } Algorithm InOrder(t) //t is a binary tree. Each node of t has //three fields: lchild, data and rchild. { if t 0 then { InOrder(tlchild); Page 104

Data Structures and Algorithms


Visit(t); InOrder(trchild);

} } Algorithm Recursive formulation of Inorder Traversal Theorem Let T(n) and S(n) respectively represent the time and space needed by any one of the traversal algorithms when the input tree t has n 0 nodes. If the time and space needed to visit a node are (1), then T(n)= (n) and S(n)=O(n).Figure 5.1 below a binary tree.
A B D E G C

F H

Figure 3.7: A binary tree Techniques for Graphs In its simplest form it requires us to determine whether there exists a path in the given graph G=(V, E) such that this path starts at vertex v and ends at vertex u. A more general form is to determine for a given starting vertex uV all vertices u such that there is a path from v to u. Starting at vertex v and systematically searching the graph G for vertices that can be reached from v can solve this. Two search methods are: Depth First Search Breadth First Search

3.2.12

Optimization

Optimization is the process of improving the speed at which a program executes. Depending on context it may refer to the human process of making improvements at the source level or to a compiler's efforts to re-arrange code at the assembly level.

Page 105

Data Structures and Algorithms

Code Optimization Recompile your program with profiling enabled and whatever optimization options are compatible with that. Run your program, again on real world data and generate a profiling report. Figure out which function uses the most CPU time, then look over it very carefully and see if any of these approaches might be useful. Make one change at a time and then run the profiler again. Repeat the process until there is no obvious bottleneck or the program runs sufficiently fast. Choose a Better Algorithm First decide on what the code has to do. Become familiar with the body of literature that describes your specialty and learn and use the most appropriate algorithms. Familiarize the O(n) notation, which is defined very commonly. Some of the obvious replacements: Slow Algorithm sequential search Replace With binary search or hash lookup

Insertion or bubble sort Quick sort, merge sort, radix sort Data structure is to be chosen appropriately. A linked list would be a good option if you'll be doing a lot of insertions and deletions at random and an array would be better if you'll be doing some binary searching. Write Clear, Simple Code Codes that are clear and readable to humans are also clear and readable to compilers. Complicated expressions are harder to optimize and can cause the compiler to "fallback" to a less intense mode of optimization. Part of the clarity is making chunks of code into functions when appropriate. The cost of a function call is extremely small on modern machines. Writing a clean, portable code would help in quickly transferring to the latest, fastest machine and offer that as a solution to customers who are interested in speed. Perspective Take a note of operations that take time. Among the slowest are opening a file, reading or writing significant amounts of data, starting a new process, searching, sorting, Page 106

Data Structures and Algorithms


operations on entire arrays and copying large amounts of data around. The fastest operations are basic elements of the language like assigning to a variable, de-referencing a pointer or adding two integers. If you perform even the fastest operation 10 million times, it will take a noticeable amount of time. A sure sign of misunderstanding is this fragment: if (x != 0) x = 0; The intent is to save time by not initializing x if it's already zero. In reality, the test to see whether it is zero or not will take up about as much time as setting it to zero. x = 0; has the same effect and will be somewhat faster. There is no substitute for examining the assembler-level output the compiler generates and counting the instructions. Some optimization and instruction scheduling is put off until link time and will not show up in the assembler output for an individual module.

3.2.13

AND/OR Graphs

Many complex problems can be broken down into a series of subproblems so that these results in the solution for the original problem or they become sufficiently primitive as to be trivially solvable. This breaking down of a complex problem into several subproblems can be represented by a directed graph like structure in which nodes represent problems and descendents of nodes represent the subproblems associated with them. Example The graph of Figure 3.8(a) represents a problem A that can be solved by solving either both the subproblems B and C or the single subproblem D or E.

(a)

Page 107

Data Structures and Algorithms


(b)

A A B C D A E

Figure 3.8: Graphs representing problems Groups of subproblems that must be solved in order to imply a solution to the parent node are joined together by an arc going across the respective edges. By introducing dummy nodes in Figure 3.7(b), all nodes can be made to be such that their solution requires either all descendents to be solved or only one descendent to be solved. Nodes of the first type are called AND nodes and those of the latter type are called OR nodes. Nodes A and A of Figure 3.7(b) are OR nodes whereas A is an AND node. The AND nodes are drawn with an arc across all edges leaving the node. Nodes with no descendents are called terminal. Terminal nodes represent primitive problems and are marked either solvable or not solvable. Solvable terminal nodes are represented by rectangles. An AND/OR graph need not always be a tree. Problem reduction is the process of breaking down a problem into several sub problems. Problem reduction has been used on such problems as theorem proving, symbolic integration and analysis of industrial schedules. When problem reduction is used, two different problems may generate a common sub problem. In this case, it may be desirable to have only one node representing the sub problem. Figure 3.9 shows two AND/OR graphs for cases in which this is done. (a)

Page 108

Data Structures and Algorithms


(b)

A B D C E F

Figure 3.9: Two AND/OR graphs that are not trees Note that the graph is no longer a tree. Furthermore, such graphs may have directed cycles as in Figure 3.8. The presence of a directed cycle does not in itself imply the insolvability of the problem. In fact, solving the primitive problems G, H and I can help in solving problem A. This leads to the solution of D and E and hence of B and C. A subgraph of solvable nodes that shows that the problem is solved is called solution graph. Possible solution graphs for the graphs of Figure 3.8 are shown by heavy edges. Let there be a cost associated with each edge in the AND/OR graph. The cost of a solution graph H of an AND/OR graph G is the sum of the costs of the edges in H. The AND/OR graph decision problem (AOG) is to determine whether G has a solution graph of cost at most k, for k a given input. Theorem CNF-satisfiability the AND/OR graph decision problem. Proof Let P be a prepositional formula in CNF. Let us see how to transform a formula P in CNF into an AND/OR graph such that the AND/OR graph so obtained has a certain minimum cost solution if and only if P is satisfiable. Let k P = Ci, Ci= j l i=1 where the ljs are literals. The variables of P, V(P) are x1,x2,.,xn. The AND/OR graph will have nodes as follows:

Page 109

Data Structures and Algorithms


1. A special node S with no incoming arcs, represents the problem to be solved.

2. The node S is an AND node with descendent nodes P, x1, x2,, xn. 3. Each of the nodes xi represents the corresponding variable xi in the formula P.
Each xi is an OR node with two descendents denoted Txi and Fxi respectively. Solving Txi will correspond to assigning a truth-value of true to the variable xi.

4. The node P which is an AND node represents the formula P. It has k


descendents C1, C2, .,Ck. Node Ci corresponds to the clause Ci in the formula P. The nodes Ci are OR nodes.

5. Each node of type Txi or Fxi has exactly one descendent node that is terminal
(i.e., has no edges leaving it). These terminal nodes are denoted v1,v2,,v2n. To complete the construction of the AND/OR graph, the following edges and costs are added:

1. From each node Ci an edge Ci, Txj is added if xj occurs in clause Ci. An
edge Ci, Fxj is added if xj occurs in clause Ci. This is done for all variables xj appearing in the clause Ci. Clause Ci is designated an OR node.

2. Edges from nodes of type Txi or Fxi to their respective terminal nodes are
assigned a weight, or cost of 1. 3. All other edges have a cost of 0. The node S can be solved by solving each of the nodes P, x1,x2,,xn. Solving nodes x1,x2,.,xn costs n. To solve P, we must solve all the nodes C1,C2,,Ck. The cost of a node Ci is at most 1. However, if one of its descendent nodes was solved while solving the nodes x1,x2,,xn, then the additional cost to solve Ci is 0, as the edges to its descendent nodes have cost 0 and one of its descendents has already been solved. That is, a node Ci can be solved at no cost if one of the literals occurring in the clause C i has been assigned a value of true. If there is some assignment of truth-values to the xis such that at least one literal in each clause is true under that assignment, that iss if a formula P is satisfiable, it follows that the entire graph can be solved at a cost n. If P is not satisfiable, then the cost is more than n.The construction clearly takes only polynomial time. This completes the proof.

3.2.14

Bi-directional Components

The ability to combine separate reusable software components to form a complete program is necessary for effective software reuse. Views provide a clean, flexible, and efficient mechanism for combining reusable software components. A View describes how an application data type implements features of an abstract type; it provides a bi-

Page 110

Data Structures and Algorithms


directional mapping between a generic concept and a particular implementation of that concept. Parameterizing a generic procedure by means of views allows a single copy of the procedure to be specialized for a variety of application data types and target languages. The different abstract types that are often required are composition of views and multiple views of the same data. Automated support makes it easy to create views and to generate specialized code for an application in a desired target language. These techniques have been implemented.

3.2.15

Depth First Search and Traversal

A depth first search of a graph differs from a breadth first search in that the exploration of a vertex v is suspended as soon as a new vertex is reached. At this time the exploration of the new vertex u begins. When this new vertex has been explored, the exploration of v continues. The search terminates when all reached vertices have been fully explored. This search process is best-described recursively as in the below algorithm Algorithm DFS(v) Mark v as visited. For each vertex w such that v,w belongs to G If w is undiscovered DFS(G,w); Else Check vw without visiting w Mark w as finished DFS(G,w) Explore vw Visit w Explore from there as much as possible and backtrack from w to v. 1

(a) Undirected graph G 8 Page 111

Data Structures and Algorithms


1

(b) Directed graph head nodes [1] [2] [3] [4] [5] [6] [7] [8] 2 1 1 2 2 3 3 4

3 4 6 8 8 8 8 5

0 5 7 0 0 0 0 6 7 0 0 0

(c) Adjacency list for G Figure 3.10: Example graphs and adjacency lists Example A depth first search of the graph of Figure (5.4(a)) starting at vertex 1 and using the adjacency lists of Figure (5.4(c)) results in the vertices being visited in the order 1,2,4,8,5,6,3,7. One can easily prove that DFS visits all vertices reachable from vertex v. If T(n,e) and S(n,e) represent the maximum time and maximum additional space taken by DFS for an n-vertex e-edge graph, then S(n,e)=(n) and T(n,e)=(n+e) if adjacency lists are used and T(n,e)= (n2) if adjacency matrices are used. A depth first traversal of a graph is carried out by repeatedly calling DFS, with a new unvisited starting vertex each time. The algorithm for this (DFT) differs from BFT only in that the call to BFS(i) is replaced by a call to DFS(i). The exercises contain some problems that are solved best by BFS and others that are solved best by DFS. BFS and DFS are two fundamentally different search methods. In BFS a node is fully explored before the exploration of any other node begins. The next node to explore is the first unexplored node remaining. In DFS the exploration of a node is suspended as soon as a new unexplored node is reached.

Page 112

Data Structures and Algorithms


3.3 Revision Points

Searching Data stored in an organized manner requires to be accessed for processing. Locating a particular data item in the memory involves searching the data item. Searching is a technique where the memory is scanned for the required data. Dynamic Programming Dynamic programming is an algorithm design method that can be used when the solution to a problem can be viewed as the result of a sequence of decisions. Optimization Optimization is the process of improving the speed at which a program executes.

3.4

Intext Questions
1. Explain the concept of Dynamic Programming in detail. 2. Find a minimum cost path from s to t in the multistage graph of the below figure A. Do this using the forward approach then using the backward approach.

3. Explain Multistage Graphs in detail. 4. Describe the All-Pairs Shortest Paths with algorithm in detail.

5. Give an example of a set of knapsack instances for which | Si| = 2i, 0 i n.


Your set should include one instance for each n. 6. Explain 0/1 Knapsack in detail. 7. Explain the usage of AND/OR Graphs.

Page 113

Data Structures and Algorithms


8. Show that Depth First Search (DFS) visits all vertices in Graph from Vertex V. 9. State the differences between Depth First Search (DFS) and Breadth First Search (BFS).

3.5

Summary
Dynamic programming is an algorithm design method that can be used when the solution to a problem can be viewed as the result of a sequence of decisions. Dynamic programming often drastically reduces the amount of enumeration by avoiding the enumeration of some decision sequences that cannot possibly be optimal. In dynamic programming an optional sequence of decisions is obtained by making explicit appeal to the principle of optimality. The important difference between the greedy method and dynamic programming is that, in the greedy method only one-decision sequence is ever generated. In dynamic programming, many decision sequences may be generated A multistage graph problem is to find minimum cost path from source to target. All pairs of shortest-path problem is to determine a matrix A such that a(i,j) is the length of a shortest path from i to j . Search, which necessitates the examination of every vertex in the object being searched, is called a traversal. Optimization is the process of improving the speed at which a program executes. Problem reduction is the process of breaking down a problem into several sub problems. A sub graph of solvable nodes that shows that the problem is solved is called solution graph. In BFS a node is fully explored before the exploration of any other node begins. The next node to explore is the first unexplored node remaining. In DFS the exploration of a node is suspended as soon as a new unexplored node is reached.

Page 114

Data Structures and Algorithms


Terminal Exercises
1. 2. 3. 4. Design a data representation of The Traveling Salesperson Problem. Define Searching and Traversal Techniques. Define optimization. What is Code Optimization used for?

Supplementary Materials
1. Ellis Horowitz, Sartaj Sahni, Fundamentals of Computer Algorithms, Galgotia Publications, 1997. 2. Aho, Hopcroft, Ullman, Data Structures and Algorithms, Addison Wesley, 1987. 3. Jean Paul Trembly & Paul G.Sorenson, An introduction to Data Structures with Applications, McGraw-Hill, 1984.

Assignments
1. Prepare an assignment for applications of searching and its important.

Suggested Reading/Reference Books/Set Books


Mark Allen Weiss, Data Structures and Algorithm Analysis in C++, Addison Wesley, 1999. 1. Yedidyah Langsam, Moshe J.Augenstein, Aaron M. Tanenbaum, Data Structures Using C and C++, Prentice-Hall, 1997.

Learning Activities
Students can be performed the following task with small group 1. Searching 2. Indexed Sequential file

Page 115

Data Structures and Algorithms


Keywords
0/1 Knapsack Depth First Search Searching Sequential Search Binary Search AND / OR Graphs Traveling Salesperson Problem

Page 116

You might also like