You are on page 1of 143

Overview

Binary Trees
Traversal of Binary Trees
Binary Tree Representations
Threaded Binary Trees
Binary Search Tree
AVL tree (Balanced Binary Tree)
Run Time Storage Management
Garbage Collection
Compaction

Trees

Learning Objectives
Overview
Binary Trees
Traversal of Binary Trees
Binary Tree Representation
Threaded Binary Trees
Binary Search Tree
AVL Tree
Run Time Storage Management
Garbage collection
Compaction
Top

There are several data structures that you have studied like Arrays, Lists, Stacks, Queues and Graphs. All
the data structures except graphs are linear data structures. Graphs are classified in the non-linear category
of data structures. A tree is another data structure that is an important class of graphs. A tree is an acyclic
connected graph. A tree contains no loops or cycles. The concept of trees is one of the most fundamental
and useful concepts in computer science.
Trees have many variations, implementations, and applications. Trees find their use in applications such as
compiler constructions, database design, windows operating system programs etc.
Top
2 ALGORITHMS AND ADVANCED DATA STRUCTURES

A Binary tree is a finite set of elements that is either empty or is partitioned into three disjoint subsets. The
first subset contains a single element called the root of the tree. The other two subsets are themselves binary
trees, called the left and right subtrees of the original tree. A left or right subtree is called nodes of the tree.
A conventional method of picturing a binary tree is shown in figure 1.1. This tree consists of nine nodes
with A as its root. Its left subtree is rooted at B and its right subtree is rooted at C. This is indicated by the
two branches emanating from A to B on the left and to C on the right. The absence of a branch indicates an
empty subtree for example, the left subtree of the binary tree rooted at C and the right subtree of the binary
tree rooted at E are both empty. The binary trees rooted at D, G, H and I have empty right and left subtrees.

If A is the root of a binary tree and B is the root of its left or right subtree, then A is said to be the father of
B and B is said to be the left or right son of A. A node that has no sons (such as D, G, H, and I of figure
1.1) is called a leaf. Node n1 is an ancestor of node n2 (and n2 is a descendent of n1). If, n1 is either the father
of n2 or the father of some ancestor of n2. For example, in the tree of fig. 1.1, A is an ancestor of G and H is
a descendent of C, but E is neither an ancestor nor a descendent of C. A node n2 is a left descendent of node
n1 if n2 is either the left son of n1 or a descendent of the left son of n1. A right descendent may be similarly
defined. Two nodes are brothers if they are left and right sons of the same father.
Figure 1. 2 illustrate some structures that are not binary trees.

If every nonleaf node in a binary tree has non-empty left and right sub trees, the tree is called a Strictly
Binary Tree. Thus the tree of figure 1.3 is strictly binary tree.
TREES 3

A strictly binary tree with n leaves always contains 2n-1 nodes.


The level of a node in a binary tree is defined as follows: The root of the tree has level 0, and the level of
any other node in the tree is one more than the level of its father. For example in the binary tree of figure
1.1 node E is at level 2 and node H is at level 3. The depth of a binary tree is the maximum level of any leaf
in the tree. Thus the depth of the tree of figure 1.1 is 3. A complete binary tree of depth d is the strictly
binary tree all of whose leaves are at level d.
If a binary tree contains m nodes at level l, it contains at most 2m nodes at level l+1. A complete binary tree
of depth d is the binary tree of depth d that contains exactly 2l nodes at each level between 0 and d.
A binary tree of depth d is an almost complete binary tree if:
1. Each leaf in the tree is either at level d or at level d-1.
2. For any node nd in the tree with a right descendent at level d, all the left descendent of nd that are
leaves are also at level d.
The strictly binary tree of figure 1.4a is not almost complete since it violates conditions. The binary tree of
figure is an almost complete binary tree 1.4b.

Student Activity 1.1


Before going to next section, answer the following questions:
1. What is a Binary Tree?
2. Define Strictly Binary Tree?
3. Find the total number of nodes in a complete binary tree of depth d.
If your answers are correct, then proceed to next section.
Top
4 ALGORITHMS AND ADVANCED DATA STRUCTURES

In many applications it is necessary, not only to find a node within a binary tree, but to be able to move
through all the nodes of the binary tree visiting each one in turn. If there are n nodes in the binary tree then
their n-1 different orders in which they could be visited, but most of these have regularity of pattern. This
operation is called tree traversing. We will define three of these traversal methods. In each of there
methods, nothing needs to be done to traverse an empty binary tree. The methods are all defined
recursively, so that traversing a binary tree involves visiting the root and traversing its left and right sub
trees. The only difference among the methods is the order in which these three operations are performed.
To traverse a non-empty binary tree in Preorder (also known as depth-right order), we perform the
following three operations:
(1) Visit the root.
(2) Traverse the left sub tree in preorder.
(3) Traverse the right sub tree in preorder.
To traverse a non-empty binary tree in Inorder (or symmetric order)
(1) Traverse the left subtree in inorder.
(2) Visit the root.
(3) Traverse the right subtree in inorder.
To traverse a nonempty binary tree in Postorder
(1) Traverse the left subtree in postorder.
(2) Traverse the right subtree in postorder.
(3) Visit the root.
Figure 1.5(a&b) Illustrates two binary trees and their traversals in preorder, inorder and postorder.

PREORDER : ABDGCEHIF

INORDER : DGBAHEICF

POSTORDER : GDBHIEFCA

!
TREES 5

I J K L

PREORDER : ABCEIFJDGHKL

INORDER : ELEFJBGDKHLA

POSTORDER : IEJFCFKLHDBA

" #

Student Activity 1.2


Before going to next section, answer the following questions:
1. Find the Inorder, Preorder and Postorder traversals of the following binary tree.

If your answers are correct, then proceed to next section.


Top

Recall that the n nodes of an almost complete binary tree can be numbered from 1 to n, so that the number
assigned to a right son is 1 more than twice the number assigned its father. We can represent an almost
complete binary tree without father, left or right links. Instead, the nodes can be kept in an array in the of
size n. We refer to the node at position p simply as “ node p “or info[p] holds the contents of node p info
being the array name.
6 ALGORITHMS AND ADVANCED DATA STRUCTURES

In C, array start at position 0; therefore instead of numbering the tree nodes from 1 to n, we number them
from 0 to n – 1. Because of the one-position shift, the two sons of a node numbered p are in position 2p + 1
and 2p + 2, instead of 2p and 2p + 1.
The root of the tree is at position 0, so that tree, the extend pointer to the tree root, always equals 0. The
node in position p (that is, node p) is the implicit father of nodes 2p + 1 and 2p + 2. The left son of node p is
node 2p + 1 and right son of p by 2p + 2. Given a left son at position p, its right brother is at p – 1 and,
given a right son at position p its left brother is at p – 1. Father of p is implemented by (p – 1)/2. p points to
a left son if and only if p is odd. Thus the test for whether node p is a left son (this is left operation) is to
check whether p % 2 is not equal to 0. Figure 1.6 illustrates arrays that represent the almost complete binary
trees.

We can extend this implicit array representation of almost complete binary trees to an implicit array
representation of binary trees generally. We do this by identifying an almost complete binary tree that
contains the binary tree being represented. Figure 1.7(a) illustrate two (non - almost complete) binary trees
TREES 7

and Figure 1.7(b) illustrates the smallest complete binary trees that contain them finally Figure illustrates
the implicit array representation of these almost complete binary trees, and by extension, of the original
binary trees. The implicit array representation is also called the sequential representation, as contrasted with
the linked representation presented earlier, because it allows a tree to be implemented in a contiguous block
of memory (an array) rather than via pointers connecting widely separated nodes. Under the sequential
representation an array element is allocated whether or not it serves to contain a node of a tree. This may be
accomplished by one of two methods. One method is to set info[p] to a special value if node p is NULL.
This special value should be invalid as the information content of a legitimate tree node. For example in a
tree containing positive numbers, a NULL node may be indicated by a negative info value. Alternatively,
we may add a logical flag field, used to each node. Each node then contains two fields info. The entire
structure is contained in an array implemented as node (p). Info p is implemented by node (p) info. We use
this method latter in implementing the sequential representation.

(a) Two binary trees

%
8 ALGORITHMS AND ADVANCED DATA STRUCTURES

A node can be defined in language C as follows:


struct node
{
int info;
struct node *left;
struct node *right;
struct node *father;
};
typedef struct node *nodeptr;
The operations info (p), left (p), right (p) and father can be implemented by references to p-info, p-left, p-
right, and p-father, respectively. These operations are used to retrieve the value of node p, left child of node
p, right child of node p and father of node p respectively.

We may implement the traversal of binary trees in C by recursive routines that mirror the traversal
definitions. The tree C routines preorder, inorder and postorder visit the contents of a binary tree in
preorder, inorder, and postorder respectively.
The parameter to each routine is a pointer to the root node of a binary tree. We use the dynamic node
representation of a binary tree.
/*Preorder : Visit each node of the tree in preorder*/
void preorder (nodeptr root)
{
if (root) {
visit (root);
preorder (root→left);
preorder (root→right);
}
}
/* Inorder: visit each mode in Inorder*/
void inorder (nodeptr root)
{
if (root)
TREES 9

{
inorder (root→left);
visit (root);
inorder (root→right);
}
}
/* Postorder: visit each mode in Postorder*/
void postorder (nodeptr root)
{
if (root)
{
postorder (root→left);
postorder (root→right);
visit (root);
}
}

Student Activity 1.3


Before going to next section, answer the following questions:
1. Write algorithms for inorder, preorder and postorder traversals.
2. Construct the binary tree whose inorder and preorder traversals are given as:
Inorder: E1CFJB9DKHLA
Preorder: ABCEIFJDGHKL
If your answers are correct, then proceed to next section.
Top

Traversing a binary tree is a common operation, and it would be helpful to find a more efficient method of
implementing the traversal. As we have seen that generally either left or right child of node is empty i.e.
NULL. We can change these null links in a binary tree to special links called threads, so that it is possible
to perform traversals insertion and deletions operations without using either a stack or a recursion.
In a right threaded binary tree each right link is replaced by a special link to the successor of that node
under inorder traversal, called a right thread. Using right threads we will easily do an inorder traversal of a
tree, since we need only to follow either an ordinary link or a thread to find the next node to visit.
A Left threaded binary tree may be defined similarly as one in which each NULL left pointer is altered to
contain a thread to that node’s inorder predecessor. A binary tree which has both left and right threads is
10 ALGORITHMS AND ADVANCED DATA STRUCTURES

called a fully threaded binary tree. The word fully is omitted if there is no danger of confusion. Following
figure shows a fully threaded binary tree where the threads are shown as dotted lines.

K H I
& " "
Following figure shows a Right-threaded binary tree.

' ( " "

To implement a right threaded binary tree under the dynamic node implementation of a binary tree, and
extra logical field, rethread, is included with in each node is indicate whether or not its right pointer is a
thread. For consistency, the thread field of the right most node of a tree (that is the last node in the tree’s in
order traversal) is also left to TRUE, although its right field remains NULL.
Thus a node is defined as follows (We are assuming that no father field exists):
Struct node {
int info;
struct node *left
struct node * right
int rethread; |* a not null thread*|
};
typedef struct node *nodeptr;
Now we present a routine to implement inorder traversal of a right threaded binary tree.
inorder2 (nodeptr root)
{
Nodeptr p, q;
p = root;
TREES 11

do {
q = NULL;
while (p! = NULL) { |* Traverse left branch*|
q = p;
p = p→left;
}
if (q! = NULL){
visit (q);
p = q→right;
while (q→rthread && p! = NULL){
visit (p);
q = p;
p = p→right
}
}
}while (q! = NULL)
}

Student Activity 1.4


Before going to next section, answer the following questions:
1. What are right threaded and left threaded binary trees?
2. Discuss the advantages and disadvantages of threaded binary trees.
If your answers are correct, then proceed to next section.
Top

A Binary Search Tree (BST) is an ordered binary tree such that either it is an empty tree or
1. each data value in its left subtree is less than to the root value,
2. each data value is its right subtree is greater than the root value, and
3. left and right subtrees are again binary search trees.
Following figure shows a binary search tree
12 ALGORITHMS AND ADVANCED DATA STRUCTURES

Following operation can be performed on a Binary Search Tree:


1. Initialization of a Binary Search Tree; this operation makes an empty tree.
2. Check whether BST is empty or not.
3. Create a node for the Binary search tree; this operation allocates memory space for the new node;
returns with error if no space is available.
4. Retrieve a nodes data.
5. Update a node’s data.
6. Insert a node.
7. Search for a node.
8. Traverse a Binary Search Tree.
The advantage of using a BST over an array is that a tree enables search insertion, and deletion operation to
be performed efficiently. If an array is used, an insertion or deletion requires that approximately half of the
elements of the array be moved. Insertion or deletion in a BST on the other hand, requires that only a few
pointers be adjusted.
The following algorithm searches a binary search tree and inserts a new record into the tree if the search is
unsuccessful (We assume the existence of a function make tree that constructs a binary tree consisting of a
single node whose information field is passed as an argument and returns a pointer to the tree.)
q = NULL;
p = root;
while (p! = NULL) {
if (key = k (p)) return (p);
q = p;
if (key < k (p))
p = left (p)
else
p = right (p)
}
TREES 13

v = make tree (sec, key);


if (a = NULL);
root = v
else
if (key < k (q))
left (q) = v;
else
right (q) = v;
return (v)
Here key is item to the searched.

We now present an algorithm to delete a record (node) with key “key” from a Binary Search Tree. There
are three cases to consider. If the node to be deleted has no sons, it may be deleted with out further
adjustment to the tree. This is illustrated in the following figure.

* " + , )

If the node to the be deleted has only one subtree, its only son can be moved up to take its place. This is
illustrated in the following figure:
14 ALGORITHMS AND ADVANCED DATA STRUCTURES

* " + , !

If, however, the node p to delete has two subtrees, its In order successor (or predecessor) must take its
place. The in order successor cannot have a left subtree (since a left descendent would be the In order
successor of p). Thus the right son of s can be moved up to take the place of s. This is illustrated in the
following figure, where the node with key 12 replaces the node with key 11 and is replaced in turn by the
node with key 13.

* + -

Since the definition of a binary search tree is recursive, it is easiest to describe a recursive search method.
Suppose we wish to search for an element with key x. An element could in general be an arbitrary structure
that has as one of its field a key. We assume for simplicity that the element just consists of a key and use
the terms element and key interchangeably. We begin at root. If the root is 0, then the search tree contains
no elements and the search is unsuccessful. Otherwise we compare x with the key in the root. If x equals
this key, the search terminates successfully. If x is less than the key in the root, then no element in the right
subtree can have key value x and only the left subtree is to be searched. If x is larger than the key in the
root, only the right subtree needs to be searched. The subtree can be searched recursively as in the
following algorithm.
Search (Nodeptr root, int x)
{
if (root ==0) return o;
else if (x == root →info) return root;
else if (x< root→info)
return (search (root→left, x));
else return (search (root→right), x);
TREES 15

}
following table gives the efficiencies of search insert and delete operation in a binary search the.

" # $ %&' $ %( !&'

& $ %&' $ %( !&'

( $ %&' $ %( !&'

Student Activity 1.5


Before going to next section, answer the following questions:
1. How many binary search trees are possible with key values 1,2,3,4,5?
2. Delete node 10 from the following binary search tree.

If your answers are correct, then proceed to next section.


Top

!" # $
The effectiveness of searching process in a binary search tree depends on how data are organised to make
up a specific tree. For example consider the two shapes given in the following figures

10
9
16 ALGORITHMS AND ADVANCED DATA STRUCTURES

8
7
1
2
3
4
5
6
! "

The efficiency of search will be rather different in these two cases, although the same elements are
organised in the two structures. The tree in the first figure above is rather short and compact while the tree
in second figure is a long and thin tree. We may say that the tree in the first figure is somewhat more
balanced than that in the second figure.
Let us define more precisely the rotation of a “balanced” tree. The height of a binary tree is the maximum
level of its leaves (this is also sometimes known as the depth of the tree). For convenience, the height of a
NULL tree is defined as -1. A balanced binary tree (AVL tree) is a binary tree in which the heights of two
subtrees of every node never differ by more than 1. The balance of a node in a binary tree is defined as the
height of its left subtree minus the height of its right subtrees.
Following figure illustrates a balanced binary tree. Each node in a balanced binary tree has a balance of –1,
+1, or 0, depending on whether the height of its left subtree is greater than, less than, or equal to the height
of its right subtree. The balance of each node is also indicated in the following figure.

Suppose that we are given a balanced binary tree and use the preceding search and insertion algorithm to
insert a new node p into the tree. The resulting tree may or may not remain balanced. Following figure
illustrates all possible insertions that may be made to the tree of figure 1.16 (b).

$ * $
TREES 17

$ $ $ * $ *

$ $ $ $ $ $

+ + + + $ $ $ $

+ + + + + + +

Each insertion that yields a balanced tree is indicated by a b. The unbalanced insertions are indicated by a
U, and one numbered from 1 to 12. It is easy to see that the tree becomes unbalanced if and only if the
newly inserted node is a left descendent of a node that previously had a balance of 1 (this occurs in case U1
through U8 in figure 1.16(b) or it is a right descendent of a node that previously had a balance of –1 (cases
U9 through U12).
Top

% &
As discussed earlier, allocation of storage and its release is done for one node at a time. This method is
convenient in regard to two properties of nodes, (i) size of a node of a particular type is fixed. (ii) a node is
sizeably small. But these two characteristics don’t help in programs where a large amount of contiguous
storage is required. At times a program may require storage blocks in varied sizes and thus arises the need
of a memory management system. The run time storage management is such a system and is a convenient
tool for processing requests for variable-length blocks.
The illustration exhibited expresses the necessity of availability of space; its allocation when space is
requested and combining of contiguous free spaces when a block is freed.
As an example of this situation, consider a small memory of 1024 words. Suppose a request is made for
three blocks of storage of 348, 110 and 212 words, respectively. Let us-further suppose that these blocks are
allocated sequentially, as shown in Figure 1.17(a). Now suppose that the second block of size 110 is freed,
resulting in the situation depicted in Figure 1.17(b). There are now 464 words of free space; yet, because
the free space is divided into noncontiguous blocks, a request for a block of 400 words could not be
satisfied.
18 ALGORITHMS AND ADVANCED DATA STRUCTURES

Suppose that block 3 were now freed. Clearly, it is not desirable to retain three free blocks of 110, 212, and
354 words. Rather the blocks should be combined into a single large block of 676 words so that further
large requests can be satisfied. After combination, memory will appear as in Figure 1.17(c).
This example illustrates the necessity to keep track of available space, to allocate portions of that space
when allocation requests are presented, and to combine contiguous free spaces when a block is freed.
Top

' (
Deallocation of nodes can take place in two levels:
1. The application which claimed the node, releases it back to the operating system.
2. The operating system calls the storage management routines to return free nodes to the free space.
For example deallocation as in:
1. Occurs in ac, program with the statement “free (x)” where x is space earlier allocated by a malloc call.
2. Is usually implement by the method of Garbage Collection. This requires the presence of a ‘marking
bit’ on each node. It runs in two phases. In the first phase, all non-garbage nodes are marked. In the
second phase all non-marked nodes are collected and returned to the free space. Where variable size
nodes are used. It is desirable to keep the free space as one contiguous block, in this case, the second
phase is called memory compaction.
TREES 19

Garbage Collection is usually called when some program runs out of space. It is a slow process and its use
should be obtained by efficient programming models.
One field must be set aside in each node to indicate whether a node has or has not been marked. The
marking phase sets the mark field to tree in each accessible node. As the collection phase proceed the mark
field in each accessible node is reset to false. Thus, at the start and end of garbage collection, all mark fields
are false. User program do not affect the mark fields.
It is sometimes inconvenient to reserve one field in each node solely for the purpose of marking. In that
case a separate area in memory can be reserved to hold a long array of mark bits. One for each node that
may be allocated.
One aspect of garbage collection is that it must run when there is very little space available. This means that
auxiliary tables and stacks needed by the garbage collector must be kept to a minimum. Since there is little
space available for them. An alternative is to reserve a specific percentage of memory for the exclusive use
of the garbage collector. However, this effectively reduces the amount of memory available to the user and
means that the garbage collector will be called more frequently.
Whenever the garbage collector is called, all user processing comes to a halt while the algorithm examines
all allocated nodes in memory. For his reason it is desirable that the garbage collector be called as
infrequently as possible. For real time applications, in which a computer must respond to a user request
within a specific short time span, garbage collection has generally been considered an unsatisfactory
method of storage management. We can picture a space ship drifting off into the infinite as it waits for
directions from a computer occupied with garbage collection. However, methods have recently been
developed whereby garbage collection can be performed simultaneously with user processing. This means
that the garbage collector must be called before all space has been exhausted so that user processing can
continue in whatever space is left, while the garbage collector recovers additional space.
Another important consideration is that users must be careful to ensure that all lists are well formed and that
all pointers are correct. Usually the operations of a list processing system are carefully implemented so that
if garbage collection does occur in the middle of one of them, the entire system still works correctly.
However, some users try to outsmart the system and implement their own pointer manipulations. This
requires great care so that garbage collection will work properly. In a real-time garbage collection system,
we must ensure not only that user operations do not upset list structures that the garbage must have but also
that the garbage collection algorithm itself does not unduly disturb the list structures that are being used
concurrently by the user.
It is possible that, at the time the garbage collection program is called, users are actually using almost all
the nodes that are allocated. Thus almost all nodes are accessible and the garbage collector recovers very
little additional space. After the system runs for a short time, it will again be out of space; the garbage
collector will again be called only to recover very few additional nodes, and the vicious cycle starts again.
This phenomenon, in which system storage management routines such as garbage collection are executing
almost all the time, is called thrashing.
Clearly thrashing is a situation to be avoided. One drastic solution is to impose the following condition. If
the garbage collector is run and does not recover a specific percentage of the total space, the user who
requested the extra space is terminated and removed from the system. All of that users space is then
recovered and made available to other users.
Top
20 ALGORITHMS AND ADVANCED DATA STRUCTURES

As a final topic, we shall briefly discuss compaction as a technique for reclaiming storage and introduce an
algorithm this task.
Compaction works by actually moving blocks of data etc. from one location in memory to another so as to
collect all the free blocks into one large block. The allocation problem then becomes completely implied.
Allocation now consists of merely moving a pointer which point to the top of this successively shortening
block of storage. Once this single block gets too small again, the compaction mechanism is again invoked
to reclaim what unused storage may now exist among allocated blocks. There is generally no storage
release mechanism. Instead, a marking algorithm is used to mark blocks that are still in use. Then, instead
of freeing each unmarked block by calling a release mechanism to put it on the free list, the compactor
simply collects all unmarked blocks into one large block at one end of the memory segment. The only real
problem in this method is the redefining of pointers. This is solved by making extra passes through
memory. After blocks are marked, the entire memory is stepped through and the new address for each
marked block is determined. This is solved by making extra passes through memory. After blocks are
marked, the entire memory is stepped through and the new address for each marked block is determined.
This new address is stored in the block itself. Then another pass over memory is made. On this pass,
pointers that point to marked blocks are reset to point to where the marked blocks will be after compaction.
This is why the new address is stored right in the block – it is easily obtainable. After all pointers have been
reset, then the marked blocks are moved to their new locations. A general algorithm for the compaction
routine is as follows.
1. Invoke garbage collection marking routine.
2. Repeat step 3 until the end of memory is reached.
3. If the current block of storage being examined has been marked then set the address of the block to
the starting address of unused memory update the starting address of unused memory
4. Redefine variable references address of unused memory.
5. Define new values for pointers in marked block.
6. Repeat step 7 until the end of memory is reached
7. Move marked blocks into new locations and reset markets.

Student Activity 1.6


Answer the following questions:
1. What is Garbage collection? What is the disadvantage of garbage collection?
2. Discuss the advantages of AVL Trees.
3. What are the conditions for a tree to be an AVL tree?

%
So after go through this chapter, we can summarize the concepts.
A Binary tree is a finite set of elements, trust is either empty or is partitioned into three disjoint
subsets.
We can traverse a Binary tree in three way i.e. Pre order, Inverter & Post order.
In a threaded binary tree each right sink is replaced by a special link to the successor of that node
under in order traversal called a right thread.
TREES 21

A Binary Search tree (BST) is an ordered binary tree, such tree either it is an empty tree or each data
value in its left sub tree is less than to the root valve, each data valve is its right sub tree is greater than
the root valve and left sight sub trees are again binary search tree.
A balanced binary tree (AVL tree) is a binary tree in which the heights of two sub tree of every node
never differ by more than 1.

) *+

I. True and False


1. A binary tree can have more than 2 children of a node.
2. Array representation of binary tree is more efficient than dynamic representation
II. Fill in the blanks
1. Garbage collection is a method of ____________.
2. The advantage of using a Binary search tree over an array is that a tree enables ____________
and ___________to be performed more efficiently.
3. The average time complexity of binary search is __________.

I. True and False


1. False
2. False
II. Fill in the blanks
1. run time storage management
2. insertion and deletion
4. log2n.

I. True and False


1. Threaded binary tree are useful in tree traversals
2. An AVL tree is a more efficient binary search tree
II. Fill in the blanks
1. A ____________ is a finite set of elements that is either empty or is partitioned into three
disjoint subsets.
22 ALGORITHMS AND ADVANCED DATA STRUCTURES

2. In ____________ binary tree each right link is replaced by a special link to the success of that
node under in order traversal.
3. The advantage of using a __________ over an array is that a tree enables search insertion and
deletion operation to be performed efficiently.
4. Since the definition of a binary search tree is ________, it is easiest to describe a recursion
search method.
5. The effectiveness of ______ process in a binary search tree depends on how data are organized
to make up a specific tree.

,%
1. Prove that the root of a binary tree is an ancestor of every node in the tree except itself.
2. Prove that a node of a binary tree has at most one father.
3. Prove that a strictly binary tree with a leaves contain 2n–1 nodes.
4. Two binary trees are similar if they are both empty or if their left sub trees are similar, and their right
sub trees are similar, write an algorithm to determine if two binary trees are similar.
5. Write C routines to traverse a binary tree in preorder and Post order.
Overview
Bubble Sort
Insertion Sort
Selection Sort
Quick Sort
Merge Sort
Radix Sort
Heap Sort
External Sorting
Lower Bound Theory
Adversary Arguments
Minimum Spanning Tree
Shortest Paths
Graph Component Algorithm
String Matching
The Boyer-Moore Algorithm

Sorting Techniques

Learning Objectives
• Overview
• Bubble Sort
• Insertion Sort
• Selection Sort
• Quick Sort
• Merge Sort
• Radix Sort
• Heap Sort
• External Sort
• Lower Bound theory for sorting
• Selection and Adversary Argument
• Minimum Spanning Tree
• Prim’s Algorithm
• Kruskal’s Algorithm
• Shortest Path
• Graph Component Algorithm
• String Matching
• KMP Algorithm
24 ALGORITHMS AND ADVANCED DATA STRUCTURES

• Boyer Moore Algorithm


Top

The concept of an ordered set of elements is one that has considerable impact on our daily lives. Consider,
for example, the process of finding a telephone number in a telephone directory. This process; called a
search, is simplified considerably by the fact that the names in the directory are listed in alphabetical order.
Consider the trouble you might have in which the customers placed their phone orders with the telephone
company. In such a case, the names might as well have been entered in random order. Since the entries are
sorted in alphabetical rather than in chronological order, the processing is simplified.
A few years ago, it was estimated, more than half the time on many commercial computers was spent in
sorting. This is perhaps no longer true, since sophisticated methods have been devised for organizing data,
methods that do not require that it be kept in any special order. Eventually nonetheless, the information
does go out to people, and then it must be sorted in some way. Because sorting is so important, great many
algorithms have been devised for doing it. In fact so many ideas appear in sorting methods that an entire
course could easily be built around this one theme. Amongst the different methods, the most important is
the distinction between internal and external, that is, whether there are so many structures to be sorted that
they must be kept in external files on disks, tapes, or the like, or whether they can all be kept internally in
high-speed memory.
We now present some basic terminology. A file of size n is a sequence of n items r(0), r(1),……r(n–1).
Each item in the file is called a record. A key, k(i), usually (but not always) a subfield of the entire record.
The file is said to be sorted on the key if i<j implies that k[i] precedes k[j] in some ordering on the keys. In
the example of the telephone directory, the file consists of all the entries in the book. Each entry is a record;
the key upon which the file is sorted is the name field of the record. Each record also contains fields for an
address and a telephone number.
Top

The first sort presented is probably the most widely known among beginners and students of programming
- the Bubble sort. One of the characteristics of this sort is that it is easy to understand and program. Yet, of
all the sorts we shall consider, it is probably the least efficient.
In each of the subsequent examples, x is an array of integers of which the first n are to be sorted so that x[i]
≤ x[j] for 0 ≤ i < j ≤ n . It is straightforward to extend this simple format to one, which is used in sorting n
records, each with a subfield key k.
The basic idea underlying the bubble sort is to pass through the file sequentially several times. Each pass
consists of comparing each element in the file with its successor (x[i] with r[i+1]) and interchanging the
two elements if they are not in proper order. Consider the following file:
25 57 48 37 12 92 86 33
The following comparisons are made on the first pass
x[0] with x[1] (25 with 57) no interchange
x[1] with [2] (57 with 48) interchange
x[2] with x[3] (57 with 37) interchange
SORTING TECHNIQUES 25

x[3] with x[4] (57 with 12) interchange


x[4] with x[5] (57 with 92) no interchange
x[5] with x[6] (92 with 86) interchange
x[6] with x[7] (92 with 33) interchange
Thus that after first pass, the file is in the following order
25 48 37 12 57 86 33 92
Notice that after this pass, the largest element (92) is in its proper position with in the way. In general x[n–i]
will be in its proper position after iteration i. This method is called the bubble sort because each number
slowly “bubbles” up to its proper position. After the second pass the file is
25 37 12 48 57 33 86 92
Notice that 86 has now found its way to the second highest position. Since each integration places a new
element into its proper position, a file of n element requires no more than n–1 iterations.
The complete set of iterations is the following
iteration 0 (initial file) 25 57 48 37 12 92 86 33
iteration 1 25 48 37 12 57 86 33 92
iteration 2 25 37 12 48 57 33 86 92
iteration 3 25 12 37 48 33 57 86 92
iteration 4 12 25 37 33 48 57 86 92
iteration 5 12 25 33 37 48 57 86 92
iteration 6 12 25 33 37 48 57 86 92
iteration 7 12 25 33 37 48 57 86 92
On the basis of foregoing discussion we could proceed to code the bubble sort. We present a routine bubble
that accepts two variables x and n. x is an array of numbers, and n is an integer representing the number of
elements to be sorted (n may be less than number of elements in x).
bubble (int x [ ], int n)
{
int i, j, temp;
for (i = 0; i < n–1; i++) // outer loop controls the no.
for (j = 0; j < n–1; j ++) // of passes
if x [j] > x [j+1] // inner loop governs each
// individual pass
{ // interchange elements
temp = x[j];
x[j] = x[j + 1];
ALGORITHMS AND ADVANCED DATA STRUCTURES

x [j + 1] = temp;
}
}
What can be said about the efficiency of bubble sort? The total number of comparisons is (n–1) (n–1) = n2 -
2n +1, which is 0(n2). Of course the number of interchanges cannot be greater than the number of
comparisons. It is likely that it is the number of interchanges rather than the number of comparisons that
takes up the most time in the program’s execution.

Student Activity 2.1


Before going to next section, answer the following questions:
1. Discuss the advantages of sorting.
2. Sort the following file using bubble sort
15, 4, 18, 19, 10, 21
If your answers are correct, then proceed to next section.
Top

An insertion sort is one that sorts a set of records by inserting records into an existing sorted file.
Example
Initial order: SQ SA C7 H8 DK
Step 1: SQ SA C7 H8 DK
Step 2: C7 SQ SA H8 DK
Step 3: C7 H8 SQ SA DK
Step 4: C7 DK H8 SQ SA
Example of insertion sort
The insertion sort algorithm thus proceeds on the idea of keeping the first part of the list, when once
examined, in the correct order. An initial list with only one item is automatically in order. If we suppose
that we have already sorted the first i-1 items, then we take item i and search through this sorted list of
length i-1 to see where to insert item i.
The algorithm for insertion sort is as follows:
insertion (int x [ ], int n]
{
int i, k, y;
/* initially x [0] may be thought of as a sorted file of one element. After each repetition of
the following loop. The element x [0] through x [k] are in order.*/
for (k = 1; k < n; k + 1)
SORTING TECHNIQUES 27

{
y = x [k];
/* move down all elements greater than y by 1 position*/
for (i = k–1; i > = 0 && y < x [i]; i---)
x [i + 1] = x [i];
/* insert y at proper position*/
x[i + 1] = y;
}
If the initial file is sorted, only one comparison is made on each pass, so that the sort is O[n]. If the file is
initially sorted in the never order, the sort is O(n2), since the total no. of comparisons is
(n–1) + (n –2) + …. + 3 + 2 + 1 = (n –1) * n /2
Which is O(n 2). However the insertion sort is still better than the bubble sort. The closer to file is sorted
order, the more efficient the insertion sort becomes. The average no. of comparisons in the insertion sort is
also O(n2). The space requirement for the sort consists of only one temporary variable, y.

Student Activity 2.2


Before going to next section, answer the following questions:
1. Compare search efficiencies of Insertion sort and Bubble sort.
2. Sort the following key values using insertion sort
20, 17, 25, 13, 54
If your answers are correct, then proceed to next section.
Top

A selection sort is one in which successive element are selected in order and placed into their proper sorted
positions. The elements of the impact may have to be preprocessed to make the ordered selection possible.
The selection sort consists entirely of a selection phase in which the largest of the remaining elements,
large, is repeatedly placed in its proper position I, at the end of the array, to do so large is interchanged with
the element x[i]. After n –1 selection the entire array is sorted. Thus the selection process need be done only
from n –1 down to 1 rather than down to 0. The following algorithm implements the selection sort.
selection (int x[ ], int n]
{
int i, j, large, k;
for (i = n –1; i>0; i- -)
{
28 ALGORITHMS AND ADVANCED DATA STRUCTURES

/* place the largest number of x [0] through x [i] into large and its index into k */

large = x [a];
k = 0;
for (j = 1; j < = i; j ++)
if (x [j] > large)
{
large = x [j];
}
x[n] = x[i];
x[i] = large;
}
}
Analysis of the selection sort is straightforward. The first pass makes n –1 comparison, the second pass
makes n –2, and so on. Therefore, there is a total of
(n –1} + (n –2) + ------ + 3 + 2 + 1 = n (n –1)/2
Comparisons, which is O(n2). The number of interchanges is always n –1. There is little additional storage
required (except to hold a few temporary variables). The sort may therefore be categorized as O(n2),
although it is faster than bubble sort. Example of selection sort is given below:

Student Activity 2.3


Before going to next section, answer the following questions:
1. Find the worst case efficiency of selection sort.
2. Compare storage efficiencies of bubble sort and selection sort.
If your answers are correct, then proceed to next section.
Top
SORTING TECHNIQUES 29

The next sort we consider is the quick sort (or portion exchange sort). Let x be an array, and n the number
of elements in the array to be sorted. Choose an element a from a specific position within the array (for
example, a can be chosen as the first element so that a = x[0]). Suppose that the elements of x are
partitioned so that a is placed into position j and the following condition hold:
1. Each of the elements in position 0 through j–1 is less than or equal to a.
2. Each of the elements in position j + 1 through n –1 is greater than or equal to a.
Notice that if these two conditions hold for a particular a and j, a is the jth smallest element of x, so that a
remains in position j when the array is completed sorted. If the foregoing process is repeated with sub
arrays x[0] through x[j–1] and x[j+1] through x[n –1] and any sub-arrays created by the process in
successive iteration s, the final result is a sorted file. Hence it is a divide and conquer technique.
Let us illustrate quicksort with an example. If an initial array is given as
25 57 48 37 12 92 86 33
And the first element (25) is placed in its proper position, the resulting array is
12 25 57 48 37 92 86 33
At this point, 25 is in its proper position in the array (x[1]), each element below that position (12) is less
than or equal to 25, and each element above that position (37, 48, 37, 92, 86 and 33) is grater than or equal
to 25. Since 25 is its final position the original problem has been decomposed into the problem of sorting
the two sub-arrays.
12 25 and (57 48 37 92 86 33)
Nothing need be done to sort the first of these subarays; a file of one element is already sorted. To sort the
second subarray the process is repeated and the subarray is further decided. The entire array may now be
viewed as
12 25 (57 48 37 92 86 33)
Where parentheses enclose the subarrays that are yet to be sorted. Repeating the process on the subarray
x[2] through x[7] yields
12 25 (48 37 33) 57 (92 86)
and further repetitions yield.
12 25 (37 33) 48 57 (92 86)
12 25 (33) 37 48 57 (92 86)
12 25 33 37 48 57 (92 86)
12 25 33 37 48 57 86 92
12 25 33 37 48 57 86 92
Note that the final array is sorted.
By this time you have noticed the quicksort may be defined more conveniently as a recursive procedure.
Now we present a mechanism to partition the given file, and then present and algorithm partition to
implement this.
30 ALGORITHMS AND ADVANCED DATA STRUCTURES

The object of partition is to allow a specific element to find its proper position with respect to the others in
the subarray. Note that the manner in which this partition is performed is irrelevant to the sorting method.
All that is required by the sort is that the elements be partitioned properly. In the preceding example, The
elements in each of the two subfiled remain in the same relative order as the appear in the original file.
However such a partition method is relatively inefficient to implement.
One way to effect a partition efficiently is the following: let a = x [lb] be the element whose final position is
sought. Two pointers, up and down, are initialized to the upper and lower bounds of the subarray
respectively. At any point during execution, each element in a position above up is greater than or equal to
a and each element in a position below down is less than or equal to a. The two pointers up are down are
moved towards each other in the following way.
Step 1 : repeatedly increase the pointer down by one position until x [down] > a
Step 2 : repeatedly decrease the pointer up by one position until x [up] < = a
Step 3 : If up > down, interchange x (down with x [up] ).
The process is repeated until the condition in step 3 fails (up < = down), at which point x [up] is
interchanged with x [lb] (which equals a), whose final position was sought, and j is set to up.
We, illustrated this process on the sample file, showing the positions of up and down as they are adjusted.
The direction of the sean is indicated by an arrow at the pointer being moved. These astericks on a line
indicates that an interchange is being made.
A = x [lb] = 25
Down→ up
25 57 48 37 12 92 86 33
down up
25 57 48 37 12 92 86 33
down ←up
25 57 48 37 12 92 86 33
down ←up
25 57 48 37 12 92 86 33
down ←up
25 57 48 37 12 92 86 33
down up
25 57 48 37 12 92 86 33
down up
*** 25 12 48 37 57 92 86 33
down→ up
25 12 48 37 57 92 86 33
down up
25 12 48 37 57 92 86 33
down ←up
25 12 48 37 57 92 86 33
down ←up
SORTING TECHNIQUES 31

25 12 48 37 57 92 86 33
←up down
25 12 48 37 57 92 86 33
up down
25 12 48 37 57 92 86 33
12 25 48 37 57 92 86 33 ***
At this point 25 is in its proper position (position 1), and every element to its left is less than or equal to 25,
and every element to its right is greater than or equal to 25. We could now proceed to sort the two subarrays
(12) and (48 37 57 92 86 33) by applying the same method.
The algorithm for partition is as follows :
partition (int x [ ], int lb, int ub)
{
int a, down, temp, up;
a = x [lb]; /* a is the element whose final position is sought*/
up = ub;
down =lb;
while (down < up){
while (x [down] < = a && down < ub)
down ++ ; /* move up the array*/
while (x [up] > a)
up - -; /* move down the array*/
if (down < up){
/* interchange x [down] and x [up] */
temp = x [down];
x [down] = x [up];
x [up] = temp;
} //end if
} //end while
x [lb] = x [up];
x [up] = a;
return (up);
} // end partition
We may now code to implement the quicksort
32 ALGORITHMS AND ADVANCED DATA STRUCTURES

quicksort (int a[ ], int p, int q)


{ int j;
if (p < q)
{
//divide into to sub arrays
j = Partition (a, p, q + 1);
//Solve the sub problems
quick sort (a, p, j–1);
quick sort (a, j +1, q);
// There is no need to combining solutions
}
}

How efficient is the quicksort? Assume that the file size a is a power of 2, say n = 2m, so that m = log2n.
Assume also that proper position for the pivot always turns out to be the exact middle of the subarray. In
that case there will be approximately n comparisons (actually n–1) on the first pass, after which the file is
split into two subarrays (sub files) of size n/2, approximately. For each of these two files there are
approximately n/2 comparisons, and a total of 4 files each of size n/4 are formed and each of these file
requires n/4 comparisons yielding a total of n/8 sub files. After having the sub files m times, there are n
files of size 1. Thus the total number of comparisons for the entire sort is approximately:
n + 2* (n/2) + 4* (n/4) …… + 4* (n/4)
Or n + n + …………… + n (m times)
= nm = n log n
Thus the total no. of comparisons is O(n long n)
Thus of the foregoing properties describe the file the quicksort is O(n log n), which is relatively efficient.
The analysis for the case in which the file size is not an integral power of 2 is similar but slightly more
complex; the results, however remains the same. It can be shown, however, that on the average (over the
files of size n), the quicksort makes approximately 1.386 nlong2n comparisons.
For the algorithm quicksort in which x[lb] is used as the pivot value, this analysis assume that the original
array and all the resulting subarrays are unsorted, so that the pivot value x[lb] always finds its proper
position at the middle of the subarray. Suppose that the preceding conditions do not hold and the original
array is sorted (or almost sorted). If, for example, x[lb} is in its correct position, the original file is split into
subfiles of size 0 and n–1, if this process continues, a total number n–1 sub files are sorted, the first of size
n, the second of size n–1, the third of size n–2, and so on. Assuming k comparisons to rearrange a file of
size k, the total no. of comparisons to sort the entire file is
n + (n–1) + (n–2) + ….+2
SORTING TECHNIQUES 33

which is O(n2). Similarly, if the original file is sorted in descending order the find position of n [lb} is up
and the file is again split into two sub files that one heavily unbalanced (sizes n–1 and 0), thus the
unmodified quicksort has seemingly absurd property that it works best for files that are completely unsorted
and worst for files that an completely sorted. This property is precisely the opposite for the bubble sort,
which works best for sorted files and worst for unsorted files.

Student Activity 2.4


Before going to next section, answer the following questions :
1. When would quick sort be worse than simple solution sort?
2. Sort the following file using bubble sort
27, 45, 15, 50, 23
If your answers are correct, then proceed to next section.
Top

This sort is an example of divide-and-conquer technique. It has the nice property that in the worst case its
complexity is 0 (n log n). This algorithm is called merge sort. We assume throughout that the element are to
be sorted in nondecreasing order. Given a sequence of n elements (also called keys) a[1]…. A[n], the
general idea is to imaging them into two sets a[1]… a[n/2] and a[n/2+1]…….a [n]. Each set is individually
sorted, and the resulting sorted sequences are merged to produce a single sorted sequence of n elements.
Thus we have an ideal example by divide-and-conquer strategy in which the splitting is into two equal sized
sets and the combining operation is the merging of two sorted sets into one (as we did in quick sort).
The algorithm merge sort describe this process very succinctly using recursion and a function merge which
merges two sorted sets. Before executing merge sort, the n elements should be placed in an array a[n]. Then
merge sort (1, n) causes the keys to be rearranged into nondecreasing order in a.
mergesort (int low, int high)
{ int mid;
if (low < high) //if there are more than
{ //one element
//Divide problem into sub problems
//Find where to split the array
mid = [(low + high)/2];
//Solve the sub problems
mergesort (low, mid);
mergesort (mid + 1, high);
//Combine the solution
merge (low, mid, high);
}
34 ALGORITHMS AND ADVANCED DATA STRUCTURES

}
merge (int low, int mid, int high)
{
int h, i, j, b[20];
j = low; i = low; j = mid +1;
while((h < = mid) && (j < = high))
{
if (a [h] < = a [j]){
b [i] = a [h];
h = h + 1;
}
else
{
b [i] = a[j];
j = j +1;
}
i = i +1;
}
if (h > mid)
for (k = j; k < = high ; k + 1)
{
b [i] = a [k];
i=i+1
}
else
for (k = h; h < = mid; k +1)
{
b [i] = a[k];
i = i +1;
for (k = low; k < = high; k +1)
a [k] = b [k];
}
SORTING TECHNIQUES 35

Consider the array of ten elements a[ ] = {310, 285, 179, 652, 351, 423, 861, 254, 450, 520}. Algorithm
Mergesort begins by splitting a[ ] into two subarrays each of size five. The elements in a[1 to 5] are then
split into two subarrays of size two (a[1 to 2]) and two (a [4 to 5]). Then the items in a[1 to 3] are split into
subarrays of size two (a[1 to 2]) and one (a[3 to 3]). The values in a [1 to 2] are split a final time into one-
element subarrays, and now the merging begins. Note that no movement of data has yet taken place. A
record of the subarrays is implicitly maintained by the recursive mechanism. Pictorially the file can now be
viewed as
{310|285|179|652, 351|423, 861, 254, 450, 520}
Where vertical bars indicate the boundaries of subarrays. Elements a[1] and a[2] are merged to get
(285, 310|179|652, 351|423, 861, 254, 450, 520}
Then a[3] is merged with a [1 to 2] to yield
{179, 285, 310|652, 351|423, 861, 254, 450, 520}
Next, elements a[4] and a[5] are merged:
{179, 285, 310|351, 652|423, 861, 254, 450, 520} and then a[1 to 3] and a[4 to 5]:
{179, 285, 310, 351, 652|423, 861, 254, 450, 520}
At this point the algorithm, has returned to the first invocation of mergesort and is about to process the
second recursive call. Repeated recursive calls produce the following subarrays:
{179, 285, 310, 351, 652|423|861|254|450, 520}
Elements a[6] and a[7] are merged. Then a[8] is merged with a[6 to 7]:
{179, 285, 310, 351, 652|254, 423, 861|450, 520}
Next a[9] and a[10] are merged, and them a[6 to 8] and a[9 to 10]:
{179, 285, 310, 351, 652|254, 423, 450, 520, 861}
At this point there are two sorted subarrays and the final merge produces the fully sorted result:
{179, 254, 285, 310, 351, 423, 450, 520, 653, 861}

There are obviously no more than log2n passes in merge sort, each involving n or fewer comparisons. Thus
mergesort requires no more than n log2n comparisons. In fact, it can be shown that the mergesort requires
fewer than n log2n-n+1 log 2n–n+1 comparisons, on the average, compared with 1.386 + n* log2n average
comparisons for quick sort. In addition, quick sort can require O(n2) comparisons in the worst case, where
as mergesort never requires more than n* log n. However, merge sort does require approximately twice as
many assignments as quick sort on the average.
Merge sort also requires O(n) additional space for the auxiliary array, where as quicksort requires only
O(log n) additional space the stack (if implemented by using stack). An algorithm has been developed for
an in-place merge of two sorted subarrays in O(n) time. This algorithm would allow mergesort to become
36 ALGORITHMS AND ADVANCED DATA STRUCTURES

an in-place O(n log n) sort. However that technique does require a great deal many more assignments and
would thus not be as practical as finding the O(n) extra space.

Student Activity 2.5


Before going to next section, answer the following questions:
1. Compare the space requirements of quick sort and merge sort.
2. Sort the following file using merge sort
16, 17, 10, 9, 4, 18
If your answers are correct, then proceed to next section.
Top

The next sorting method that we consider is called the Radix sort. This sort is based on the values of the
actual digits in the positional represent atoms of the numbers being sorted. For example, the number 235 in
decimal notation is written with a 2 in hundreds position, a 3 in the tens position, and a 5 in the units
position. The larger of two such integers of equal length can be determined as follows: Start at the most-
significant digit and advance through the least-significant digit as long as the corresponding digits in the
two numbers match. The number with a larger digit in the first position in which the digits of the two
numbers do not match in the larger of the two numbers. Of course, if all the digits of both numbers match,
the numbers are equal.
We can write a sorting based on the foregoing methods. Using the decimal base, for example, the numbers
can be partitioned into ten groups based on their most significant digit. Thus every element in the “0” group
is less than every element in the “1” group, all of whose elements are less than every element in the “2”
group and so on. We can then sort within the individual groups based on the next significant digit we repeat
this process until each subgroup has been subdivided so that the least-significant digits are sorted. At this
point the original file has been sorted. This method is sometimes called radix exchange sort.
Let us now consider an alternative to the forgoing method. It is apparent from the foregoing discussion that
considerable bookkeeping is involved in constantly subdividing files and distributing their contents into sub
files based on particular digits. It would certainly be easier if we could process the entire file as a whole
rather than deal with many individual files.
Suppose that we perform the following actions on the file for each digit, beginning with the least-significant
digit and ending with the most-significant digit. Take each number in order in which it appears in the file
and place it into one of the quakes, depending on the value of the digit currently being processed. Then
restore each queue to the original file starting with the queue of numbers with a 0 digit and ending with the
queue of numbers with a 9 digit. When these actions have been performed for each digit, starting with the
least significant with the most significant, the file is sorted. This sorting method is called the radix sort.
Notice that this scheme sorts on the less-significant digits first. Thus when all the numbers are sorted on a
more significant digits, numbers, numbers that have the same digit in that position but different digits in a
less-significant position are already sorted on the less-significant position. This allows processing of the
entire file without and dividing the files and keeping track of where each sub file begins and ends.

Now we illustrate this sort on the following file


SORTING TECHNIQUES 37

25 57 48 37 12 12 86 33
Queue based on the least significant digit
Front Rear
Queue [0]
Queue [1]
Queue [2] 12 92
Queue [3] 33
Queue [4]
Queue [5] 25
Queue [6] 86
Queue [7] 57 37
Queue [8] 48
Queue [9]
After first pass:
12 92 33 25 86 57 37 48
Queue based on most significant digit:
Front Rear
Queue [0]
Queue [1] 12
Queue [2] 25
Queue [3] 33 37
Queue [4] 48
Queue [5] 57
Queue [6]
Queue [7]
Queue [8] 86
Queue [9] 92
Therefore sorted file : 12 25 33 37 48 57 86 92
# define NUM 10
radixsort(x, n)
int x [ ], n;
{
38 ALGORITHMS AND ADVANCED DATA STRUCTURES

int front [NUM], near [NUM];


struct {
int into;
int next;
} node [NUM];
int exp, first, i, j, k, p, q, y;
/* inilialize linked list */
for (i = 0; i < n–1; i + 1){
node [i].info = x [i];
node [i].next = i + 1;
}
node [n–1]. info = x [n–1];
node [n+1]. next = –1;
first = 0; //first is the head of the list
for (k = 1; k < 5; k + 1){
/* Assume we have four-digit numbers*/
for ( i = 0; i < 10; i +1){
/* Initialize queue */
near [i] = –1;
front [i] = –1;
}
//Process each element on the list
while (first ! = –1){
p = first
first = node [first]. next;
y = node [p]. info;
// extract the kth digit
exp = power (10, k–1); //raise 10 to (k–1)th
//power
j = (y/exp) %10;
// insert y into queue [j]
q = near [–j];
SORTING TECHNIQUES 39

if (q ==–1)
front [j] = p;
eloe
node [q]. next = p;
near [j] = p;
}
//At this point each record is in its proper
//queue based on digit k. We now form a
//Single list from all the queue element
//Find the first element
for (j = 0; j < 0 && front [j] ==–1; j + 1);
first = front [j];
//Link up remaining queues
while (j < = 9){ //check if finished
//find the next element
for (i = j + 1; i < 10 && front [i] ==–1; i +1);
if (i < = 9){
p = i;
node [near [j]). next = front [i];
}
j=i
}
node [near [p]). next = -1;
}
//Copy back to original array
for (i = 0; i < n; i + 1){
x [i] = node [first]. info;
first = node [first]. next;
}
}
40 ALGORITHMS AND ADVANCED DATA STRUCTURES

The time requirements for the radix sort clearly depend on the number of digits (m) and the number of
element in the file (n). This sort is approximately O(n +m). Thus the sort is reasonably efficient if the
number of digits in the keys is not too large.

Student Activity 2.6


Before going to next section, answer the following questions:
1. Explain Radix Sort method.
2. Sort the following file using radix sort
637, 455, 987, 462, 982
If your answers are correct, then proceed to next section.
Top

!
We begin by defining a new structure, the heap. We have studied binary trees earlier. A binary tree is
illustrated below.

! " #

A complete binary tree is said to satisfy the ‘heap condition’ if the key of each node is greater than or equal
to the key in its children. Thus the root node will have the largest key value.
Trees can be represented as arrays, by first numbering the nodes (starting from the root) from left to right.
The key values of the nodes are then assigned to array positions whose index is given by the number of the
node. For the example tree above, the corresponding array would be
Index 1 2 3 4 5 6 7 8 9 10 11 12
Array : Z R P G M J A C D E I C
The relationship of a node can be determined from this array representation. If a node is at position j its
children will be at positions 2j and 2j + 1. Its parent will be at position [j/2].
Consider the node M. It is at the position 5. Its parent node is therefore at position [5/2] = 2 i.e. the parent is
Q. Its children are at positions 2*5 and (2*5) +1, i.e. 10 and 11 respectively i.e. E and I are its children. We
see from the pictorial representation that these relationships are correct.
A Heap is a complete binary tree, in which each node satisfies the heap condition, represented as an array.
SORTING TECHNIQUES 41

We will now study the operations possible on a heap and see how these can be combined to generate a
sorting algorithm.
The operations on a heap work in 2 steps.
1. The required node is inserted/deleted/or replaced.
2. It may cause violation of the heap condition so the heap is traversed and modified to rectify any such
violations.

Insertion
Consider the insertion of node R in the heap of figure 2.1.
(i) Initially R is added as the right child of J and given the number 13.
(ii) But then the heap condition is violated.
(iii) Move R up to position 6 and move 5 down to position 13.
(iv) But the heap condition is still violated.
(v) Swap R and P.
(vi) The heap condition is now satisfied by all the nodes and we get the following heap.

! " #

Deletion consider the deletion of M from heap of figure 2.2.


The larger of Mi children is promoted to 5, to get:
42 ALGORITHMS AND ADVANCED DATA STRUCTURES

An efficient sorting method based on the heap construction and node removal from the heap in order. This
algorithm is guaranteed to sort n element in n log n steps.
We will first see 2 methods of heap construction and then removal in order from the heap to sort the list.

" ! # !
Insert items into an initially empty heap, keeping the heap condition inviolate at all steps.

Now we build a heap for the following array of characters:

!""# $ %

$ $

&' ( &' ( &')( % &'*(

% # % # $ % #

&' ( &'( &' (


SORTING TECHNIQUES 43

# $ $ # $

% %

&' ( & '(

$ # $ $ # $

$ & % # '

&'+( &',(

$ !# !
Build a heap with the items in the order presented. Then from the right most node modify to satisfy the
heap condition.
Example: Now we see the above method on the same array

!""# $ %

$ $

% # % & '

$ & ' $ # #

-' ( -' (

% & $ & $

% # ' % # '
44 ALGORITHMS AND ADVANCED DATA STRUCTURES

-')( -'*(

$ & $ $ & $

% # ' % # '

-' ( -'(

We will now see how the sorting take place using the heap built by the top down approach. The sorted
elements will be placed in A [ ] an array of size 12.

$ & $

% # '

.' (

1. Remove S and store it in A [12]

$ & $

# # '

! "
(

.' (

2. Remove S and store in A [11]

$
SORTING TECHNIQUES 45

$ & '

# #
! "
(

.')(

3. Remove R and Store in A[10]

$ $

& '

# #
! "
(
.'*(

4. Remove P and store in A [9]

$ '

&

# #

! "
(

.' (

5. Remove O and store in A [8]


$

& '

#
46 ALGORITHMS AND ADVANCED DATA STRUCTURES

! "
( $

.'(

6. Remove O and store in A [7]


&

'

# #

! "
( $ $

.' (

7. Remove N and store in A [6]


'

# #

! "
( & $$

.' )

8. Similarly remaining nodes are removed and the heap modified to get the sorted list –
AEEILNOOPRSS.
Top
SORTING TECHNIQUES 47

So far, all the algorithm we have examined require that the input fit into main memory. There are, however,
applications where the input is much too large to fit into memory. This section will discuss external sorting
algorithms, which are designed to handle very large inputs.

%# & & ' #$ (


Most of the internal sorting algorithms take advantage of the fact that memory is directly addressable. Shell
sort compares elements A [t] and A[i – hk] in one time unit. Heapsort computers elements A[i] and [i + 2 +
1] in one time unit. Quick sort, with median of three partitioning requires comparing A [left], A [Center],
and A [Right], in a constant number of time units. If the input is on a tape can only be accessed
sequentially. Even if the data is on a disk, there is still a practical loss of efficiency, because of the delay
required to spin the disk head.
To see how slow external access really are, create a student file that is large, but not too big to fit in main
memory. Read the file in and sort it using an efficient algorithm. The time it takes to sort the input is
certain to be insignificant compared to the time to read the input, even though sorting is an O (n log n)
operation and reading the input is only O(n).

The wide variety of mass storage devices makes external sorting much more device-dependent than internal
sorting. The algorithms that we will consider work on tapes, which are probably the most restrictive storage
medium. Since access to an element on tape is done by winding the tape to the correct location tapes can be
efficiently accessed only in sequential order (in either direction).
We will assume that we have at least three tapes are drives to perform, the sorting. We need too drives to do
an efficient sort, the third drive simplifies matters. If only one tape drive is present, then we are in trouble
any algorithm will require Ω(N2) tape accesses.

"# $! ' #$
The basic external sorting algorithm uses the Merge routine from mergesort. Suppose we have four tapes
Ta1, Ta2, Tb1, Tb2, which are two input and two output tapes. Depending on the point in the algorithm, the a
and b tapes are either input tapes or output tapes. Suppose the data is initially on Ta1. Suppose further that
the internal memory can hold (and sort) M records at a time. A natural first step is to read M records at a
time from the input tape, sort the records internally, and then write the sorted records alternately to Tb1 and
Tb2. We will call each set of sorted records a run. When this is done, we rewind all the tapes. Suppose we
have the same input as our example for Shellsort.

) ! ! !!
)
)*
)*

If M = 3, then after the runs are constructed, the tapes will contain the data indicated in the following
figure.
)
)
)* ! !!
48 ALGORITHMS AND ADVANCED DATA STRUCTURES

)* !

Now, Tb1 and Tb2 contain a group of runs. We take the first run from each tape and merge them, writing the
result, which is a run twice as long, onto Ta1. Then we take the next run from each tape, merge these, and
the write the result to Ta2. We continue this process, alternative between Ta1 and Ta2 until either Tb1 or Tb2 is
empty. At this point either both are empty or there is one run left. In the latter case, we copy this run to the
approximate tape. We rewind all four tapes, and repeat the same step, this time using the a tapes as input
and the b tapes, and repeat the same steps, this time using the a tapes as input and the b tapes as output. This
will give runs of 4m. We continue the process until we get one run of length n.
This algorithm will require [log (n/m)] passes, plus the initial run-constructing pass. For instance, if we
have 10 million records of 128 bytes each, and four megabytes of internal memory, then the first pass will
create 320 runs. We would then need nine more passes to complete the sort. Our example requires [log
13/3] = 3 more passes, which are shown in the following figure.

) ! !
) !!
)*
)*

)
)
)* ! ! !!
)*

) ! ! !!
)
)*
)*

If we have extra tapes, then we can expect to reduce the number of passes required to sort our input. We do
this extending the basic (two-way) merge to a k – way merge.
SORTING TECHNIQUES 49

Merging two runs is done by winding each input tape to the beginning of each run. Then the smaller
element is found, placed on an output tape, and the appropriate input tape is advanced. If there are k input
tapes, this strategy works the same way, the only difference being that it is slightly more complicated to
find the smallest of the k elements. We can find the smallest of these elements by using a priority queue. To
obtain the next element to write on the output tape, we perform a Delete Min operation. The approximate
input tape is advanced, and if the run on the input tape is not yet complicated, we insert the new element
into the priority queue. Using the same example as before, we distribute the input onto the three tapes.
)
)
)
)* !
)* !
)* !!

We then need two more passes of three way merging to complete the sort.

) ! ! !!
)
)
)*
)*
)*

)
)
)
)* ! ! !!
)*
)*

After the initial run construction phase, the number of passes required using k-way merging is [logk (n/m)],
because the runs get k times as large in each pass. For the example above, the formula is verified, since
[log3 (13/3)] = 2. If we have 10 tapes then k = 5, and our large example from the previous section would
require [log5 320] = 4 passes.

) !#
50 ALGORITHMS AND ADVANCED DATA STRUCTURES

The k – way merging strategy developed in the last section requires the use of 2k tapes. This could be
prohibitive for some applications. It is possible to get by with only k + 1 tapes. As an example, we will
show how to perform two-way merging using only three tapes.
Suppose we have three tapes, T1, T2, and T3 and an input file on T1 that will produce 34 runs. Our option is
to put 17 runs on each of T2 and T3 . We could then merge this result onto T1, obtaining one tape with 17
runs. The problem is that since all the runs are on one tape, we must now put some of these runs on T2 to
perform another merge. The logical way to do this is to copy the first eight runs from T 1 onto T 2 and then
perform the merge. This has the effect of adding an extra half pass for every pass we do.
An alternative method is to split the original 34 runs unevenly. Suppose, we put 21 runs on T2 and T3 runs.
We would then merge 13 runs onto T1 before T3 was empty. At this point, we could rewind T1 and T3 and
merge T1, with 13 runs, and T2, which has 8 runs, onto T3. We could then merge T1 and T3 and so on. The
following table shows the number of runs on each tape each pass.

+ , , , , , , ,
- . ) /) ) /) ) /) ) /) ) /) ) /) ) /)
) " " "
) " " "
) " " "

The original distribution of runs makes a great deal of difference. For instance, if 22 runs are placed on T2
with 12 on T3 then after the first merge, we obtain 12 runs on T1 and 10 runs on T2. After another merge,
there are 10 runs on T1 and 2 runs on T3. At this point the going gets slow, because we can only merge two
sets of runs before T3 is exhausted. Then T1 has 8 runs and T2 has 2 runs Again we, can only merge two
sets of runs, obtaining T 1 with 6 runs and T3 with 2 runs. After three more passes T2 has two runs and then
we can finish the merge.

It turns out that the first distribution we gave is optimal. If the number of runs is a Fibbonacci numbers FN,
then the best way to distribute them is to split them into two Fibonacci number FN–1 and FN–2. Otherwise, it
is necessary to pad the tape with dummy runs in order to get the number of runs up to a Fibonacci number.
We leave the details of how to place the initial set of runs on the takes as an exercise.

We can extend this to a k – way merge, in which case we need k th order Fibbonacci numbers for the
distribution, where the kth order Fibonacci number is defined as F(k) (N) = F(k) (N – 1) + F(k) (N – 2) + ….+
F(k) (N – k), with the approximate initial conditions F (k) (N) = 0, 0 ≤ N ≤ k –2. F (K) (K – 1) = 1.

! $
The last item we will consider is construction of the runs. The strategy we have used so far is the simplest
possible. We read as many records as possible, until one realize that as soon as the first record is written to
an output tape, the memory it used becomes available for another record. If the next record on the input
tape is large than the record we have just output, then it can be included in the run.
SORTING TECHNIQUES 51

Using this observation, we can give an algorithm for producing runs. This technique is commonly referred
to as replacement selection Initially M records are read into memory and placed in a priority queue. We
perform a Delete Min, writing the smallest record to the output tape. We read the next record from the input
tape. If it is larger than the record we have just written we can add it to the priority queue, Otherwise it can
not go into the current run. Since the priority queue, is smaller by one element we can store this new
element in the dead space of the priority queue. Until the run is completed and use the element for the next
run, storing an element in the dead space. It is clear that run construction for the small example we have
been using, with M = 3. Dead elements are indicated by an asterisk.
In this example, replacement selection produces only three runs, compared with the five runs obtained by
sorting. Because of this, a three – way merge finishes in one pass instead of two. If the input is randomly
distributed replacement selection can be shown to produce runs of average length 2M. For our large
example, we would expect 160 runs instead of 320 runs, so a five way merge would require four passes. In
this case, we have not saved a pass although we might if we get lucky and have 125 runs or less. Since
external sorts take so long, every pass saved can make a significant difference in the running time.
As we have seen, it is possible for replacement selection to do no better than the standard algorithm.
However, the input is frequently sorted on nearly sorted to start with, in which case replacement selection
produces only a few very long runs. This kind of input is common for external sorts and makes replacement
selection extremely valuable.

3 Elements In Heap Array Output Next Element Read


0"1 01 01
+ ! !
! ! 2
! ! 2 ! 2
! 2 2 ! 2
2 2 2 # 3- + . *+ 3 4
+ !!
!!
!!
!!
!! 2
!! 2 3- , 4
!! 2 !!
2 # 3- + . *+ 3 4
+

Student Activity 2.7


1. Contract a heap from the following by values.
1, 2, 4, 5, 7, 8.
52 ALGORITHMS AND ADVANCED DATA STRUCTURES

Describe the sort with the help of an example.


2. What is internal sort?
If your answers are correct, then proceed to next selection.
Top

* "#
Recall that there is a mathematical notation for expressing lower bounds if f(n) is the time for some
algorithm then we write f(n) = Ω (g(n)) to mean that g(n) is the lower bound for f(n). Formally this equation
can be written if there exists positive constant c and so such that |f(n)|≥ c|g(n)| for all n> no. In addition to
developing lower bounds to within a constant factor we are also concerned with determining more exact
bounds whenever this is possible.
Deriving good lower bounds is often more difficult than efficient algorithms. Perhaps this is because a
lower bound states a fact about all possible algorithms for solving a problem. Usually we cannot enumerate
and analyse all these algorithms, so lower bound proofs are often hard to obtain. However for any problem
it is possible to easily observe that a lower bound identical to n exists, where n is the number of inputs to
the problem.

Now let us consider the sorting problem. We can describe any sorting algorithms that satisfies the
restrictions of the comparison tree. Consider the case in which n numbers A[1:n] are to be sorted and these
numbers are distinct. Now any comparison between a[1] and a[j] must result in one of two possibilities :
either A[i] < A[j] or A[i] > A[j]. So if we form a comparison tree then it will be a binary tree in which each
internal node is labeled by the pair i : j, which represents the comparison of A[i] with A[j]. If A[i] is less
than A[j], then the algorithm proceeds down the left branch of the tree otherwise it proceeds down the right
branch.
Following shows a comparison tree for sorting three items.

5 6

5 6 5 6

77 77
5 6 5 6

77 77 77 77

We consider the worst case for all comparison—based sorting algorithms. Let T(n) be the minimum
number of comparison that are sufficient to sort n items in the worst case. We know that, if all internal
nodes in a binary tree are at level less than k, then there are at most 2k external nodes.
Therefore, if we let k = T(n)
SORTING TECHNIQUES 53

n! < = 2T(n)
Since T(n) is an integer, we get lower bound
T(n) > = log n!
By Starling’s approximation, it follows that
log n! = n log n – n/ln2 + (1/2) log n + 0(1)
Where ln2 refers to the natural algorithm of 2. This formula shows that T(n) is of the order n log n. Hence
we say that any comparison-based sorting algorithm need Ω(n log n) time.

Student Activity 2.8


Before going to next section, answer the following questions:
1. Describe lower bound theory.
2. Make a comparison three for sorting of key value
a, b, c, d,
If your answers are correct, then proceed to next section.
Top
' ' $
One of the proof techniques that is useful for obtaining lower bounds consists of making use of an oracle.
The most famous oracle in history was called the Delphic oracle, located Delphi, Greece. This oracle can
still be found, situated in the side of a hill embedded in some rocks. In olden times people would approach
the oracle and ask it a question. After some period of time elapsed, the oracle would reply and a caretaker
would interpret the oracle’s answer.
A similar phenomenon takes place when we use an oracle to establish a lower bound. Given some model of
computation such as comparison trees, the oracle tells us the outcome of each comparison. To derive a good
lower bound, the oracle tries its best to cause the algorithm to work as hard as it can. It does this by
choosing as the outcome of the next test, the result that causes the most work to be required to determine
the final answer. And by keeping track of the work that is done, a worst-case lower bound for the problem
can be derived.

Now we consider the merging problem. Given the sets A[1 : m] and B[1 : n], where the items in A and the
items in B are sort, we investigate lower bounds for algorithms that merge these two sets to give a single
sorted set. As was the case for sorting. We assume that all the m+n elements are distinct and that A[1] <
A[2] <……..<A[m] and B[1] < B [2]<…….< B[n]. It is possible that after these two sets are merged, the n
elements of B can be interleaved within A in every possible way. Elementary combinatorics tell us that
m+ n
there are ways that the A’s and B’s can merge together while preserving the ordering within A and
m
3+ 2
B. For example, if m=3, n=2. A[1] = x, A [2]= y, A[3]= z, B[1] = u. and B[2]= v, there are = 10
3
54 ALGORITHMS AND ADVANCED DATA STRUCTURES

ways in which A and B can merge: u, v, x, y, z; u, x, v, y, z; u, x, y, v, z; u, x, y, z, v; x, u, v, y, z; x, u, y, v,


z; x, u, y, v, z; x, y, u, z, v; and x, y, z, u, v.
m+ n
This if we use comparison trees as our model for merging algorithms, then there will be external
m
nodes, and therefore at least

m+ n
log
n

comparisons are required by any comparison-based merging algorithm. The conventional merging
algorithm that was given in earlier takes m + n - 1 comparisons. If we let MERGE(m, n) be the minimum
number of comparisons needed to merge m items with n items, then we have the inequality

m+ n
log ≤ MERGE(m, n ) ≤ m + n − 1
n

The exercises show that these upper and lower bounds can get arbitrarily far apart as m gets much smaller
than n. This should not be a surprise because the conventional algorithm is designed to work best when m
and n are approximately equal. In the extreme case when m=1, we observe that binary insertion would
require the fewest number of comparisons needed to merge A[1] into B[1], …..,B[n].
When m and n are equal, the lower bound given by the comparison tree model is too low and the number of
comparisons for the conventional merging algorithm can be shown to be optimal.

Theorem
MERGE (m, m)=2m - 1, for m ≥ 1.

Proof
Consider any algorithm that merges the two sets A[1] <……..< A[m] and B[1] <………< B[m]. We
already have an algorithm that requires 2M-1 comparisons. If we can show that MERGE (m,m)≥2m-1, then
the theorem follows. Consider any comparison-based algorithm for solving the merging problem and an
instance for which the final result, is B[1] < A[1] < B[2] <A[2] <……< B[m ] < A[m], that is, for which the
B’s and A’s alternate. Any merging algorithm must make each of the 2m - 1 comparisons B[1] : A[1], A[1]
: B[2], B[2] : A[2], …., B[m] : A[m] while merging the given inputs. To see this, suppose that a
comparison of type B[i] :A[i] is not made for some i. Then the algorithm cannot distinguish between the
previous ordering and the one in which i.
B[1] < A[1] <…..< A[i - 1] < A [i]< B [i] < B[i + 1] <…..< B [m] < A[m]
So the algorithm will not necessarily merge the A’s and B’s properly. If a comparison of type A[i] : B[i +
1] is not made, then the algorithm will not be able to distinguish between the case in which B[1] < A[1] <
B[2] <………< B[m] < A[m] and in which B[1] <A[1] <B[2] <A[2] <….< A[i -1] < B[i] <B[i + 1] < A[i]
< A[i+1]<….< B[m] < A[m]. So any algorithm must make all 2m - 1 comparisons to produce this final
result. The theorem follows.

* *
For another example that we can solve using oracles, consider the problem of finding the largest and the
second largest elements out of a set of n. What is a lower bound on the number of comparison required by
any algorithm that finds these two quantities? It has been already provided us with an answer using
SORTING TECHNIQUES 55

comparison trees. An algorithm that makes n - 1 comparisons t find the largest and then n - 2 to find the
second largest gives an immediate upper bound of 2n - 3. So large gap still remains.
This problem was originally stated in terms of a tennis tournament in which the values are called players
and the largest value is interpreted as the winner, and the second largest as the runner-up. Figure 2.8 shows
a sample tournament among eight players. The winner of each match (which is the larger of the tow values
being compared) is promoted up the tree until the final round, which in this case, determines McMohan as
the winner. Now, who are the candidates for second place? The runner-up must be someone who lost to
McMohan but who did not lose to anyone else. In Figure 2.8 that means either Guttag, Rosen, or Francez
are the possible candidates for second place.
Figure 2.8 leads us to another algorithm of determining the runner-up once the winner of a tournament has
been found. The players who have lost to the winner play a second tournament to determine the runner-up.
This second tournament need only be replayed along the path that the winner, in this case McMohahon,
followed as he rose through the tree. For a tournament with n players, there are [log n] levels, and hence
only [log n] - 1 comparison are required for this

0 1

second tournament. This new algorithm, which was first suggested by J. Schreier in 1932, requires a total of
n – 2 + [log n] comparisons. Therefore we have an identical agreement between the known upper and lower
bounds for this problem.
Now we show how the same lower bound can be derived using an oracle.

Theorem
Any comparison-based algorithm that computes the largest and second largest of a set of n unordered
elements requires n – 2 + [log n] comparisons.

Proof
Assume that a tournament has been played and the largest element and the second-largest element obtained
by some method. Since we cannot determine the second-largest element without having determined the
largest element, we see that at least n-1 comparisons are necessary. Therefore all we need to show is that
there is always some sequence of comparisons that forces the second largest to be found in [log n]-1
additional comparisons.
Suppose that the winner of the tournament has played x matches. Then there are x people who are
candidates for the runner-up position. The runner-up has lost only once, to the winner, and the other x-1
candidates must have lost to one other person. Therefore we produce an oracle that decides the results of
matches in such a way that the winner plays [log n] other people.
In a match between a and b the oracle declares a the winner if a is previously undefeated and b has lost at
least once or if both a and b are undefeated but a has won more matches than b. In any other case the oracle
can decide arbitrarily as long as it remains consistent.
56 ALGORITHMS AND ADVANCED DATA STRUCTURES

Now, consider a tournament in which the outcome of each match is determined by the above oracle.
Imagine drawing a directed graph with n vertices corresponding to this tournament. Each vertex
corresponds to one of the n players. Draw a directed edge from vertex b to a, b ≠a, if and only if either
player a has defeated b or a has defeated another player who has defeated b. It is easy to see by induction
that any player who has played and won only x matches can have at most 2x-1 edges pointing into her or his
corresponding node. Since for the overall winner there must be an edge from each of the remaining n-1
vertices, it follows that the winner must have played at least [log n] matches.

! #
Another technique for establishing lower bounds that is related to oracles is the state space description
method. Often it is possible to describe any algorithm for solving a given problem by a set of n-tuples. A
state space description is a set of rules that show the possible state (n-tuples) that an algorithm can assume
from a given state and a single comparison. Once the state transitions are given, it is possible to derive
lower bounds by arguing that the finish state cannot be reached using any fewer transitions. As an example
of the state space description method, we consider a problem originally defined and solved in the Selection
given n distinct items, find the maximum and the minimum. Recall that the divide-and-conquer-based
solution required [3n/2]-2 comparisons. We would like to show that this algorithm is indeed optimal.

Theorem
Any algorithm that computes the largest and smallest elements of a set of n unordered elements requires
[3n/2]-2 comparisons.

Proof
The technique we use to establish a lower bound is to define an oracle by a state table. We consider the
state of a comparison-based algorithm as being described by a 4-tuple (a, b, c, d), where a is the number of
items that have never been compared, b is the number of items that have won but never lost, c is the number
of items that have lost but never won, and d is the number of items that have both won and lost. Originally
the algorithm is in state (n, 0, 0, 0) and concludes with (0, 1,1, n-2). Then, after each comparison the tuple
(a, b, c, d) can make progress only if it assumes one of the five possible states shown in Figure 2.9.
To get the state (0, 1, 1, n-2) from the state (n, 0, 0, 0), [3n/2]-2 comparisons are needed. To see this,
observe that the quickest way to get the a component to zero requires n/2 state changes yielding the tuple
(0, n/2,n/2,0). Next the b and c components are reduced; this requires addition and additional n-2 state
changes.
8 9 7*/ 7:/ 73; ≥ <<)=- > ,-> , :->4 , 3.

8 9 7*7:/ 73;-, 8 9 7*/ 7:73; ≥ << > ,-> :->4 ,3 = ? -


-, 8 9 7*7:73/ ; << ,-> * ,- :.

8 7*9 7:73/ ; *≥ <<)=- > ,-> * , :->4 , 3.

8 7*7:9 73/ ; :≥ <<)=- > ,-> : , :->4 , 3.

2 " 1 34

Selection
We end this section by deriving another lower bound on the selection problem. One of the algorithms
presented there has a worst-case complexity of O(n) no matte what values is being selected. Therefore we
know that asymptotically any selection algorithm requires Θ(n) time. Let SELk (n) be the minimum number
of comparisons needed for finding the kth element of an unordered set of size n. We have already seen that
SORTING TECHNIQUES 57

for k = 1, SEL1(n) = n - 1 and, for k = 2, SEL2 (n) = n – 2 + [log n]. In the following paragraphs we present
n
a state table that shows that n – k + ( k - 1) log ≤ SELk (n). We continue to use the terminology
2(k − 1)
that refers to an element of the set as a player and to a comparison between two players as a match that
must be won by one of the players. A procedure for selecting the kth-largest element is referred to as a
tournament that finds the kth-best player.
To derive this lower bound on the selection problem, an oracle is constructed in the form of a state
transition table that will cause any comparison based algorithm to make at least n – k + ( k -
n
1) log comparisons. The tuple size for states in this case is two, (it was four for the max-min
2(k − 1)
problem), and the components of a tuple, say (Map, Set), are Map, a mapping from the integers 1, 2,
……….,n onto itself, and Set, an ordered subset of the input. The initial state is the identity mapping (that
is Map(i) = 1, 1 ≤ i ≤ n) and the empty set. Intuitively, at any given time, the players in Set are the top
players (from among all). In particular, the ith player that enters Set in the ith-best player. Candidates for
entering Set are chosen according to their Map values. At any time period t the oracle is assumed to be
given two unordered elements form the input, say a and b, and the oracle acts as follows:
1. If a and b are both in Set at time t, then a wins if a > b. The tuple (Map, Set) remains unchanged.
2. If a is in Set and b is not in Set, then a wins and the tuple (Map, Set) remains unchanged.
3. If a and b are both not in Set and if Map (a) > Map (b) at time t, then a wins. If Map (a) = Map (b),
then it doesn’t matter who wins as long as no inconsistency with any previous decision is made. In
either case, if Map(a) + Map (b) ≥ n/( k - 1) at time t, then Map is unchanged and the winner is
inserted into Set as a new member. If Map(a) + Map(b) < n/(k - 1), Set stays the same and we set Map
(the loser) : = 0 at time +1 and Map (the winner) := Map (a) + Map (b) at time t + 1 and, for all items
w, 2 ≠ a, w, ≠ b, Map (w) stays the same.

Lemma

n
Using the oracle just defined, the k-1 best players will have played at least (k-1) log matches
2(k − 1)
when the tournament is completed.

Proof
At time t the number of matches won by any player x is greater than or equal to [log Map (x )] . The
elements is Set are ordered so that x1 < ……< xj. Now for all w in the input w Map(w ) = n. Let W={ y : y
is not in Set but Map(y) > 0}. Since for all w in the input Map(w) < n/(k - 1), it follows that the size of Set
plus the size of W is greater than K - 1. However, since the elements y in W can only be less than some xi
in Set, if the size of Set is less than k - 1 at the end of the tournament, then any player in Set or W is a
candidate to be one of the k - 1 best players. This is a contradiction, so it follows that at the end of the
tournament |Set|≥ (k - 1).
We are now in a position to establish the main theorem.

Theorem
[Hayfil] The function SETk (n) satisfies
58 ALGORITHMS AND ADVANCED DATA STRUCTURES

n
SELk(n) ≥ n - k + (k - 1) log
2(k − 1)

Proof

n
According to lemma, the k - 1 best players have played at least (k - 1) log matches. Any player
2(k − 1)
who is not among the k best player has lost at least one match against a player who is not among the k -
1best. Thus there are n - k additional matches that were not included in the count of the matches played by
the k - 1 top players.

Student Activity 2.9


Before going to next section, answer the following questions:
1. Let m = αn. then by Stirling’s approximation
αn + n 1
log = n[(n + α ) log(1 + α ) − α log α ] − log n+ O(1). Show that as α→ 0, the difference
αn 2
between this formula and m + n - 1 gets arbitrarily large.
2. Let F (n) be the minimum number of comparisons. In the worst case, needed to insert B[1] into the
ordered set A[1] < A[2] <……..<A[n]. Prove by induction that F(n) ≥ [log n + 1].
3. A search program is a finite sequence of instructions of three types: (1) if (f (x) r 0) goto L1; else goto
L2; where r is either <, >, or = and x is a vector; (2) accept; and (3) reject. The sum of the subsets
problem asks for a subset I of the integers 1, 2,……,n for the inputs w1,….,wn such that
i ∈I (w i ) = b, where b is a given number. Consider search programs for which the function f is

restricted so that it can only make comparison of the form


(w i ) = b
i ∈I

Using the adversary technique D. Dobkin and R. Lipton have shown that Ω(2n ) such operations are
required to solve the sum of subsets problem (w1,……….,wn, b). See if you can derive their proof.
If your answers are correct, then proceed to next section.
Top
$ $ ! "
Let G = (V,E) be an undirected connected graph. A subgraph t = (V, E1) of g is a spanning tree of g if t is a
tree.

Example: Figure 2.10 shows the complete graph on four nodes together with three of its spanning trees.
SORTING TECHNIQUES 59

Spanning trees have many applications. For example, they can be used to obtain an independent set of
circuit equations for an electric network. In practical situations, the edges have weights assigned to them.
These weights may represent the cost of construction, the length of the link, and so on. Given such a
weighted graph, one would then wish to select cities to have minimum total cost or minimum total length.
In either case the links selected have to form a tree. We are therefore interested in finding a spanning tree g
with minimum cost. Fig. 2.11 shows a graph and one of its minimum spanning tree involves the selection of
a subset of the edges, this problem fits the subset paradigm.

" "

22
'( 6 '( 7 1

We present two algorithms for finding a minimum spanning tree of a weighted graph: Prim’s algorithm and
Kruskal’s algorithm.

) $+ ' #$
A greedy method to obtain a minimum spanning tree is the to build the tree edge by edge. The next edge to
include is chosen according to some optimization criterion. The simplest such criterion is to choose an edge
that results in a minimum increase in the sum of the costs of the edges so far included. There are two
possible ways to interpret this criterion. In the first, the set of edges so far selected from a tree. Thus if A is
the set of edges selected so far, then A forms a tree. The next edge (u,v) to be included in A is a minimum
cost edge not in A with property that A ∪ {(u,v)} is also a tree. The following example shows this selection
criterion results in a minimum spanning tree. The corresponding algorithm is known as Prim’s algorithm.

Example: Figure 2.12 shows the working of Prim’s method on the graph of figure 2.11(a). The spanning
tree obtained is shown in figure 2.11(b) and has a cost of 99.

1 1
" "
60 ALGORITHMS AND ADVANCED DATA STRUCTURES

'( '(

" "

')( '*(

" "

'( '(

Having seen how Prim’s method works, let us obtain a n algorithm to find a minimum cost spanning tree
using this method. The algorithm will start with a tree that include only a minimum cost edge of g. Then,
edges are added to this tree one by one. The next edge (i, j) to be added is such that I is a vertex already
included in the tree, j is a vertex not yet included, and the cost of (i, j), cost [i, j], is minimum among all
edges (k, l) such that vertex k is in the tree and vertex e is not in the tree. To determine this edge (i, j)
efficiently, we associate with each vertex j not yet included in the tree a value near [j]. The value near [j]
[near (j)] is minimum among all choices for near [j]. We define near [j] = 0 for all vertices j that are already
in the tree. The next edge to include is defined by the vertex j such that near [j] # 0 (j not already in the tree)
and cost [j] [near (j)] is minimum.
Prim (E, cost, n, t)
//E is the set of edges in g. cost [n] [n] is
// the cost adjacmcy matrix of an n vertex
//graph such that cost [i,j] is
//either a positive real number or is it
//no edge (i, j) exists.
//A minimum spanning tree is computed
SORTING TECHNIQUES 61

//and stored as a set of edges in


//The array t[n–1] [2]. (The final cost is
//returned.
{
Let (k, l) be an edge of minimum cost in E;
Min cost = cost [k] [l];
t [1] [1] = k; k[1] [2] = l;
for (i = 1; i < = n : i + l)
if (cost [i] [l] < cost [i] [k])
near [i] = l;
else
near [i] = k;
near [k] = near [l] = 0;
for (i = 2; i < = n–1; i + l)
{ //Find n–2 additional edges for t.
Let j be an index such that near [j] ! = 0 and
Cost [j] near [(j)] is minimum;
T [i] [1] = j; t [i] [2] = near [j];
min cost = mincost + cost [j] [near [j]];
near [j] = 0;
for (k = 1; k < = n; k + l) //update near
if (near [k] ! = 0 && cost [k] [near [k] > cost [k] [j])
near [k] = j;
}
return (min cost);
}
The time required by algorithm prim is 0 (n2), where n is the number of vertices in the
graph g.

, + ' #$
There is a second possible interpretation of the optimization criteria mentioned earlier in which the edges of
the graph are considered in non-decreasing order of cost. This interpretation is that the set t of edges so far
selected for the spanning tree be such that it is possible to complete t into a tree. Thus t may not be a tree at
all stages in the algorithm. In fact it will generally only be a forest since the set of edges t can be completed
into a tree if there are no cycles in t. This method is due to kruskal.
62 ALGORITHMS AND ADVANCED DATA STRUCTURES

Example: Consider the graph of figure 2.14(a). We begin with no edges selected figure 2.14(a) shows the
current graph with no edges selected Edge (1,6) is the first edge considered. It is included in the spanning
tree being built. This yield the graph of figure 2.14(b). Next the edge (3,4) is selected and included in the
tree (fig. 2.14(c)). The next edge to be considered is (2,7). Its inclusion in the tree being built does not
create a cycle, so we get the graph of figure 2.14. Edge (2,3) is considered next and included in the tree
figure 2.14(e). Of the edges not yet considered (7,4) has the least cost. It is considered next. Its inclusion in
the tree results in a cycle, so this edge is discarded. Edge (5,4) is the next edge to be added in the tree being
built. This result in the configuration of figure 2.14(f). The next edge to be considered is the edge (7,5). It is
discarded as its inclusion creates a cycle. Finally edge (6,5) is considered an included in the tree built. This
completes the spanning tree. The resulting tree (figure 9(b)) has cost 99.

"

'( '(

" "

')( '*(

" "

'( '(
&
SORTING TECHNIQUES 63

For clarity, kruskal’s method is written out more formally in following algorithm.
1. t = 0;
2. while [(it has less than n–1 edges) R& (E! = 0)]
3. {
4. Choose an edge (u,v) from E of lowest cost;
5. Delete (q, w) from E;
6. If (u, w) does not create a cycle in it)
add (v,w) to t;
7. else
Discard (v, w);
8. }
Initially E is the set of all edges in g. The only functions we wish to perform on this set are
(1) determine an edge with minimum cost (line 4) and (2) delete this edge (line 5). Both these functions can
be performed efficiently if the edges in E are maintained as a sorted sequential list. It is not essential to sort
all the edges so long as the next edge for line 4 can be determined easily. If the edges are maintained as a
min heap, then the next edge to consider can be obtained in 0 (long |E|) line. The construction of heap it self
take O (|E|) time. To be able to perform step 6 efficiently, the vertices in g should be grouped together in
such a way that one can easily determine whether the vertices v and w are already connected by the earlier
selection of edges. If they are, then the edge (v,w) is to be added to t. One possible grouping is to place all
vertices in the same connected component by t into a set. For example, when the edge (2,6) is to be
considered, the sets are {1,2}, {2,4,6}, and {5}. Vertices 2 and 6 are in different sets so these sets are
combined to give {1,2,3,4,6} and {5}. The next edge to be considered is (1,4). Since vertices 1 and 4 are in
the same set, the edge is rejected. The edge (3,5) connects vertices in different sets and results in the final
spanning tree.

Student Activity 2.10


Before going to next section, answer the following questions.
1. Why is Prim’s algorithm called greedy method?
2. Compare and contrast Prim’s method with Kruskal’s method.
3. Draw a spanning tree of edges {2, 6, 8, 18, 35} using Kruskal’s method.
If your answers are correct, then proceed to next section.
Top

# ) #
Graphs can be used to represent the highway structure of a state on country with vertices representing cities
and edges representing sections of highway. The edge can them be assigned weights which may be either
the distance along that section of highway. A motoriot wishing to drive from city A to B would be
interested in answers to the following question:
• Is there a path from A to B?
• If there is more than one path from A to B, which is the shortest path?
64 ALGORITHMS AND ADVANCED DATA STRUCTURES

The problems defined by these questions are special cases of the path problems we study in this section.
The length of a path is now defined to be the sum of the weights of the edges or that path. The starting
vertex of the path is referred to as the source, and the last vertex the destination. The graphs are digraphs to
allow for one-way structs. In the problem we consider we are given a directed graph g = (V,E), a weighting
function cost for the edges of g, and a source vertex V0. The problem is to determine the shortest path from
V0 to all the remaining vertices of g. It is assumed that all the weight are positive.

-. + ' #$
This algorithm determines the lengths of the shortest paths from v0 to all other vertices in g.
Dijkstra’s (v, cost, dist, n)
//dist [j], 1< = j < = n, is set to the legnth
//of the hortest path from vertex v to
//vertex j in a diagraph g with n
//vertices dist [v] set to zero. G is
//represented by its cost adjacency matrix
//cost [n] [n].
{
for (i = 1; i < = n; i &&)
{ //intializes
S [i] = false; dist [i] = cost [v][i]
}
S [v] = true; dist [v] = 0.0; ||put v in S.
{
//Determines n–1 paths from v.
Choose u from among these vertices not in S such that digit [u] is minimum;
S[u] = true; ||put u in S
For (each is adjacent to u with S[w] = false)
//update distances
dist [w] = dist [u] + cost [u] [w]
}
}

Example: Consider the eight vertex diagraph of figure 2.16(a) with cost adjacency matrix as in figure
2.15(b). The values of dist and the vertices selected at each iteration of the for loop of line 12 is previous
algorithm, for finding all the shortest paths from Boston are shown in figure 2.16. To begin with, S
contains only Boston. In the first iteration of the for loop (that is num = 2), the city u that is not in S and
whose dist [4] is minimum is identified to be New York. In the next iteration of the for loop, the city that
enters S is Miami since it has the smallest dist [ ] value from among all the nodes not in S. None of the dist
[ ] values are altered. The algorithm when only seven of the eight vertices are in S. By the definition of dist,
SORTING TECHNIQUES 65

the distance of the last vertex, in this case Los Angeles, is correct as the shortest path from Boston to Los
Angeles can go through only the remaining six vertices.
@- -
""
? : A-

"
"" """

%, : :- & = B-,C
""

"" D, "" !""


"""

"""
'- A & = $,
"" >
-' (

"
"" "
"" "" "
"" "
"" " "
""" " !"" ""
" """
"" "

-' (

D - E , F :
: 3 ' % #& @$ ) &B G &$

01 01 01 01 01 01 01 01

H H / "" / "" / "" "" " " / "" / ""


I J / "" / "" / "" " " " " "

I 7J / "" / "" / "" " " " " "

I 77J / "" / "" " " " " " "


I 777J " / "" " " " " " "

I 7777J " " " " " " " "

I 77777J " " " " " " " "


66 ALGORITHMS AND ADVANCED DATA STRUCTURES

I 777777J

Top

/ !# 0 $ ! ' #$
The first questions one is most likely to ask when encountering a new a G will be: Is G connected? If G is
not connected, what are the comments of G? Therefore, our first algorithm will be one that determines the
connectedness and components of a given graph.
A addition to being an important question is its own right, the question connectedness and components
arises in many other algorithms. For example, before testing a graph G for reparability, planarity, or
isomorphism, another graph, it may be better for the sake of efficiency to determine components of G and
then subject each component to the desired scrutiny. Connectedness algorithm is very basic and may serve
as a subroutine in the involved graph-theoretic algorithms. (The reader may be reminded here although in
drawing a graph one might see whether a graph is connected or not, the connectedness is by no means
obvious to a computer or human … if the graph is presented in other forms).
Given the adjacency matrix X of a graph, it is possible to determine whether or not the graph is connected
by trying various permutations of rows with the corresponding columns of X, and them checking if it is in a
block-regional form. This, however, is an inefficient method, because it may involve n! permutations. A
more efficient method could be to check for zeros in the matrix.
Y = X + X2 +….+Xn-1.
is too is not very efficient, as it involves a large number of matrix multiplications. The following is an
efficient algorithm:
Description of the Algorithm: The basic step in the algorithm is the fusion adjacent vertices. We start with
some vertex in the graph and fuse all vertices that are adjacent to it. Then we take the fused vertex and
again fuse with it all those vertices that are adjacent to it now. This process fusion is repeated until no more
vertices can be fused. This indicates that connected component has been “fused” to a single vertex. If this
exhausts very vertex in the graph, the graph is connected. Otherwise, we start with new vertex (in a
different component) and continue the fusing operation.
In the adjacency matrix the fusion of the jth vertex to the ith vertex is accomplished by OR-ing, that is,
logically adding the jth row to the ith row as well as the jth column to the ith column. (Remember that in
logical adding 1+ 0 = 0 + 1 = 1 + 1= 1 and 0 + 0 = 0). Then the jth row and the jth column are discarded
from the matrix. (If it is difficult or time consuming to discard the specified rows and columns, one may
leave these rows and columns as the matrix, taking care that they are not considered again in any fusion.
Note that a self-loop resulting from a fusion appears as and in the man diagonal, but parallel edges are
automatically replaced by a single edge because of the logical addition (or OR-ing) operation. These, of
course, have no effect on the connectedness of a graph.
The maximum number of fusion that may have to be performed in this algorithm is n – 1, n being the
number of vertices. And since in each fusion one performs at most n logical additions, the upper bound on
the execution time is proportional to n(n–1).
SORTING TECHNIQUES 67

/ 8 1 6

A proper choice of the initial vertex (to which adjacent vertices are fused) in each component would
improve the efficiency, provided one did not pay too much of a price for selecting the vertex itself.
A flow chart of the “Connectedness and Components Algorithm” is shown in Fig. 2.17.
Top

#
In text editing we frequently found the problem of finding all occurrences of pattern in a text in text-editing
programs. The pattern searched is a particular word supplied by the user in a text. We can also use String-
matching algorithms for particular patterns in DNA sequences.
The string-matching problem is defined as follows. We assume that length of the text in an array T[1..n]
is of length n and that the pattern is an array P[1..m] of length m. We assume that the elements of
P and T belongs to a finite alphabet Σ. For example, alphabet may be Σ= {0,1} or
Σ = {a,b,…z}. The character arrays P and T are called strings.
68 ALGORITHMS AND ADVANCED DATA STRUCTURES

We say that pattern P occurs with shift s in text T (or, we can say that, that pattern P occurs beginning at
position s + 1 in text T) if 0 ≤ s ≤ n + m and T[s + 1..s + m] = P [1..m] (i.e., if T[s + j] = P[j], for 1 ≤ j ≤ m.
If P occurs with shift s in T, s is called a valid shift, otherwise, an invalid shift. The string-matching
problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T
Figure 2.18 illustrates these definitions.
Now we see the native brute-force algorithm for the string-matching problem, has worst-case running time
O((n – m +1)m) presents an interesting string-matching algorithm, due to Rabin and Karp. This algorithm
also

Text T
* * * * * : * :

pattern P s=3
*

0 " 4 )

Our goal is to find all occurrences of the pattern P = abaa in the text T = abcabaabcabac. The pattern occurs
only once in the text, for shift s = 3. The shifts s = 3 is said to be a valid shift. Here each character of the
pattern is connected by a vertical line to the matching character in the text, and all matched characters are
shown shaded. Has worst-case running time O((n –m +1)m), but it works much better on average and in
practice. It also generalizes nicely to other pattern-matching problems. The study then describes a string-
matching algorithm that begins by constructing a finite automaton specifically designed to search for
occurrences of the given pattern P in a text. This algorithm runs in time O(n + m|Σ|). The similar but much
cleverer Knuth-Morris-Pratt (or KMP) algorithm is presented further. The KMP algorithm runs in time O(n
+ m). An algorithm due to Boyer and Moore that is often the best practical choice, although its worst-case
running time (like that of the Rabin-Karp algorithm) is no better than that of the naïve string-matching
algorithm.

& $
Σ* denote the set of finite-length strings formed using characters from alphabet Σ. In this chapter, we
consider only strings of finite lengths. The zero-length empty string(∈), also belongs to Σ*. |x| is the length
of a string x. The concatenation of two strings x and y, is denoted xy. The length of xy is |x| + |y| and
consists of the characters from x followed by the characters from y.
A string w is a prefix of a string x, denoted w ⊂ x, if x = wy for some string y ∈Σ*. Note that if w ⊂ x, then
|w|≤|x|. Similarly, we can define a string w is a suffix of a string x, denoted w ⊃ x, if x = yw for some y∈Σ*.
It follows from w ⊃ x that |w| ≤|x|. The empty string ∈ is both a suffix and a prefix of every string. For
example, we have ab ⊂ abcca and cca ⊃ abcca. It is useful to note that for any string x and y and any
character a, we have x ⊃ y if and only if xa ⊃ ya. Also note that ⊂ and ⊃ are transitive relations. The
following lemma will be useful later.

Student Activity 2.11


Answer the following questions:
1. How is connectedness determined from an adjacency matrix?
2. State and explain string-matching problem.
SORTING TECHNIQUES 69

3. What is the worst-case order of time complexity of Rabin-karp algorithm?


Top

"# 1 ' #$
This algorithm is most efficient if the pattern P is relatively long and the alphabet Σ is reasonably large.
This algorithm due to Robert S. Boyer and J. Strother Moore.
Boyer-Morre Matcher (T,P,Σ)
1 n←length [L]
2 m←length[P]
3 λ←Compute-Last-Occurrence-Function (P,m,Σ)
4 y←Compute-Good-Suffix-Fuinction (P,m)
5 s←0
6 while s ≤n–m
7 do j←m
8 while j > 0 and P [j] = T[s-j]
9 do j←j–1
10 if j = 0
11 then print “Pattern occurs at shift”s
12 s←s + 7 [0]
13 else s←s + max (γ[j], j– ≠ λ [T[s+j]])

Aside from the mystrious-looking λ’s and γ‘s, this program looks remarkably like the naïve string-
matchiong algorithm. Now we comment out lines3–4 and replace the updating of s on lines 12–13 with
simple incrementations as follows :
12 s←s+1
13 else s←s+1
In the modified program, the while loop beginning on line 6 considers each of the n–m+1 possible shifts s
in turn, and the while loop beginning on line 8 tests the condition P[1..m] = T[s+ 1..s +m] by comparing
P[j] with T[s + j] for j = m, m –1……1. If the loop terminates with j = 0, a valid shifts s has been found, and
line 11 prints out the value of s. At this level, the only remarkable features of the Boyer-Moore algorithm
are that it compares the pattern against the text from right to left and that it increases the shifts s on lines
12–13 a value that is not necessarily 1.
The Boyer-Moore algorithm uses two heuristics that allow it to avoid much of the work that our previous
string-matching algorithms performed. These heuristics are very effective in that they often allow the
algorithm to skip altogether the examination of many text characters. These heuristics are known as the
“bad-character heuristic” and “good-suffix heuristic”. They are illustrated in Figure 2.19. They can be
viewed as operating independently in parallel. When a mismatch occurs, each heuristic proposes an amount
by which s can safely be increased without missing a valid shift. The Boyer-Moore algorithm chooses the
70 ALGORITHMS AND ADVANCED DATA STRUCTURES

larger amount and increases s by that amount: when line 13 is reached after a mismatch, the bad-character
heuristic proposes increasing s by j –λ[T[s + j]], and the good-suffix heuristic proposes increasing s by γ[j].
* 3 :? , : , A--3 + F

… = , K - : K ? …

s
, > : :
'(

…= , K - ) : K ? …

s+4
, > : :

'(

… …
= , K - : K ?

s+3
, > : :

(c)
2

An illustration of the Boyer-Moore heuristics. (a) Matching the pattern reminiscence against a text by
comparing characters in a right-to-left manner. The shifts s is invalid; although a “good suffix” ce of the
pattern matched correctly against the corresponding characters in the text (matching character are shown
shaded), the “bad character” I, which didn’t match the corresponding character n in the pattern, was
discovered in the text. (b) The bad-character heuristics proposes moving the pattern to the right, if possible,
by the amount that guarantee that the bad text character will match the rightmost occurrence of the bad
character in the pattern. In this example, moving the pattern 4 positions to the right causes the bad text
character I in the text to match the rightmost I in the pattern, at position 6. If the bad character doesn’t occur
in the pattern, then the pattern may be moved completely past the bad character in the pattern is to the right
of the current bad character position, then this heuristic makes no proposal. (c) With the good-suffix
heuristic the pattern is moved to the right by the least amount that guaranttes that any pattern characters that
align with the good suffix ce previously found in the text will match those suffix characters. In this
example, moving the pattern 3 positions to the right satisfies this condition. Since the good-suffix heuristic
proposes a movement of 3 positions, which is smaller than the 4-position proposal of the bad-character
heuristic, the Boyer-Moore algorithm increases the shift by 4.

"# 1 # #
This heuristic, when a mismatch occurs, uses information about where the bad text character T(S+j) occurs
in the pattern (if in occurs at al) to propose a new shift. In the best case, the mismatch occurs on the first
comparison (P[m] ≠ T[s+m]) and the bad character T[s+m] does not occur in the pattern at all. (imagine
SORTING TECHNIQUES 71

searching for am in the text string bn). So, we can increase the shift s by m, since any shift smaller than s+m
will align some pattern character against the bad character, causing a mismatch. If the best case occurs
repeatedly, the Boyer-Moore algorithm examines only a fraction I/m of the text characters, since each text
character examined yields a mismatch, thus causings to increase by m. This best-case behaviour illustrates
the power of matching right-of-left instead of left-to-right.
This algorithm works as follows. Assume we have just found a mismatch: P[j] ≠ T[s+j], where I ≤ ≤ j ≤ m.
Now let k be the largest index in the range I ≤ k ≤ m such that T[s+j]=P[k], if any such k exists. Otherwise,
let k=0. We claim that we may safely increase s by j-m. We must consider three cases to prove this claim,
as illustrated by Figure 2.20.
K=0: from Figure 2.20 (a), the bad character T[s+j] didn’t occur in the pattern at all, and so we can
safely increase s by j without missing any valid shifts.
K<j: from Figure 2.20 (b), the rightmost occurrence of the bad character is in the pattern to the left of
position j, so that j-k>0 and the pattern must be moved j-k characters to the right before the bad text
character matches any pattern characters any pattern character. Hence, we increase s by j-k without
missing any valid shifts.
k>j: from Figure 2.20 (c), j-k<0, and so the bad-character heuristic is essentially proposing to decrease
s. This recommendation will be ignored by this algorithm, because the good-suffix heuristic will
propose a shift to the right in all cases.
Now we give a simple program that defines λ [a] to be the index of the right-most position in the pattern at
which character a occurs, for each a ∈ Σ a ∈ Σ . If a is not found in the pattern, then λ [a] is set to 0. Then
we call λ the last-occurrence function for the pattern. With this definition, the expression j- λ [Ts[s+j]] on
line 13 of Boyer-Moore Matcher implements the bad-character heuristic. (Since j- λ [T[s+j]] is negative if
the rightmost occurrence of the bad character T[s+j] in the pattern is to the right of position j, we rely on the
positively of y[j], proposed by the good-suffix heuristic, to ensure that the algorithm makes progress at each
step).
72 ALGORITHMS AND ADVANCED DATA STRUCTURES

(c)

5 ) 11 *4) ) 1 )' ( *) ) )) 1 9 :
*1 ) * ) * +; ) ) 1 1 11 * *) ) '(
1 )) ) *) ) 1 1 ,< +: * 1 )
* ) * +4, ) ) 1 " ) +; 5 * ,; . *) ) : ) * )*&
1 1 =1 ')( 1 )) ) *) ) 1 1
,> + # 1 3 :+; 5 * ,; *) ) *4) ) 1) 11
1 :9 ) 1 *

COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, Σ )
1. for each character a ∈ Σ
2. do λ [a ] =0

3. for j ← l to m

4. do λ [ P ]] ← j

5. return λ

The running time of procedure COMPUTE-LAST-OCCURRENCE-FUNCTION is 0 Σ + m . ( )


SORTING TECHNIQUES 73

"# / 2
Here we need to define the relation Q ~ R (read “Q is similar to R”) for strings Q and R to mean that Q
R or R Q. We can align two similar strings with their rightmost characters matched, and no pair of
aligned characters will disagree. The relation “~” is symmetric Q~R if and only if R~Q. We also have, as a
consequence of Lemma discussed earlier, that
Q R and S R imply Q~S. (1)
If P[j] ≠ T[s+j], where j<m, then the good-suffix heuristic says that we can advance s by

λ [ j ] = m − max {k : 0 ≤ k < m and P[ j + l...m] ~ Pk }.


i.e., λ [j] is the least amount we can advance s and not cause any characters in the “good suffix”
T[s+j+l..s+m] to be mismatched against the new alignment of the pattern. We call y the good-suffix
function for the pattern P.
We now show how to compute the good-suffix function y. We first observe that
w = π [m] λ [ j ] ≤ m − π [m] for all j, as follows. If w = π [m] , then Pw P by the definition of π .
Furthermore, since P[j+1..m] P for any j, we have Pw~[j+1…m), by equation (1). Therefore, y[j] ≤ m-
π [m] for all j.
Now we rewrite our definition for y as
Y[j]=m-max {k: π [m] ≤ k<m and P[j+1…m]~Pk}.
The condition that P[j+1...m]~Pk holds if either P[j+1..m] PK or Pk P[j+1..m]. But the latter possibility
implies that Pk P and thus that k ≤ π [m], by the definition of π . This latter possibility cannot reduce the
value of y[j] below m- π [m]. We can therefore, rewrite our definition of y still further as follows:
y[j]=m-max ({ π [m]} ∪ {m k: π [m]<k<m and P[j+1..m] Pk}).
(The second set may be empty). It is worth observing that the definition implies that y[j]>0 for all j=1,
2,…m, which ensures that the Boyer-Moore algorithm makes progress.
To simplify the expression for y further, we define P’ as the reverse of the pattern P and π ’ as the
corresponding prefix function. That is, P’[i]=P[m- i +1] for i =1,2,….m, and π ’[t] is the largest u such that
u<t and P’u P’t.
If k is the largest possible value such that P[j+1..m] Pk, then we claim that
π ‘[l]=m-i, (2)
where l=(m-k)+(m-j). To see that this claim is well defined. Note that P[j+1..m] Pk implies that m-j ≤ k,
and thus l ≤ m. Also, j<m and k ≤ m, so that l ≥ 1. We prove this claim as follows. Since P[j+1..m] Pk,
we have P’m-j Pl`. Therefore, π ’[l] ≥ m-j. Suppose now that p>m-j, where p= π ’[/]. Then, by the
definition of π ’, we have P’p P’l or, equivalently, P’[1..P]=P’[1-p+l..m]=P[m-l+l..m-l+p]. Substituting
for l=2m-k-j, we obtain P[m-p+l..m]=P[k-m+j+l..k-m+j+p), which implies P[m-p+l..m] Pk-m+jp. Since
p>m-j, we have j+1>m-p+1, and so P[j+1..m] P[m-p+l..m], implying that P[j+1..m] Pk-m+j+p by the
transitivity of . Finally, since p>m-j, we have k’>k, where k’=k-m+j+p, contradicting or choice of k as
the largest possible value such that P[j+1..m] Pk. This contradiction means that we can’t have p>m-j, and
thus p=m-j, which proves the claim (2).
Using equation (2), and noting that π ’[/]=m-j implies that j=m– π ’[/] and k=m–l+ π ’[l], we can rewrite
our definition of y still further.
74 ALGORITHMS AND ADVANCED DATA STRUCTURES

y[ j ] = m − ma({π [m]}
∪ {m – l + π ' [l ] : 1 ≤ l ≤ m and j = m − π '[l ]})
= min ({m − π [m]}
∪ {l − π '
[l ] : 1 ≤ l ≤ m and j = m − π '' [l ]}) (3)

Σ Again, the second set may be empty.

Now we see the procedure for computing y:


COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
1. π ← COMPUTE-GOOD-SUFFIX-FUNCTION (P)
2. Pt ← reverse (P)
3. π '← COMPUTE-GOOD-SUFFIX-FUNCTION (P’P
4. for j ← 0 to m
5. do y[j] ← m- π [m]
6. for l ← 1 to m
7. do j ← m– π ’[l]
8. if y[j]>l- π ’[l]
9. then y[j] ← l– π ’[l]
10. return y
The procedure COMPUTE-GOOD-SUFFIX-FUNCTION is a direct implementation of equation (3). Its
running time is 0(m).

The worst-case time complexity of the Boyer-Moore algorithm is clearly O((n-m+1)m+ Σ ), COMPUTE-
GOOD-SUFFIX-FUNCTION takes time O(m), and the Boyer-Moore algorithm (like the Rabin-Karp
algorithm) spends O(m) time validating each valid shift s.

"# , #1 1) #$
Knuth. Morris and Pratt gave a linear-time string-matching algorithm. Their algorithm is a θ(n + m) running
time algorithm by avoiding the computation of the transition function δ . It does the pattern matching using
just an auxiliary function π[1..m] pre-computed from the pattern in time O(m). The array π allows the
transition function δ to be computed efficiently “on the fly” as needed. We can say, for any state q = 0,
1……..m and any character a, ∈Σ, the value δ [q] contains the information that is independent of a and is
need to compute δ (q,a). (This remark will be clarified shortly.) Since the array π has only m entries,
whereas δ has O(m[Σ]) entries, we save a factor of Σ in preprocessing by computing π rather than δ.

The prefix function for a pattern provides knowledge about how the pattern matches against shifts to itself.
This information can be used to avoid testing useless shifts in the naïve a pattern-matching automaton.
SORTING TECHNIQUES 75

Now see the operation of the naïve string matcher. Figure 2.21(a) describe 3 particular shifts s of a template
containing the pattern P = ababaca against a text T. For this example, q = 5 of the characters have matched ,
but the 6th information that q characters have matched successfully determines the corresponding text
characters. If we know these q text characters then we can determine immediately that certain shifts are
invalid. In the example of the figure, the shifts s + 1 is necessarily invalid, since the first pattern character,
an a would be aligned with a text character that is known to match with the second pattern character, a b.
The shifts s – 2 shown in part (b) of the figure, however, aligns the first three pattern characters with three
text characters that must necessarily match. In general it is useful to know the answer to the following
question:

* : * * * * : * * T

s
* * : P

q
'(

* : * * * * : * * T

s
* * : P
k
'(

: * * P4

* P4
')(

3 ) π

'( ; ) 1 *9 3 1 1 ; -) ) 1 ) 7 )
) ) 1:1 9 1 * * ) ) * ) 1 ' ( ?1 , 9 * - ) *
76 ALGORITHMS AND ADVANCED DATA STRUCTURES

) ) 1:9 ) * * ) 1 @ 1 *: 1 @ 1 ) 11 9
9 , 9 3 * 1 * ')( 1 1)
* * ) 1) ) * ) 9 1 :9 1 1 3
1 1 1 3 - 1 1 1 ) * * 1 * π:1 πA-B;
6 ) ) 1 ) * 1 )) 11 1 : 3 *1 1
πA B(

Given that pattern characters P[1..q] match text characters T[s + 1..s + q], what is the least shift
s’ > s such that
P[1..k] = T[s’ + 1..s’ + k] ... (α)
where s’ + k = s + q?
Such a shifts s’ is the first shift greater than s that is not necessarily invalid due to our knowledge of T [s +
i..s + q]. In the best case, we have that s’ = s + q, and shifts s + 1, s + 2, ….,s+q –1 are all immediately
ruled out. In any case, at the new shift s’ we don’t need to compare the first k characters of P with the
corresponding characters of T, since we are guaranteed that they match by equation (α).
See in Figure 2.21(c). Since T[s’ + 1..s’ –k] is part of the known portion of the text, it is a suffix of the
string P4. Equation (α) can therefore be interpreted as asking for the largest k < q such that Pk ]Pq . Then, s’
= s + (q –k) is the potentially valid shift. It turns out to be convenient to store the number k of matching
characters at the new shift s’, rather than storing, say, s’ – s. This information can be used to speed up both
the naïve string-matching algorithm and the finite-automaton matcher.
We formalize the precomputation required as follows. Given a pattern P[1..m], the prefix function for the
pattern P is the function :{1, 2,….,m} {0,1,….,m–1} such that
π[q] = max {k : k < q and Pk ]Pq }

That is π[q] is the length of the longest prefix of P that is a proper suffix of Pq.
The Knuth-Morris-Pratt Matching algorithm is given in pseudocode below as the procedure KMP-Matcher.
It is mostly after Finite-Automaton-Matcher, as we shall see. KMP-Matcher calls the auxiliary procedure
Compute-Preffix-Function to compute π.
KMP-Matcher (T,P)
1 n←length [T]
2 m←length[T]
3 k←Compute-Prefix-Function (P)
4 q←0
5 for I←1 to n
6 do while q> 0 and P[q + 1] ≠ T[i]
7 do q←π[q]
8 if P[q + 1] = T[i]
9 then q←q +1
10 if q = m
SORTING TECHNIQUES 77

11 then print “Pattern occurs with shift”i – m


12 q←π[q]
Compute-Prefix-Function (P)
1 m←length [P]
2 π[1]←0
3 k←0
4 for q←2 to m
5 do while k> 0 and P[k + 1] ≠ P[q]
6 do k←π[k]
7 if P[k + 1] = P[q]
8 then k←k +1
9 π[q] ←k
10 return π

The time complexity of Compute-Prefix-Function is O(m). We can attach a potential of k with the current
state k of the algorithm. This potential has an initial value of 0 (from line 3). Line 6 decreases k, since π[k]
< k. Since π[k] ≥ 0 for all k, however, k can never become negative. The only other line that affects k is line
8, which increases k by at most one during each execution of the for loop body. Since k < q upon entering
the for loop and since q is incremented in each iteration of the for loop body, k < q always holds. (This
justifies the claim that π[q] < q as well, by line 9. We can pay for each execution of the while loop body on
line 6 with the corresponding decrease in the potential function, since π[k] < k. Line 8 increases the
potential function by at most one, so that the amortized cost of the loop body on lines 5–9 is O(1). Since the
number of other-loop iterations is O(m), and since the final potential function is at least as great as the
initial potential function, the total actual worst-case running time of Compute-Prefix-Function is O(m).
The Knuth-Morris-Pratt algorithm has time complexity O(m + n). The call of Compute-Prefix-Function
takes O(m) time as we have just seen, and a similar analysis, show that using the value of q as the potential
function, the remainder of KMP-Matcher takes O(n) time.

$$
Average Case Time complexities of Bubble sort, insertion sort and selection sort are O(n2).
Merge Sort has space complexity O(n2).
Primes and kruskal’s algorithms are used to find minimum spanning tree.
Finding all occupancies of pattern in a text is a problem known as string matching.

1 $
78 ALGORITHMS AND ADVANCED DATA STRUCTURES

I. True and False


1. A selection sort is one in which successive elements are selected in order and placed into their
proper sorted positions.
2. Selection sort is more efficient than quick sort.
II. Fill in the blanks
1. Kruskal’s algorithm is used to find______________.
2. Average case time complexity of quick sort is______________.
3. Two string matching algorithms are _________ and ________ algorithms.

I. True and False


1. True
2. False
II. Fill in the blanks
1. minimum spanning tree
2. nlogn
3. Boyer, Moore

I. True and False


1. Prim’s algorithm is used to find minimum spanning tree.
2. If a pattern is relatively long and the alphabet is reasonably large than Boyer-Moore algorithm is
the most efficient string matching algorithm.
II. Fill in the blanks
1. Time complexity of Bubble Sort is _______________.
2. Space complexity of Merge out is _______________.
3. Two string matching algorithms are _______________ and _______________.
4. Kruskal’s algorithm is used to create_______________ tree_______________.
5. Prims algorithm is a _______________ method for creation of minimum spanning tree.
Overview
Principle of Optimality
Matrix Multiplication
Optimal Binary Search Trees

Dynamic Programming

Learning Objectives
• Overview
• Principle of Optimality
• Matrix Multiplication
• Optimal Binary Search Trees
Top

Dynamic programming is an algorithm design method that can be used when the solution to a problem can
be viewed as the result of a sequence of decisions.
One way to solve problems for which it is not possible to make a sequence of stepwise decisions leading to
an optimal sequence is to try all possible decision sequences. We could enumerate all decision sequences
and then pick out the best. But the time and space requirements may be prohibitive. Dynamic Programming
often drastically reduces the amount of enumeration by avoiding the enumeration of some decision
sequences that cannot possibly be optimal. In dynamic programming an optimal sequence of decisions is
obtained by making implicit appeal to the principle of optimality.
Top

The principle of optimality states that an optimal sequence of decisions has the property that whatever the
initial state and decision are, the remaining decisions must constitute an optimal decision sequence with
regard to the state resulting from the first decision.
Thus, the essential difference between the greedy method and dynamic programming is that in the greedy
method only one decision sequence is ever generated. In dynamic programming much decision sequences
may be generated. However, sequences containing sub optimal subsequences cannot be optimal (if the
principle of optimality, holds) and so will not (as far as possible) be generated.
82 ALGORITHMS AND ADVANCED DATA STRUCTURES

Another important dynamic feature of programming approach is that optimal solutions to subproblems are
retained so as to avoid recomputing their values. The use of these tabulated values makes it natural to recast
the recursive equations into an iterative algorithm.

Student Activity 3.1


Before going to next section, answer the following questions:
1. What is Dynamic programming?
2. State principle of optimality.
If your answers are correct, then proceed to next section.
Top

Our first example of dynamic programming is an algorithm that solves the problem of matrix-chain
multiplication. We have a sequence (chain) (A1, A2 ……An) of n a matrices to be multiplied, and our goal is
to compute the product
A1 A2…… An……….(1)
We can evaluate the expression (1) using the standard algorithm for multiplying pairs of matrices as a sub
routine once we have parenthesized it to resolve all ambiguities in how the matrices are multiplies together.
A product of matrices is fully parenthesized if it is either a single matrix or the product of two fully
parenthesized matrix products, surrounded by parenthesis. We know that the matrix multiplication is
associative, therefore all parenthesations yield the same product. For example, the product of matrices A1,
A2, A3 A4 can be fully parenthesized in five distinct ways :
(A1 (A2 (A3 A4))),
(A1 (A2 A3) (A4)),
((A1 A2) (A3 A4)),
(A (A2 A3) A4),
(((A1 A2) A3) A4),
The method of paranthesization of matrices can have a dramatic impact on the cost of evaluating the
product. Consider the cost of multiply two matrices. The standard algorithm is given by the following
pseudocode algorithm.
The attributes rows and columns are the number of rows and columns in a matrix.
Matrix Multiply (A,B)
{
if (columns [A] ! = rows [B]
print_error (“incompatible dimensions”)
DYNAMIC PROGRAMMING 83

else
for i = 1 to rows [A]
for j = 1 to columns [B]
{
C [i, j] = 0
For k = 1 to columns [A]
C [i, j] = C[i j] + A[i, k] * B[k, j]
}
}
To examine the different costs incurred by different parenthesizations of a matrix product, consider the

First we should convince our selves that exhaustively checking all possible parenthesizations does not yield
an efficient algorithm Now we solve the matrix-chain multiplication problem by dynamic programming,.
Denote the number of alternative parenthesization of a sequence of n matrices by P(n). Since we can split a
seuqence of n matrices between the Kth and (k + 1)st matrices for any K = 1, 2,…… n – 1 and then
parenthesize the two resulting subsequences independently, we obtain the recurrence

{
1 if n = 1
n–1
P(n) == Σ P (k) P (n–k) if n ≥ 2
k=1

The solution to this recurrence is the sequence of Catalan number :


P(n) = c(n – 1), where

1 2n
C (n ) =
n + 1 n
== Ω (4 n
/ n 3 / 2
)
The number of solution s is thus exponential in n, and the brute force method of exhaustive search is
therefore a poor strategy for determining the optimal parenthesization of a matrix chain.
84 ALGORITHMS AND ADVANCED DATA STRUCTURES

In dynamic-programming the first step is to characterize the structure of an optimal solution. For the
matrix-chain-multiplication problem we can perform this step as follows. For convenience, let us adopt the
notation Ai…j for the matrix that results from evaluating the product Ai Ai+1….Aj. An optimal
parenthesization of the product A1 A2….An splits the product between Ak and Ak+1 for some integer k in the
range 1 ≤ k ≤ n. That is, of some value k, we first compute the matrices A1...k and Ak+1….n and then multiply
them together to produce the final product A1….n. The cost of this optimal parentesization is thus the cost of
computing the matrix A1….k, plus the cost of computing Ak+1….n, plus the cost of multiply them together.
The key observation is that the parenthesization of the ‘prefix’ subchain A1 A2…Ak within this optimal
parenthesization of A1…An must be an optimal parenthesization of A1 A2…An why? If there were a less
costly way to parenthesize A1 A2…Ak, substituting that parenthesization in the optimal parenthesization of
A1 A2…An would produce another parenthesization of A1 A2…An whose cost was lower than the optimum : a
contradiction. A similar observation hold for the parenthesization of the subchain Ak+1 Ak+2…An in the
optimal parenthesization of A1 A2….An : it must be an optimal parenthesization of Ak+1 Ak+2…An.
Thus, an optimal solution to an instance of the matrix-chain multiplication problem contains with in it
optimal solution to sub problem instances.
!
Next we define the value of an optimal solution recursively in terms of the optimal solutions to sub
problem. For this problem, we pick as our subproblems the problems of determining the cost of a
parenthesization of Ai Ai+1…..Aj for 1 ≤ i ≤ j ≤ n. Let m [i, j] be the minimum number of scalar
multiplications needed to compute the matrix Ai…j; the cost of a cheapest way to compute A1….n would thus
be m(1, n).
Now we can define m(i , j) as follows. If i = j; the chain consists of just one matrix Ai…j = Ai. So no scalar
multiplications are necessary to compute the product. Thus m (i,j) = 0 for i = 1, 2,….n. To compute m(i, j)
when i<j, we take advantage of the structure of an optimal solution from step 1. Let us suppose that the
optimal parenthesization splits the product Ai Ai+1….Aj between Ak and Ak+1, where I ≤ k < j, them m(i, j) is
equal to the minimum cost for computing the sub products Ai…k and Ak+1….j, plus the cost of multiplying
these two matrices together. Since computing the matrix product Ai….k Ak+1….j takes Pi–1 Pk Pj scalar
multiplications, we obtain.
M(i, j) = m[i, k] + m[i, k] m[k+1, j] + Pi–1 Pk Pj..
This recurrence relation assumes that we know the value of k, which we don’t have are only j–i possible
values for k, however, namely k=i, i+1,….j–1. Since the optimal parenthesization must use one of these
values for k, we need only to check them all to find the best. Thus, our recursive definition for the minimum
cost of parenthesizing the product Ai Ai+1….Aj becomes

{
0 if i = j
min
m[i, j] = {m[i, k] +m [k+1, j] + pi–1 Pk Pj}
i≤k≤j
i j i<j (2)
The m[i, j] values give the costs of optimal solutions to subproblems. To help us keep track of how to
construct an optimal solution, let us define s[i, j] to be a value of k at which we can split the product Ai
DYNAMIC PROGRAMMING 85

Ai+1……Aj to obtain an optimal parenthesization. That is s[i, j] equals a value k such that m[i, j] = m[i, k]
+m[k+1, j] + pi–1 Pk Pj

Now it is a simple to write a recursive algorithm based on relation (2) to compute the minimum cost of m[1,
n] for multiplying A1 A2….An. However this algorithm takes exponential time—no better than the brute-
force method of checking each way of parenthesizing the product.
The important observation is that we have relatively few sub problems. One problem for each choice of i
and j satisfying 1 ≤ i ≤ j ≤ n or (n2) + n = θ (n2) total.
Instead of computing the solution to recurrence (3) recursively we perform the third step of the dynamic
programming paradigm and compute the optimal cost by using a bottom-up approach. The following
pseudocode algorithm assumes that matrix Ai has dimensions Pi–1×Pi for i = 1, 2, ….n. The input sequence
is (P0, P1…….Pn), where length [P] = n + 1. The procedure uses an auxiliary table m[1…n, 1…..n] for
storing them m[i, j] costs and an auxiliary table s[1….n, 1…..n] that records which index of k achieved the
optimal cost in computing m[i, j]
Matrix chain order (P)
{
n = length [p]–1
for (i = 1; i< = n; i ++)
m [i, i] = 0
for (l = 2; l < = n; l ++)
for (i = 1; i< = n–l+1; i ++
{
j = i + l–1
m[i, j] = ∞
for (k = I; k < = j–1; k++)
{
q–m[i, k] +m [k+1,J]+Pi–1 Pk Pj
if (q<m [i, j])
{
m [i, j] = q
s[i, j]=k
}
}
}
This algorithm fits the table m in a manner that corresponds to solving the parenthesization problem or
matrix chain s of increasing length.
86 ALGORITHMS AND ADVANCED DATA STRUCTURES

Now we see the operation of above algorithm on a chain of n = 6 matrices:


Matrix dimension

The table m and s for this problem are shown below which are calculated from the above algorithm.

The minimum number of scalar multiplication to multiply the 6 matrices is m[1, 6] = 15, 125.
A simple inspection of the nested loop structure of Matrix chain order yields a running time of O(n3) for
the algorithm.

The Matrix chain order does not show how to multiply the matrices it only determines the optimal
number of scalar multiplications needed to compute a matrix-chain product.
We will use the table s[1…n,1….n] to determine the best way to multiply the matrices. Each entry s[i, j]
records the value of k such that the optimal parenthesization of Ai Ai+1…….Aj splits the product between Ak
and Ak+1. Thus, we know that the final matrix multiplication in computing A1…n optimally is A1….S[1,n] As[1,
n]+1…..n . The earlier matrix multiplication can be computed recursively, since s[1, s (1, n)] determines the
east matrix multiplication in computing As[1, n]+1…..n. The following recursive procedure computes the
matrix-chain product Ai…j given the matrices A = (A1, A2,…. An). The table s computed by Matrix chain
order, and the indices i and j. The initial call is Matrix chain multiply (A, s, 1, n)
DYNAMIC PROGRAMMING 87

Matrix Chain Multiply (A, s, i, j)


{
if (j > I)
{
X = Matrix Chain Multiply (A, s, i, s [i, j]);
Y = Matrix Chain Multiply (A, s, s[i, j] + 1, j);
Return Matrix Multiply (x, y)
}
else return AI
}
In the above example the call
Matrix Chain Multiply (A, s, 1, 6) computes the matrix chain product according to the parenthsization ((A1
(A2 A3)) ((A4 A5) A6)).

Student Activity 3.2


Before going to next section, answer the following questions:
1. What is matrix chain multiplication problem?
2. Describe matrix chain multiplication with an example.
If your answers are correct, then proceed to next section.
Top

" ! #
Given a fixed set of identifiers to create a binary search tree organization. We may expect different binary
search trees for the same identifier set to have different performance characteristics. The tree of figure
3.2(a), in the worst case, requires four comparisons to find an identifier. Where as the tree of figure 3.2(b)
requires only the tree. On the average the two trees need 12/5 and 11/5 comparisons, respectively.
88 ALGORITHMS AND ADVANCED DATA STRUCTURES

For example, in the case of tree 1(a), it takes 1, 2, 2, 3, and 4 comparisons, respectively to find the
identifiers for, do, while, int and if. Thus the average number of comparisons is (1+2+3+4)5 = 12/5. This
calculation assumes that each identifier is searched for with equal probability and that no unsuccessful
searches (i.e., searches for identifiers not in the tree) are made.
In a general situation we can expect different identifiers to be searched for with different frequencies (or
probabilities) in addition, we can expect unsuccessful searches also to be made. Let us assume that the
given set of identifiers is {a, a2, an} with a1 < a2 <……..<an. Let P(i) be the probability with which we
search for ai. Let q(i) be the probability that the identifier x being searched for is such that ai<x<ai+1, 0 ≤ I
≤ n (assume a0 = –00 and an+1 = +00). Then, Σo≤i≤n a(i) is the probability of an unsuccessful search.
Clearly, Σ1≤i≤n P(i) + Σ0 ≤ I ≤ n q(i) = 1. Given this data we wish to construct an optimal binary search tree for
{a, a2….an}. First of course we must be precise what we mean by an optimal binary search tree.
In obtaining a cost function for binary search trees, it is useful to add a fictitious node in place of every
empty subtree in the search tree, such nodes, called external nodes, are drawn square in the figure 3.3. All
other nodes are internal nodes. If a binary search tree represents n identifiers, then there will be exactly n
internal nodes and n+1 (fictitious) external nodes. Every internal node represents a point where an
unsuccessful search may terminate.

If a successful search terminates at an internal node at level l, then l iterations of the while loop of binary
search algorithm are needed. Hence the expected cost contribution from the internal node for ai is P(i) *
level (ai).
Unsuccessful searches terminate with t=0 (i.e., at an external node). The identifiers not in the binary search
tree can be partitioned into n+1 equivalence classes Ei, 0 ≤ i ≤ n. The class E0 contains all identifiers x such
that ai<x<ai+1, 1 ≤ i ≤ n. The class Ei contains all identifiers x, such that ai<n<ai+1, 1<I<n. The class En
contains all identifiers x, x>an. It is easy to see that for all identifiers in the same class Ei, the search
terminates at the same external node. For identifiers in different Ei the search terminates at different
DYNAMIC PROGRAMMING 89

external nodes. If the failure node for Ei is at level l, then only l–1 iterations of while loop are made, Hence
the cost contribution of this node is q(i) * (level (Ei)-1).
The preceding discussion leads to the following formula for the expected cost of a binary search tree:
Σ1 i ≤n P(i) * level (ai) + Σ0 ≤ i ≤ n q(i) * (level (Ei)–1)
We define an optimal binary search tree for the identifier set {a1, a2……an} to be a binary search tree for
which above equation is minimum,

The possible binary search trees for the identifier set {a1, a2, a3} = (do, if, while) are given in figure 3.4
with equal probabilities
P(i) = q(i) = 1/7 for all i, we have
Cost (tree a) = 15/7 cost (tree b) = 13/7
Cost (tree c) = 15/7 cost (tree d) = 15/7
Cost (tree e) = 15/7
As expected tree b is optimal. With P(1) = .6 P (2) = .1 P(3) = .05, q (0) = .15, q (1) = .1, q (2) = .05 and q
(3) = .05 we have
Cost (tree a) = 2.65 cost (tree b) = 1.9
Cost (tree c) = 1.5 cost (tree d) = 2.05
Cost (tree e) = 1.6
For instance, cost (tree a) can be computed as follows. The contribution from successful searches is
3*0.5+2*0.1+0.05=1.75 and the contribution from unsuccessful searches is 3*0.15+3*.1+2*0.05+ 0.90. All
the other costs can also be calculated in a similar manner. Tree c is optimal with this assignment of P’s and
q’s.
90 ALGORITHMS AND ADVANCED DATA STRUCTURES

!" !

To apply dynamic programming to the problems of obtaining an optimal binary search tree, we need to
view the construction of such a tree as the result of a sequence of decisions and then observe that the
principle of optimality holds when applied to the problem state resulting from a decision. A possible
approach to this would be to make a decision as to which of the ai’s should be assigned to the root node of
the tree. If we choose ak, then it is clear that the external nodes for a1, a2,……ak–1 as well as the external
nodes for classes E0 E1, ….Ek–1 will be in the left subtree l of the root. The remaining nodes will be in the
right subtree r. Define
Cost (l) = Σ1<I≤n P(i) * level (ai) + Σ0<I≤n q(i) * (level (Ei)–1)
and
Cost (r) = Σ1<I≤n P(i) * level (ai) + Σk<I≤n q(i) * (level (Ei)–1)
In both cases the level is measured by regarding the root of the respective subtree to be at level 1.
Using w(i, j) to represent the sum
i
a(a) + [q(l ) + P(l )] , we obtain the
i =1+1
DYNAMIC PROGRAMMING 91

Following as the expected cost of the search tree (figure 3.5):

# $ " %

P(k)+cost (l) + cost (r) + w (o, k–1) + w (k,n)……………(1)


If the tree is optimal, the equation (1) must be minimum over all binary search trees containing a1, a2…ak–1
and E0, E1,…., Ek–1. Similarly cost (r) must be minimum. It we use C(i, j) to represent the cost of an optimal
binary search tree tij containing ai+1,…..,aj and Ei,…..Ej, then for the tree to be optimal, we must have cost
(l) = C(o, k–1) and cost (r) = C(k,n). In addition, k must be chosen such that
P(k)+C(0, k–1) + C(0, k–1)+C(k, n)+W(0, k–1)+W(k, n)
is minimum. Hence for
C(0, n) we obtain C(0, n)= min {C(0, k–1)+C(k, n)+P(k)+W(0, k–1)+W(k, n)}……..(2)
|≤k≤n
We can generalize equation (2) to obtain for any C(i, j)
C(i, j))= min {C(i, k–1)+C(k, j)+P(k)+W(i, k–1)+W(k, j)}
i<k≤j
C(i, j))= min {C(i, k–1)+C(k, j)+W(k, j)}…………………….(3)
i <k≤j
Equation (3) can be solved for C(0, n) by first computing all C(i, j) such that j–i=1 (note C(i, j) = 0 and W(i,
i) = q(i), 0 ≤ I ≤ n. Next we can compute all C(i, j) such that j–i = 2, then all C(i, j) with j–i=3, and so on. If
during this computation we record the root r(i, j) of each tree tij, then an optimal binary search tree can be
constructed from these r(i, j). Note that r (i, j) is the value of k that minimizes equation (3).

Let n = 4 and (a1, a2, a3, a4) = (do, if, int, while). Let P (1:4) = (3, 3, 1, 1) and q (0:4) = (2, 3, 1, 1, 1). The
p’s and q’s have been multiplied by 16 for convenience. Initially, we have w(i, i)= q(i), C(i, i) = 0 and r(i, i)
= 0, 0 ≤ i ≤ 4. Using equation (3) and the observation w(i, j) = p(i)+q(j)+ w(i, j–1), we get
w(0, 1)=p(1)+ q(1)+ w(0, 0)=8

c(0, 1)=w(0, 1)+min {c(0, 0)+ c(1, 1)}=8

r(0, 1) =1

w(1, 2)=p(2)+q(2)+w(1, 1)=7

c(1, 2)=w(1, 2)+ min {c(1, 1)+ c(2, 2)}=7

r(0, 2)=2
92 ALGORITHMS AND ADVANCED DATA STRUCTURES

w(2, 3)=p(3)+q(3)+w(2, 2)=3

c(2, 3)=w(2, 3)+ min {c(2, 2)+c(3, 3)}=3

r(2, 3)=3

w(3, 4)=p(4)+q(4)+w(3, 3)=3

c(3, 4)=w(3, 4)+ min {c(3, 3)+c(4, 4)}=3

r(3, 4)=4
Knowing w(i, i+1) and c(i, i+1), 0 ≤ i<4, we can again use equation (3) to compute w(i, i+2), c(i, i+2) and
r(i, i+2), 0 ≤ i<3. This process can be repeated until w(0, 4), c(0, 4), and r(0, 4) are obtained. The table of
figure 3.5 shows the result of this computation. The box in row i and column j shows the result of this
computation. The box in row i and column k shows the values of w(j, j+1), c(j,j+1) and r(j, j+1)
respectively. The computation is carried out by row from row 0 to 4. From the table we see that c(0, 4)=32
is the minimum cost of a binary search tree for (a1, a2, a3, a4). The root of tree to 4 is a2. Hence, the left
subtree is t01, and the right subtree t24. Tree t01 has root aj and subtrees t00 and t11. Tree t24 has root a3; its left
subtree is t22 and its right subtree t34. Thus, with the data in the table it is possible to reconstruct t04. Figure
3.6 shows t04.

0 1 2 3 4
w00=2 w11=3 w22=1 w33=1 w44=1
0 c00=0 c11=0 c22=0 c33=0 c44=0
r00=0 r11=0 r22=0 r33=0 r44=0
w01=8 w12=7 w23=3 w34=3
1 c01=8 c12=7 c23=3 c34=3
r01=1 r12=2 r23=3 r34=4
w02=12 w13=9 w24=52
2 c02=19 c13=12 c24=8
r02=1 r13=2 r24=3
w03=14 w14=11
3 c03=25 c14=13
DYNAMIC PROGRAMMING 93

r03= 2 r14=2
w04=16
4 c04=32
r04=2

& ' () *! !" *! *!

If

Do int

while

+ $) (

The above example shows how equation (3) can be used to determine to c’s and r’s and also how to
reconstruct t0n knowing the r’s. Let us examine the complexity of this procedure to evaluate the c’s and r’s.
The evaluation procedure described in the above example requires us to compute c(i, j) for (j–i) = 1, 2,…..,
n in that order. When j–i=m, there are n–m+1 c(i, j)’ s to compute. The computation of each of these c(i,
j)’s requires us find the minimum of m quantities (see equation (3)). Hence, each such c(i, j) can be
computed in time O(m). The total time to evaluate all C(i, j) p and r(i, j)’ s is therefore.

Σ (nm–m2)=O(n3)
i ≤m≤n

We can do better than this using a result due to D.E. Knuth which shows that the optimal k in equation (3)
can be found by limiting the search to the range r(i, j–1) ≤ k ≤ r(i+1, j). In this case the computing time
become O(n2). The function OBST uses this result to obtain the values of w(i, j), r(i, j) and c(i, j), 0 ≤ i ≤ j
≤ n, in O(n2) time. The tree ton can be constructed from the values of r(i, j) in O(n) time.
OBST (p, q, n)
//given n distinct identifies a, <a3<a2<..<an;
//and probabilities p[i], 1 ≤ i ≤ n, and q[i],
//0 ≤ i ≤ n, this algorithm computes the
//cost c[i, j] of optimal binary search
//tree tij for identifiers ai+1,…….aj. It also
//computes r[i, j], the root of tij. w[i ,j]
// is the weight of tij.
{
for (i=0; l<=n–1; I++)
{
94 ALGORITHMS AND ADVANCED DATA STRUCTURES

//initialize
w[i, i]=q[i]
r[i, i]=0;
c[i, i]=0.0;
//optimal trees with one node
w[i, i+1]=q[i]+q[i+1]+p[i+1];
r[i, i+1]=i+1;
c[i, i+1]=q[i]+q[i+1]+p[i+1];
}
w[n, n]=q[n];
r[n, n]=0;
c[n, n]=0.0;
for (m=2; m<=n; m++) //find optimal trees with m nodes
for (i = 0; i<=n–m; i++)
{
j= i+m;
w[i, j]=w[i, j–1]+P[j]+q[j];
//solve equation (3) using Knuth’s result.
k = find (c, r, i, j);
//A value of / in the range r[i, j–1] ≤ l
//≤ r[i+1, j] that minimizes
//c[i, l–1]+c[i, j];
c[i, j]=w[i, j]+c[i, k–1]+c[k, j];
r[i, j]=k;
}
Write (C[0, n], w(0, n), r[0, n]);
} //end OBST
Find (c, r, i, j)
{
min = 00;
for (m=r[i, j–1]; m<=r[i+1,j]; m+1)
{
DYNAMIC PROGRAMMING 95

if (c[i, m–1]+c[m, j]<min)


{
min = c[i, m–1]+c[m, j];
l = m;
}
}
return (l);
}

Student Activity 3.2


Answer the following questions:
1. Explain the need for optimal binary search tree.
2. Give the formula to find expected cost of a binary search tree.

!
• The principle of optimality status that an optimal sequence of decision has the property that whatever
the initial state and decision are, the remainly decision not constitute an optimal decision sequence with
regard to the state resulting from the first decision.
• We apply the Algorithm for computing the optimal costs over recessive solution. We compute the
optimal cost by using a potter-up approach.

I. True and False


1. In greedy method only one decision sequence is ever generated
2. Principle of optimality does not hold for dynamic programming
II. Fill in the blanks
1. Matrix chain multiplication problem can be solved by _______________.
2. To make binary search tree more efficient we need to find ___________________.

I. True and False


96 ALGORITHMS AND ADVANCED DATA STRUCTURES

1. True
2. False
II. Fill in the blanks
1. Dynamic programming
2. Optimal binary search tree

1. Fill in the blanks:


(a) The essential difference between the greedy method and ______ is that in the _______ method
only one decision sequence is ever generated.
(b) An __________ to take of the matrix-chain multiplication problems contain with is it optimal
solution to sub problem instances.
(c) ____________ is an algorithm design method that can be used when the solution to a problem
can be viewed as the result of a sequence of decision.
(d) In dynamic programming an optimal sequence of decision is obtained by making implicit appeal
to the _________.

$ %&
1. Find an optimal parenthesization of a matrix-chain product whose sequence of dimensions is (5, 10,
3, 12, 5, 50, 6).
2. Give an efficient algorithm Print-Optimal-areas to print the optimal parentesization of a matrix
chain given the table s computed by Matrix chain order. Analyze your algorithm.
3. Show that a full parenthesization of n-element expression has exactly n–1 pairs of parentheses.
4. Let Q(i, j) be the number of times that table entry m[i, j] is referenced by Matrix chain order in
computing other tables entries. Show that the total number of references for the entire table is
n n n 3n
R(i , j ) =
i =1 i =1 3

(Hint: You may find the identity)

n
Σ i2=n(n+1) (2n+1)/6 useful.
i =1

5. Use the function OBST to compute w(i, j), r(i, j), and c(i, j), 0 ≤ i ≤ j ≤4, for the identifier set (a1, a2,
a3, a4)= (cout, float, if while) with
P(1)=1/20, P(2)=1/5, P(3)=1/10, P(4)=1/20,
Q(0)=1/5, q(1)=1/10, q(2)=1/5, q(3)=1/20 and q(4)=1/20.
Using the r(i, j)’s, construct the optimal binary search tree.
DYNAMIC PROGRAMMING 97

6. (a) Show that the computing time of function OBST is O(n2).


(b) Write an algorithm to construct the optimal binary search tree given the roots
r(i, j), 0 ≤ i ≤ j ≤ n. Show that this can be done in time O(n).
Overview
Polynominal Time
NP-completeness and Reducibility
NP-completeness
NP-completeness Proofs
NP-complete Problems

NP Complete Problem

Learning Objectives
• Overview
• Polynomial-time
• NP-Completeness and Reducibility
• NP-Completeness Proofs
• NP-Complete Problems
Top

All of the algorithms we have studied thus far have been polynomial-time algorithms: on inputs of size n,
their worst-case running time is O(nK) where K is a constant. What do you think whether all problems can
be solved in polynomial-time? The answer is no. Some problems such as Tuning's famous "Halting
Problem," cannot be solved by any computer, no matter how much time is provided. One problems can b e
solved, but not in time O(nK) for any constant k. The problems that can be solvable by polynominal - time
algorithms are called tractable, and problems that need superpolynominal time as being intractable.
We discuss an interesting class of problems called the "NP-Complete" problems in this chapter, whose
status is unknown. There is no polynomial-time algorithm for an NP-Complete problem, nor we are yet able
to prove a superpolynominal time lower bound for any of them. In theoretical computer science question P
≠ NP question has been one of the deepest, most perplexing open research problems in theoretical computer
science. It came into existence in 1971.
Many scientists believe that the NP-complete problems can be solved in polynomial time i.e. intractable.
Because if any single NP-complete problem can be solved in polynominal time, then we can solve every
NP-complete problem in a polynomial time.
If you want to design good algorithms you should understand the rudiments of the theory of NP-
completeness. If its intractability can be proved, as an engineer you would then do better spending your
time developing an approximation algorithm rather than searching for fast algorithm that solved the
problem exactly. Some problems no harder than sorting, graph searching or overhear flow seem easy but
are in face NP-complete. Hence we should become familiar with its important class of problems.
NP COMPLETE PROBLEM 99

Student Activity 4.1


Before going to next section, answer the following questions:
1. Give the formal definition for the problem of finding the longest simple cycle in an undirected graph.
Give a needed decision problem. Give the language corresponding to the decision problem.
2. What are abstract problem?
If your answers are correct, then proceed to next section.
Top

We begin our study of NP-completeness by defining polynominal time solvable problems. Generally these
problems are tractable. The reason is a mathematical issue. We give three supporting arguments here.
The first argument says that it is reasonable to regard a problem that requires time Q(n100) as intractable
there are very few practical problems that require time on the order of such a high degree polynomials. The
practical polynomial-time problems require much less time.
Second, many problems that can be solved in polynomial-time in one model, there always exists another
polynomial-time model.
And the third is that the class of polynomial-time solvable problems has closure properties. For example if
we fed the output of one polynomial-time algorithm into the input of another, the resultant algorithm is
polynomial. If an polynomial-time algorithm makes a constant number of calls to polynomial-time
subroutined, the running time of the composite algorithm is polynomial.

To make clear the class of polynomial-time solvable problems, we first define what a problem is. We define
an abstract problem Q to be a binary relation on a set I of problem instances and a set S of problem
solutions. For example, remember the problem of finding SHORTEST PATH, a shortest path between two
given vertices in an unweighted undirected graph G = (V,E). We define an instance for SHORTEST PATH
as a triple consisting of a graph and two veritices. And a solution is defined as a sequence of vertices in the
graph (with empty sequence denoting that no path exists) The problem SHORTEST PATH itself is the
relation that associated each instance of a graph of two vertices with a shortest path in the graph that joins
the two vertices. We can have more than one solution becauseshortest paths are not necessarily unique.
This formulation of an abstract problem is sufficient for our purposes. To make simple the theory of NP
completeness restricts attention to decision problems: those having yes/no solution. In this case we can
view an abstract decision problem as a function that maps the instance set I to the solution set {0,1}. We
can observe it by an example: a decisions problem path related to the shortest path problem is "Given a
graph G = (V,E), two vertices p, q ∈ V and a positive integer k does a path exits in G between p and q
whose length is at most k" if i = (G,p,q,k) is an instance of this shortest path problem, then PATH (i) = 1
(yes) if a shortest path from u to v has length at most k otherwise PATH (i) = 0 (no).
Certain other abstract problems are there called optimization problem in which some value must be
minimized or maximized and these are not decision problems. But if we want to apply the theory of NP-
completeness to optimization problems, we must reproduce them as decision problems. Typically, an
optimization problem can be recast by imposing a bound on the value to be optimized. For example in
100 ALGORITHMS AND ADVANCED DATA STRUCTURES

recasting the shortest path problem as the decision problem PATH we added a bound k to the problem
instance.
The requirement to recast optimization problem as decision problem does not diminish the impact of the
theory. Generally if we are able to solve an optimization problem quickly, we will be able to solve its
related decision problem in short time. We simply compare the value obtained from the solution of the
optimization problem with the bound provided as input to the decision problem if an optimization problem
is easy, therefore, its related decision problem is easy as well. Stated in a way that has more relevance to
NP-completeness if we can provide evidence that a decision is hard, we also provide evidence that its
related optimization problem is hard. Thus even though it restricts attention it decision problem the theory
of NP-completeness applies much more widely.

If we want to make a computer program that can solve an abstract problem, we have to represent instances
in a way that the program understands. An encoding of a set S of abstract objects is a mapping e from S to
the set of binary strings For example, the encoding of the natural numbers N = {0,1,2,3,4........} is as the
,

strings {0,10,11,100,......} Hence by this encoding, e(17) = 10001. Anyone who has looked at computer
representations of keyboard characters is familiar with either the ASCII or EBCDIC codes. In the ASCII
codes, e (A) = 1000001. Even a compound object can be encoded as a binary string by combining the
representations of its constituent parts. Polygons, graphs, functions ordered pairs programs all can be
encoded as binary strings.
Hence a computer algorithm to solves some substance decision problem will an encoding of problem
instants as input. A concrete problem isa problem whose instance set is the set of binary thing. An
algorithm solves a concrete problem in time O(T (n)) if it is provided a problem instance i of length n = [i],
the algorithm can produce the solution in at most O(T(n)) time. We can say that a concrete problem is
polynominal-time solvable. Therefore if there exist an algorithm to solve it in time O(nk) for some constant
k.
We will now define the complexity class P as the set of concrete decision problems that can be solved in
polynomial-time.
Encoding can be used to map abstract problem to concrete problems. Given an abstract decision problem Q
mapping an instance set I to {0,1} an encoding e : I —{0,1} can be use to induce a related concrete decision
problem which we denote by e(Q). If the solution to an abstract problem instance i ∈ I is Q(i) ∈ {0,1}, then
the solution to the concrete-problem e(i) ∈{0,1}* is also Q(i). There may be some binary strings that
represent no meaningful abstract-problem instance. For convenience, we shall assume that any such string
is mapped arbitrarily to 0. Thus the concrete problem produces the same solution as the abstract problem on
binary digit instances that represent the encodings of abstract-problem instances.
Now We generalize the definition of polynomial-time solvability from concrete problems to abstract
problems using encodings as the bridge, but we keep the definition independent of any particular encoding.
We want to say that the efficiency of solving a problem will not depend on how the problem is encoded.
Unfortunately, it depends quite heavily. For example suppose that an integer k is to be provided as the sole
input to an algorithm and suppose that the running time of the algorithm is O(k). If the integer K the
provided in unary—a string of k 1's then the running time of the algorithm is O(n) on length-n-inputs,
which is polynominal time. If we use the more natural binary representation of the integer k, but then the
input length is n = [1gk]. In this case, the running time of the algorithm is O(k) = θ(2n) which is exponential
in the size of the input. Thus, depending on the encoding, the algorithm run in the either polynomial of
superpolynomial-time.
To understand the polynomial – time it the encoding of an abstract program is important. We cannot really
talk about solving an abstract problem without first specifying an encoding. Practically, if we rule out
NP COMPLETE PROBLEM 101

expensive encoding such as an unary ones, the actual encoding of a problem makes little difference to
whether the problem can be solved in polynomial-time. For example representing integers in base 6 instead
of binary has no effect on whether a problem is solvable in polynomial-time, since an integer represented in
base 2 in polynomial-time.
A function f : {0,1}*—{0,1}* is Polynomial-time computable if we can find a polynomial-time algorithm A
that, for any input x∈{0,1}*, produces as output f (x). For time set I of problem instances, two encoding e1
and e2 are polynomial related if there exist two polynomial-time computable functions f12 and f21 such that
for any i ∈ I, we have f12 (e1(i)) e21 (i) and f21 (e2(i)) e1(i). That is, the encoding e2 (i) can be computed from
the encoding e1(i) by a polynomial-time algorithm and vice versa.

Let S be an abstract decision problem on an instance set I, and let e1 and e2 be polynomially related
encoding on I. Then e1 (S) will be in T if and only if e2(S) is in T.

We focus of Decision problems because they make it easy to use the machinery of formal-language theory.
Now we define some terminology. We call alphabet , a finite set of symbols. We define a language L over
as any set of string that can be made from . For example, if = {a,b}, the set L = {aa,bb,ab,,...} is a
language, and the empty language is denoted by ∅. The language of all strings over is denoted by ∗. For
example, if ∗ = {a,b}, then ∗ = {ε, α,b,ab,abb,baba,...} is the set of all strings of a’s and b’s. Hence it is
clear that every language L over is a subset of ∗.
We can perform a variety of operations on a language. For example, union and intersection operaitons,
follow directly from the set definitions. The complement of L is defined as L = Σ * − L . The
concatenation of two languages L1 and L2 is defined as .

L = {x y : x∈ L1 and y ∈ L2 }

The closure or Kleene star of a language L is


L* = {ε}∪L u L2 ∪ L3 ∪...,
The set of instances for any decision problem Q is simply the set ∗, where = {0,1}. Since Q completely
characterized by those problem instances that procedure a 1 (yes) answer, we can think of Q as a language
L over = {0,1}, where
L = {x ∈ {0,1}* : Q(x) = 1}
For example, the decision problem PATH has the corresponding language

{ }
PATH = G, u, v, k : G = (V , E) is an undirected graph , such that u, v, ∈ v, k ≥ 0 is an integer, and

there exists a path from u to v in G whose maximum length is k.


By using the formal-language framework we are able to express relation between decision problems and
algorithms that solve them concisely. An algorithm A is said to accepts a string x ∈ = { 0,1}* if, given
input x, the output of the algorithm is A (x) = 1. The language accepted by algorithm A is the set L
={x∈{0,1}∗ : Α(x) = 1}, that is, the set of strings that the algorithm accepts. An Algorithm A rejects a
string x if A(x) = 0.
102 ALGORITHMS AND ADVANCED DATA STRUCTURES

Even if language L is accepted by an algorithm A, the algorithm will not necessarily reject a string x ∉ L .
For example, if the algorithm loops forever!. If an algorithm A either accept or rejects a string from a
language L then the language L is decided by that algorithm A. A language L is accepted in polynomial-
time by an algorithm A if for any length-n string x∈ L the algorithm accepts x in time O(nk) for some
constant k. A language L is decided in polynomial-time by a an algorithm A if for any length-n string x ∈
{0,1}*, the algorithm decides x in time O(nk) for some constant k. Thus, to accept a language, an algorithm
need only worry about strings in L, but to decide a language, it must accept or reject every string in {0,1}*.
As example the language path can be accepted in polynomial-time. One such polynomial-time algorithm is
breadth first search that computes the shortest path from u to v in G, and then compares the distance
obtained with k. If the distance is at most k, the algorithm outputs 1 and halts. Otherwise, the algorithm runs
forever. This algorithm does not decide PATH, however, since it does not explicitly output 0 for instances
in i which the shortest path has length greater than k. A decision algorithm for PATH should explicitly
reject binary strings that do but belong to PATH. For a decision problem such an algorithm is not difficult
to design.
A complexity class is defined as a set language, membership in which is determined but a complexity
measure, such as running time, on an algorithm that determines whether a given string x belong to language
L.
With the help of this language theoretic framework, we can provide an alternative definition of the
complexity class P:
P = {L ⊆ {0,1}* : there exists an algorithm A that decides L in polynomial-time}.
P is also the class of languages that can be accepted in polynomial-time.

P = {L: L is accepted by a polynomial-time algorithm}.

We need only show that if L is accepted by a polynomial-time algorithm, it is decided by a polynomial-time


algorithm this is because the class of languages decided by polynomial-time algorithms is a subset of the
class of languages accepted by polynomial-time algorithms,. Assume L be the language accepted by some
polynomial-time algorithm B Because B accepts L in time O(nk) for some constant k their also exist a
constant c such that B accepts L in at most T = cnk steps. For any input string x the algorithm B’ simulates
the action of B for time T. At the end of time T, algorithm B’ inspects at the behavior of B. If B has
accepted x then B’ accepts x by outputting a 1. If B has not accepted x then B’ rejects x by outputting a O.
The overhead of B’ simulating B does not increase the running time by more than a polynomial-time
algorithm that decides L.
Student Activity 4.2
Before going to next section, answer the following questions:
1. What are abstract problems?
2. Describe a formal language.
If your answers are correct, then proceed to next section.
Top
NP COMPLETE PROBLEM 103

! " #
The reason that theoretical computer scientists believe that P ≠ NP is the existence of the class of "NP-
complete" problems. This class has an interesting property that if any one NP-complete problem can be
solved in polynomial-time, then every problem in NP has a polynomial-time solution, that is, P = NP. No
polynomial-time algorithm has ever been discovered for any NP-complete problem for the decades.
The language HAM-CYCLE is one NP-complete problem. If we could decide HAM-CYCLE in polynomial-
time then we solve could every problem in NP in polynomial-time. In fact, if NP - P should turn out to be
nonempty, we could say with certainty that HAM-CYCLE ∈ NP – P.

∈ !" # $ %# #
∈ # # % # $ %# # & ' ∈

The NP-complete languages are the "hardest" languages in NP. We shall show how to compare the relative
"hardness" of languages using "polynomial-time reducibility." First, we formally define the NP-complete
languages, and then we sketch a proof that one such language, called CIRCUIT-SAT, is NP-complete. We
use the notion of reducibility to show that many other problems are NP-complete.

#
A problem P can be reduced to another problem P’ if any instance of P can be "easily rephrased" as an
instance of P’ the solution to which provides a solution to the instance of P. For example, the problem of
solving linear equations in an indeterminate x reduces to the problem of solving quadratic equations. Given
an instance ax + b = 0, we can transform it to Ox2 + ax + b = 0. Its solution provides a solution to ax + b
= 0. Thus, if a problem P reduces to another problem P’, then we say that P is, "no harder to solve" than
P’.
Now we returns our formal-language framework for decision problems, a language L1 is said to polynomial-
time reducible to a language L2 written L1 ≤ pL2, if there exists a polynomial-time computable to function f :
{a,b}* → {a,b}* such that for all x ∈ {a,b}*,
x ∈ L1 if and only if f (x) ∈ L2.........................................(1)

The function f is called the reducing-function, and a polynomial-time algorithm F that computes f is known
as a reduction algorithm.
The idea of a polynomial-time reduction from a language L1to another language L2 is given in Figure 4.2.
Each language is a subset of {0,1}*. The reduction function f gives a polynomial-time mapping such that if
104 ALGORITHMS AND ADVANCED DATA STRUCTURES

x ∈ L1 then f (x) ∈ L2. also, if x ∈ L1, then f (x) ∉ L2 . Thus, the reduction function maps any instance x of
the decision problem represented by the language L2 to an instance f (x) of the problem represented the
language L1 to instance f(x) of the problem represented by L2. Providing an answer to whether f (x) ∈ L2
directly provides the answer to whether x ∈ L1.

∈ ? ∈ ?

(# (# # # # #
# #
) # # %# # ∈ *
& ' # %# # '∈

Polynomial-time reductions give us a powerful tool for proving that various languages belong to P.

$
If L1 , L2 ⊆ {0,1}* are languages such that L1≤ PL2, then L2 ∈ P implies L1 ∈ P.

Proof Let L2 is decided by a polynomial-time algorithm B2 and let F be a polynomial-time reduction


algorithm that computes the reduction function f. We shall construct a polynomial-time algorithm B1 that
will decides L1.

The construction of B1 is given in Figure 4.2. For a given input x ε {0,1}* the algorithm B1 uses F to
transform x into f(x), and then it uses B2 to test whether f (x) ε L2 . The output of B2 is the value provided as
the output from B1. The Algorithm runs in polynomial-time since both F and B2 run in polynomial-time.

Student Activity 4.3


Before going to next section, answer the following questions:
1. What do you mean by NP-completeness?
2. What is reducibility?
If your answers are correct, then proceed to next section.
Top

! "
Polynomial-time reductions gives a formal means for showing that one problem is at least as hard as
another, if we consider a polynomial-time factor. Hence if L1≤P L2, then L1 is not more than a polynomial-
time factor Harder than L2 because the "less than or equal to" notation for reduction is mnemonic. Now we
define the set of NP-complete languages, which are the hardest problems in NP.
A language L ⊆ {0,1}* is NP-complete if
1. L ∈ NP, and
2. L1≤P L for every L1 ∈ NP.
NP COMPLETE PROBLEM 105

+ , % # % # # - .- . -/ 0 #
- . -/ %# % # .- - ∩ . -/ 1 ∅

If a language L satisfies property 2, but not necessarily property 1, we say that L is NP = hard. We also
define NPC to be the class of NP-complete language.
As the following theorem shows, NP-completeness is at the crux of deciding whether P is in fact equal to
NP

If any NP-complete problem is polynomial-time solvable then P = NP. If any problem in NP is not
polynomial-time solvable, then all NP-complete problems are not polynomial-time solvable.
Proof Suppose that L belongs to class P & L ∈ NPC. For any L1 ∈ NP, we have L1≤ L by property 2 of
the definition of NP-completeness. Thus by Lemma, we also have that L1 ∈ P, hence the state of lemma is
proved.

Now we can prove the second statement, Letthere exists an L ∈ NP such that L ∉ P. Let L1 ε NPC be any
NP-complete language, and for the purpose of contradiction, assume that L ∉ P. But then by Lemma, we
have L ≤P L1, and thus L ∈ P.

This is because that research into the P ≠ NP question centers around the NP-complete problems. Most
computer scientists think that P ≠ NP, which leads to the relationship among P, NP, and NPC. But for all
we know someone may come up with a polynomial-time algorithm for an NP-problem, thus proving that P
= NP. Nevertheless since no polynomial-time algorithm for any NP-complete problem has yet been
discovered a proof that a problem is NP-complete provides excellent evidence for its intractability.

% &
Up to this point we have not actually proved that any Problems is NP-complete though we have defined
NP-complete problem. If we prove that at least one problem is NP-complete, polynomial-time reducibility
can be used as a tool to prove the NP-completeness of other problems. So we will focus on showing the
existence of an NP-complete problem: the circuit-satisfiability problem.
106 ALGORITHMS AND ADVANCED DATA STRUCTURES

(% # * * & ' (# 1 1 1
+
# # # # * (# # *
&*' . # # # #
* # # *

We shall informally describe a proof that relies of a basic understanding of boolean combinational circuits.
Two boolean combinational circuits are shown in Figure 4.4. Each circuit has three inputs and a one
output. A truth assignment means a set of boolean input values for that circuit. We say that a one output
Boolean combinational circuits is satisfiable if it has a satisfying assignment: a truth assignment that causes
the output of the circuit in Figure 4.4(a) has the satisfying assignment x1= 1, x2= 1, x3= 0 , and so it is
satisfiable. No assignment of values to x1,x2, and x3 causes the circuit in Figure 4.4(b) to produce a 1 output;
it always produces 0, and so it is unsatisfiable.
Now we state the circuit-satisfiability problem as, "Given a boolean combinational circuit composed of
AND, OR, AND NOT gates is it satisfiable?" In order to pose this question formally however we must
agree on a standard encoding. We can make a graph like encoding that maps any given circuit C into a
binary string C whose length is not much larger than the size of the circuit itself. As a formal language.
We can therefore define.
CIRCUIT-SET =

{ C : C is a satisfiable boolean combinational circuit}


There is a great importance of the circuit-satisfiability problem in the area of computer aided hardware
optimization. If a circuit always produces 0, it can be replaced by an easier circuit that omits all logic gates
and provides the constant 0 value as its output. If we can design a polynomial-time algorithm for the
problem then it would have considerable practical application.
Suppose we are given a circuit C, we might attempt to determine whether it is satisfiable by simply
checking all possible assignment to the input. But if there are k input there are 2k possible assignments.
When the size of C is polynomial in k, checking each one leads to a superpolynomial-time algorithm. In
fact as has been claimed there is strong evidence that no polynomial-time algorithm exists that solves the
NP COMPLETE PROBLEM 107

circuit-satisfiability problem because circuit satisfiability is NP-complete. We break the proof of this fact
into two parts based on the two parts of the definition of NP-completeness.

'
The circuit-satisfiability problem is in the class NP.
Proof: We can give a two-input, polynomial-time algorithm B that can verify CIRCUIT-SAT. One of the
input to B is a boolean combinational circuit C. Another input is a certificate corresponding to an
assignment of boolean values to the wires in C.
The algorithm B can be design as follows. For each logic gate in the circuit it checks that the value
provided by the certificate on the output wire is correctly computed as a function of the values on he input
wires. Now if the output of the entire circuit is 1 the algorithm outputs 1, since the values assigned to the
inputs of C provide a satisfying assignment. Otherwise, B outputs 0.
Every time a satisfiable circuit C is input to algorithm B, we have a certificate there whose length is
polynomial in the size of C and that causes B to output a1. Whenever an unsatisfiable circuit is input no
certificate can fool A into believing that A the circuit is satisfiable. Algorithm A runs in polynomial-time :
with a good implementation, liner time suffices. Thus CIRCUIT-SET can be verified in polynomial-time,
and CIRCUIT-SAT ∈ NP.
Now we shall show that the language is NP-hard to prove that CIRCUIT-SET is NP-complete. Hence we
have to show that every language in NP is polynomial-time reducible to CIRCUIT-SET. The actual proof of
this fact is full of technical intricacies, and so we shall settle for a sketch of the proof based on some
understanding of the working of computer hardware.

As we know that the Memory stores a computer program as a sequence of instructions. A typical instruction
encoded in memory, and an address where the result is to be stored. Program counter, deeps track of which
instruction is to be executed next. The program counter is automatically incremented whenever an
instruction is fetched, thereby causing the computer to execute instruction sequentially. The execution of an
instruction can cause a value to be written to the program counter, however, and then the normal sequential
execution can be altered, allowing the computer to loop and perform conditional branches.

At any time in the execution of a program the entire state of the computation is represented in the
computer's memory. A configuration is any particular state of computer memory. The execution means the
mapping one configuration to another. Importantly the computer hardware that accomplishes this mapping
can be implemented as a boolean combinational circuit, which we denote by M in the proof of the following
lemma.

The circuit-satisfiability problem belongs to NP-hard.


108 ALGORITHMS AND ADVANCED DATA STRUCTURES

Proof Suppose L be any language in NP. Now we give a polynomial-time algorithm F that can compute a
reduction function f that maps every binary string x to a circuit C = f (x) such that
x ∈ L if and only if C ∈ CIRCUIT-SAT.

Since L ∈ NP we must have an algorithm A that verifies L in polynomial-time. The algorithm F that we
shall construct will use the two input algorithm A to compute the reduction function f.

Let T(n) denote the worst-case running time of algorithm An on length-n input strings and let
k ≥ 1 be a constant such that T(n) = O(nk) and the length of the certificate is O(nk). (The running time of A is
actually a polynomial in the total input size, which includes both an input string and a certificate but since
the length of the certificate is polynomial in the length n of the input string the running time is polynomial
in n.)

This basic idea of the proof is to represent the computations of A as a sequence of configuration. As shown
in Figure 4.5 each configuration can be broken into parts consisting of the program for A, the program
counter and auxiliary machine state, the input x, the certificate y, and working storage. Starting with an
initial configuration ci is mapped to a subsequent configuration Ci + 1 by the combinational circuit M
Implementing the computer hardware.

C0

C1

C2

CT(n)
NP COMPLETE PROBLEM 109

2 (# $ * #
3 # # # # *
# &-/' # % 4
3 # # * *
* (# #
* # % 4

The output of the algorithm A—0 or 1—is written to some designated location in the working storage when
A finished executing, and if we assume that thereafter A halts, the value never changes. Thus if the
algorithm runs for at most T(n) steps, the output appears as one of the bit in ct(n).

The reduction algorithm F gives a single combinational circuit that computes all configuration given by a
given initial configuration. That is we can paste together T(n) copies of the circuit M. The output of the ith
circuit, which produced configuration ci is fed directly into the input of the
(i + 1)st circuit. Thus the configuration rather than ending up in a state register, simply reside as values on
the wires on the connecting copies of M.
Since we know that what a polynomial-time reduction algorithm F must do. Given an input x, it must
compute a circuit C = f (x) that is satisfiable if there exists a certificate y such that A(x,y) = 1. When F
obtains an input x, it first computes n = [x] and constructs a combinational circuit C'consisting os T (n)
copies of M. The input to C' is an initial configuration corresponding to a computation on A(x,y), and the
output is the configuration cT(n).

The circuit C = f (x) that F computes is obtained by making a few changes in C' . Initially the inputs to C'
corresponding to the program for A, the initial program counter, the input x, and the initial state of memory
are wired directly to these known values. Thus the only remaining inputs to the circuit correspond to the
certificate y. Now, all outputs to the circuit should be ignored, except the one bit of CT(n) corresponding to
the output of A, This circuit C, so constructed, computes C(y) = A (x,y) for any input y of length O(nk). The
reduction algorithm F when provided an input string x, computes such a circuit C and outputs it.
Now we have two properties to be proved. First, we must show that F correctly computes a reduction
function f. That is we have to show that C is satisfiable if and only if there exists a certificate y such that A
(x,y) = 1. Second we must show that F- runs in polynomial-time.
In order to prove that F correctly computes a reduction function, let us assume that there exists a certificate
y of length O(nk) such that A(x,y) = 1. Then, if we apply the bits of y to the inputs of C, the output of C is
C(y) = A(x,y) = 1. Thus if a certificate exists, then C is satisfiable. Hence there exists an input y to C such
that C(y) = 1, from which we conclude that A(x,y) = 1. Thus, F correctly computes a reduction function.
Now we are going to complete the proof, for this we need only show that F runs in time polynomial in n =
[x]. The first observation we make is that the number of bits needed to represent a configuration is
polynomial in n. The program for A itself has constant size, independent of the length of its input x. The
length of the input is x is n, and the length of the certificate y is O(nk). Since the algorithm runs for at most
O(nk) step, the amount of working storage required by A is polynomial in n as well. (We assume that this
memory is contiguous)
The combinational circuit M which can implement the computer hardware has size polynomial in the length
of a configuration which is polynomial in O(nk) and hence is polynomial in n. (Most of this circuitry
implement the logic of the memory system.) The circuit C consists of at most
t = O(nk) copies of M, and hence it has size polynomial in n. The construction of C from x can be
accomplished in polynomial-time by the reduction algorithm F, since each step of the construction takes
polynomial-time.
110 ALGORITHMS AND ADVANCED DATA STRUCTURES

The language circuit-set is therefore at least as hard as any language in NP, and since it belongs to NP, it is
NP-complete.

The circuit-satisfactory problem is NP-problem.


Proof Immediate from lemmas given before and the definition of NP-completeness.

Student Activity 4.4


Before going to next section, answer the following questions:
1. Prove that circuit satisfiability problem is NP- AND.
2. Prove that circuit satisfiability problem belongs to NP class.
If your answers are correct, then proceed to next section.
Top

! "
The NP-completeness of the circuit-satisfactory problem depends on a direct proof that L ≤P CIRCUIT-SAT
for every language L ∈ NP. Here, we shall show how to prove that language are NP-complete without
directly reducing every language in NP to the given language.
The following lemma provides a base for showing that a language is NP-language.

If L is language such that L'≤P L for some L'∈ΝΠΧ, Τηεν L is NP-hard. Also, if L ∈NP then L ∈NPC.

Proof: For all L" ∈NP we have L" ≤P L'this is because L'is NP- complete. By supposition. L'≤P L. and thus
by transitivity, we have L" ≤P L. which shows that L is NP-hard, If L ∈NP. We also have L ∈NPC.

We can say that, by reducing a known NP-complete language L'to L we implicitly reduce every language
in NP to L. Thus lemma gives us a method for proving that a language L is NP-complete:
1. Prove L ∈NP.
2. Select a known NP-complete language L.
3. Describe an algorithm that computes a function f mapping every instance of L'to an instance of L
4. Prove that the function f satisfies x ∈L'if and only if f(x) ∈L for all x ∈ {0,1}*
5. Prove that the algorithm computing f runs in polynomial-time.

) 5 (# # * # & 6& ∧6 ''


NP COMPLETE PROBLEM 111

resulting expression is
ø’ = y1 ∧ (y1 ↔ (y2 ∧¬ x2))
∧ (y2 ↔ (y3∨ y4))
∧ (y3 ↔ (x1→x2))
∧ (y4 ↔ ¬ y5)
∧ (y5 ↔ (y^∨ x4))
∧ (y6 ↔ (¬ x1 ↔ x3)).

It should be noted that the formula φ’ thus obtained is a conjunction of clauses φ’i each of which has at
most 3 literals the only additional requirement is that each clause be an OR of literals.

Now we convert each clause φ’i in to conjunctive normal form. We construct a truth table for φ’i by
evaluating all possible assignment to its variables. Each row of the truth table consists of a possible
assignment of the variable of the clause, together with the value of the clause under that assignment. Using
the truth-table entries that evaluate to 0, we build a formula in disjunctive normal form (or DNF) — an OR
of AND's that is equivalent to ¬ φ’i. We then convert this formula into a CNF formula φ’’i by using
DeMorgan's laws all literals and change OR's into AND's or AND's into OR's.

Here we convert the clause φ’i = [y1↔(y2∧¬x2)] into CNF as follow. The truth table for φ’i is given above.
The DNF formula equivalent to ¬φ’i is
(y1 ∧y2 ∧ x2) ∨ (y1 ∧¬ y2 ∧ x2) ∨ (y1 ∧¬ y2 ∧¬ x2) ∨ ( ¬ y1 ∧ y2 ∧¬ x2).

By using DeMorgan's law we get the CNF formula.


ø"1 = (¬y1 ∨ y2 ∨ ¬x2 ) ∧ (¬y1 ∨ y2 ∨ ¬x2 )
∧ (¬y1 ∨ y 2 ∨ ¬n2 ) ∧ (y1 ∨ y2 ∨ n2 )

Which is equivalent to the original clause ø'


1.

Each clause φ’i of the formula φ’i has now been converted into a CNF formula φ”i and thus φ’i is
equivalent to the CNF formula φ” consisting of the conjunction of the φ”i. Moreover each clause of φ”i has
at most 3 literals.
In the step of the reduction further transforms the formula so that each clause has exactly 3 distinct literals.
The final 3-CNF formula φ’’’ is constructed from the clauses of the CNF formula φ’’. It also uses two
auxiliary variables let p and q. For each clause Ci of φ’’ we include the following clauses in φ’’’:

• If Ci has 3 distinct literals then simply include Ci as a clause of φ’’’.

• If Ci has 2 distinct literals, that is if Ci = (l1 ∨ l2), where l1 and l2 are literals, then include (l1 ∨ l2 P) (l1
∨ l2 P) as clauses of f(φ). The literals P and ¬ p merely fulfill the syntactic requirement that there be
exactly 3 distance literals are per clause: (l1 ∨ l2 P) ∧ (l1 ∨ l2∨¬ P) is equivalent to (l1 ∨ l2 ) whether p
= 0 or p = 1.
112 ALGORITHMS AND ADVANCED DATA STRUCTURES

• If Ci has only 1 distinct literal l, then include (l ∨¬ P ∨ P) ∧ (l ∨ ¬ P ∨ ¬ q) as clauses of φ’’’. Note


that every setting of P and q causes the conjunction of these four clauses to evaluate
to l.
Hence the 3-CMP formula φ’’’ is satisfiable iff φ is satisfiable by inspecting each of the three steps. Like
the reduction form CIRCUIT-SET to SAT, the construction of φ’ from φ in the first step retains satisfiability.
The second step produces a CNF formula φ’’ which is equivalent to φ’. Third step produces a 3-CNF
formula φ’’ that is effectively equivalent to φ’’ because any assignment to the variables P and q produces a
formula that is algebraically equivalent to φ’’.
We have to show that the reduction can be computed in polynomial-time. In constructing φ’ from φ we
have to introduce at most 1 variable and 1 clause per connective φ’. Constructing φ’ from φ can introduce at
most 8 clause in φ’’ for each clause from φ’, since each clause of φ’ has at most 3 variable, and the truth
table has at most 23 = 8 rows. Similarly the construction of φ’ from φ introduces at most 4 clauses into φ’’’
for each clause of φ’’.Hence the size of the φ’’’ is polynomial in the length of the original formula and each
of the constructions can easily be accomplished in polynomial-time.
Top

! "
NP-complete problem can be in the domains: boolean logic, arithmetic, automata and language theory,
network design sets and partitions, storage and retrieval sequencing and scheduling, graphs, mathematical
programming, algebra and number theory, games and puzzles, program optimization etc. Here we use the
reduction methodology to provide NP-completeness proofs for the problems related to graph theory and set
partitioning.

"

7 (# 8 .-/ -

% (
A clique in an undirected graph (G = V,E ). It is a subset V’⊆V of vertices each pair of which is connected
by an edge in E. We can say that a clique is a complete sub-graph of G. The size of a clique is defined as
the number of vertices it contains. Hence a clique is:
CLIIQUE = { G,k : G is a graph with a clique of size k}.
NP COMPLETE PROBLEM 113

A native algorithm for determining whether a graph (G = V,E ) with [V] vertices has a clique of size k is to
list all k subsets of V, and check each one to see whether it forms a clique. The time complexity of this
algorithm is Ω(k2([vk])), which is polynomial if k is a constant. Generally k could be proportional to[V] in
which case the algorithm runs in super polynomial-time. We can say that an efficient algorithm for the
clique problem is unlikely to exists.

The clique problem is NP-complete.


Proof: We have to show that clique ∈ NP. For a given graph (G = V,E ) , we use the set V'⊆ V of vertices
in the clique as a certificate for G. To check whether V'is a clique can be accomplished in polynomial-time
can be done by checking whether the edge (u,v) belongs to E.
We next show that the clique problem is NP-hard by proving that 3-CNF-SAT ≤P CLIQUE. That we
should be able to prove this result is somewhat surprising since on the surface logical formulas seem to
have little to do with graphs.

# $% $%
%

# $% $% # $% $%

x3 x3

9 (# #: # + ;1 / ∧ / ∧ /+ %# / 1 & ∨¬ ∨¬ +
'
/+ 1 & ∨ ∨ +
' + /0 8 ( / )< = 3 #
& 1 1 +
1 ' (# / % #¬
/ /+ % # + # $ % # # #

The reduction algorithm begins with an instance of 3-CNF-SAT. Let φ = C1 ∧ C2 ∧....∧ Ck be a boolean
formula in 3-CNF-SAT k clauses. For r = 1,2 ....., k each clause Cr has exactly three distinct literals l1r, l2r
and l3r. Now we construct a graph G such that φ is satisfiable iff G has a clique of size k.
The graph G = (V,E ) is constructed as follows for each clause Cr = ( l 1r ∨ l 2r 3r )l in φ, we place a triple of
vertices vr1,vr2, and vr3 in V. We put an edge between two vertices vri and vsj if both of the following hold:

• vri and vsj are different triples, that is, r ≠ s and

• their corresponding literals are consistent, that is, vri is not the negation of vsj .
114 ALGORITHMS AND ADVANCED DATA STRUCTURES

The graph can easily be computed from φ in polynomial-time. As an example of this construction, if we
have

φ= (x1 ∨¬ x2 ∨¬ x3) ∧ (¬ x1∨ x2 ∨ x3) ∧ ( x1∨ x2 ∨ x3)


then G is the graph shown in Figure 4.9.
Now we show that this transformation of φ into G is a reduction. Assume that φ has a satisfying assignment.
So, each clause CR will have at least one literal l ir that is assigned 1, and each such literal corresponding to a
vertex v ir . taking one such "true" literal from each clause yields a set of V' vertices. We say that V'is a
clique. For any two vertices v ir , v rj ∈ v’, where r ≠ s both corresponding literals l ir and l sj are mapped to 1 by
the given satisfying assignment and thus the literals cannot be complements. Thus, by the construction of
G, the edge (v ir , v sj ) belongs to E.

& $ & $

' ( ' (

) )

> ? $ & ' # % # $ 1 !


&*' (# : # * # # # # 6 1 !

Conversely, assume that G has a clique V'of size k. No edges in G connect vertices in the same triple and
so V'contains exactly one vertex per triple. We can assign 1 to each literal l ir such that vri ∈ V'since G
contains no edges between inconsistent literals. Each clause is satisfied and so φ is satisfied (Any variables
that correspond to no vertex in the clique may be set arbitrarily)
In the example of Figure 4.9 a satisfying assignment of φ is x1 = 0, x2 = 0, x3 = 1 . A corresponding clique
of size k = 3 consists of the vertices corresponding to ¬x2 from the first clause x3 from the second clause,
and x3 from the third clause.

) *!
We define a vertex cover of an undirected graph G = (V,E) as a subset V' ⊆ V such that if (u,v) ∈ E, then u
∈ V, then u ∈ V'(or both). That is each vertex "covers" its incident edges, and a vertex cover for G is a set
of vertices that covers all the edges in E. The size of a vertex cover is the number of vertices in it. For
example the graph in figure 4.9 has a vertex cover {w, z} of size 2.
This problem is to find a vertex cover of minimum size in a given graph. Restating it as a decision problem
we wish to determine whether a graph has a vertex cover of a given size k. In language we define.
VERTEX-COVER = { G,K : graph G has vertex cover of size k}.

The vertex-cover problem is NP−complete


NP COMPLETE PROBLEM 115

Proof Initially we shall show that vertex-cover ∈NP. Assume there is a graph a G = (V,E) and an integer
k. The certificate we choose is the vertex cover V'⊆ V itself. The verification algorithm says that [V'
] = k,
and then it checks for each edge, whether u ∈ V, and v ∈ V, this verification can be done in polynomial-
time easily.
We prove that the vertex cover problem is NP-hard by showing that CLIQUE ≤PVERTEX-COVER This
reduction is based on the notion of the "complement" of a graph. Given an undirected graph G = (V,E), we
define the complement of G as G = (V,E) where E = {(u,v) : (u,v) ∉E}, In other words G is the graph
containing exactly those edges that are not in G. Figure .9 shows a graph and its complement and illustrates
the reduction from CLIQUE to VERTEX-COVER.
The reduction algorithm takes as input an instance (G,K) off the clique problem. It computes the
complement G which is easily doable in polynomial-time. The output of the reduction algorithm is the
instance (G [V] - k) of the vertex cover problem. To complete the proof, we show that this transformation is
indeed a reduction : the graph G has a clique of size k if and only if the graph G has a vertex cover of size
[V] — k.
Assume that G has a clique V'⊆ V with [V] = k . We claim that V — V'is a vertex cover in G. Let (u,v) be
any edge in E. Then (u,v) ∉E, which implies that at least one of u to v does not belong toV'is connected by
an edge of E. Equivalently atleast one of u to v is V - V'which means that edge (u,v) is covered by V - V'.
Since (u,v) was chosen arbitrarily from E, every edge of E is covered by a verted in V—V' . Hence the set V -
V', which has size [V] — k, forms vertex cover for G.
Conversely, suppose that G has a vertex cover V'⊆ V were [V' ] = [V] - k. Then , for all u, v, ∈ V, if (u, v,)
∈ E, or both. The contrapositive of this implication is that for all u, v, ∈ V, if u ∉V'and v ∉V' , then (u, v,)
∈ V', then (u,v) ∈ V. In other words, V — V'is a clique, and it has size [V] - [V' ] = k.

& !
In this, we are given a finite set S ⊂ N and a target t ∈ N. We ask whether there is a subset S' ⊆ S whose
elements sum to t. for example, if S = {1, 4, 16, 64, 256, 1040, 1041, 1093, 1285, 1344} and t = 3755, then
the subset S' = {1, 16, 64, 256, 1040, 1093,1285} is a solution.
In language we define:
Subset-sum =
{ S,t there exists a subset S' ⊆ S such that t = s}.
s ∈S'

As usual, it is important that our standard encoding assumes that the input integers are coded in binary.
Now, we can show that the subset-sum problem is unlikely to have a fast algorithm.

The subset-sum problem is NP-complete.


Proof To show that subset-sum belongs to class NP, for an instance (S,t) of the problem, we assume the
subset S' is the certificate. Checking whether t = s ∈S' s can be done by a verification algorithm in
polynomial-time.
We now show that VERTEX-COVER ≤P SUBSET-SUM. For an instance (G,k) of the subset-sum problem
the reduction algorithm constructs an instance (S,t) of the subset-sum problem so that G has a vertex cover
of size k if and only if there is a subset of S whose sum is exactly t.
At the heart of the reduction is an incidence-matrix representation of G. Let G = (V,E) be an undirected
graph and let V = {v0,v1,....v[V] - 1}and E = {e0,e1,....e[E] - 1}. The incidence matrix of G is a [V] * [E] matrix B
= (bij) such that
116 ALGORITHMS AND ADVANCED DATA STRUCTURES

bij = {1 if edge ej is incident on vertex vi, 0 otherwise}

The incidence matrix for the undirected graph of Figure 4.10 is shown in Figure 4.10. The incidence matrix
is shown with lower index edges on the right rather, than on the left as is conventional, in order to simplify
the formulas for the numbers in S.
Given a graph G and an integer k, the reduction algorithm computes a set s of numbers and an integer t. To
understand how the reduction algorithm works let us represent numbers in a "modified base-4" fashion. The
[E] low-order digits of a number will be in base-4 but the high-order digit can be as large as k. The set of
numbers is constructed in such a way that no carries can be propagated from lower digit to higher digits.

v0 x0 = 1 0 0 1 0 1 = 1041
v0 0 0 1 0 1

v2 1 1 0 0 0 x2 = 1 1 1 0 0 0 = 1041

v2

y1 = 0 0 0 0 1 0 = 4

(# # * # * * & ' #
+ ! @ + # # &*' (# 8# #
% # &' 3 # # # # % & '
(# * (# % # # * # , #
! @ # # # * 55 25 >+ 9 !
%# # +72

The set S consists of two types of numbers, corresponding to vertices and edges respectively. For each
vertex vi ∈ V, we create a positive integer xi whose modified base-4 representation consists of a leading 1
followed by [E] digits. The digits corresponds to vi’s rows of the incidence matrix B = (bij) for G, as
illustrated in Figure 4.10 (c) formally for i = 0,1,....,[V]-1,
E −1
xi = 4 E + bij 4 j
j =o

For each edge ej∈E we create a positive integer yj that is just a row of the “identity” incidence matrix. (The
identity incidence matrix is the [E] + [E] matrix with 1's only in the diagonal positions.) Formally, for J
=0,1,....,[E] - 1,
yj = 4j.

The first digit of the target sum t is k, and all [E] lower order digits are 2's. Formally,
E
t = k4 E + 2.4 j .
j =0

All of these numbers have polynomial size when we represent them in binary. The reduction can be
performed in polynomial-time by manipulating the bits of the incidence matrix.
NP COMPLETE PROBLEM 117

Now we have to show that graph G has a vertex cover of size k if and only if there is a subset S' ⊆ S whose
sum is t, First suppose that G has a vertex cover V' ⊆ V of size k. Let V' = {vi1, vi2,.....,vik}, and define S’
by
S’ = {xi1, xi2,.....,xik}∪

{yj : ej is incident on precisely one vertex in V’}

To see that s ∈S’’s = t, observe that summing the k leading 1’s of the xim∈S’ gives the leading digit k of
modified base 4 representation of t. To get the low-order digits of t, each of which is a 2 consider the digit
positions in turn, each of which corresponds to an edge ej. Because V’ is a vertex cover, ej incident on at
least one vertex in V’. Thus, for each edge ej there is at least one xim∈S’ with 1 in the jth position. If ej is
incident on two vertices in V’ then both contribute a 2 to the sum in the jth position. The jth digit of yj
contributes nothing. Since ej is incident on two vertices, which implies that yj ∉S’ . Thus in this case the
sum of S' produces a 2 in the jth position of t. for the other case —when ej is incident on exactly one vertex
in V’ — we have yj ∈ S’ and the incident vertex and yj each contribute 1 to the sum of the jth digit of t,
thereby also producing a 2. Thus , S’ is a solution to the subset sum instance S.
Now, suppose that there is a subset S’ ⊆ S’ that sums to t. Let S = {xi1, xi2,.....,xim}∪ {yj1, yj2,.....,yjp}. We
claim that m = k and that V’ = {vi1, vi2,.....,vim} is a vertex cover for G. To prove this claim we start by
observing that for each edge ej ∈E there are three 1's in set s in the ej position: one from each of the two
vertices incident on ej , and one from yj because we are working with a modified base 4 representation, there
are no carries from position ej to position ej +1. Thus, for each of the [E] low order position of] t, at least
one and at most two xi must contribute to the sum. Since at least one xi contributes to the sum for each edge
we see that V’ is a vertex cover. To see that m = k, and thus that V’ is a vertex cover of size k, observe that
the only way the leading k in target t can be achieved is by including exactly k of xi in the sum.

In Figure 4.10 the vertex cover V’ = {v1,v3,v4} corresponds to the subset S’ = {x1 ,x3 ,x4 ,y0 ,y2 ,y3 ,y4 ,}. All of
the yj are included in S’ with the exception of y1, which is incident on two vertices in V’.

+ !
* *+ * *+

' ' ' ',

- -+ - -+
& ' &*'

* *+ * *+

' ' ' ',

- -+ - -+
& ' & '
118 ALGORITHMS AND ADVANCED DATA STRUCTURES

& ' A # + /. , B /C/ 3 &*' & ' ) * #


# (# # # / #
# # # # # # # % *
% %# # # # # * #
& ' # %

The hamiltonian cycle problem is NP-complete.


Proof Initially we show that HAM-CYCLE belong to NP. Given a graph G = (V,E) our certificate is the
sequence of [V] vertices that make up the hamiltonain cycle. The verification algorithm checks that this
sequence contains each vertex in V exactly once and that with the first vertex repeated at the end it forms a
cycle in G. This verification can be performed in polynomial-time.
We now prove that HAM-CYCLE is NP-complete by showing that 3 CNF-SAT ≤P HAM-CYCLE. Given a
3-CNF boolean formula ø over variables x1, x2,....,xn with clauses c1,c2,....,ck , each containing
exactly 3 distinct literals we construct a graph G = (V,E) in polynomial-time such that G has a Hamiltonian
cycle if and only if φ is satisfiable. Our construction is based on widgets, which are pieces of graphs that
enforce certain properties.
Our first widget is the subgraph A shown in Figure 4.11. Suppose that A is a subgraph of some graph G and
that the only connections between A and the remainder G are through the vertices z1, z2, z3 and z4 in one of
the ways shown in figures 4.11 (b) and (c) we may treat subgraph A as if it were simply a pair of edges
(a,a') and (b,b') with the restriction that any hamiltonian cycle of G must include exactly one of these
edges. We shall represent widget A as shown in Figure 4.11.
The subgraph B in Figure 4.12 is our second widget. Suppose that B is a subgraph of some graph G and that
the only connections from B to the remainder of G are through vertices b1,b2.b3, and b4. A Hamiltonian cycle
of graph G cannot traverse all of the edges (b1,b2), (b2,b3), and (b3,b4), since then all vertices in the widget
other than b1,b2,b3 and b4 would be missed. A hamiltonian cycle of G may however traverse any proper
subset of these edges. Figure 4.12 (a)—(e) show five such subsets; the remaining two subsets can be
obtained by performing a top-to-bottom flip of part
(b) and (e). We represent this widget as in Figure 4.12 (f), the idea being that at least one of the paths
pointed by that arrows must be taken by a G hamiltonain cycle.
The graph G that we shall construct consists mostly of copies of these two widgets. The construction is
illustrated in Figure 4.13 of the k clauses φ, we include a copy of widget B, and we join these widgets
together in series as follows. Letting bij be the copy of vertex bj in the jth copy of widget B, we connect bi,4
to bi+1.1 for i = 1,2,...,k - 1.

Then, for each variable xm in φ we include two vertices x' m


and x"m . We connect these two vertices by means
of two copies of the edge (x' m
, x" m
), which we denote by e m
and em to distinguish them. The idea is that if the
hamiltonian cycle takes edge em , it corresponds to assigning variable xm the value 1. If the hamiltonian cycle
takes edge em, the variable is assigned the value 0. Each pair of these edges forms a two-edge loop; we
connect these small loops in series by adding edges (x' m
, x"m+1) for m = 1,2,....,n - 1. We connect the left
(clause) side of the graph to the right (variable) side by means of two edges (b1,1 , x' 1
) and (bk,4 , x"n ), which
are the topmost and bottom most edges in Figure 4.13.
We are not yet finished with the construction of graph G, since we have yet to relate the variables to the
clauses. If the jth literal of clause Ci is xm, then we use an A widget to connect edge (bij,bi,j+1) with edge em.
NP COMPLETE PROBLEM 119

If the jth literal of clause ci is ¬ xm, then we instead put an A widget between edge (bij,bi,j+1) and em In
Figure 4.13 for example, because clause c2 is (xi ∨¬ x2,∨x3), we place three A widgets as follows:

between (b2,1;b2,2) and e1,

between (b2,2;b2,3) and e2 , and

between (b2,3;b2,4) and e3,

Note that connecting two edges by means of A widgets actually entails replacing each edge by the five
edges in the top to bottom of Figure 4.13 (a) and, of course, adding connections that pass through the Z
vertices as well. A given literal lm may appear in several clauses (¬ x3in figure 4.13 for example), and thus
an edge em or em may be influenced by several A widgets (edge e3 ,for example). In this case, we connect
the A widgets in series, as shown in Figure 4.14 effectively replacing edge cm or em by a series of edges.
120 ALGORITHMS AND ADVANCED DATA STRUCTURES

A # + /, 8 ( , B /C/ 3 . #
# # % # & ' & ' & '
* # * # % & '& ' # *
&' # % %# # # #
* # % * 4 #

+ (# : #/ # φ 1 &¬ ∨ ¬∨ '∧& ∨¬ ∨ '∧& ∨ ∨¬ '


! # * φ !& '1 !& '1 !& '1 %# #
# # # % . # !& "' 1 # em
# # !&#"' 1 # em # #

- +

! ! !
! ! !
! ! !
! ! !

- ,
-

- , + -

! ! !
. ! ! !
NP COMPLETE PROBLEM 121

! ! !
- ++ ! ! !

- , - , ++

(# %# em * % & '
&*' (# * #
We claim that formula φ is satisfiable if and only if graph G contains a hamiltonian cycle. We first
suppose that G has a hamiltonian cycle h and show that φ is satisfiable. Cycle h must take a particular form:

First, it traverses edge. (b1.1 , x '1 ) to go from the top left of the top right.

It then follows all of the x'


m
and x"m vertices from top to bottom, choosing either edge em or edge em ,
but not both.

It next traverses edge (bk 4, x "n ) to get back to the left side.

Finally, it traverses the B widgets from bottom to top on the left.


(It actually traverses edges within the A widgets as well, but we use these subgraphs to enforce the either /or
nature of the edges it connects.)
Given the hamiltonian cycle h, we define a truth assignment for ø as follows. If edge em belong to h then we
set xm = 1. Otherwise, edge em belong to h, and we set xm = 0.

We claim that this assignment satisfies φ. Consider a clause Ci and the corresponding B widget in G. Each
( )
edge bi , j bi , j + 1 is connected by an A widget to either edge em of edge em , depending on whether xm or ¬xm
( )
is the jth literal in the clause. The edge bi , j bi , j + 1 is traversed by h if and only if the corresponding literal is
0. Since each of the three edges (bi,j1,bi,2),(bi,j2,bi,3),(bi,j3,bi,4) in clause ci is also in a B widget, all three
cannot be traversed by the hamiltonian cycle h. One of the three edges, therefore, must have a
corresponding literal whose assigned value is 1, and Clause Ci is satisfied. This property holds for each
clause Ci , i = 1, 2..... k, and thus formula φ is satisfied.

,
& $

(
/
2 # * 8#
% # 7

Conversely, let us suppose that formula φ is satisfied by some truth assignment. By following the rules from
above, we can construct a hamiltonian cycle. For graph G: traverse edge em edge en if xm = 0, and traverse
122 ALGORITHMS AND ADVANCED DATA STRUCTURES

( )
edge bi , j , bi , j + 1 if and only if the jth literal of clause Ci is 0 under the assignment. These rules can indeed
be followed, since we assure that s is a satisfying assignment for formula φ.
Finally, we note that graph G can be constructed on polynomial-time. It contains one B widget for each of
the k clauses in φ, and so there are 3k A widgets. Since the A and B widgets are of fixed size, the graph G
has O(k) vertices and is easily constructed in polynomial-time. Thus we have provided a polynomial-time
reduction from 3-CHF-SET to HAM-CYCLE.

!
In the travelling-salesman problem, which is closely related to the hamiltonian-cycle problem, a salesman
must visit n cities. Modeling the problem as a complete graph with n vertices, we can say that the salesman
wishes to make a tour, or hamiltonian cycle, visiting each city exactly once and to finishing at the city he
starts from. There is an integer cost c(i, j) to travel from city i to city j, and the salesman wishes to make
the tour whose total cost is minimum, where the total cost is the sum of the individual costs along the edges
of the tour. For example, in Figure 4.15 a minimum -cost tour is u, w, v, x, u , with cost 7. The formal
language for the traveling salesman problem is :
TPS = {G, c, k} : G = (V, E) is a complete graph,
c is a function from V × V → Z,
K ∈Z, and
G has a travelling -salesman tour with cost at most k}.
The following theorem shows that a fast algorithm for the travelling-salesman problem is unlikely to exist.

The travelling-salesman problem is NP-complete.


Proof: We first show that TPS belongs to NP. Given an instance of the problem, we use as a certificate the
sequence of n vertices in the tour. The verification algorithm checks that this sequence contains each vertex
exactly once, sums up the edge costs and checks whether the sum is at most k. This process can certainly be
done in polynomial-time.

To prove that TSP is NP-hard, show that HAM-CYCLE ≤P TSP. Let G = (V,E) be an instance of HAM-
) where E'= {(i,j) : i,j,∈V}, and we define the cost
CYCLE. We form the complete graph G' = (V,E'
function c by

c(i,j) = {0 if (i,j) ∈E,

1if (i,j) ∈E.


The instance of TSP then (G'
,c,o), which is easily formed in polynomial-time.
We now show that graph G has a hamiltonian cycle if and only if graph G' has a tour of cost at most 0.
Suppose that graph G has a hamiltonian cycle h. Each edge in h belong to E and thus has cost 0 in G'has.
Thus, h'is a tour in G'with cost 0. Conversely, suppose that graph G'has a tour h'of cost at most 0. Since
the cost s of the edges in E'are 0 and 1, the cost of tour h'is exactly 0. Therefore, h'contains only edge in
E. We conclude that h is a hamiltonian cycle in graph G.

Student Activity 4.5


Answer the following questions:
NP COMPLETE PROBLEM 123

1. What is a clique problem? Show that the clique problem is NP complete?


2. What is vertex cover problem? Show that this problem is NP complete.
3. What is traveling-salesperson problem?

&
If any single NP-Complete problem can be solved in polynomial time, then every NP-complete
problem has a polynomial time algorithm.
The formal-language frame-work allows us to express relation between decision problems and
algorithms that solve them concisely.
The circuit-satisfiability problem is NP-hand.
The vertex cover problem is NP-complete.

I. True and False


1. The class of polynomial-time solvable problems has closure properties.
2. The class of language decided by polynomial-time algorithms is not a subset of the class of
languages accepted by polynomial-time algorithms.
II. Fill in the blanks
1. The Hamiltonian cycle problem is ______________.
2. The traveling salesman problem is closely related to the _____________________ problem.
3. The NP___________ languages are in a sense, the “hardest” language in NP.
4. If any NP-complete problem is polynomial-time solvable then________________.

I. True and False


1. True
2. False
II. Fill in the blanks
1. NP-complete
2. Hamiltonian-cycle
3. complete
4. P=NP
124 ALGORITHMS AND ADVANCED DATA STRUCTURES

I. True and False


1. The circuit – satisfiability problem is NP – complete.
2. The subset-sum problem is NP complete.
II. Fill in the blanks:
1. One of the convenient aspects of focusing on decision problem is that they make it easy to use the
machinery of _________ theory.
2. ___________ reductions provide a formal means for showing that one problem is atleast as hard
as another, to within a polynomial-time factor.
3. If any NP-complete problem is polynomial time solvable then ________.
4. The circuit-satisfiability problem belongs to the class N ________.
5. The clique problem is __________.

, -
1. Show that the hamiltonain-path problem is NP-complete?
2. The longest-simple cycle problem is the problem of determining a simple cycle of maximum length in
a graph (no repeated vertex). Show that this problem is NP-complete.
3. A Hamiltonain-path in a graph is a simple path that visits every vertex exactly once: show that the
language HAM-PATH = {(G,u,v) : there is a hamiltonian path from u to v in graph G} belongs to NP.
4. Show that L is complete for NP if and only if is complete for Co-NP.
5. Show that the subset-sum problem is solvable in polynomial-time if the target value t is expressed in
unary.
Overview
Parallelism
Computational Model: PRAM and other Models
Finding Maximum Element
Merging
Sorting

Parallel Algorithms

Learning Objectives
• Overview
• Parallelism
• Computational Model: PRAM and Other Models
• Finding Maximum Element
• Merging
• Sorting
Top

So far our discussion of algorithm has been confined to single processor computers. In this block we study
algorithms for parallel machines (i.e. computers with more than one processor). There are many
applications in day-to-day life that demand real time solutions to problems. For example, whether
forecasting has to be done in a timely fashion. In case of severe hurricanes or snowstorms, evacuation has
to be done in short period of time. If an expert system is used to aid a physician in surgical procedures,
decisions have to be made within seconds. And so on. Programs written for such applications have to
perform enormous amount of computation. In the forecasting example, large sized matrices have to be
operated on. In the medical example, thousand of rules have to be tried. Even the fastest single-processor
machines may not be able to come up with solutions within tolerable time limits. Parallel machines offer
the potential of decreasing the solution time enormously.

Assume that you have 5 loads of clothes to wash. Also assume that it takes 25 minutes to wash one load in
a washing machine. Then it will take 125 minutes to wash all the clothes using a single machine. On the
other hand, if you had 5 machines, washing could be computed in just 25 minutes; in this example, if there
are p washing machines and p loads of clothes, then the washing time can be cut down by a factor of p
128 ALGORITHMS AND ADVANCED DATA STRUCTURES

compared to having a single machine : here we have assumed that every machine takes exactly the same
time to wash. If this assumption is invalid then the washing time will be dictated by the slowest machines.

As another example say there are 100 numbers to be added and there are two persons A and B. Person A
can add the first 50 numbers. At the same time B can add next 50 numbers. When they are done one of
them can add the two individual sums to get the final answer. So two people can add the 100 numbers in
almost half the time required by one.
Top

The idea of parallel competing is very similar, given a problem to solve we partition the problem into many
sub problem; and when all the processor are done, the partial solutions are combined to arrive at the final
answer. If there are p processor then potentially we can cut down the solution to by a factor of p. We refer
to any algorithm designed for a single processor machine as a sequential algorithm and any designed for a
multi processor machines a parallel algorithm.

Let π be a given problem for which the best known sequential algorithm has a run time of s' (n) where n is
the problem size. If a parallel algorithm on a p-processor machine runs in time T'(n,p) then the speedup of
the parallel algorithm is defined to be S'(n) / T' (n,p). If the best known sequential algorithm for π has an
asymptotic run time of s(n) and if T(n,p) is the asymptotic run time of a parallel algorithm., then the
asymptotic speedup of the parallel algorithm is defined to be S(n)/T(n,p). If S(2)/T(n,p)=θ(p), then the
algorithm is said to have linear speedup.
Note: In this block we use the terms speedup and asymptotic speedup interchangeably which one is meant
is clear from the context.

For the problem of example 2, the 100 numbers can be added sequentially in 99 units of time. Person A
can add 50 numbers in 49 units of time. At the same time B, can add other 50 numbers. In another unit of
time, the two partial sums can be added; his means that the parallel run time is 50. So the speed up of this
parallel algorithm is 99/50 = 1.98, which is very nearly equal to 2!

There are many sequential algorithms for sorting such as heap sort that are optimal and run in time
θ(nlogn), n being the number of keys to be sorted. Let A be an n-processor parallel algorithm that sorts n
keys in θ(log n) time and let B be an n2 processor algorithm that also sort n keys θ (logn) time.
θ(n log n ) θ(n log n )
Then the speedup of A is = θ(n ) . On the other hand, the speedup of B is also = θ(n ) .
θ(log n ) θ(log n )
Algorithm A has linear speedup whereas B does not have a linear speedup.
PARALLEL ALGORITHMS 129

If a p-processor parallel algorithm for a given problem runs in time T (n,p) the total work done by this
algorithm is defined to be p T (n,p) the efficiency of the algorithm is defined to be S(n) / pT(n,p), where
S(n) is the asymptotie run time of the best known sequential algorithm for solving the same problem. Also
the parallel algorithm is said to be work optimal if pT(n,p) =
0 (S(n)).
Note: A parallel algorithm is work optimal if and only if it has linear speedup. Also the efficiency of a work
optional parallel algorithm is Q(1)'

Let w be the time to wash one load of cloths on a single machine in example (1) also let n be the total
number of loads to wash. A single machine S will take time nw. If there are p machines the washing time is
n n p
w. Thus the speedup is n / . This speedup is > if n ≥ p .So the asymptotic speed up is Ω (p) and
p p 2
nw
hence the parallel algorithm has linear speedup and is work optimal also the efficiency is . This is
n
p w
p
θ(1) if ≥ p. n

For the algorithm A of example 4, the total work done is nθ(logn) = θ (n log n). Its efficiency is θ (nlogn) /
θ (nlogn) = θ (1) Thus A is work optimal and has a linear speedup. The total work done by the algorithm B
is n2 θ (logn) = θ (n2logn) and its efficiency is θ(nlogn)/θ(n2 logn)= θ (1/n) as a result B is not work optimal
.
Is it possible to get a speed up of more than p for any problem on a p-processor machine? Assume that it is
possible (such a speed up is called super linear speedup). In particular let π be the problem under
consideration and s be the best known sequence run time. If there is a parallel algorithm can a p-processor
machine whose speedup is better than p, it means that the parallel run time T < (s / p) that is < PT < s. This
is a contradiction since by assumption s is the run time of the best known sequential algorithm for solving
π!
The preceding discussion is valid only when we consider asymptotic speedups. When the speedup is
defined with respect to the actual run times on the sequential and parallel machines, it is possible to obtain
super linear speedup. Two of the possible reasons for such an anomaly are (1) p-processor have more
aggregate memory than one and (2) The cache-hit frequency may be better for the parallel machines as the
p-processor may have more aggregate cache them does one processor.
One way of solving a given problem in a to explore many techniques (i.e. algorithm) and identify the one
that is the most paralletizable to achieve a good speedup, it is necessary to parallelize every component of
the under lying technique. If a fraction of the technique cannot be parallelized (i.e. has to be run
sequentially), then the maximum speed up that can be obtained a limited by f. Amdahl’s law relates the
maximum speed up achievable with f and p as follows.

Maximum speed up = 1 / ( f + (1 − f ) / p)
130 ALGORITHMS AND ADVANCED DATA STRUCTURES

Consider the some technique for solving a problem π. Assume that p=10. If f=0.5 for this technique, then
the maximum speedup that can be obtained is 1/0.5+1-0.5/10=20/11, which is less than 2! If f=0.1, then the
maximum speedup is 10/1.0, which is slightly more than 5! Finally, if f=0.01, then the maximum speedup is
10/1.09, which is slightly more than 9!

Student Activity 5.1


Before going to next section, answer the following questions:
1. Explain the importance of parallel processing with an example.
2. What do you mean by a work-optimal parallel algorithm?
If your answers are correct, then proceed to next section.
Top

The sequential computational model we have employed so far is the RAM (random access machine). In the
RAM model we assume that any of the following operations can be performed in one unit of time :
addition, subtraction, multiplication, division, comparison, memory access, assignment and so on. This
model has been widely accepted as a valid sequential model. On the other hand when it comes to parallel
computing, numerous models have been proposed and algorithm have been designed for each such model.
An important feature of parallel computing that is absent in sequential computing is the need for inter
processor communication. For example, given any problem, the processors have to communicate among
themselves and agree on the subproblems each will work on. Also they need to communicate to see whether
every one has finished its task, and so on. Each machine or processor in a parallel computer can be assumed
to be a RAM. Various parallel models differ in the way they support interprocessor communication.
Parallel models can be broadly categorized into two; fixed connection machines and shared memory
machines.
PARALLEL ALGORITHMS 131

A fixed connection network is a graph G(V,E) whose nodes represent processors and whose edges represent
communication links between processor. Usually we assume that the degree of each node is either a
constant or a slowly increasing function of the number of nodes in the graph. Examples include the mesh,
hypercube and butterfly, and so on (See figure 5.1).
Inter processor communication is done through the communication links. Any two processors connected by
an edge in G can communicate in one step. In general two processors can communicate through any of the
paths connecting them. The communication time depends on the lengths of these paths (at least for small
packets).

In shared memory models [also called PRAMs (Parallel Random Access Machines)], a number (say p) of
processors work synchronously. They communicate with each other using a common block of global
memory that is accessible by all this global memory is also called common or shared memory (See figure
5.2). Communication is performed by writing to and/or reading from the common memory. Any two
processors i and j can communicate in two steps. In the first step, processor i writes its message into
memory cell j, and in the second step processor j reads from this cell. In contrast, in a fixed connection
machine, the communication time depends on the lengths of the paths connecting the communicating
processors.
Each processor in a PRAM is a RAM with some local memory. A single step of a PRAM algorithm can be
one of the following; arithmetic operation (such as addition, division and so on), comparison, memory
access (local or global), assignment etc. The number (m) of cells in a global memory is typically assumed to
be the same as p. But this need not always to the case.

!"

In fact we present algorithms for which m is much larger or smaller than p. We also assume that the input is
given in the global memory and there is space for the output and for storing intermediate results. Since the
global memory is accessible by all processors, access conflict may arise. What happens if more than one
132 ALGORITHMS AND ADVANCED DATA STRUCTURES

processor tries to access the same global memory cell (for purpose of reading from or writing into)? There
are several ways of resolving read and write conflicts. Accordingly, several variants of PRAM arise.
EREW (Exclusive Read and Exclusive Write) the PRAM is the shared memory model in which no
concurrent read or write is allowed on any cell by the global memory. Note that ER or EW does not
preclude different processors simultaneously accessing different memory cells. For example, at a given
time step, processor one might access cell five and at the same time processor two might access cell 12 and
so on. But processors one and two cannot access memory cell ten, for example, at the same time. CREW
(Concurrent Read Exclusively Write) PRAM is a variation that permits concurrent read but not concurrent
writes. Similarly one could also define the ERCW model. Finally, the CRCW, PRAM model allows both
concurrent reads and concurrent writes.
In a CRCW or CRCW PRAM, if more than one processor tries to read from the same cell, clearly, they will
read the same information. But in a CRCW PRAM, if more than one processor tries to write in the same
cell, then possibly they may have different messages to write. Thus there has to be an additional mechanism
to determine which, message gets to be written. Accordingly several variations of the CRCW PRAM can be
derived. In a common CRCW PRAM, concurrent writes are permitted in any cell only if all the processor
conflicting for this cell have the same message to write. In an arbitrary CRCW PRAM, if there is a
conflict for writing, one of the processors will succeed in writing and we don’t know which one. Any
algorithm designed for this model should work no matter which processor succeeds in the event of
conflicts. The priority CRCW lets the processor with the highest priority succeed in the case of conflicts.
Typically each processor is assigned a (static) priority to begin with.

Consider a 4-processor machine and also consider an operation in which each processor has to read from
the global cell M[1]. This operation can be denoted as
Processor i (in parallel for 1 ≤ i ≤ 4) does:
Read M[1]
This concurrent read operation can be performed in one unit of time on the CRCW as well as on the CREW
PRAMs. But on the EREW PRAM, concurrent reads are prohibited. Still we can perform this operation on
the EREW PRAM making sure that at any given time no two processors attempt to read from the same
memory cell. One way of performing this is as follows: processor 2 reads M[1] at the first time unit;
processor 2 reads M[1] at the second time unit; and processor 3 and 4 read M[1] at third and fourth time
units, respectively the total runtime is four.
Now consider the operation in which each processor has to access M[1] for writing at the same time-since
only one message can be written to M[1], one has to assume some scheme for resolving contentions, this
operation can be denoted as
Processor i(in parallel for 1 ≤ i ≤ 4) does :
Write M[1];
Again in the CRCW PRAM, this operation can be completed in one unit of time. In the CRCW and
(EREW) PRAMs, concurrent writes are prohibited. However, these models can simulate the effect of a
concurrent write. Consider our simple example of four processor trying to write in M[1]-simulating a
common CRCW PRAM requires the four processors to verify that all wish to write the same value.
Following this processor 1 can do the writing. Simulating a priority CRCW PRAM requires the four
processors to first determine which has the highest priority, and then the one with this priority does the
write. Other models may be similarly simulated.
PARALLEL ALGORITHMS 133

Note that any algorithm that runs on a p-processor EREW PRAM in time T(n, p), where n is the problem
size, can also run on a p-processor CREW PRAM or a CRCW PRAM within the same time. But a CRCW
PRAM algorithm or a CREW PRAM algorithm may not be implementable on an EREW PRAM preserving
the asymptotic run time. In example 8, we saw that the implementation of a single concurrent write or
concurrent read step takes much more time on the CREW PRAM. Likewise, a p-processor CRCW PRAM
algorithm may not be implementable on a
p-processor CREW PRAM preserving the symptotic run time. It turn out that there is a strict hierarchy
among the variants of the PRAM in terms of their computational power, for example, a CREW PRAM is
strictly more powerful than an EREW PRAM. This means that there is at least one problem that can be
solved in asymptotically less time on a CREW PRAM than on an EREW PRAM, given the same number of
processors. Also any version of the CRCW PRAM is more powerful than a CREW PRAM as is
demonstrated by example 9.

A[0]=A[1]||A[2]||A……||A[n] is the Boolean (or logical) OR of the n bits A[1 : n]. A[0] is easily computed in
O(n) time on a RAM. Following algorithm shows how A[0] can be computed in θ(1) time using an n-
processor CRCW PRAM.
Assume that A[0] is zero to begin with. In the first time step, processor i, for 1 ≤ i ≤ n, reads memory
location A[i] and proceeds to write a1 in memory location A[0] if A[i] is a1. Since several of the A[i], may
be 1, several processors may write to A[0] concurrently. Hence the algorithm can not be run (as such) on a
EREW or CREW PRAM. In fact, for these two models, it is known that the parallel complexity of the
Boolean OR problem is O(log n), no matter how many processors are used. Note that this algorithm works
on all this two varieties of the CRCW PRAM.
Processor i (in parallel for 1 ≤ i ≤ n) does:
if (A[i]==1) A[0]=A[i];
Theorem
The Boolean OR of n bits can be computed in O(1) time on an n-processor common CRCW PRAM.
There exists a hierarchy among the different versions of the CRCW PRAM also. Common arbitrary, and
priority from an increasing hierarchy of computing power. Let EREW (p, T(n, p)) denote the set of all
problems that can be solved using a p-processor EREW PRAM in time T(n, p) (n being the problem size).
Similarly define CREW (p, T(n, p)) and CRCW (p, T(n, p)), then
EREW (p, T (n,p)) ⊂ CREW (p, T(n,p)) ⊂ Common CRCW (p, T (n,p))
⊂ Arbitrary CRCW (p, T(n, p)) ⊂ Priority CRCW (p, T(n, p))
In practice a problem of n is solved on a computer with a constant number p of processors. All the
algorithm designed under some assumptions about the relationships between n and p can also be used when
fewer processors are available as there is a general slow-down lemma for the PRAM model.
Let A be a parallel algorithm for solving problem π that runs in time T using p processors. The slow-down
lemma concerns the simulation of the same algorithm on a p’-processors machine (for p’<p).
Each step of algorithm A can be simulated on the p’-processor machine (call it M) in time
≤ p/p’ . Since a processor of M can be in charge of simulating p/p’ processors of the original machine.
Thus, the simulation time on M is ≤ T p/p’ . Therefore the total work done on M is ≤ p’ T p/p’
≤ pT+p’T = O(pT). This results is the following lemma.
134 ALGORITHMS AND ADVANCED DATA STRUCTURES

[Slow-down lemma] Any parallel algorithm that runs on a p-processor machine in time T can be run on a
p’-processor machine in time O(pT/p’), for any p’<p.

Algorithm of example (9) runs in θ(1) time using n processors. Using the slow-down lemma, the same
algorithm also runs in θ(log n) time using
n
log n
( )
processors; it also run in θ n time using n processors;

and so on. When p=1, the algorithm runs in time θ(n), which is the same as the run times of the best
sequential algorithm!

Student Activity 5.2


Before going to next section, answer the following questions:
1. Define EREW PRAM model.
2. Differentiate between EREW PRAM and CRCW PRAM.
If your answers are correct, then proceed to next section.
Top

!
Algorithms to find the maximum element in a list using more than one processor are given below.

" # #
Finding the maximum of n given numbers can be done in O(1) time using an n2–processor CRCW PRAM.
Let k1, k2,…….kn be the input. The idea is to perform all pairs of comparisons in one step using n2
processors. If we name the processors pij (for 1 ≤ i ≤ n, 1 ≤ j ≤ n), processor pij computes xij=(ki<kj).
Without loss of generality assume that all the keys are distinct. Even if they are not, they can be made
distinct by replacing key ki with the tuple (ki, i) (for (1 ≤ i ≤ n); this amounts to appending each key with
only a (log n)–bit number of all the input keys. There is only one key k which when compared with every
other key would have yielded the same bit zero. This key can be identified using the Boolean OR algorithm
and is the maximum of all. The resultant algorithm appears as follows:
Step 0. If n=1, output the key.
Step 1. Processor Pij (for each 1 ≤ i, j ≤ n in parallel) compute xij=(ki<kj).
Step 2. The n2 processors are grouped into n groups G1, G2,…..Gn where Gi (1 ≤ i ≤ n) consists
of the processors pi1, pi2,…..pin. Each group Gi computes the Boolean OR of xi1,
xi2….xin.
Step 3. If Gi computes a zero in step 2, then processor pi1 outputs ki as the answer.
Step 1 and 3 of this algorithm take unit time each. Step 2 takes O(1) time. Thus the whole algorithm runs in
O(1) time; this implies the following theorem:
Theorem
PARALLEL ALGORITHMS 135

The maximum of n keys can be computed in O(1) time using n2 common CRCW PRAM processors.
Note that the speedup of previous algorithm is θ(n)/1=θ(n). Total work done by this algorithm is θ(n2).
Hence its efficiency is θ(n)/ θ(n2) = θ(1/n). Clearly this algorithm is not work-optimal.

#
Now we show that maximal selection can be done in O(log log n) time using n common CRCW PRAM
processors. The technique to be employed is divide and conquer. To simplify the discussion, we assume n
is a perfect square (when n is not a perfect square, replace √x by [√n] in the following discussion).
Let the input sequence by k1, k2…..kn. We are interested in developing an algorithm that can find the
maximum of n keys using n processors. Let T(n) be the run time of this algorithm. We partition the input
into √n parts so that the maximum of each part can be computed in parallel. Since the recursive maximal
selection of each part involves √n keys and an equal number of processors, this can be done in T(√n) time.
Let M1, M2, …. M√n be the group maxima. The answer we are supposed to output is the maximum of these
maxima. Since now we only have √n keys, we can find the maximum of these employing all the n
processors (see the following algorithm).
Step 0. If n=1 return k
Step 1. Partition the input keys into n part k1, k2….k√n where ki consists of k(i–1) √n+1, k(i–
1)n+2,….kin similarly partition the processors so that Pi(1 ≤ i ≤ n) consists of the
processors P(i–1) √n+1, P(i–1)n+2…,P√n let Pi find the maximum of Ki recursively (for 1
≤ i ≤ n).
Step 2. If M1, M2,……M√n are the group maxima, find and output the maximum of these
maxima employing theorem of previous section.
Step 1 of this algorithm takes T(√n) time and step 2 takes O(1) time. This T(n) satisfies the recurrence
T(n)=T(√n)+O(1)
Which solves to T(n) = O(log log n). Therefore, the following theorem arises.
Theorem
The maximum of n keys can be found in O(log log n) time using n common CRCW PRAM processors.
Total work done by the above algorithm is θ(log log n) and its efficiency is θ(n)/ θ(log log n)=θ(1/log log
n).
Thus this algorithm is work-optimal.

" # $
Consider again the problem of finding the maximum of n given keys. If each one of these keys is a bit, the
problem of finding the maximum reduces to computing the Boolean OR of n bits and hence can be done in
O(1) time using n common CRCW PRAM processors. This raises the following question: What can be the
maximum magnitude of each key if we desire a constant time algorithm for maximal selection using n
processors? Answering this question in its full generality is beyond the scope of this syllabus. Instead we
show that if each key is an integer in the range [0, nc], where C is a constant, maximal selection can be done
work-optimally in O(1) time. Speedup of this algorithm is θ(n) and its efficiency is θ(1).
Since each key is of magnitude at most nc, it follows that each key is a binary number with ≤C log n bits.
Without loss of generality assume that every key is of length exactly equal to log n. Suppose we find the
136 ALGORITHMS AND ADVANCED DATA STRUCTURES

maximum of the n keys only with respect to their log n/2 most significant bits. (See figure 3). Let M be
dropped from future consideration since it cannot possibly be the maximum. After this many keys can
potentially survive. Next we compute the maximum of remaining keys with respect to their next log n/2
MSBs and drop keys that cannot possibly be the maximum.

# $ %" "

We repeat this basic step 2 times (once for every log n/2 bits in the input keys). One of the keys that survive
the very last step can be output as the maximum. Refer to the log n/2 MSBs of any key as its first part, the
next most significant logn/2 bits as its second part, and so on. There are 2c parts for each key. The 2cth part
may have less than logn/2 bits. The algorithm is summarized below. To begin with, all the keys are alive.
For (i=1; i<=2c; i++)
{
Step 1. Find the maximum of all alive keys with respect to their i th parts. Let M be the
maximum.
Step 2. Delete each alive key whose i th part is <M
}
Output one of the alive keys.
We now show the step 1 of this algorithm can be completed in O(1) time using n common CRCW PRAM
processors. Note that if a key has at most logn/2 bits, its maximum magnitude is √n-1. Thus each step of
this algorithm is nothing but the task of finding the maximum of n keys, where each key is an integer in the
range (o, √n-1). Assign one processor to each key. Make use of √n global memory cells (which one
initialized to –∞). Call these cells M0,…..M√n-1. In one parallel write step, of processor i has a key ki, then it
tries to write ki in Mki. For example, if processor i has a key valued 10, it will attempt to write 10 in M10.
After this write step, the problem of computing the maximum of the n keys reduces to computing the
maximum of the contents of M0, M1…..M√n–1. Since these are only √n numbers, their maximum can be
found in O(1) time using n processors. As a result we get the following theorem.
Theorem
PARALLEL ALGORITHMS 137

The maximum of n keys can be found in O(1) time using n CRCW PRAM processors provided the keys are
integers in the range [o, nc] for any constant c.

Student Activity 5.3


Before going to next section, answer the following questions:
1. Write algorithm for finding the maximum using n2 processors?
2. Describe method for finding the maximum using n processors.
If your answers are correct, then proceed to next section.
Top

The problem of merging is to take two sorted sequences as input and produce a sequence of all the
elements. Merging is an important problem. For example an efficient merging algorithm can lead to an
efficient sorting algorithm. The same is true in parallel computing also. In this section we study the parallel
complexity of merging.

#%
Let X1=k1, k2,……, km and X2=km+1, km+2,…..k2m be the input sorted sequence to be merged. Assume without
loss of generality that m is an integral power of 2 and that the keys are distinct. Note that the merging of X1
and X2 can be reduced to computing the rank of each key k in X1, UX2. If we know the rank of each key,
then the keys can be merged by writing the key whose rank is i into global memory cell i. This writing will
take only one time unit if we have n=2m processors.
For any key k, let its rank in X1 (X2) be denoted as r1k (r2k). If k=kj ∈X1, then note that r1k=j. If we allocate a
single processor π to k, π can perform a binary search on X2 and figure out the number 9 of keys in X2 that
are less than k. Once q is known, π can compute k’s rank in X1UX2 as j+q. If k belongs to X2, a similar
procedure can be used to compute its rank in X1UX2. In summary, if we have 2m processors (one processor
per key); merging can be completed in O(log m) time.
Theorem
Merging of two sorted sequences each of length m can be completed in O(log n) time using m CREW
PRAM processors.
Since two sorted sequences of length m each can be sequentially merged in θ(m) time, the speedup of the
above algorithm is θ(m)/ θ(log n) = θ(m/log n); its efficiency is θ(m)/ θ(m log n) =θ(1/log m). This
algorithm is not work-optimal!

&!
Odd-Even merge is a merging algorithm based on divide and conquer that yields itself to efficient
parallelization. If X1=K1, K2,….Km and X2 = Km+1,….K2m (where m is an integral power of 2) are the two
sorted sequences to be merged then following algorithm uses 2m processors.
Step 0. If m=1, merge the sequences with one comparison.
Step 1. Partition X1 and X2 into their odd and even parts. That is, partition X1 into X1odd=K1,
K3,……,Km–1 and X1even=K2, K4,…..,Km. Similarly, partition X2 into X2odd and X2even.
138 ALGORITHMS AND ADVANCED DATA STRUCTURES

Step 2. Recursively merge X1odd with X2odd using m processors. Let l1=l1, l2,….lm be the
result. Note that X1odd, X2even, X2odd and X2even are in sorted order. At the same time
merge X1even with X2even using the other m processors to get l2=lm+1, lm+2….l2m.
Step 3. Shuffle l1 and l2 : that is, form the sequence L=l1, lm+1, l2 lm2,….lm, l2m. Compare
every pair (lm+i, li+1) and interchange them out of order. That is, compare lm+2 with
l3 and inter change them if need by, compare lm+2 with l3 and inter change them if
need be, and so on. Out put the result sequence.

Let X1=2,5,8,11,13,16,21,25 and X2=4,9,12,18,23,27,31,34. Figure 5.4. Shows how the odd-even merge
algorithm can be used to merge these two sorted sequences.

& ' '(' ' #' )' ' & *'+' ' (' #' ,'# '#*
! - ! -

'(' #' ' ' ' )' *' ' #'# +' (' ,'#*

" "

& '*'(' ' #' #'# & '+' ' )' (' ' ,'#*

& ' '*'+'(' ' ' )' #' (' ' ' #' ,'# '#*

Compare-exchange
'*' '('+' ' ' #' )' #' (' ' ' #' ,'# '#*

* ! - / % "

The correctness of the merging algorithm can be established using the zero-one principle. The validity of
this principle is not proved here.
Theorem
[Zero-one principle] If any oblivious comparison-based sorting algorithm sorts an arbitrary sequence of n
zeros and ones correctly then it will sort any sequence of arbitrary keys.
PARALLEL ALGORITHMS 139

A comparison-based sorting algorithm is said to be oblivious if the sequence of cells to be compared in the
algorithm is prespecified. For example, the next pair of cells to be compared cannot depend on the outcome
of comparisons made in the previous steps.

Student Activity 5.4


Before going to next section, answer the following questions:
1. Explain odd-even merge algorithm.
2. Merge the following two files by odd-even merge algorithm
X1 = 5, 9, 10, 13, 15
X2 = 4, 8, 10, 11, 14
If your answers are correct, then proceed to next section.
Top

"
Given a sequence of n keys, recall that problem of sorting is to rearrange this sequence into either
ascending or descending order. In this section we study algorithms for parallel sorting. If we have n
processors, the rank of each key can be computed in O(log n) time comparing in parallel, all possible pairs.
Once we know the rank of back key, in one parallel write step they can be written in sorted order (the key
whose rank is i is written in all i). Thus we have the following theorem.
Theorem
We can sort n keys in O(log n) time using n2 CREW PRAM processors.

&! "
Odd-even merge sort employs the classical divide and conquer strategy. Assume for simplicity that n is an
integral power of two and that the keys are distinct. If X=k1, k2……kn/2 is the given sequence of n keys, it is
partitioned into two subproblems X1=k1,k2, … kn/2 and X’2=kn/2+1,…..kn, Of equal length. X’1 and X’2 are
sorted recursively assigning n/2 processors to each. The two sorted subsequences (call them X1 and X2
respectively) are then finally merged.
The preceding description of the algorithm is exactly the same as that of two subsequence X1 and X2 are
merged. We employ the odd-even merge algorithm of previous section.
Theorem
We can sort n arbitrary keys in O(log2 n) time using n EREW PRAM processors.
Proof : The sorting algorithm is described as follows:
Step 0. If n ≤1, return X.
Step 1. Let X=K1, K2….kn be the input. Partition the input into two: X’1=K1, K2,…kn/2 and
X’2=K2/2+1,….kn.
Step 2. Allocate n/2 processors to sort X’1 recursively. Let X1 be the result. At the same
time employ the other n/2 processors to sort X’2 recursively. Let X2 be the result.
Step 3. Merge X1, and X2 using odd even merge algorithm and n=2m processors.
140 ALGORITHMS AND ADVANCED DATA STRUCTURES

It uses n processors. Define T(n) to be the time taken by this algorithm to sort n keys using n processors.
Step (of this algorithm takes O(1) time. Step 2 runs in T(n/2) time. Finally, step 3 takes O(log n) time.
Therefore. T(n) satisfies T(n)=O(1)+T(n/2)+O(log n)=T(n/2)+O(log n) which solves to T(n)=O(log2n)

Consider the problem of sorting the 16 number 25, 21, 8, 5, 2, 13, 11, 16, 23, 31, 9, 4, 18, 12, 27, 34 using
16 processors. In step 1 of algorithm, the input is partitioned into two parts:
X’1=8,21,8,5,2,13,11,16 and X’2=23,31,9,4,18,12,27,34, In 2, processors 1 to 8 work on X’1, recursively sort
it and obtain X1=2,5,8,11,13,16,21,25. At the same time processors 9 to 16 work on X’2, sort it, and obtain
X2=4,9,12,18,23,27,31,34. In step 3, X1 and X2 are merged as showed in example of previous section to get
the final result :
2,4,5,8,9,11,12,13,16,18,21,23,25,27,31,34.
The work done by this algorithm is θ(n log2n). Therefore, its efficiency is θ(1/logn) it has a speedup of
θ(1/log n)

Student Activity 5.5


Answer the following questions :
1. Describe odd-even merge sort algorithm.
2. What is the time complexity for sorting n arbitrary keys using n EREW PRAM processors.

" '
In parallel computing a problem is subdivided into many subproblems and submitted to many
processors. The partial solutions are then combined to obtain the final result.
In PRAMs, a number of processors work synchrously and communicate to each other by means of a
common global memory.
Maximum selection can be done in O (log log n) time using n common CRCW PRAM processors.
The problem of merging is to take two sorted sequence as input and produce a sequence of all the
element.
Odd-Even merge is a merging algorithm based on divide and conquer.

I. True and False


1. Any algorithm designed for multi processor machines is called a parallel algorithm.
2. The maximum speed up = 1/(f+p)
II. Fill in the blanks
1. Odd-Even merge is based on_____________technique.
PARALLEL ALGORITHMS 141

2. The Boolean OR of n bits can be computed in_____________time on an n-processor common


CRCW PRAM.
3. Finding the maximum of n given numbers can be done in O(1) time using an __________CRCW
PRAM.

" #

I. True and False


1. True
2. False
II. Fill in the blanks
1. divide and conquer
2. O(1)
3. n2 –processor

$ !

I. True and False


1. Shared memory modes are called PRAMs
2. Maximum selection can be done in O(log n) time using n common CRCW PRAM processors.
II. Fill in the blanks
1. A parallel algorithm is work optimal if and only if it has _____.
2. In a CRCW PRAM, if more than one processor trial to read from the same cell, the will read the
________ information.
3. The problem of ________ is to take two sorted sequences as input and produce a sequence of all
the elements.
4. Given a sequence of n keys. The problem of ________ is to rearrange this sequence into either
ascending or descending order.

( )
1. Algorithms A and B are parallel algorithm for solving the problem of finding the maximum element
in a list. Algorithm A uses n0.5 processors and runs in time θ(n0.5). Algorithm B uses n processors
and runs in O(log n) time. Compute the work done, speedups, and efficiencies of these two
algorithms. Are these algorithms work-optimal?
2. Mr. Ultra smart claims to have found an algorithm for above problem that runs in time θ(logn)
using n3/4 processors. Is it possible?
3. Present an O(1) time n-processor common CRCW PRAM algorithm for computing the Boolean
AND of n bits.
142 ALGORITHMS AND ADVANCED DATA STRUCTURES

4. Input is an array of n elements. Give an O(1) time, n-processor common CRCW PRAM algorithm
to check whether the array is in sorted order.
5. Solve the Boolean OR and AND problems on the CRCW and EREW PRAMs. What are the time
and processor bounds of your algorithms?
6. Can exercise (4) be solved in O(1) time using n processors on any of the PRAMs if the keys are
arbitrary? How about it there are n2 processors?
7. The algorithm A is a parallel algorithm that has two components. The first runs in θ(log log n) time
using n/log log n EREW PRAM processors. The second component runs in θ(log n) time using
n/logn CREW PRAM processors. Show that the whole algorithm can be run in θ(log n) time using
n/logn CREW PRAM Processors.
8. Present on O(log log n) time algorithm for finding the maximum of n arbitrary numbers using n/log
log n common CRCW PRAM processors.
9. Show that minima computation can be performed in O(log log n) time using n/log log n common
CRCW PRAM processors.
10. Given an array A of n elements, we would like to find the largest I such that A[i]=1. Give an O(1)
time algorithm for this problem on an n–processor common CRCW PRAM.
11. Given two sorted sequences of length n each. How will you merge them in O(1) time using n2
CRCW PRAM processors?
12. A given two sets A and B of size n each (in the form of arrays), the goal is to check whether the two
sets are disjoint or not. Show how to solve this problem.
(a) In O(1) using n2 CRCW PRAM processors.
(b) In O (logn) time using n CRCW PRAM processors.
Algorithms and
Advanced Data Structures

BCA-202

Directorate of Distance Education


Maharshi Dayanand University
ROHTAK – 124 001
Copyright © 2002, Maharshi Dayanand University, ROHTAK
All Rights Reserved. No part of this publication may be reproduced or stored in a retrieval system or
transmitted in any form or by any means; electronic, mechanical, photocopying, recording or otherwise,
without the written permission of the copyright holder.

Maharshi Dayanand University


ROHTAK – 124 001

Developed & Produced by EXCEL BOOKS, A-45 Naraina, Phase 1, New Delhi-110028

2
Contents
UNIT 1 TREES 1
Overview
Binary Trees
Traversal of Binary Trees
Binary Tree Representation
Threaded Binary Trees
Binary Search Tree
AVL Tree
Run Time Storage Management
Garbage collection
Compaction

UNIT 2 SORTING TECHNIQUES 23


Overview
Bubble Sort
Insertion Sort
Selection Sort
Quick Sort
Merge Sort
Radix Sort
Heap Sort
External Sort
Lower Bound theory for sorting
Selection and Adversary Argument
Minimum Spanning Tree
Prim’s Algorithm
Kruskal’s Algorithm
Shortest Path
Graph Component Algorithm
String Matching

KMP Algorithm

3
Boyer Moore Algorithm

UNIT 3 DYNAMIC PROGRAMMING 81


Overview
Principle of Optimality
Matrix Multiplication
Optimal Binary Search Trees

UNIT 4 NP COMPLETE PROBLEM 98


Overview
Polynomial-time
NP-Completeness and Reducibility
NP-Completeness Proofs
NP-Complete Problems

UNIT 5 PARALLEL ALGORITHMS 127


Overview
Parallelism
Computational Model: PRAM and Other Models
Finding Maximum Element
Merging
Sorting

4
Suggested Readings
1. The Design and Analysis of Computer Algorithms, A.J. Hopcroft and J. Ullman.
2. Data Structures and Algorithms, AHO, A.V., J. E. Hopcroft, and J.D. Ullman
3. Computer Algorithms: Introduction to Design and Analysis and Analysis, S. Basse.
4. Dynamic Programming, Bellman, R., Princeton University Press.
5. Decomposable Searching Problem, Inform Process, Bentley, J.L., Lett.
6. Sorting by Distributive Partitioning, Dobosiewicz, W., Lett.
7. Introduction to the Design and Analysis of Algorithms, Goodman, S. E., McGraw-Hill.
8. Fundamentals of Data Structure, Horowitz, E. and S. Sahni Computer Science Press.
9. Fundamental Algorithms, Knuth, D.E.

You might also like