You are on page 1of 65

UNIT – I

Introduction: Algorithmic Specification-Introduction-Recursive Algorithm-Performance


analysis-Space complexity-Timecomplexity-Asymptotic notation-Sparse matrices-Polynomials.
Stacks and queues: Stacks –stacks using dynamic arrays- Queues- Circular Queues using
Dynamic arrays-Evaluation of expressions-Multiple stacks and queues.
UNIT – II
Linked lists: Singly linked lists and chains- Linked stacks and queues- Polynomials-
Additional list operations.
TREES-introduction-binary trees-Binary tree traversals-threaded binary trees-Heaps-Binary
search trees-Counting binary trees.
UNIT – III
Graphs: The Graph abstract Data type-definitions-Graph representation-Elementary Graph
operation-Depth first search-Breadth first traversal.Connected components-Spanning trees-
Biconnected components.
Hashing: Introduction- Static Hashing-Dynamic Hashing-
UNIT – IV
Sorting: Motivation-Insertion sort-Quick sort-Merge sort-Merging –Iterative merge sort-
Recursive merge sort-heap sort-Sorting on several keys-List and table sorts.
External sorting-introduction-k-way merging-buffer handling for parallel operation- run
generation- optimal merging of runs.
UNIT - V
Efficient binary search trees:Optimal binary serach trees-AVL trees-Red black trees-
Definition-Representation of a Red Black tree-Searching red black tree-inserting into ared Black
tree-Deletion from ared Black tree-Joining Red Black trees.
B trees: Definition and properties-Number of elements in ab tree- Insertion into Black trees-
Deltion from B tree.
Text Book:
1.Horowitz,Sahni,Anderson-freed-“Fundamentals of Data structures in C” , Universities
Press,Second edition,2008
Data Structures Page |2

UNIT -1
1. ALGORITHM

DEFINITION:
An algorithm is a finite set of instructions which, if followed accomplish a particular
task.

Every algorithm must satisfy the following criteria:


INPUT:
There are zero or more quantities which are externally supplied;
OUTPUT:
At least one quantity is produced;
DEFINITENESS:
Each instruction must be clear and unambiguous;
FINITENESS:
If we trace out the instructions of an algorithm, then for all cases the algorithm
will terminate after a finite number of steps;
EFFECTIVENESS:
Every instruction must be sufficiently basic that it can in principle be carried out by a
person using only pencil and paper.
It is not enough that each operation be definite as in (iii), but it must also be feasible.

2.HOW TO ANALYZE PROGRAMS ?


There are many criteria upon which we can judge a program, for instance:

(i) Does it do what we want it to do?


(ii) Does it work correctly according to the original specifications of the task?
(iii) Is there documentation which describes how to use it and how it works?
(iv) Are subroutines created in such a way that they perform logical sub-functions?
(v) Is the code readable?

There are two criteria related to performance:


Computing time
Storage requirements of the algorithms.

Performance evaluation is divided into 2 major phases:


(a) a priori estimates and
(b) a posteriori testing.
Both of these are equally important.
First consider a priori estimation.
Suppose that somewhere in one of your programs is the statement
X x + 1.
We would like to determine two numbers for this statement.
The amount of time a single execution will take;
Data Structures Page |3

The number of times it is executed.


The product of these numbers will be the total time taken by this statement. This is called
frequency count.

The following information are needed to compute time:


(i) the machine we are executing on:
(ii) its machine language instruction set;
(iii) the time required by each machine instruction;
(iv) the translation a compiler will make from the source to the machine language.

Consider the three examples below.

. for i 1 to n do for i1 to n do


xx + 1. xx + l for j1 to n do
. end xx + 1
end
end
.
.
(a) (b) (c)
Fig: Three simple programs for frequency counting.

In program (a) we assume that the statement xx + 1 is not contained within any loop .Then
its frequency count is one.
In program (b) the same statement will be executed n times and
In program (c) n2 times.

Now 1, n, and n2 are said to be different and increasing orders of magnitude just like 1, 10,
100 would be if we let n = 10.
To determining the order of magnitude , formulas such as
∑1 , ∑I , ∑i2
1<=i<=n 1<=i<=n 1<=i<=n

Often occur.
Simple forms for the above three formulas are
∑ik =nk+1/k+1 + terms of lower degree, k>=0
1<=i<=n

Let us consider a simple program for computing the n-th Fibonacci number.
The Fibonacci sequence starts as
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
Each new term is obtained by taking the sum of the two previous terms. If we call the first
term of the sequence F0 then F0 = 0, F1 = 1 and
In general
Fn = Fn-1 + Fn-2, n 2.
The program on the following page takes any non-negative integer n and prints the value Fn.
Data Structures Page |4

1 procedure FIBONACCI
2 read (n)
3-4 if n < 0 then [print ('error'); stop]
5-6 if n = 0 then [print ('0'); stop]
7-8 if n = 1 then [print ('1'); stop]
9 fnm2 0; fnm1 1
10 for i 2 to n do
11 fn fnm1 + fnm2
12 fnm2 fnm1
13 fnm1 fn
14 end
15 print (fn)
16 end FIBONACCI.

A complete set would include four cases: n < 0, n = 0, n = 1 and n > 1. Below is a table
which summarizes the frequency counts for the first three cases.

Step n < 0 n=0 n=1


1 1 1 1
2 1 1 1
3 1 1 1
4 1 0 0
5 0 1 1
6 0 1 0
7 0 0 1
8 0 0 1
9-15 0 0 0

When n > 1, Steps 1, 2, 3, 5, 7 and 9 will be executed once, but steps 4, 6 and 8 not
At all. Both commands in step 9 are executed once.

Now, for n>=2, step 10 is executed n times and not n -1 times. Though 2 to n is only n - 1
executions, remember that there will be a last return to step 10 where i is incremented to n + 1,
the test i > n made and the branch taken to step 15. Thus, steps 11, 12,13 and 14 will be executed
n - 1 times but step 10 will be done n times.

We can summarize all of this with a table.

Summary for Execution Count for Computing Fn


Data Structures Page |5

STEP FREQUENCY STEP FREQUENCY


1 1 9 2
2 1 10 n
Each 3 1 11 n-1 statement is
counted 4 0 12 n-1 once, so step
9 has 2 5 1 13 n-1 statements
and is 6 0 14 n-1 executed
once for a 7 1 15 1 total of 2.
8 0 16 1
Clearly, the actual time taken by each statement will vary.
The total count then is 5n + 5. We will often write this as O(n),ignoring the two constants 5.
This notation means that the order of magnitude is proportional to n.

The notation f(n) = O(g(n)) (read as f of n equals big-oh of g of n) has a precise mathematical
definition.

Definition: f(n) = O(g(n)) iff there exist two constants c and no such that |f(n)| c|g(n)| for all
n>= no.

F(n) will normally represent the computing time of some algorithm. When we say that the
computing time of an algorithm is O(g(n)) we mean that its execution takes no more than a
constant times g(n). N is a parameter which characterizes the inputs and/or outputs.
For example n might be the number of inputsor the number of outputs or their sum or the
magnitude of one of them.
For the Fibonacci program n represents the magnitude of the input and the time for this
program is written as T(FIBONACCI) = O(n).

We write O(1) to mean a computing time which is a constant. O(n) is called linear, O(n2) is
called quadratic, O(n3) is called cubic, and O(2n) is called exponential. If an algorithm takes
time O(log n) it is faster, for sufficiently large n, than if it had taken O(n). Similarly, O(n log n)
is better than O(n2) but not as good as O(n). These seven computing times, O(1), O(log n), O(n),
O(n log n), O(n2), O(n3), and O(2n) are the ones we will see most often throughout the book.
If we have two algorithms which perform the same task, and the first has a computing time
which is O(n) and the second O(n2), then we will usually take the first as superior.

Rate of Growth of Common Computing Time Functions


Log2n n nlog2n n2 n3 2n
------------------------------------------------------------------------
0 1 0 1 1 2
1 2 2 4 8 4
2 4 8 16 64 16
3 8 24 64 512 256
Data Structures Page |6

4 16 64 256 4096 65536


5 32 160 1024 32768 2, 147, 483, 648

3.SPARSE MATRICES
A matrix is a mathematical object which arises in many physical problems. A sparse matrix
is a matrix where many entries are zero. A general matrix consists of m rows and n columns of
numbers as in figure .
Figure : Example of 2 matrices
C1 c2 c3 c1 c2 c3 c4 c5 c6
--------- ------------------------
R1 2 3 4 r1 15 0 0 0 91 0
R2 1 2 3 r2 0 11 0 0 0 0
R3 1 4 5 r3 0 3 0 0 0 28
r4 22 0 -6 0 0 0
r5 0 0 0 0 0 0
r6 -15 0 0 0 0 0
The first matrix has three rows and three columns, the second six rows and six columns(out
of 36 elements only 8 are non-zero,such a matrix is a sparse matrix). In general, we write m X n
(read m by n) to designate a matrix with m rows and n columns. Such a matrix has mn elements.
When m is equal to n, we call the matrix square.
We store a matrix in a two dimensional array, say A(1:m, 1:n). Now if we look at the second
matrix of figure 2.1, we see that it has many zero entries. Such a
Matrix is called sparse.
#include<stdio.h>
#include<conio.h>
void main()
{
int m,n,i,j,a[10][10],c=0;
clrscr();
printf("enter the no of rows nd columns= \n ");
scanf("%d %d",&m,&n);
printf("enter the matrix= ");
for(i=1;i<=m;i++)
{
for(j=1;j<=n;j++)
{
scanf("%d",&a[i][j]);
if(a[i][j]!=0)
c++;
}
}
printf(" sparse matrix=");
printf("\n \t row\t columns\t element");
Data Structures Page |7

printf("\n \t~~~~\t~~~~~~\t~~~~~~~");
for(i=1;i<=m;i++)
{
for(j=1;j<=n;j++)
{
if(a[i][j]!=0)
printf("\n \t %d \t %d \t %d ",i,j,a[i][j]);
}
}
getch();
}
Data Structures Page |8

4.STACK
A stack is an ordered list in which all insertions and deletions are made at one
end, called the top.

One natural example of stacks which arises in computer programming is the processing of
subroutine calls and their returns. Suppose we have a main procedure and three subroutines as
below:

| proc MAIN | | proc A1 | | proc A2 | | proc A3 |


| ___ | | ___ | | ___ | | ___ |
| ___ | | ___ | | ___ | | ___ |
| ___ | | ___ | | ___ | | ___ |
| call A1 | | call A2 | | call A3 | | ___ |
| r: | | s: | | t: | | |
| ___ | | ___ | | ___ | | ___ |
| ___ | | ___ | | ___ | | ___ |
| end | | end | | end | | end |

Figure 2. Sequence of subroutine calls


POLICY: LIFO(last in first out)
OPERATIONS:
Associated with the object stack there are several operations that are necessary:
CREATE (S)
Creates s as an empty stack;
ADD (i,S)
Inserts the element i onto the stack S and returns the new stack;
DELETE (S)
Removes the top element of stack S and returns the new stack;
TOP (S)
Returns the top element of stack S;
ISEMTS (S)
Returns true if S is empty else false;
The ADD and DELETE operations are only a bit more complex.
Procedure ADD (item, STACK, n, top)
//insert item into the STACK of maximum size n; top is the number of elements curently in
STACK//
If top>= n then call STACK_FULL
Top <- top + 1
STACK (top)<- item
End ADD

Procedure DELETE (Item, Stack, Top)


//removes the top element of STACK and stores it in item unless STACK is empty//
Data Structures Page |9

If top<= 0 then call STACK_EMPTY


Item <-STACK (top)
Top <- top - 1
End DELETE

EXAMPLE:
#include<stdio.h>
#include<conio.h>
void main()
{
int n,item,i,y,s[20],ch,top=0;
char c1;
clrscr();
printf("enter the limit:");
scanf("%d",&n);
f: printf("1.push\n ");
printf("2.pop");
printf("\n3.exit");
printf("\nenter the choice: ");
scanf("%d",&ch);
switch(ch)
{
case 1:if(top>=n)
{
printf("stack full");
break;
}
else
printf("enter the item to insert\n");
scanf("%d",&item);
top=top+1;
s[top]=item;
printf("elements in the stack are...\n");
for(i=top;i>0;i--)
printf("%d",s[i]);
break;
case 2:
if(top>0)
{
y=s[top];
top=top-1;
printf("one element deleted (deleted element:%d)\n",y);
}
if(top<=0)
Data Structures P a g e | 10

printf("stack empty");
else
{
printf("contents of the stack are....\n");
for(i=top;i>0;i--)
printf("%d",s[i]);
}
break;
case 3:exit(0);
break;
deafult:printf("invalid choice\n");
break;
}
printf(" do you want to continue(y/n)");
c1=getche();
if((c1=='y')||(c1=='y'))
goto f;
else
getch();
}

5.QUEUE:
DEFINITION:
A queue is an ordered list in which all insertions take place at one end, the rear, while all
deletions take place at the other end, the front. Given a stack S = (a1, ...an) then we say that a1 is
the bottommost element and element ai is on top of element ai - 1, 1 < i n. When viewed as a
queue with an as the rear
Element one says that ai+1 is behind , 1 i< n.

POLICY: FIFO(first in first out)


OPERATIONS OF QUEUE:
CREATEQ(Q)
Creates Q as an empty queue;
ADDQ(i,Q)
Adds the element i to the rear of a queue and returns the new queue;
DELETEQ(Q)
Removes the front element from the queue Q and returns the resulting queue;
FRONT(Q)
Returns the front element of Q;
ISEMTQ(Q)
Returns true if Q is empty else false.

The following algorithms for ADDQ and DELETEQ result:


Procedure ADDQ(item, Q, n, rear)
Data Structures P a g e | 11

//insert item into the queue represented in Q(l:n)//


If rear = n then call QUEUE_FULL
Rear <-rear + 1
Q(rear) <- item
End ADDQ

Procedure DELETEQ(item, Q, front, rear)


//delete an element from a queue//
If front = rear then call QUEUE_EMPTY
Front <-front + 1
Item <-Q(front)
End DELETEQ

EXAMPLE:
#include<stdio.h>
#include<conio.h>
void main()
{
int n,item,i,y,q[20],ch,front=0,rear=0;
char c1;
clrscr();
printf("enter the limit:");
scanf("%d",&n);
f: printf(" 1.insertion ");
printf("\n 2.deletion");
printf("\n 3.exit");
printf("\n enter the choice: ");
scanf("%d",&ch);
switch(ch)
{
case 1:if(rear>=n)
{
printf("overflow");
break;
}
else
{
printf("enter the item to insert:\n");
scanf("%d",&item);
if(rear==0)
front=1;
rear=rear+1;
q[rear]=item;
}
Data Structures P a g e | 12

printf(" elements in the queue are...\n");


for(i=front;i<=rear;i++)
printf("%d",q[i]);
break;A
case 2:
if((front>rear)||(rear==0))
{
printf("underflow");
front=0;
rear=0;
goto f;
}
else
{
y=q[front];
front=front+1;
printf(" one element deleted (deleted element: %d )\n",y);
}
if(front>rear)
break;
else
{
printf("contents of the queue are....\n");
for(i=front;i<=rear;i++)
printf("%d",q[i]);
}
break;
case 3:exit(0);
break;
deafult:printf("invalid choice");
break;
}
printf(" do you want to continue(y/n)\n");
c1=getche();
if((c1=='y')||(c1=='y'))
goto f;
else
getch();
}
6.EVALUATION OF EXPRESSIONS
 Expressions is made up of operands , operators , and delimiters.
 An expression is X=A/B-C+D
 A,B,C,D these are all one-letter variables , operands can be any legal variable and
Constant in our programming language.
Data Structures P a g e | 13

 This operations are described by operators.


 First there are the basic arithmetic operators : plus , minus , times, and and divide(+, -,/).
 Other arithmetic operators <,>,<=,>=,=.
 The result of an expression that contains relational operators is one of two constants:
 True or false .
 For instance, if A=4,B=C=2,D=E=3
 ((4/2)-2)+(3*3)-(4*2)= 0+9-8 =1.
Expressions are three type:
Infix expression
Postfix expression
Prefix expression
INFIX EXPRESSION
• Normal way to write an expression.
• Binary operators come in between their left And right operands.
–a*b
–a+b*c
–a*b/c
– (a + b) * (c + d) + e – f/g*h + 3.25
POSTFIX EXPRESSION
• The postfix form of a variable or Constant is the same as its infix form.
– a, b, 3.25
• The relative order of operands is the Same in infix and postfix forms.
• Operators come immediately after the Postfix form of their operands.
Infix = a + b
Postfix = ab+c/
Postfix Notation
Expressions are converted into Postfix notation before
Compiler can accept and process them.
X=A/B–C+D*E–A*C
Infix => A / B – C + D * E – A * C
Postfix => A B / C – D E * + A C * -
Operation Postfix
T1 = A / B T1 C – D E * + A C * -
T2 = T1 - C T2 D E * + A C * -
T3 = D * E T2 T3 + A C * -
T4 = T2 + T3 T4 A C * -
T5 = A * C T4 T5 -
T6 = T4 - T5 T6
Prefix EXPESSION
• The prefix form of a variable or constant Is the same as its infix form. a, b, 3.25
• The relative order of operands is the Same in infix and prefix forms.
• Operators come immediately before the Prefix form of their operands.
– Infix = a + b ; Postfix = ab+ ;Prefix = +ab
Prefix Examples
Data Structures P a g e | 14

• Infix = a + b *c=> +a * b c
• Infix = a * b + c=>+ * a b c
INFIX TO POSTFIX
● Fully parenthesize the expression.
● Move all operators so that they replace their corresponding right parenthesis.
● Delete all parenthesis.Example=>A/B-C+D*E-A*C

Non empty none


A empty A
* * A
( *( A
B *( AB
+ *(+ AB
C *(+ ABC
) * ABC+
* * ABC+*
D * ABC+*D
Done empty ABC+*D*

• These examples motivate a priority-based scheme for stacking and Unstacking operators.
• When the left parenthesis ‘(‘ is not in the stack, it behaves as an Operator with high priority.
• whereas once ‘(‘ gets in, it behaves as one with low priority (n)Operator other than the
matching right parenthesis should cause it to Get unstacked)
• Two priorities for operators: isp (in-stack priority) and icp (in-coming
Priority)
• The isp and icp of all operators in remain
Unchanged.
• We assume that isp(‘(‘) = 8 (the lowest), icp(‘(‘) = 0 (the highest),
An disp(‘#’) = 8 (# 􀃆 the last token)
• Result rule of priorities:
– Operators are taken out of the stack as long As their isp is numerically less than or equal to
The icp of the new operator.
7.POLYNOMIAL:
A polynomial may also be represented using a linked list. A structure may be defined
such that it contains two parts- one is the coefficient and second is the corresponding exponent.
The structure definition may be given as shown below:
Struct polynomial
{
int coefficient;
int exponent;
struct polynomial*next;
};
Thus the above polynomial may be represented using linked list as shown below:
Data Structures P a g e | 15
Data Structures P a g e | 16

UNIT-2
SINGLY LINKED LIST :
In this type of Linked List two successive nodes are linked together inlinear fashion .
 Each Node contain address of the next node to be followed.
 In Singly Linked List only Linear or Forward Sequential movement is possible.
 Elements are accessed sequentially , no direct access is allowed.
EXPLANATION :
 It is most basic type of Linked List in C.
 It is simple sequence of Dynamically allocated Nodes.
 Each Node has its successor and predecessor.
 First Node does not have predecessor while last node does not have any
successor.
 Last Node have successor reference as “NULL“.
In the above Linked List We have 3 types of nodes.
 1.First Node
 2.Last Node
 3.Intermediate Nodes
In Singly Linked List access is given only in one direction thus Accessing Singly Linked is
Unidirectional.We can have multiple data fields inside Node but we have only single “Link”
for next node.
Void creat()
{
Char ch;
Do
{
Struct node *new_node,*current;
New_node=(struct node *)malloc(sizeof(struct node));

Printf("nenter the data : ");


Scanf("%d",&new_node->data);
New_node->next=NULL;

If(start==NULL)
{
Start=new_node;
Current=new_node;
}
Else
{
Current->next=new_node;
Current=new_node;
}
Printf("ndo you want to creat another : ");
Ch=getche();
Data Structures P a g e | 17

}while(ch!='n');
}
INTRODUCTION TO LINKED LIST
 It is a data Structure which consists if group of nodes that forms a sequence.
 It is very common data structure that is used to create tree,graph and
other abstract data types.

Linked list comprise of group or list of nodes in which each node have link to next node
to form a chain
LINKED LIST DEFINITION
 Linked List is Series of Nodes
 Each node Consist of two Parts viz Data Part & Pointer Part
 Pointer Part stores the address of the next node

Linked list is created using following elements –


No Element Explanation

1 Node Linked list is collection of number of nodes

Address Field in Address field in node is used to keep address of


2
Node next node

Data Field in Data field in node is used to hold data inside linked
3
Node list.
We can represent linked list in real life using train in which all the buggies are nodes and two
coaches are connected using the connectors.
Data Structures P a g e | 18

POLYNOMIAL:
Representation
Addition
Multiplication
Representation of a Polynomial:
A polynomial is an expression that contains more than two terms. A term is made up of
coefficient and exponent. An example of polynomial is
P(x) = 4x3+6x2+7x+9
A polynomial thus may be represented using arrays or linked lists. Array representation
assumes that the exponents of the given expression are arranged from 0 to the highest value
(degree), which is represented by the subscript of the array beginning with 0. The coefficients of
the respective exponent are placed at an appropriate index in the array. The array representation
for the above polynomial expression is given below:

A polynomial may also be represented using a linked list. A structure may be defined such
that it contains two parts- one is the coefficient and second is the corresponding exponent. The
structure definition may be given as shown below:
Struct polynomial
{
int coefficient;
int exponent;
struct polynomial *next;
};

Thus the above polynomial may be represented using linked list as shown below:

Addition of two Polynomials:


 For adding two polynomials using arrays is straightforward method, since both the
arrays may be added up element wise beginning from 0 to n-1, resulting in addition of
two polynomials.
 Addition of two polynomials using linked list requires comparing the exponents, and
wherever the exponents are found to be same, the coefficients are added up.
 For terms with different exponents, the complete term is simply added to the result
thereby making it a part of addition result.

Multiplication of two Polynomials:


 Multiplication of two polynomials however requires manipulation of each node such
that the exponents are added up and the coefficients are multiplied.
 After each term of first polynomial is operated upon with each term of the second
polynomial,
Data Structures P a g e | 19

 then the result has to be added up by comparing the exponents and adding the
coefficients for similar exponents and including terms as such with dissimilar
exponents in the result.
The ‘C’ program for polynomial manipulation is given below:
TREES:
A tree structure means that the data is organized so that items of information are related by
branches.
 One very common place where such a structure arises is in the investigation of
genealogies.
 There are two types of genealogical charts which are used to present such data: the
pedigree and the lineal chart.

Definition:
 A tree is a finite set of one or more nodes such that:
 There is a specially designated node called the root;
 The remaining nodes are partitioned into n >= 0 disjoint sets T1, ...,Tn where each of
these sets is a tree. T1, ...,Tn are called the subtrees of the root.
A node stands for the item of information plus the branches to other items. Consider the tree
in figure 5.2. This tree has 13 nodes, each item of data being a single letter for convenience. The
root is A and we will normally draw trees with the root at the top. The number of subtrees of a
node is called its degree. The degree of A is 3, of C is 1 and of F is zero. Nodes that have degree
zero are called leaf or terminal nodes. {K,L,F,G,M,I,J} is the set of leaf nodes. Alternatively, the
other nodes are referred to as nonterminals. The roots of the subtrees of a node, X, are the
children of X. X is the parent of its children. Thus, the children of D are H, I, J; the parent of D is
A. Children of the same parent are said to be siblings. H, I and J are siblings. We can extend this
terminology if we need to so that we can ask for the grandparent of M which is D, etc. The
degree of a tree is the maximum degree of the nodes in the tree. The tree of the following figure
has degree 3. The ancestors of a node are all the nodes along the path from the root to that node.
The ancestors of M are A, D and H.

B C
D

E F
G H I J
Data Structures P a g e | 20

L
K M

Binary tree:

Introduction
 We extend the concept of linked data structures to structure containing nodes with
more than one self-referenced field.
 A binary tree is made of nodes, where each node contains a "left" reference, a "right"
reference, and a data element.
 The topmost node in the tree is called the root.
 Every node (excluding a root) in a tree is connected by a directed edge from exactly
one other node. This node is called a parent.
 On the other hand, each node can be connected to arbitrary number of nodes, called
children.
 Nodes with no children are called leaves, or external nodes.
 Nodes which are not leaves are called internal nodes.
 Nodes with the same parent are called siblings.
Data Structures P a g e | 21

MORE TREE TERMINOLOGY:

The depth of a node is the number of edges from the root to the node.
The height of a node is the number of edges from the node to the deepest leaf.
The height of a tree is a height of the root.
A full binary tree.is a binary tree in which each node has exactly zero or two
children.
A complete binary tree is a binary tree, which is completely filled, with the possible
exception of the bottom level, which is filled from left to right.

A complete binary tree is very special tree, it provides the best possible ratio between the
number of nodes and the height. The height h of a complete binary tree with N nodes is at most
O(log N). We can easily prove this by counting nodes on each level, starting with the root,
assuming that each level has the maximum number of nodes:

N = 1 + 2 + 4 + ... + 2h-1 + 2h = 2h+1 - 1

Solving this with respect to h, we obtain

H = O(log n)

Where the big-O notation hides some superfluous details.


Data Structures P a g e | 22

Traversals:
A traversal is a process that visits all the nodes in the tree. Since a tree is a nonlinear data
structure, there is no unique traversal. We will consider several traversal algorithms wit
we group in the following two kinds
Depth-first traversal breadth-first traversal
There are three different types of depth-first traversals, :
Preorder traversal - visit the parent first and then left and right children;
Inorder traversal - visit the left child, then the parent and the right child;
Postorder traversal - visit left child, then the right child and then the parent;

There is only one kind of breadth-first traversal--the level order traversal. This traversal visits
nodes by levels from top to bottom and from left to right.

As an example consider the following tree and its four traversals:


Preorder - 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
Inorder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
Postorder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
Levelorder - 8, 5, 4, 9, 7, 11, 1, 12, 3, SS
Data Structures P a g e | 23

In the next picture we demonstarte the order of node visitation. Number 1 denote the first
node in a particular traversal and 7 denote the last node.

These common traversals can be represented as a single algorithm by assuming that we visit
each node three times. An Euler tour is a walk around the binary tree where each edge is treated
as a wall, which you cannot cross. In this walk each node will be visited either on the left, or
under the below, or on the right. The Euler tour in which we visit nodes on the left produces a
preorder traversal. When we visit nodes from the below, we get an inorder traversal. And when
we visit nodes on the right, we get a postorder traversal.

.
THREADED BINARY TREE
A binary tree is represented using array representation or linked list representation. When a
binary tree is represented using linked list representation, if any node is not having a child we use
NULL pointer in that position. In any binary tree linked list representation, there are more
number of NULL pointer than actual pointers. Generally, in any binary tree linked list
representation, if there are 2N number of reference fields, then N+1 number of reference fields
are filled with NULL ( N+1 are NULL out of 2N ). This NULL pointer does not play any role
except indicating there is no link (no child).

A. J. Perlis and C. Thornton have proposed new binary tree called "Threaded BinaryTree”,
which make use of NULL pointer to improve its traversal processes. In threaded binary tree,
NULL pointers are replaced by references to other nodes in the tree, called threads.
Data Structures P a g e | 24

Threaded Binary Tree is also a binary tree in which all left child pointers that are NULL (in
Linked list representation) points to its in-order predecessor, and all right child pointers that are
NULL (in Linked list representation) points to its in-order successor.
If there is no in-order predecessor or in-order successor, then it point to root node.Consider
the following binary tree

To convert above binary tree into threaded binary tree, first find the in-order traversal of that
tree...
In-order traversal of above binary tree...
H-D-I-B-E-A-F-J-C-G
When we represent above binary tree using linked list representation, nodes H, I, E, F, J and
G left child pointers are NULL. This NULL is replaced by address of its in-order predecessor,
respectively (I to D, E to B, F to A, J to F and G to C), but here the node H does not have its in-
order predecessor, so it points to the root node A. And nodes H, I, E, J and G right child pointers
are NULL. This NULL ponters are replaced by address of its in-order successor, respectively (H
to D, I to B, E to A, and J to C), but here the node G does not have its in-order successor, so it
points to the root node A.
Above example Binary tree become as follows after converting into Threaded binary tree.
Data Structures P a g e | 25

In above figure
threadeds are indicated with dotted links.

Advantages of trees:
Trees are so useful and frequently used, because they have some very serious
advantages:
Trees reflect structural relationships in the data Trees are used to represent hierarchies
Trees provide an efficient insertion and searching
Trees are very flexible data, allowing to move subtrees Around with minimum
effort.
Data Structures P a g e | 26

#include <stdio.h>
#include<conio.h>
Void main()
{
int c, first, last, middle, n, search, array[100];
clrscr();
printf("enter number of elements\n");
scanf("%d",&n);
printf("enter %d integers\n", n);
for (c = 0; c < n; c++)
scanf("%d",&array[c]);
printf("enter value to find\n");
scanf("%d", &search);
first = 0;
last = n - 1;
middle = (first+last)/2;
while (first <= last) {
if (array[middle] < search)
first = middle + 1;
else if (array[middle] == search) {
printf("%d found at location %d.\n", search, middle+1);
break;
}
else
last = middle - 1;
middle = (first + last)/2;
}
if (first > last)
printf("not found! %d is not present in the list.\n", search);
getch();
Data Structures P a g e | 27

UNIT-3
GRAPH:
Graph and its representations
Graph is a data structure that consists of following two components:
1. A finite set of vertices also called as nodes.
2. A finite set of ordered pair of the form (u, v) called as edge. The pair is ordered because (u,
v) is not same as (v, u) in case of directed graph(di-graph). The pair of form (u, v) indicates that
there is an edge from vertex u to vertex v. The edges may contain weight/value/cost.
Graphs are used to represent many real life applications: Graphs are used to represent
networks. The networks may include paths in a city or telephone network or circuit network.
Graphs are also used in social networks like linkedin, facebook. For example, in facebook, each
person is represented with a vertex(or node). Each node is a structure and contains information
like person id, name, gender and locale. This can be easily viewed by where barnwal.aashish is
the profile name..
Following is an example undirected graph with 5 vertices.

Following two are the most commonly used representations of graph.


1. Adjacency matrix
2. Adjacency list
There are other representations also like, Incidence Matrix and Incidence List. The choice of the
graph representation is situation specific. It totally depends on the type of operations to be
performed and ease of use.
Adjacency Matrix:
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let
the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex
j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to
represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with
weight w.
The adjacency matrix for the above example graph is:
Data Structures P a g e | 28

Adjacency Matrix Representation of the above graph


Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time.
Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be done
O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less number of
edges), it consumes the same space. Adding a vertex is O(V^2) time.

Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the array be
array[]. An entry array[i] represents the linked list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
stored in nodes of linked lists. Following is adjacency list representation of the above graph.

Adjacency List Representation of the above Graph

Depth First Traversal:-

Depth First Search algorithm(DFS)


traverses a graph in a depthward
motion and uses a stack to remember to get
the next vertex to start a search when a dead
end occurs in any iteration.
Data Structures P a g e | 29

As in example given above, DFS algorithm traverses from A to B to C to D first then to


E, then to F and lastly to G. It employs following rules.

 Visit adjacent unvisited vertex. Mark it visited. Display it. Push it in a stack.
 If no adjacent vertex found, pop up a vertex from stack. (It will pop up all the vertices
from the stack which do not have adjacent vertices.)
 Repeat Rule 1 and Rule 2 until stack is empty.

Step Traversal Description

1
Initialize the stack
.

Mark S as visited and put it onto the


stack. Explore any unvisited adjacent node
2
from S. We have three nodes and we can
.
pick any of them. For this example, we
shall take the node in alphabetical order.

Mark A as visited and put it onto the


stack. Explore any unvisited adjacent node
3
from A. Both S and D are adjacent to A
.
but we are concerned for unvisited nodes
only.
Data Structures P a g e | 30

Visit D and mark it visited and put


onto the stack. Here we have B and C
4
nodes which are adjacent to D and both are
.
unvisited. But we shall again choose in
alphabetical order.

We choose B, mark it visited and put


5 onto stack. Here B does not have any
. unvisited adjacent node. So we pop B from
the stack.

We check stack top for return to


6 previous node and check if it has any
. unvisited nodes. Here, we find D to be on
the top of stack.

Only unvisited adjacent node is from D


7
is C now. So we visit C, mark it visited
.
and put it onto the stack.

As C does not have any unvisited adjacent node so we keep popping the stack until we find
a node which has unvisited adjacent node. In this case, there's none and we keep popping until
stack is empty.
Data Structures P a g e | 31

Breadth First Traversal:-

Breadth First Search algorithm(BFS) traverses a graph in a breadthwards motion and uses a
queue to remember to get the next vertex to start a search when a dead end occurs in any
iteration.

As in example given above, BFS algorithm traverses from A to B to E to F first then to C


and G lastly to D. It employs following rules.

 Visit adjacent unvisited vertex. Mark it visited. Display it. Insert it in a queue.
 If no adjacent vertex found, remove the first vertex from queue.
 Repeat Rule 1 and Rule 2 until queue is empty.

Step Traversal Description

1
Initialize the queue.
.
Data Structures P a g e | 32

2 We start from visiting S (starting


. node), and mark it visited.

We then see unvisited adjacent node


3 from S. In this example, we have three
. nodes but alphabetically we choose A
mark it visited and enqueue it.

4 Next unvisited adjacent node from S is


. B. We mark it visited and enqueue it.

5 Next unvisited adjacent node from S is


. C. We mark it visited and enqueue it.

6 Now S is left with no unvisited


. adjacent nodes. So we dequeue and find A.
Data Structures P a g e | 33

From A we have D as unvisited


7
adjacent node. We mark it visited and
.
enqueue it.

At this stage we are left with no unmarked (unvisited) nodes. But as per algorithm we keep
on dequeuing in order to get all unvisited nodes. When the queue gets emptied the program is
over.

HASHING:

Content:

 Hashing
 Basic operation of hashing
 Static hashing
 Operation of static hashing
 Dynamic hashing
 Operation of dynamic hashing.

Hashing:-

Hashing is a technique to convert a range of key values into a range of indexes of an array.
We're going to use modulo operator to get a range of key values. Consider an example of
hashtable of size 20, and following items are to be stored. Item are in (key,value) format.
Data Structures P a g e | 34

 (1,20)
 (2,70)
 (42,80)
 (4,25)
 (12,44)
 (14,32)
 (17,11)
 (13,78)
 (37,98)

Array
S.n. Key Hash
Index
1 % 20 =
1 1 1
1
2 % 20 =
2 2 2
2
42 % 20
3 42 2
=2
4 % 20 =
4 4 4
4
12 % 20
5 12 12
= 12
14 % 20
6 14 14
= 14
17 % 20
7 17 17
= 17
13 % 20
8 13 13
= 13
37 % 20
9 37 17
= 17

Linear Probing:

As we can see, it may happen that the hashing technique used create already used index of the
array. In such case, we can search the next empty location in the array by looking into the next
cell until we found an empty cell. This technique is called linear probing.
Data Structures P a g e | 35

Array After Linear Probing, Array


S.n. Key Hash
Index Index
1 % 20
1 1 1 1
=1
2 % 20
2 2 2 2
=2
42 %
3 42 2 3
20 = 2
4 % 20
4 4 4 4
=4
12 %
5 12 12 12
20 = 12
14 %
6 14 14 14
20 = 14
17 %
7 17 17 17
20 = 17
13 %
8 13 13 13
20 = 13
37 %
9 37 17 18
20 = 17

Basic Operations

Following are basic primary operations of a hashtable which are following.

 Search − search an element in a hashtable.


 Insert − insert an element in a hashtable.
 delete − delete an element from a hashtable.

DataItem

Define a data item having some data, and key based on which search is to be conducted in
hashtable.

struct DataItem {

int data;
int key;
};
Data Structures P a g e | 36

Hash Method

Define a hashing method to compute the hash code of the key of the data item.

int hashCode(int key){


return key % SIZE;
}

Search Operation

Whenever an element is to be searched. Compute the hash code of the key passed and locate the
element using that hashcode as index in the array. Use linear probing to get element ahead if
element not found at computed hash code.

struct DataItem *search(int key){


//get the hash
int hashIndex = hashCode(key);

//move in array until an empty


while(hashArray[hashIndex] != NULL){

if(hashArray[hashIndex]->key == key)
return hashArray[hashIndex];

//go to next cell


++hashIndex;

//wrap around the table


hashIndex %= SIZE;
}

return NULL;
}

Insert Operation

Whenever an element is to be inserted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing for empty location if an
element is found at computed hash code.

void insert(int key,int data){


struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem));
item->data = data;
item->key = key;
Data Structures P a g e | 37

//get the hash


int hashIndex = hashCode(key);

//move in array until an empty or deleted cell

while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1){


//go to next cell
++hashIndex;

//wrap around the table


hashIndex %= SIZE;
}

hashArray[hashIndex] = item;
}

Delete Operation

Whenever an element is to be deleted. Compute the hash code of the key passed and locate the
index using that hashcode as index in the array. Use linear probing to get element ahead if an
element is not found at computed hash code. When found, store a dummy item there to keep
performance of hashtable intact.

struct DataItem* delete(struct DataItem* item){


int key = item->key;

//get the hash


int hashIndex = hashCode(key);

//move in array until an empty


while(hashArray[hashIndex] !=NULL){

if(hashArray[hashIndex]->key == key){
struct DataItem* temp = hashArray[hashIndex];

//assign a dummy item at deleted position


hashArray[hashIndex] = dummyItem;
return temp;
}

//go to next cell


Data Structures P a g e | 38

++hashIndex;

//wrap around the table


hashIndex %= SIZE;
}

return NULL;
}

Static Hashing

In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5 values.
The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.

Operation

 Insertion − When a record is required to be entered using static hash, the hash function
h computes the bucket address for search key K, where the record will be stored.

Bucket address = h(K)


Data Structures P a g e | 39

 Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
 Delete − This is simply a search followed by a deletion operation.

Bucket Overflow

The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.

 Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.

 Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Data Structures P a g e | 40

Dynamic Hashing

The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets
are added and removed dynamically and on-demand. Dynamic hashing is also known as

extended hashing.Hash function, in dynamic hashing, is made to produce a large number of


values and only a few are used initially.

Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is
used for computing bucket addresses. Every hash index has a depth value to signify how many
bits are used for computing a hash function. These bits can address 2n buckets. When all these
bits are consumed − that is, when all the buckets are full − then the depth value is increased
linearly and twice the buckets are allocated.
Operation
Querying − Look at the depth value of the hash index and use those bits to compute the bucket
address.
 Update − Perform a query as above and update the data.
 Deletion − Perform a query to locate the desired data and delete the same.
 Insertion − Compute the address of the bucket
o If the bucket is already full.
 Add more buckets.
 Add additional bits to the hash value.
 Re-compute the hash function.
o Else
 Add data to the bucket,
o If all the buckets are full, perform the remedies of static hashing.

Hashing is not favorable when the data is organized in some ordering and the queries require a
range of data. When data is discrete and random, hash performs the best.
Data Structures P a g e | 41

UNIT-4

Sorting:
 Sorting is the process of re-arranging a given set off objects in a specific order.

 The purpose of sorting is to facilitate searching a specific object in the sorted list.

Two broad categories of sorting methods are:


 internal methods, i.e.,methods to be used when the file to be sorted is small enough
so that the entire sort can be carried out in main memory;
 external methods, i.e.,methods to be used on larger files.

In this chapter we shall study the following internal sorting methods:


 Insertion sort
 Quick sort
 Merge sort
 Heap sort
 Radix sort

INSERTION SORT:
Suppose an array ‘A’ with ‘n’ elements A(1),A(2)….A(n) is in memory,
The insertion sort algorithm scans A from A(1) to A(n)., inserting each element A(k) into
the proper position in the previously sorted sub array A(1),A(2),…..A(k-1).
The problem is deciding how to insert A(k) in its proper place in the sorted sub array
A(1),A(2),…..A(k-1).This is accomplished by comparing A(k) with A(k-1), comparing A(k)
with A(k-2),comparing A(k) with A(k-3) and so on, until meeting an element A(j) such that
A(j)<=A(k).Then each elements of A(k-1),A(k-2),….A(j+1) is moved forward on location, and
A(k) is then inserted in the j+1th position in the array.

Algorithm for insertion sort:

Algorithm insertion(A,n)
// this algorithm sorts the array A with ‘n’ elements//
A(0)=-∞
For k= 2 to n
{
Temp=A(k)
Pos=k-1
While (temp < A(pos))
{
A(pos+1)=A(pos)
Pos=pos-1
}
Data Structures P a g e | 42

A(pos+1)=temp
}
Return

Eg: suppose an array A contains 8 elements as follows:77,33,44,11,88,22,66,55


Insertion sort for n=8,items:
The circled element indicates A(k) in each pass of the algorithm

K A(0) A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8)


1 | -∞ 77 33 44 11 88 22 66 55
2 | -∞ 77 33 44 11 88 22 66 55
3 | -∞ 33 77 44 11 88 22 66 55
4 | -∞ 33 44 77 11 88 22 66 55
5 | -∞ 11 33 44 77 88 22 66 55
6 | -∞ 11 33 44 77 88 22 66 55
7 | -∞ 11 22 33 44 77 88 66 55
8 | -∞ 11 22 33 44 66 77 88 55
Sorted -∞ 11 22 33 44 55 66 77 88
Array

Complexity of Insertion Sort


The number f(n) for comparisons in insertion sort can be easily computed.
The worst case occurs when array A is in reverse order and the inner loop must use the
maximum number k-1 of comparisons.

F(n)=1+2+..+(n-1)
= n(n-1)/2
= O(n2)

In average case, there will be approximately (k-1)/2 comparisons in the inner loop,
F(n)=1/2 + 2/2 +….(n-1)/2 = n(n-1)/4
= O(n2)
QUICKSORT
 The quick sort scheme developed by C. A. R. Hoare has the best average behavior
among all the sorting methods we shall be studying.
 The purpose of Quick sort is to move a data item in the correct direction just enough
for it to reach its final place in the array .This method reduces unncessary swaps and
moves an item a great distance in one move.

In Insertion Sort ,the key Ki currently controlling the insertion is placed into the right spot
with respect to the sorted subfile (R1, ...,Ri - 1).
Data Structures P a g e | 43

In Quicksort , the key Ki controlling the process is placed at the right spot with respect to
the whole file.

Approach used in quick sort:


The divide and conquer approach can be used to arrive at an efficient sorting method
different from merge sort.

In merge sort, the file a(1:n) was divided at its mid point into sub arrays which were
independently sorted and later merged.

In quick sort, the division into two subarrays is made so that the sorted sub array do not
need to be merged later.
This is accomplished by arranging the elements in a(1:n) such that a(i)≤a(j) for all ‘I’
between 1 and m and all ‘j’ between m+1 and ‘n’ for some ‘m’,1≤m≤n.
Thus the elements in a(1:m) and a(m+1:n) can be independently sorted.No merge is
needed.

Partition the array a(m:p-1) about a(m)


Algorithm partition(a,m,p)
// ‘m’ is the starting position,’p’ is the end position of array ‘a’//
V=a(m); i=m; j=p+1
Repeat
{
Repeat
I=i+1
Until (a(i) ≥v)
Repeat
J=j-1
Until (a(j)≤v)

If (i≤j) then interchange(a,i,j)


} until (i ≥ j)

A(m)=a(j)
A(j)=v
Return j
}

Algorithm interchange(a,i,j) // swap a(i) & a(j) //


{
P=a(i)
A(i)=a(j)
Data Structures P a g e | 44

A(j)=p
}

Sorting
Algorithm Quicksort(p,q)
// p is starting & q is end position of array //
If (p<q) then // if there are more than 1 element //
{
J=partition(a, p, q+1) // divide p into 2 sub problems //

// solve the subproblem //


Quicksort(p,j-1)
Quicksort(j+1,q)
}

Eg: We have to sort 45 30 10 50 15 12 35 using Quick sort

(1) (2) (3) (4) (5) (6) (7) i j


45 30 10 50 15 12 35 4 7
45 30 10 35 15 12 50 7 6
[12 30 10 35 15] 45 [50] 2 5
12 10 30 35 15 [45] [50] 3 2
[10] 12 [30 35 15] 45 50 4 5
10 12 [30 15 35] 45 50 5 4
10 12 [15] 30 [35] 45 50
10 12 15 30 35 45 50 // sorted list //

Complexity of Quicksort
Worst case : O(n2)
Average computing time: O(n log2 n)

The quick sort uses divide and conquer to gain the same advantages as the merge sort,
while not using additional storage. As a trade-off, however, it is possible that the list may not be
divided in half. When this happens, we will see that performance is diminished.
A quick sort first selects a value, which is called the pivot value. Although there are many
different ways to choose the pivot value, we will simply use the first item in the list. The role of
the pivot value is to assist with splitting the list. The actual position where the pivot value
belongs in the final sorted list, commonly called the split point, will be used to divide the list
for subsequent calls to the quick sort.
shows that 54 will serve as our first pivot value. Since we have looked at this example a few
times already, we know that 54 will eventually end up in the position currently holding 31. The
Data Structures P a g e | 45

partition process will happen next. It will find the split point and at the same time move other
items to the appropriate side of the list, either less than or greater than the pivot value.

Partitioning begins by locating two position markers—let’s call them leftmark and
rightmark—at the beginning and end of the remaining items in the list (positions 1 and 8 in
Figure 13). The goal of the partition process is to move items that are on the wrong side with
respect to the pivot value while also converging on the split point. Figure 13 shows this process
as we locate the position of 54

We begin by incrementing leftmark until we locate a value that is greater than the pivot
value. We then decrement rightmark until we find a value that is less than the pivot value. At
this point we have discovered two items that are out of place with respect to the eventual split
point. For our example, this occurs at 93 and 20. Now we can exchange these two items and
then repeat the process again.
Data Structures P a g e | 46

At the point where rightmark becomes less than leftmark, we stop. The position of
rightmark is now the split point. The pivot value can be exchanged with the contents of the split
point and the pivot value is now in place (Figure 14). In addition, all the items to the left of the
split point are less than the pivot value, and all the items to the right of the split point are greater
than the pivot value. The list can now be divided at the split point and the quick sort can be
invoked recursively on the two halves.

HEAP SORT

While the Merge Sort scheme has a computing time of O(n log n) both in the worst case and
as average behavior, it requires additional storage proportional to the number of records in the
file being sorted.
But heap sort will require only a fixed amount of additional storage and at the same time
will have as its worst case and average computing time O(n log n).

Heap
Suppose H is a binary tree with ‘n’ elements, H is called a heap, if eachnode N of H has the
following property.
The value at n is greater than or equal to the value at each of the children of N.
The value at N is greater then or equal to the value at any of the descendent of N.

Partial Order Tree


A tree T is a partial order tree if and only if the key at any node is greater than or equal to
the keys at each of its children.

Heap Sort Strategies


Data Structures P a g e | 47

If the elements to be sorted are arranged in a heap, then we can build a sorted sequence in
reverse order by repeatedly removing the element from the root, and rearranging the elements
still in the heap to reestablish the partial order tree property, thus bringing the next largest key
to the root.

Initial file should be of the form:


I/2
X1
I
X2 x3
2i 2i+1
X4 x5 x6 x7

Xn-2 xn-1 xn

Fig: A complete binary tree

[The parent of the node at location i is at i/2 , the left child at 2i and the right child at 2i + 1]
Heap Sort Algorithm
Suppose an array A with ‘n’ elements is given. The heap sort algorithm to sort A consists of
the two following phases:
Phase I : Build a heap H out of the elements of A
Phase II : Repeatedly delete the root element of H since the root of H always contains the
largest node in H, phase II deletes the elements of A in decreasing order.

Building a Heap
A heap is defined to be a complete binary tree with the property that the value of each node
is at least as large as the value of its children nodes (if they exist) .
This implies that the root of the heap has the largest key in the tree. In the second stage the
output sequence is generated in decreasing order by successively outputting the root and
restructuring the remaining tree into a heap.

Now, delete the elements 1 by 1 and add to sorted list.


Data Structures P a g e | 48

Stage I: Delete item 77


The new heap is
60

55 55

50 30 45 20

Sorted array : 77

Stage II: Delete item 60


55

55 45

50 30 20

Sorted array : 77, 60

Stage III: Delete item 55


55

50 45

20 30

Sorted array : 77, 60, 55

Stage IV: Delete item 55


50

30 45

20
Sorted array : 77, 60, 55, 55

Stage V: Delete item 50


45

30 20
Data Structures P a g e | 49

Sorted array : 77, 60, 55, 55, 50

Stage VI: Delete item 45


30

20
Sorted array : 77, 60, 55, 55, 50, 45

Stage VII: Delete item 30

20

Sorted array : 77, 60, 55, 55, 50, 45, 30

Stage VIII: Delete item 20

Sorted array : 77, 60, 55, 55, 50, 45, 30, 20

Essential to any algorithm for Heap Sort is a subalgorithm that takes a binary tree T whose
left and right subtrees satisfy the heap property but whose root may not and adjusts T so that the
entire binary tree satisfies the heap property. Algorithm ADJUST does this.

Procedure ADJUST (i,n)


//Adjust the binary tree with root i to satisfy the heap property. The left and right subtrees
of i, i.e., with roots 2i and 2i+ 1, already satisfy the heap property. The nodes of the trees
contain records, R, with keys K. No node has index greater than n//

R Ri; K Ki; j 2i
While j ≤ n do
If j < n and Kj < Kj+1 then j j + 1 //find max of left and right child//
//compare max. Child with K. If K is max. Then done//
If K≥ Kj then [Rj/2 R ; exit ]
Rj/2 Rj; j  2j //move Rj up the tree//
End
R j/2 R
End ADJUST

ANALYSIS OF ALGORITHM ADJUST


If the depth of the tree with root i is k, then the while loop is executed at most k times.
Hence the computing time of the algorithm is O(k).
Data Structures P a g e | 50

The heap sort algorithm may now be stated.

Procedure HSORT (R,n)


//The file R = (R1, ...,Rn) is sorted into nondecreasing order of the key K//
For i[ n/2] to 1 by -1 do //convert R into a heap//
Call ADJUST (i,n)
End
For i n - 1 to 1 by -1 do //sort R//
T Ri+1; Ri+1 R1; R1 T //interchange R1 and Ri+1//
Call ADJUST (1,i) //recreate heap//
End
End HSORT

Analysis of Algorithm HSORT


Suppose 2k-1 ≤ n < 2k so that the tree has k levels and the number of nodes on level i is 2i-1.
In the first for loop ADJUST is called once for each node that has a child. Hence the time
required for this loop is the sum, over each level, of the number of nodes on a level times the
maximum distance the node can move. This is no more than O(n). In the next for loop n - 1
applications of ADJUST are made with maximum depth k = log2 (n + 1) . Hence the computing
time for this loop is O(n log n). Consequently, the total computing time is O(n log n).

RADIX SORT

Radix sort is an algorithm that sorts numbers by processing individual digits. N numbers
consisting of k digits each are sorted in O(n · k) time. Radix sort can process digits of each
number either starting from the The LSD algorithm first sorts the list by the least significant
digit while preserving their relative order using a stable sort. Then it sorts them by the next
digit, and so on from the least significant to the most significant, ending up with a sorted list.

While the LSD radix sort requires the use of a stable sort, the MSD radix sort algorithm
does not (unless stable sorting is desired). It is common for the algorithm to be used internally
by the radix sort. Hybrid sorting approach, such as using for small bins improves performance
of radix sort significantly.

Radix Sort is a clever and intuitive little sorting algorithm. Radix Sort puts the elements in
order by comparing the digits of the numbers.
Data Structures P a g e | 51

Consider the following 9 numbers:


493 812 715 710 195 437 582 340 385
We should start sorting by comparing and ordering the one's digits:
Digit Sublist
0 340 710
1
2 812 582
3 493
4
5 715 195 385
6
7 437
8
9

Notice that the numbers were added onto the list in the order that they were found, which is
why the numbers appear to be unsorted in each of the sublists above. Now, we gather the
sublists (in order from the 0 sublist to the 9 sublist) into the main list again:
340 710 812 582 493 715 195 385 437
Note: The order in which we divide and reassemble the list is extremely important, as this is
one of the foundations of this algorithm.

Now, the sublists are created again, this time based on the ten's digit:
Digit Sublist
0
710 812
1
715
2
3 437
4 340
5
6
7
8 582 385
9 493 195
Data Structures P a g e | 52

Now the sublists are gathered in order from 0 to 9:


710 812 715 437 340 582 385 493 195

Finally, the sublists are created according to the hundred's digit: Digit Sublist
At last, the list is gathered up again: 0
195 340 385 437 493 582 710 715 812 1 195
And now we have a fully sorted array! Radix Sort is very simple, 2
and a computer can do it fast. When it is programmed properly, 3 340 385
Radix Sort is in fact one of the fastest sorting algorithms for numbers 4 437 493
or strings of letters. 5 582
6
7 710 715
Disadvantages
8 812
9
Still, there are some tradeoffs for Radix Sort that can make it less preferable than other
sorts.The speed of Radix Sort largely depends on the inner basic operations, and if the
operations are not efficient enough, Radix Sort can be slower than some other algorithms such
as Quick Sort and Merge Sort. These operations include the insert and delete functions of the
sublists and the process of isolating the digit you want.
In the example above, the numbers were all of equal length, but many times, this is not the
case. If the numbers are not of the same length, then a test is needed to check for additional
digits that need sorting. This can be one of the slowest parts of Radix Sort, and it is one of the
hardest to make efficient.
Radix Sort can also take up more space than other sorting algorithms, since in addition to
the array that will be sorted, you need to have a sublist for each of the possible digits or letters.
If you are sorting pure English words, you will need at least 26 different sublists, and if you are
sorting alphanumeric words or sentences, you will probably need more than 40 sublists in
all!Since Radix Sort depends on the digits or letters, Radix Sort is also much less flexible than
other sorts. For every different type of data, Radix Sort needs to be rewritten, and if the sorting
order changes, the sort needs to be rewritten again. In short, Radix Sort takes more time to
write, and it is very difficult to write a general purpose Radix Sort that can handle all kinds of
data.

Conclusion In practice, radix sort is fast for large inputs, as well as simple to code and
maintain. For manyprograms that need a fast sort, Radix Sort is a good choice. Still, there are
faster sorts, which is one reason why Radix Sort is not used as much as some other sorts.
:
Data Structures P a g e | 53

2-WAY MERGE SORT


Merging of two ordered list into a single ordered list is called 2-way merge sort.
Merge sort can be done in two ways:
Recursive approach .
Non-recursive approach.

1. The recursive formulation of merge sort .


Given a sequence of ‘n’ element a(1),a(2),….a(n), the general idea is to imagine them split
into two sets a(1),a(2),….a(n/2) and (a(n/2)+1,…..a(n).

Each set is individually sorted, and the resulting sorted sequences are merged to produce a
single sorted sequence of ‘n’ elements. Thus we have an example of divide and conquer
strategy in which the splitting is into two equal sized sets and the combining operation is the
merging of two sorted sets into one. This is the recursive approach.

Recursive Formulation of Merge Sort.


Merge sort may also be arrived at recursively. In the recursive formulation we divide the
file to be sorted into two roughly equal parts called the left and the right subfiles. These subfiles
are sorted using the algorithm recursively and then the two subfiles are merged together to
obtain the sorted file. First, let us consider an example:
The input file (26, 5, 77, 1, 61, 11, 59, 15, 49, 19) is to be sorted using the recursive formulation
of 2-way merge sort. If the subfile from l to u is currently to be sorted then its two subfiles are
indexed from l to (l + u)/2 and from (l + u)/2 + 1 to u. The subfile partitioning that takes place is
described by the following binary tree.
(26 | 5 | 77 | 1 | 61 | 11 59 15 48 19)
( 5 , 26 | 77 | 1 | 61 | 11 59 15 48 19)
( 5 , 26 , 77 | 1 | 61 | 11 59 15 48 19)
( 5 , 26 , 77 | 1 , 61 | 11 59 15 48 19)
( 1, 5 , 26 , 61 ,77 | 11 59 15 48 19)
( 1, 5 , 26 , 61 ,77 | 11 | 59 | 15 | 48 | 19)
( 1, 5 , 26 , 61 ,77 | 11 , 59 | 15 | 48 | 19)
( 1, 5 , 26 , 61 ,77 | 11 , 15, 59 | 48 | 19)
( 1, 5 , 26 , 61 ,77 | 11 , 15, 59 | 19, 48)
( 1, 5 , 26 , 61 ,77 | 11 , 15, ,19, 48 ,59)
(1, 5, 11, 15, 19, 26, 48, 59, 61, 77)

Tree of calls of merge sort (1,10) 1,10

1,5 6,10
Data Structures P a g e | 54

1,3 4,5 6,8 9,10

1,2 3,3 4,4 5,5 s 6,7 8,8 9,9 10,10

1,1 2,2 6,6 7,7

Each record is assumed to have two fields LINK and KEY. LINK (i) and KEY(i) are the
link and key value fields in record i, 1 i n. We assume that initially LINK(i) = 0, 1 i n. Thus
each record is initially in a chain containing only itself. Let Q and R be pointers to two chains of
records. The records on each chain are assumed linked in nondecreasing order of the key field.
Let RMERGE(Q,R,P) be an algorithm to merge the two chains Q and R to obtain P which is
also linked in nondecreasing order of key values. Then the recursive version of merge sort is
given by algorithm RMSORT. To sort the
Records X1, ...,Xn this algorithm is invoked as call RMSORT(X,1,n,P). P is returned as the
start of a chain ordered as described earlier.

Procedure RMSORT(X,l,u,P)
//The file X = (Xl, ...,Xu) is to be sorted on the field KEY. LINK is a link field in each
record and is initially set to 0. The sorted file is a chain beginning at P//
If l≥ u then P l
Else [mid (l + u)/2
Call RMSORT(X,l,mid,Q)
Call RMSORT(X,mid + 1,u,R)
Call RMERGE(Q,R,P)]
End RMSORT
The algorithm RMERGE below uses a dummy record with index d. It is assumed that d is
provided externally and that d is not one of the valid indexes of records i.e. D is not one of the
numbers 1 through n.

Procedure RMERGE(X,Y,Z)
//The linked files X and Y are merged to obtain Z. KEY(i) denotes the key field and LINK(i)
the link field of record i. In X, Y and Z the records are linked in order of nondecreasing KEY
values. A dummy record with index d is made use of. D is not a valid index in X or Y//

I X; j Y; z d
While i≠ 0 and j≠ 0 do
If KEY(i) ≤KEY(j) then
[LINK(z)  i
Z i; i LINK (i)]
Else [LINK(z)  j
Data Structures P a g e | 55

Z j; j LINK(j)]
End

//move remainder//
If i = 0 then LINK(z)  j
Else LINK(z)  i
Z LINK(d)
End RMERGE

The computing time is O(n log n).

2. The non recursive formulation of merge sort

Two files (X1, ...,Xm) and (Xm+1, ...,Xn) that are already sorted are merged to get a third file
(Zl, ...,Zn) that is also sorted.

Procedure MERGE(X,l,m,n,Z)
//( X1, ...,Xm) and (Xm+1, ...,Xn) are two sorted files with keys xl≤ ... ≤xm and Xm+1≤ ... ≤xn.
They are merged to obtain the sorted file (Zl, ...,Zn) such that zl ≤... ≤zn//

I k l; j m + 1 //i, j and k are position in the three files//


While i≤ m and j≤ n do
If xi≤ xj then [Zk Xi; i i + 1]
Else [Zk Xj; j j + 1]
K k + 1
End
If i > m then (Zk, ...,Zn)  (Xj, ...,Xn)
Else (Zk, ...,Zn)  (Xi ...,Xm)
End MERGE
Analysis of Algorithm MERGE
At each iteration of the while loop k increases by 1. The total increment in k is n - l + 1.
Hence the total time is therefore O(n - l + 1).
If records are of length M then this time is really O(M(n - l + 1)).
A total of log2 n passes are made over the data.Each pass of merge sort takes O(n) time

As there are log2 n passes, the total computing time is O(n log n).
Data Structures P a g e | 56

In formally writing the algorithm for 2-way merge, it is convenient to first present an
algorithm to perform one merge pass of the merge sort.
Procedure MPASS(X,Y,n,l)
//This algorithm performs one pass of merge sort. It merges adjacent pairs of subfiles of
length l from file X to file Y. N is the number of records in X//
I 1
While i≤ n - 2l + 1 do
Call MERGE (X, i, i + l - 1, i + 2l - 1, Y)
I i + 2l
End
//merge remaining file of length <2l//
If i + l - 1 < n then call MERGE (X, i, i + l - 1, n, Y)
Else (Yi, ...Yn)  (Xi, ...,Xn)
End MPASS
The merge sort algorithm then takes the form:
Procedure MSORT(X,n)
//Sort the file X = (X1, ...,Xn) into non decreasing order of the keys x1, ...,xn//
Declare X(n), Y(n) //Y is an auxilliary array l is the size of subfiles currently being
merged//
L 1
While l < n do
Call MPASS(X,Y,n,l)
L 2*l
Call MPASS(Y,X,n,l) //interchange role of X and Y//
L 2*l
End
End MSORT
Example : Consider the input file (26, 5, 77, 1, 61, 11, 59, 15, 48, 19).
[26] [5] [77] [1] [61] [11] [59] [15] [48] [19]

[5 26] [1 77] [11 61] [15 59] [19 48]

[1 5 26 77] [11 15 59 61] [19 48]

[1 5 11 15 26 59 61 77] [19 48]

[1 5 11 15 19 26 48 59 61 77]
Data Structures P a g e | 57

UNIT-5
AVL Trees
 It is observed that BST's worst-case performance closes to linear search algorithms, that is Ο(n). In
real time data we cannot predict data pattern and their frequencies. So a need arises to balance out
existing BST.
 Named after their inventor Adelson, Velski&Landis
 AVL trees are height balancing binary search tree. AVL tree checks the height of left and right sub-
trees and assures that the difference is not more than 1. This difference is called Balance Factor.
 Here we see that the first tree is balanced and next two trees are not balanced −

 In second tree, the left subtree of C has height 2 and right subtree has height 0, so the difference is 2.
In third tree, the right subtree of A has height 2 and left is missing, so it is 0, and the difference is 2
again. AVL tree permits difference (balance factor) to be only 1.
 BalanceFactor = height(left-sutree) − height(right-sutree)
 If the difference in the height of left and right sub-trees is more than 1, the tree is balanced using some
rotation techniques
AVL Rotations

To make itself balanced, an AVL tree may perform four kinds of rotations −

 Left rotation
 Right rotation
 Left-Right rotation
 Right-Left rotation
 First two rotations are single rotations and next two rotations are double rotations. Two
have an unbalanced tree we at least need a tree of height 2. With this simple tree, let's
understand them one by one.

Left Rotation
Data Structures P a g e | 58

 If a tree become unbalanced, when a node is inserted into the right subtree of right subtree, then we
perform single left rotation −

State Action

 In our example, node A has become unbalanced as a node is inserted in right subtree of A's right
subtree. We perform left rotation by making A left-subtree of B.

Right Rotation
AVL tree may become unbalanced if a node is inserted in the left subtree of left subtree. The
tree then needs a right rotation.

As depicted, the unbalanced node becomes right child of its left child by performing a right
rotation.

Left-Right Rotation
Double rotations are slightly complex version of already explained versions of rotations. To
understand them better, we should take note of each action performed while rotation. Let's first
check how to perform Left-Right rotation. A left-right rotation is combination of left rotation
followed by right rotation
Data Structures P a g e | 59

A node has been inserted into right subtree of left subtree.


This makes C an unbalanced node. These scenarios cause AVL
tree to perform left-right rotation.

We first perform left rotation on left subtree of C. This


makes A, left subtree of B.

Node C is still unbalanced but now, it is because of left-


subtree of left-subtree.

We shall now right-rotate the tree making B new root node


of this subtree. C now becomes right subtree of its own left
subtree.

The tree is now balanced

Right-Left Rotation:
Data Structures P a g e | 60

Second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed
by left rotation.
State Action

A node has been inserted into left subtree of right subtree. This
makes A an unbalanced node, with balance factor 2.

First, we perform right rotation along C node, making C the right


subtree of its own left subtreeB. Now, B becomes right subtree of A.

Node A is still unbalanced because of right subtree of its right


subtree and requires a left rotation.

A left rotation is performed by making B the new root node of the


subtree. A becomes left subtree of its right subtreeB.
Data Structures P a g e | 61

The tree is now balanced.

OPTIMAL BINARY SEARCH TREES

 An optimal binary search tree is a binary search tree for which the nodes

Are arranged on levels such that the tree cost is minimum.


For the purpose of a better presentation of optimal binary search trees, we
Will consider “extended binary search trees”, which have the keys stored at their
Internal nodes. Suppose “n” keys k1, k2, … , k n are stored at the internal nodes of a
 Binary search tree. It is assumed that the keys are given in sorted order, so that

K1< k2 < … < kn.


 An extended binary search tree is obtained from the binary search

Tree by adding successor nodes to each of its terminal nodes as indicated in the
Following figure by squares:
In the extended tree:
 The squares represent terminal nodes.

 These terminal nodes represent unsuccessful searches of the tree for key values.

 The searches did not end successfully, that is, because they represent key values that are
not actually stored in the tree;

OPTIMAL BINARY SEARCH TREES:


 The round nodes represent internal nodes; these are the actual keys stored in the tree;

 Assuming that the relative frequency with which each key value is accessed is known,
weights can be assigned to each node of the extended tree (p1…p6).

 they represent the relative frequencies of searches terminating at each node, that is, they
mark the successful searches.

 If the user searches a particular key in the tree, 2 cases can occur:

1 – the key is found, so the corresponding weight ‘p’ is incremented;


2 – the key is not found, so the corresponding ‘q’ value is incremented.
GENERALIZATION:
 The terminal node in the extended tree that is the left successor of k1 can be interpreted
as representing all key values that are not stored and are less than k1.
Data Structures P a g e | 62

 Similarly, the terminal node in the extended tree that is the right successor of kn,
represents all key values not stored in the tree that are greater than kn.

 The terminal node that is successed between ki and ki-1 in an inorder

 Traversal represents all key values not stored that lie between ki and ki - 1.

Example:
 Way to find an optimal binary search tree is to generate each possible binary search tree
for the keys, calculate the weighted path length, and keep that tree with the smallest
weighted path length.

 This search through all possible solutions is not feasible, since the number of such trees
grows exponentially with “n”

B Trees
A B-tree of order m is a multi-way tree such that:
- the root has at least two subtrees, unless it is a leaf
- each non-root and non-leaf node holds k – 1 data values, and k pointers to subtrees,
- where[ m/2 ] ≤ k ≤ m
- each leaf node holds k – 1 data values, where [ m / 2 ] ≤ k ≤ m
- all leaves are on the same level
- the data values in each node are in ascending order
- for all i, the data values in the first i children are less than the i-th data value
- for all i, the data values in the last m – i children are larger than the i-th data
- value
So, a B-tree is generally at least half full, has a relatively small number of levels, and
isperfectly balanced. Typically, m will be fairly large.

Example:A B-tree of order 5:

Since a binary search may be applied to the data values in each node, searching is
highlyefficient.

B Tree Insertion Algorithm


Data Structures P a g e | 63

B Tree Deletion

Deletion of a value from a node has an interesting consequence, since the number
ofchildren is related to the number of values in the node.

For a leaf node, deleting a value may drop the number of data values in the node below
themandatory floor. If that happens, the leaf must borrow a value from an adjacent
siblingnode if one has a value to spare, or be merged with an adjacent sibling node. But the
latterwill decrease the number of children the parent node has, and so a value must be
movedfrom the parent node into the merged leaf.

Consider deleting T from the B-tree of order 5 below:

Deletion from a Leaf (one case)


Data Structures P a g e | 64

Deletion from an Internal Node

Deleting a value from an internalnode is accomplished by reducing it to the former case.

Denote the value to be deleted by VK.

The immediate predecessor of VK, which must be in a leaf node, is borrowed to replace
thevalue that is being deleted, and then deleted from the leaf node.

Consider deleting K from the following B-tree of order 5:

B Tree Deletion Algorithm


Data Structures P a g e | 65

B Tree Storage Efficiency

In a B tree:
- nodes are guaranteed to be (essentially) at least 50% full
- node could also be only 50% full, wasting half the data space in the nodes
- but that "wasted" space is available to service future insertions
- analysis and simulation indicates that in typical use a B tree will be about 70% full

This expectation of wasted space is a motivation for some variants of the basic B tree.

In B* trees:
- all nodes except the root are required to be at least 2/3 full rather than 1/2 full
- splitting transforms 2 nodes into 3, rather than 1 node into 2
- analysis indicates the average utilization of a B* tree will be about 81%
- can be generalized to specify a fill factor of (n+1)/(n+2); a Bn tree

In B+ trees:
- Internal nodes store only key values and pointers*.
- All records, or pointers to records, are stored in leaves.
- Commonly, the leaves are simply the logical blocks of a database file index,
- storing key values and offsets. In this case, many key values will occur twice in
- the tree, once at an internal node to guide searching, and again in a leaf.
- If the leaves are simply an index, it is common to implement the leaf level as alinked list
of B tree nodes

The B+ tree is the most commonly implemented variant of the B-tree family, and
thestructure of choice for large databases.

You might also like