You are on page 1of 28

Sorting

Sorting is very important in every computer application. Sorting refers to arranging of data
elements in some given order. Many Sorting algorithms are available to sort the given set of
elements. We will now discuss two sorting techniques and analyze their performance. The
two techniques are:

Internal Sorting
External Sorting

Internal Sorting
Internal Sorting takes place in the main memory of a computer. The internal sorting
methods are applied to small collection of data. It means that, the entire collection of data
to be sorted in small enough that the sorting can take place within main memory. We will
study the following methods of internal sorting

Insertion sort
Selection sort
Merge Sort
Radix Sort
Quick Sort
Heap Sort
Bubble Sort

Insertion Sort
In this sorting we can read the given elements from 1 to n, inserting each element into its
proper position. An example of an insertion sort occurs in everyday life while playing cards.
To sort the cards in your hand you extract a card, shift the remaining cards, and then insert
the extracted card in the correct place. This process is repeated until all the cards are in the
correct sequence. Both average and worst-case time is O(n2).
This sorting algorithm is frequently used when n is small. The insertion sort algorithm scans
A from A[l] to A[N], inserting each element A[K] into its proper position in the previously
sorted subarray A[l], A[2], . . . , A[K-1]. That is:
Step-1: A[l] by itself is trivially sorted.
Step-2: A[2] is inserted either before or after A[l] so that: A[l], A[2] is sorted.
Step-3: A[3] is inserted into its proper place in A[l], A[2], that is, before A[l], between A[l]
and A[2], or after A[2], so that: A[l], A[2], A[3] is sorted.
Step-4: A[4] is inserted into its proper place in A[l], A[2], A[3] so that: A[l], A[2], A[3],
A[4] is sorted.

Step-n: A[N] is inserted into its proper place in A[l], A[2], . . . , A[N - 1] so that: A[l], A[2],
. . . ,A[N] is sorted.
Example:

Algorithm
INSERTION ( A , N )
This algorithm sorts the array A with N elements
Set A[0] := -- . [initializes the element]
Repeat Steps 3 to 5 for K= 2,3, ,N
Set TEMP := A[K] and PTR:= K-1
Repeat while TEMP < A[PTR]
(a) Set A[PTR +1]:=A[PTR] [Moves element forward]
(b) Set PTR := PTR-1
[End of loop].
Set A[PTR+1] := TEMP [inserts element in proper place]
[End of Step 2 loop]
Return
Complexity of Insertion Sort:
The insertion sort algorithm is a very slow algorithm when n is very large.

Worst Case
The Worst Case occurs when the array A is in reverse order and the inner loop must use the
maximum number of K-1 of comparisons.
f(n) = 1 +2+ +(n-1) =
Average Case
The average case occurs when there is (K-1) /2 comparisons in the inner loop.

Selection Sort
In this sorting we find the smallest element in this list and put it in the first position. Then
find the second smallest element in the list and put it in the second position. And so on.
Step-1: Find the location LOC of the smallest in the list of N elements A[l], A[2], . . . , A[N],
and then interchange A[LOC] and [1] . Then A[1] is sorted.
Step-2: Find the location LOC of the smallest in the sub-list of N 1 Elements A[2], A[3],. .
. , A[N], and then interchange A[LOC] and A[2]. Then: A[l], A[2] is sorted, since
A[1]<A[2].
Step-3: Find the location LOC of the smallest in the sub-list of N 2 elements A[3],
A[4], . . . , A[N], and then interchange A[LOC] and A[3]. Then: A[l], A[2], . . . ,
A[3] is sorted, since A[2] < A[3].

..
Step- N - 1. Find the location LOC of the smaller of the elements A[N - 1), A[N], and then
interchange A[LOC] and A[N- 1]. Then: A[l], A[2], . . . , A[N] is sorted, since A[N 1] < A[N]. Thus A is sorted after N - 1 passes.

Example:
Suppose an array A contains 8 elements as follows:
77, 33, 44, 11, 88, 22, 66, 55

Algorithm 2.2:
1. To find the minimum element
MIN ( A, K , N, LOC)
An array A is in memory. This procedure finds the location
LOC of the smallest element among A[K] , A[K+1],.A[N].
Set MIN:= A[K] and LOC := K [Initializes pointers]
Repeat for J = K +1, K+2
If MIN > A [J] , then : Set MIN := A[J] and LOC := A[j] and LOC: = J
Return
2. To Sort the elements
SELECTION (A, N)
Repeat Steps 2 and 3 form K= 1,2, .., N 1
Call MIN(A,K,N,LOC)
[Interchange A[K] and A[LOC] ]
Set TEMP: = A [K], A [K]:= A [LOC] and A [LOC]:=TEMP
Exit.
Complexity of the Selection Sort Algorithm
First note that the number f(n) of comparisons in the selection sort algorithm is independent
of the original order of the elements. Observe that MIN(A, K, N, LOC) requires n - K
comparisons. That is, there are n - 1 comparisons during Pass 1 to find the smallest
element, there are n - 2 comparisons during Pass 2 to find the second smallest element,
and so on. Accordingly,
f(n) = (n - 1) + (n - 2) + .. + 2 + 1 = n(n(-1)/2 = O(n2)
The above result is summarized in the following table:

Merge Sort
Combing the two lists is called as merging. For example A is a sorted list with r elements
and B is a sorted list with s elements. The operation that combines the elements of A and B
into a single sorted list C with n = r + s elements is called merging. After combing the two
lists the elements are sorted by using the following merging algorithm suppose one is given
two sorted decks of cards. That is, at each step, the two front cards are compared and the
smaller one is placed in the combined deck. When one of the decks is empty, all of the
remaining cards in the other deck are put at the end of the combined deck. Similarly,
suppose we have two lines of students sorted by increasing heights, and suppose we want
to merge them into a single sorted line. The new line is formed by choosing, at each step,
the shorter of the two students who are at the head of their respective lines. When one of
the lines has no more students, the remaining students line up at the end of the combined
line.

1. Divide Step
If a given array A has zero or one element, simply return; it is already sorted. Otherwise,
split A[p .. r] into two subarrays A[p .. q] and A[q + 1 .. r], each containing about half of
the elements of A[p .. r]. That is, q is the halfway point of A[p .. r].

2. Conquer Step
Conquer by recursively sorting the two subarrays A[p .. q] and A[q + 1 .. r].

3. Combine Step
Combine the elements back in A[p .. r] by merging the two sorted subarrays A[p .. q] and
A[q + 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure
MERGE (A, p, q, r).

The first part shows the arrays at the start of the "for k p to r" loop, where A[p . .
q] is copied into L[1 . . n1] and A[q + 1 . . r ] is copied into R[1 . . n2].
Succeeding parts show the situation at the start of successive iterations.

Entries in A with slashes have had their values copied to either L or R and have not
had a value copied back in yet. Entries in L and R with slashes have been copied back
into A.

The last part shows that the subarrays are merged back into A[p . . r], which is now
sorted, and that only the sentinels () are exposed in the arrays L and R.]

Algorithm.
Assume that a sequence a has n elements, which need not be distinct. We can sort an in
ascending order using merge-sort, as follows:
To sort the entire sequence A[1 .. n], make the initial call to the procedure MERGE-SORT
(A, 1, n).
MERGE-SORT (A, p, r)
IF p < r
THEN q = FLOOR[(p + r)/2]
MERGE (A, p, q)
MERGE (A, q + 1, r)
MERGE (A, p, q, r)

// Check for base case


// Divide step
// Conquer step.
// Conquer step.
// Conquer step.

MERGE (A, p, q, r )
n1 q p + 1
n2 r q
Create arrays L[1 . . n1 + 1] and R[1 . . n2 + 1]
FOR i 1 TO n1
DO L[i] A[p + i 1]
FOR j 1 TO n2
DO R[j] A[q + j ]
L[n1 + 1]
R[n2 + 1]
i1
j1
FOR k p TO r
DO IF L[i ] R[ j]
THEN A[k] L[i]
ii+1
ELSE A[k] R[j]
jj+1

Example: Bottom-up view of the above procedure for n = 8.

Analyzing Merge Sort

For simplicity, assume that n is a power of 2 so that each divide step yields two
subproblems, both of size exactly n/2.

The base case occurs when n = 1.


When n 2, time for merge sort steps:

Divide: Just compute q as the average of p and r, which takes constant time i.e.
(1).

Conquer: Recursively solve 2 subproblems, each of size n/2, which is 2T(n/2).

Combine: MERGE on an n-element subarray takes (n) time.

Summed together they give a function that is linear in n, which is (n). Therefore, the
recurrence for merge sort running time is

Solving the Merge Sort Recurrence

By the master theorem in CLRS-Chapter 4 (page 73), we can show that this recurrence has
the solution
T(n) = (n lg n).
Reminder: lg n stands for log2 n.
Compared to insertion sort [(n2) worst-case time], merge sort is faster. Trading a factor of
n for a factor of lg n is a good deal. On small inputs, insertion sort may be faster. But for
large enough inputs, merge sort will always be faster, because its running time grows more
slowly than insertion sorts.

Radix Sort

Radix sort is the method that many people intuitively use or begin to use when
alphabetizing a large list of names. Specifically, the list of names is first sorted according to
the first letter of each name. That is, the names are arranged in 26 classes, where the first
class consists of those names that begin with "A," the second class consists of those names
that begin with "B," and so on. During the second pass, each class is alphabetized according
to the second letter of the name. And so on. If no name contains, for example, more than
12 letters, the names are alphabetized with at most 12 passes.
The radix sort is the method used by a card sorter. A card sorter contains 13 receiving
pockets labeled as follows:
9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 11, 12, R (reject)
Each pocket other than R corresponds to a row on a card in which a hole can be punched.
Decimal numbers, where the radix is 10, are punched in the obvious way and hence use
only the first 10 pockets of the sorter. The sorter uses a radix reverse-digit sort on numbers.
That is, suppose a card sorter is given a collection of cards where each card contains a 3digit number punched in columns 1 to 3. The cards are first sorted according to the units
digit. On the second pass, the cards are sorted according to the tens digit. On the third and
last pass, the cards are sorted according to the hundreds digit. We illustrate with an
example.
Example
Suppose 9 cards are punched as follows:
348, 143, 361, 423, 538, 128, 321, 543, 366
Given to a card sorter, the numbers would be sorted in three phases,

In the first pass, the units digits are sorted into pockets. The cards are collected
pocket by pocket, from pocket 9 to pocket 0. The cards are now re-input to the
sorter.
In the second pass, the tens digits are sorted into pockets. Again the cards are
collected pocket by pocket and re-input to the sorter.
In the third and final pass, the hundreds digits are sorted into pockets.

First Pass

Second Pass

Third Pass
When the cards are collected after the third pass, the numbers are in the following order:
128, 143, 321, 348, 361, 366, 423, 538, 543
Thus the cards are now sorted. The number C of comparisons needed to sort nine such 3digit numbers is bounded as follows:
C 9 * 3 * 10
The 9 comes from the nine cards, the 3 comes from the three digits in each number, and
the 10 comes from radix d = 10 digits.

Complexity of Radix Sort:


Suppose a list A of n items A1, A2. . . An is given. Let d denote the radix (e.g., d = 10 for
decimal digits, d = 26 for letters and d = 2 for bits), and suppose each item Ai is
represented by means of s of the digits:
Ai = di1 di2 . Dis
The radix sort algorithm will require 5 passes, the number of digits in each item. Pass K will
compare each dik with each of the d digits. Hence the number C(n) of comparisons for the
algorithm is bounded as follows:
C{n) d * s * n
Although d is independent of n, the number s does depend on n. In the worst case, s = n,
so C(n) = O(n2). In the best case, s = logd n, so C(n) = O(n log n). In other words, radix
sort performs well only when the number s of digits in the representation of the Ais is
small.
Another drawback of radix sort is that one may need d*n memory locations. This comes
from the fact that all the items may be "sent to the same pocket" during a given pass. This
drawback may be minimized by using linked lists rather than arrays to store the items
during a given pass. However, one will still require 2*n memory locations.
Quicksort:
Quicksort was invented and named by C. A. R. Hoare and is one of the best general-purpose
sorting algorithms. It is built on the ideal of partitions, and it uses divide-and-conquer
strategy. The basic algorithm for a one-dimensional array is as follows.
1. Partition Step: Select an element to place in its final position in the array. That is, all
the elements to the left will be less than selected element, and all the elements to
the right will be greater than the chosen element. We will select the first element in
the array and put it in its final place in the array. Then we have one element in its
proper location and two unsorted subarrays.
2. Recursive Step: Repeat the process on each unsorted subarray.
Each time the partition step is repeated, another element is placed in its final position in the
sorted array, and two additional subarrays are created. When a subarray eventually contains
only one element, that subarray is sorted and the element is in its final location.
Algorithm:
quicksort(int array[], int left, int right)
{
int index;
if(right > left) {
index = partition(array, left, right);
quicksort(array, left, index-1);

quicksort(array, index+1, right);

}
Partitioning the Array Algorithm:
PARTITION (A, p, r)
x A[p]
i p-1
j r+1
while TRUE do
Repeat j j-1
until A[j] x
Repeat i i+1
until A[i] x
if i < j
then exchange A[i] A[j]
else return j
Partition selects the first key, A[p] as a pivot key about which the array will partitioned:
Keys A[p] will be moved towards the left .
Keys A[p] will be moved towards the right.

Best Case
The best thing that could happen in Quick sort would be that each partitioning stage divides
the array exactly in half. In other words, the best to be a median of the keys in A[p . . r]
every time procedure 'Partition' is called. The procedure 'Partition' always split the array to
be sorted into two equal sized arrays.

If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then
T(n) = T(n/2) + T(n/2) +
= 2T(n/2) +

(n)

(n)

And from case 2 of Master theorem


T(n) =

(n lg n)

Worst case Partitioning


The worst-case occurs if given array A[1 . . n] is already sorted. The PARTITION (A, p, r)
call always return p so successive calls to partition will split arrays of length n, n-1, n-2, . . .
, 2 and running time proportional to n + (n-1) + (n-2) + . . . + 2 = [(n+2)(n-1)]/2 =
(n2). The worst-case also occurs if A[1 . . n] starts out in reverse order.

Heap Sort
A heap is a complete binary tree, in which each node satisfies the heap condition. Heap sort
is a comparison-based sorting algorithm to create a sorted array (or list), and is part of the
selection sort family. Although somewhat slower in practice on most machines than a wellimplemented quick sort, it has the advantage of a more favorable worst-case O(n log n)
runtime. Heap sort is an in-place algorithm, but is not a stable sort.
Heap condition
The key of each node is greater than or equal to the key in its children. Thus the root node
will have the largest key value.
MaxHeap
Suppose H is a complete binary tree with n elements. Then H is called a heap or maxheap, if
the value at N is greater than or equal to the value at any of the children of N.
MinHeap
The value at N is less than or equal to the value at any of the children of N.
The operations on a heap
(i) New node is inserted into a Heap
(ii) Deleting the Root of a Heap
Heap sort is a two step algorithm.
The first step is to build a heap out of the data.
The second step begins with removing the largest element from the heap. We insert the
removed element into the sorted array. For the first element, this would be position 0 of the
array. Next we reconstruct the heap and remove the next largest item, and insert it into the
array. After we have removed all the objects from the heap, we have a sorted array. We can
vary the direction of the sorted elements by choosing a min-heap or max-heap in step one.
Heap sort can be performed in place. The array can be split into two parts, the sorted array
and the heap. The storage of heaps as arrays is diagrammed at Binary heap#Heap

implementation. The heap's invariant is preserved after each extraction, so the only cost is
that of extraction.
Algorithm:
function heapify(a,count) is
(end is assigned the index of the first (left) child of the root)
end := 1
while end < count
(sift up the node at index end to the proper place such that all nodes above
the end index are in heap order)
siftUp(a, 0, end)
end := end + 1
(after sifting up the last node all nodes are in heap order)
function siftUp(a, start, end) is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := floor((child - 1) / 2)
if a[parent] < a[child] then (out of max-heap order)
swap(a[parent], a[child])
child := parent (repeat to continue sifting up the parent now)
else
return
Example:

Bubble Sort
Bubble sort, sometimes incorrectly referred to as sinking sort, is a simple sorting algorithm
that works by repeatedly stepping through the list to be sorted, comparing each pair of
adjacent items and swapping them if they are in the wrong order. The pass through the list
is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm
gets its name from the way smaller elements "bubble" to the top of the list. Because it only
uses comparisons to operate on elements, it is a comparison sort. Although the algorithm is
simple, most other algorithms are more efficient for sorting large lists. In this sorting
algorithm, multiple swapping take place in one iteration. Smaller elements move or bubble
up to the top of the list. In this method, we compare the adjacent members of the list to be
sorted, if the item on top is greater than the item immediately below it, they are swapped.

Performance

Bubble sort has worst-case and average complexity both (n2), where n is the
number of items being sorted. There exist many sorting algorithms with substantially better

worst-case or average complexity of O(n log n). Even other (n2) sorting algorithms, such
as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort
is not a practical sorting algorithm when n is large.

Algorithm:
procedure bubbleSort( A : list of sortable items )
repeat
swapped = false
for i = 1 to length(A) - 1 inclusive do:
/* if this pair is out of order */
if A[i-1] > A[i] then
/* swap them and remember something changed */
swap( A[i-1], A[i] )
swapped = true
end if
end for
until not swapped
end procedure

1. Compare each pair of adjacent elements from the beginning of an array and, if they
are in reversed order, swap them.
2. If at least one swaps has been done, repeat step 1.
You can imagine that on every step big bubbles float to the surface and stay there. At the
step, when no bubble moves, sorting stops. Let us see an example of sorting an array to
make the idea of bubble sort clearer.
Example. Sort {5, 1, 12, -5, 16} using bubble sort.

Complexity analysis
Average and worst case complexity of bubble sort is O(n2). Also, it makes O(n2) swaps in the
worst case. Bubble sort is adaptive. It means that for almost sorted array it gives O(n)
estimation. Avoid implementations, which don't check if the array is already sorted on every
step (any swaps made). This check is necessary, in order to preserve adaptive property.

Turtles and rabbits


One more problem of bubble sort is that its running time badly depends on the initial order
of the elements. Big elements (rabbits) go up fast, while small ones (turtles) go down very
slow. This problem is solved in the Cocktail sort.

Complexity:
The total number of comparisons in Bubble sort is:
= (N-1)+(N-2).+2+1
=(N-1)*N/2=O(N2)

The time required to execute the bubble sort algorithm is proportional to n2 , where n is the
number of input items. The Bubble sort algorithm uses the O(n2) comparisons on average.
EXTERNAL SORT
The External sorting methods are applied only when the number of data elements to be
sorted is too large. These methods involve as much external processing as processing in the
CPU. To study the external sorting, we need to study the various external devices used for
storage in addition to sorting algorithms. This sorting requires auxiliary storage.The
following are the examples of external sorting

Sorting with Disk


Sorting with Tapes

Sorting with Disks:


The following example illustrate the concept of sorting with disks. The file F containing 600
records is to be sorted. The main memory is capable of sorting of 1000 records at a time.
The input file F is stored on one disk and we have in addition another scratch disk. The block
length of the input file is 500 records.
We see that the file could be treated as 6 sets of 1000 records each. Each set is sorted and
stored on the scratch disk as a run. These 5 runs will then be merged as follows. Allocate 3
blocks of memory each capable of holding 500 records. Two of these buffers B1 and B2 will
be treated as input buffers and the third B3 as the output buffer. We have now the following

6 run R1, R2, R3, R4, R5, R6 on the scratch disk.


3 buffers B1,B2 and B3
o Read 500 records from R1 into B1.
o Read 500 records from R2 into B2.
o Merge B1 and B2 and write into B3.
o When B3 is full- write it out to the disk as run R11.
o Similarly merge R3 and R4 to get run R12.
o Merge R5 and R6 to get run R13.
Thus, from 6 runs of size 1000 each, we have now 3 runs of size 2000 each.
The steps are repeated for steps R11 and R12 to get a run of size 4000.
This run is merged with R13 to get a single sorted run of size 6000.
Sorting with Tapes
Sorting with tapes is essentially similar to the merge sort used for sorting with disks. The
differences arise due to the sequential access restriction of tapes. This makes the selection
time prior to data transmission an important factor; unlike seek time and latency time. Thus
is sorting with tapes we will be more concerned with arrangement of blocks and runs on the
tape so s to reduce the selection or across time.
Example
A file of 6000 records is to be sorted. It is stored on a tape and the block length is 500. The
main memory can sort unto a 1000 records at a time. We have in addition 4 search tapes
T1-T4.
SEARCHING

Searching refers to the operation of finding the location of a given item in a collection of
items. The search is said to be successful if ITEM does appear in DATA and unsuccessful
otherwise. The following searching algorithms are discussed in this chapter.

Sequential Searching
Binary Search
Binary Tree Search

Sequential Search:
This is the most natural searching method. The most intuitive way to search for a given
ITEM in DATA is to compare ITEM with each element of DATA one by one. .The algorithm for
a sequential search procedure is now presented. One of the most straightforward and
elementary searches is the sequential search, also known as a linear search.
As a real world example, pickup the nearest phonebook and open it to the first page of
names. We're looking to find the first "Smith". Look at the first name. Is it "Smith"?
Probably not. Now look at the next name. Is it "Smith"? Probably not. Keep looking at the
next name until you find "Smith".
The above is an example of a sequential search. You started at the beginning of a sequence
and went through each item one by one, in the order they existed in the list, until you found
the item you were looking for. Of course, this probably isn't how you normally look up a
name in the phonebook; we'll cover a method similar to the way you probably look up
phone numbers later in this guide.
Algorithm :
SEQUENTIAL SEARCH
INPUT : List of Size N. Target Value T
OUTPUT : Position of T in the list-1
BEGIN
Set FOUND = false
Set I := 0
While (I <= N) and (FOUND is false)
IF List[i] ==t THEN
FOUND = true
ELSE
I = I+1
IF FOUND==false THEN
T is not present in the List
END
Example:

Figure %: The array we're searching

Lets search for the number 3. We start at the beginning and check the first element in the
array. Is it 3?

Figure %: Is the first value 3?


No, not it. Is it the next element?

Figure %: Is the second value 3?


Not there either. The next element?

Figure %: Is the third value 3?


Not there either. Next?

Figure %: Is the fourth value 3? Yes!


Now you understand the idea of linear searching; we go through each element, in order,
until we find the correct value.

Binary Search
The binary search algorithm can only be applied if the data are sorted. You can exploit the
knowledge that they are sorted to speed up the search.
The idea is analogous to the way people look up an entry in a dictionary or telephone book.
You don't start at page 1 and read every entry! Instead, you turn to a page somewhere
about where you expect the item to be. If you are lucky you find the item straight away. If
not, you know which part of the book will contain the item (if it is there), and repeat the
process with just that part of the book.
If you always split the data in half and check the middle item, you halve the number of
remaining items to check each time. This is much better than linear search, where each
unsuccessful comparison eliminates just one item.

The binary search algorithm applied to our array DATA works as follows. During each stage
of our algorithm, our search for ITEM is reduced to a segment of elements of DATA:
DATA[BEG], DATA[BEG + 1], DATA[BEG + 2], ...... DATA[END].
Note that the variable BEG and END denote the beginning and end locations of the segment
respectively. The algorithm compares ITEM with the middle element DATA[MID] of the
segment, where MID is obtained by
MID = INT((BEG + END) / 2)

If DATA[MID] = ITEM, then the search is successful and we set LOC: = MID.
Otherwise a new segment of DATA is obtained as follows:

If ITEM < DATA[MID], then ITEM can appear only in the left half of the segment:
DATA[BEG],DATA[BEG + 1],.. ,DATA[MID - 1]. So we reset END := MID - 1 and
begin searching again.

If ITEM > DATA[MID], then ITEM can appear only in the right half of the segment:
DATA[MID + 1], DATA[MID + 2],....,DATA[END] So we reset BEG := MID + 1 and
begin searching again. Initially, we begin with the entire array DATA; i.e. we begin
with BEG = 1 and END = n, If ITEM is not in DATA, then eventually we obtain
END<BEG

This condition signals that the search is unsuccessful, and in this case we assign
LOC: = NULL. Here NULL is a value that lies outside the set of indices of DATA. We
now formally state the binary search algorithm.

Algorithm
(Binary Search) BINARY(DATA, LB, UB, TEM, LOC) Here DATA is sorted array with lower
bound LB and upper bound UB, and ITEM is a given item of information. The variables BEG,
END and MID denote, respectively, the beginning, end and middle locations of a segment of
elements of DATA. This algorithm finds the location LOC of ITEM in DATA or sets LOC=NULL.
[Initialize segment variables.]
Set BEG := LB, END := UB and MID = INT((BEG + END)/ 2).
Repeat Steps 3 and 4 while BEG END and
DATA[MID] ITEM.
If ITEM<DATA[MID], then:
Set END := MID - 1.
Else:
Set BEG := MID + 1.
[End of If structure]
Set MID := INT((BEG + END)/2).
[End of Step 2 loop.]
If DATA[MID] :=ITEM, then:
Set LOC:=MID.
Else:
Set LOC := NULL.

[End of If structure.]
Exit.

Examples
Find 6 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}.
Step 1 (middle element is 19 > 6):
Step 2 (middle element is 5 < 6):
Step 3 (middle element is 6 == 6):
Example 2. Find 103 in {-1, 5, 6, 18,
Step
Step
Step
Step
Step

1
2
3
4
5

-1 5 6 18 19 25 46 78 102 114
-1 5 6 18 19 25 46 78 102 114
-1 5 6 18 19 25 46 78 102 114
19, 25, 46, 78, 102, 114}.

(middle element is 19 < 103): -1 5 6 18 19 25 46 78 102 114


(middle element is 78 < 103): -1 5 6 18 19 25 46 78 102 114
(middle element is 102 < 103): -1 5 6 18 19 25 46 78 102 114
(middle element is 114 > 103): -1 5 6 18 19 25 46 78 102 114
(searched value is absent):
-1 5 6 18 19 25 46 78 102 114

Complexity of the Binary Search Algorithm:


The complexity is measured by the number of comparisons f(n) to locate ITEM in DATA
where DATA contains n elements. Observe that each comparison reduces the sample size in
half. Hence we require at most f(n) comparisons to locate ITEM where f(n) = [log2n] + 1
That is the running time for the worst case is approximately equal to log2n. The running
time for the average case is approximately equal to the running time for the worst case.
Limitations of the Binary Search Algorithm
The algorithm requires two conditions:
(1) The list must be sorted and
(2) One must have direct access to the middle element in any sublist.

Binary Search Tree:


Suppose T is a binary tree. Then T is called a binary search tree if each node N of T has the
following property:
The value at N is greater than every value in the left subtree of N and is less than every
value in the right subtree of N.
Example

SEARCHING AND INSERTING IN BINARY SEARCH TREES:


Suppose an ITEM of information is given. The following algorithm finds the location of ITEM
in the binary search tree T, or inserts ITEM as a new node in its appropriate place in the
tree.
(a) Compare ITEM with the root node N of the tree.
(i) If ITEM < N, proceed to the left child of N.
(ii) If ITEM > N, proceed to the right child of N.
(b) Repeat Step (a) until one of the following occurs:
(i) We meet a node N such that ITEM = N. In this case the search is successful.
(ii) We meet an empty subtree, which indicates that the search is unsuccessful, and
we insert ITEM in place of the empty subtree.
In other words, proceed from the root R down through the tree T until finding ITEM in T or
inserting ITEM as a terminal node in T.
Example
Consider the binary search tree T in Fig. 2.6 . Suppose ITEM = 20 is given.
Compare ITEM = 20 with the root, 38, of the tree T. Since 20 < 38, proceed to the left child
of 38, which is 14.
1. Compare ITEM = 20 with 14. Since 20 > 14, proceed to the right child of 14, which is 23.
2. Compare ITEM = 20 with 23. Since 20 < 23, proceed to the left child of 23, which is 18.
3. Compare ITEM = 20 with 18. Since 20 > 18 and 18 does not have a right child, insert 20
as the right child of 18.

DELETING IN A BINARY SEARCH TREE:


Suppose T is a binary search tree, and suppose an ITEM of information is given. This
section gives an algorithm which deletes ITEM from the tree T.
Case 1. N has no children. Then N is deleted from T by simply replacing the location of N in
the parent node P(N) by the null pointer.
Case 2. N has exactly one child. Then N is deleted from T by simply replacing the location
of N in P(N) by the location of the only child of N.
Case 3. N has two children. Let S(N) denote the inorder successor of N. Then N is deleted
from T by first deleting S(N) from T (by using Case 1 or Case 2) and then replacing node N
in T by the node S(N).

Observe that the third case is much more complicated than the first two cases. In all three
cases, the memory space of the deleted node N is returned to the AVAIL list.

(a) Node 44 is deleted

(b) Linked representation


Hashing:
Hashing is the technique used for performing almost constant time search in case of
insertion, deletion and find operation. Taking a very simple example of it, an array with its
index as key is the example of hash table.
So each index (key) can be used for accessing the value in a constant search time. This
mapping key must be simple to compute and must helping in identifying the associated
value. Function which helps us in generating such kind of key-value mapping is known as
Hash Function.
Hash Table a.k.a Hash Map is a data structure which uses hash function to generate key
corresponding to the associated value.
lets look at some sample hash function for strings
Folding Method:int h(String x, int D)
{
int i, sum;

for (sum=0, i=0; i<x.length(); i++)


sum+= (int)x.charAt(i);
return (sum%D);
}
Cyclic Shift :static long hashCode(String key, int D)
{
int h=0;
for (int i=0, i<key.length(); i++)
{
h = (h << 4) | ( h >> 27);
h += (int) key.charAt(i);
}
return h%D;
}
Collision resolution:
Since its always not possible to design perfect hash function with minimal overhead which
would generate unique key. To address this problem following are the two main collision
resolving techniques:1) Open Hashing also known as separate chaining
2) Closed Hashing also known as open addressing
1) Open Hashing :- In this strategy collision is resolved by keeping the conflicting
element in a list. That is to keep all element in a list which generate same hash.
2)

Open Hashing
From above figure its clear that how collision get resolved by keeping a linked list.
2) Closed Hashing: - In this strategy collision is resolved by placing the conflicting
element near to the slot generated by the hash function. Associated with closed hashing is a
rehash strategy:
If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x),
h2(x), etc. Try each in order, if none empty table is full,
Lets take an example to understand it

HASH_TABLE_SIZE = 8
Input data :- a,b,c,d Hash for them H(a) = 0, H(b) = 3, H(c) = 7 and H(d) = 3
Finding position using linear hashing:
h1(d) = (h(d)+1)%8 = 4%8 = 4
Adding 1 to hash function of h(d) we get new position 4, and slot 4 is currently non
occupied. So entering d at position 4. In this way Closed hashing works.
Disadvantage of closed hashing is that it consumes more space as compared to open
hashing also it has less flexibility in accommodating for duplicate hash element.
Major advantage of closed hashing is that it reduces the overhead of introducing new data
structure and reduces cost of new memory allocation per new element insertion.

You might also like