Professional Documents
Culture Documents
Sorting is very important in every computer application. Sorting refers to arranging of data
elements in some given order. Many Sorting algorithms are available to sort the given set of
elements. We will now discuss two sorting techniques and analyze their performance. The
two techniques are:
Internal Sorting
External Sorting
Internal Sorting
Internal Sorting takes place in the main memory of a computer. The internal sorting
methods are applied to small collection of data. It means that, the entire collection of data
to be sorted in small enough that the sorting can take place within main memory. We will
study the following methods of internal sorting
Insertion sort
Selection sort
Merge Sort
Radix Sort
Quick Sort
Heap Sort
Bubble Sort
Insertion Sort
In this sorting we can read the given elements from 1 to n, inserting each element into its
proper position. An example of an insertion sort occurs in everyday life while playing cards.
To sort the cards in your hand you extract a card, shift the remaining cards, and then insert
the extracted card in the correct place. This process is repeated until all the cards are in the
correct sequence. Both average and worst-case time is O(n2).
This sorting algorithm is frequently used when n is small. The insertion sort algorithm scans
A from A[l] to A[N], inserting each element A[K] into its proper position in the previously
sorted subarray A[l], A[2], . . . , A[K-1]. That is:
Step-1: A[l] by itself is trivially sorted.
Step-2: A[2] is inserted either before or after A[l] so that: A[l], A[2] is sorted.
Step-3: A[3] is inserted into its proper place in A[l], A[2], that is, before A[l], between A[l]
and A[2], or after A[2], so that: A[l], A[2], A[3] is sorted.
Step-4: A[4] is inserted into its proper place in A[l], A[2], A[3] so that: A[l], A[2], A[3],
A[4] is sorted.
Step-n: A[N] is inserted into its proper place in A[l], A[2], . . . , A[N - 1] so that: A[l], A[2],
. . . ,A[N] is sorted.
Example:
Algorithm
INSERTION ( A , N )
This algorithm sorts the array A with N elements
Set A[0] := -- . [initializes the element]
Repeat Steps 3 to 5 for K= 2,3, ,N
Set TEMP := A[K] and PTR:= K-1
Repeat while TEMP < A[PTR]
(a) Set A[PTR +1]:=A[PTR] [Moves element forward]
(b) Set PTR := PTR-1
[End of loop].
Set A[PTR+1] := TEMP [inserts element in proper place]
[End of Step 2 loop]
Return
Complexity of Insertion Sort:
The insertion sort algorithm is a very slow algorithm when n is very large.
Worst Case
The Worst Case occurs when the array A is in reverse order and the inner loop must use the
maximum number of K-1 of comparisons.
f(n) = 1 +2+ +(n-1) =
Average Case
The average case occurs when there is (K-1) /2 comparisons in the inner loop.
Selection Sort
In this sorting we find the smallest element in this list and put it in the first position. Then
find the second smallest element in the list and put it in the second position. And so on.
Step-1: Find the location LOC of the smallest in the list of N elements A[l], A[2], . . . , A[N],
and then interchange A[LOC] and [1] . Then A[1] is sorted.
Step-2: Find the location LOC of the smallest in the sub-list of N 1 Elements A[2], A[3],. .
. , A[N], and then interchange A[LOC] and A[2]. Then: A[l], A[2] is sorted, since
A[1]<A[2].
Step-3: Find the location LOC of the smallest in the sub-list of N 2 elements A[3],
A[4], . . . , A[N], and then interchange A[LOC] and A[3]. Then: A[l], A[2], . . . ,
A[3] is sorted, since A[2] < A[3].
..
Step- N - 1. Find the location LOC of the smaller of the elements A[N - 1), A[N], and then
interchange A[LOC] and A[N- 1]. Then: A[l], A[2], . . . , A[N] is sorted, since A[N 1] < A[N]. Thus A is sorted after N - 1 passes.
Example:
Suppose an array A contains 8 elements as follows:
77, 33, 44, 11, 88, 22, 66, 55
Algorithm 2.2:
1. To find the minimum element
MIN ( A, K , N, LOC)
An array A is in memory. This procedure finds the location
LOC of the smallest element among A[K] , A[K+1],.A[N].
Set MIN:= A[K] and LOC := K [Initializes pointers]
Repeat for J = K +1, K+2
If MIN > A [J] , then : Set MIN := A[J] and LOC := A[j] and LOC: = J
Return
2. To Sort the elements
SELECTION (A, N)
Repeat Steps 2 and 3 form K= 1,2, .., N 1
Call MIN(A,K,N,LOC)
[Interchange A[K] and A[LOC] ]
Set TEMP: = A [K], A [K]:= A [LOC] and A [LOC]:=TEMP
Exit.
Complexity of the Selection Sort Algorithm
First note that the number f(n) of comparisons in the selection sort algorithm is independent
of the original order of the elements. Observe that MIN(A, K, N, LOC) requires n - K
comparisons. That is, there are n - 1 comparisons during Pass 1 to find the smallest
element, there are n - 2 comparisons during Pass 2 to find the second smallest element,
and so on. Accordingly,
f(n) = (n - 1) + (n - 2) + .. + 2 + 1 = n(n(-1)/2 = O(n2)
The above result is summarized in the following table:
Merge Sort
Combing the two lists is called as merging. For example A is a sorted list with r elements
and B is a sorted list with s elements. The operation that combines the elements of A and B
into a single sorted list C with n = r + s elements is called merging. After combing the two
lists the elements are sorted by using the following merging algorithm suppose one is given
two sorted decks of cards. That is, at each step, the two front cards are compared and the
smaller one is placed in the combined deck. When one of the decks is empty, all of the
remaining cards in the other deck are put at the end of the combined deck. Similarly,
suppose we have two lines of students sorted by increasing heights, and suppose we want
to merge them into a single sorted line. The new line is formed by choosing, at each step,
the shorter of the two students who are at the head of their respective lines. When one of
the lines has no more students, the remaining students line up at the end of the combined
line.
1. Divide Step
If a given array A has zero or one element, simply return; it is already sorted. Otherwise,
split A[p .. r] into two subarrays A[p .. q] and A[q + 1 .. r], each containing about half of
the elements of A[p .. r]. That is, q is the halfway point of A[p .. r].
2. Conquer Step
Conquer by recursively sorting the two subarrays A[p .. q] and A[q + 1 .. r].
3. Combine Step
Combine the elements back in A[p .. r] by merging the two sorted subarrays A[p .. q] and
A[q + 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure
MERGE (A, p, q, r).
The first part shows the arrays at the start of the "for k p to r" loop, where A[p . .
q] is copied into L[1 . . n1] and A[q + 1 . . r ] is copied into R[1 . . n2].
Succeeding parts show the situation at the start of successive iterations.
Entries in A with slashes have had their values copied to either L or R and have not
had a value copied back in yet. Entries in L and R with slashes have been copied back
into A.
The last part shows that the subarrays are merged back into A[p . . r], which is now
sorted, and that only the sentinels () are exposed in the arrays L and R.]
Algorithm.
Assume that a sequence a has n elements, which need not be distinct. We can sort an in
ascending order using merge-sort, as follows:
To sort the entire sequence A[1 .. n], make the initial call to the procedure MERGE-SORT
(A, 1, n).
MERGE-SORT (A, p, r)
IF p < r
THEN q = FLOOR[(p + r)/2]
MERGE (A, p, q)
MERGE (A, q + 1, r)
MERGE (A, p, q, r)
MERGE (A, p, q, r )
n1 q p + 1
n2 r q
Create arrays L[1 . . n1 + 1] and R[1 . . n2 + 1]
FOR i 1 TO n1
DO L[i] A[p + i 1]
FOR j 1 TO n2
DO R[j] A[q + j ]
L[n1 + 1]
R[n2 + 1]
i1
j1
FOR k p TO r
DO IF L[i ] R[ j]
THEN A[k] L[i]
ii+1
ELSE A[k] R[j]
jj+1
For simplicity, assume that n is a power of 2 so that each divide step yields two
subproblems, both of size exactly n/2.
Divide: Just compute q as the average of p and r, which takes constant time i.e.
(1).
Summed together they give a function that is linear in n, which is (n). Therefore, the
recurrence for merge sort running time is
By the master theorem in CLRS-Chapter 4 (page 73), we can show that this recurrence has
the solution
T(n) = (n lg n).
Reminder: lg n stands for log2 n.
Compared to insertion sort [(n2) worst-case time], merge sort is faster. Trading a factor of
n for a factor of lg n is a good deal. On small inputs, insertion sort may be faster. But for
large enough inputs, merge sort will always be faster, because its running time grows more
slowly than insertion sorts.
Radix Sort
Radix sort is the method that many people intuitively use or begin to use when
alphabetizing a large list of names. Specifically, the list of names is first sorted according to
the first letter of each name. That is, the names are arranged in 26 classes, where the first
class consists of those names that begin with "A," the second class consists of those names
that begin with "B," and so on. During the second pass, each class is alphabetized according
to the second letter of the name. And so on. If no name contains, for example, more than
12 letters, the names are alphabetized with at most 12 passes.
The radix sort is the method used by a card sorter. A card sorter contains 13 receiving
pockets labeled as follows:
9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 11, 12, R (reject)
Each pocket other than R corresponds to a row on a card in which a hole can be punched.
Decimal numbers, where the radix is 10, are punched in the obvious way and hence use
only the first 10 pockets of the sorter. The sorter uses a radix reverse-digit sort on numbers.
That is, suppose a card sorter is given a collection of cards where each card contains a 3digit number punched in columns 1 to 3. The cards are first sorted according to the units
digit. On the second pass, the cards are sorted according to the tens digit. On the third and
last pass, the cards are sorted according to the hundreds digit. We illustrate with an
example.
Example
Suppose 9 cards are punched as follows:
348, 143, 361, 423, 538, 128, 321, 543, 366
Given to a card sorter, the numbers would be sorted in three phases,
In the first pass, the units digits are sorted into pockets. The cards are collected
pocket by pocket, from pocket 9 to pocket 0. The cards are now re-input to the
sorter.
In the second pass, the tens digits are sorted into pockets. Again the cards are
collected pocket by pocket and re-input to the sorter.
In the third and final pass, the hundreds digits are sorted into pockets.
First Pass
Second Pass
Third Pass
When the cards are collected after the third pass, the numbers are in the following order:
128, 143, 321, 348, 361, 366, 423, 538, 543
Thus the cards are now sorted. The number C of comparisons needed to sort nine such 3digit numbers is bounded as follows:
C 9 * 3 * 10
The 9 comes from the nine cards, the 3 comes from the three digits in each number, and
the 10 comes from radix d = 10 digits.
}
Partitioning the Array Algorithm:
PARTITION (A, p, r)
x A[p]
i p-1
j r+1
while TRUE do
Repeat j j-1
until A[j] x
Repeat i i+1
until A[i] x
if i < j
then exchange A[i] A[j]
else return j
Partition selects the first key, A[p] as a pivot key about which the array will partitioned:
Keys A[p] will be moved towards the left .
Keys A[p] will be moved towards the right.
Best Case
The best thing that could happen in Quick sort would be that each partitioning stage divides
the array exactly in half. In other words, the best to be a median of the keys in A[p . . r]
every time procedure 'Partition' is called. The procedure 'Partition' always split the array to
be sorted into two equal sized arrays.
If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then
T(n) = T(n/2) + T(n/2) +
= 2T(n/2) +
(n)
(n)
(n lg n)
Heap Sort
A heap is a complete binary tree, in which each node satisfies the heap condition. Heap sort
is a comparison-based sorting algorithm to create a sorted array (or list), and is part of the
selection sort family. Although somewhat slower in practice on most machines than a wellimplemented quick sort, it has the advantage of a more favorable worst-case O(n log n)
runtime. Heap sort is an in-place algorithm, but is not a stable sort.
Heap condition
The key of each node is greater than or equal to the key in its children. Thus the root node
will have the largest key value.
MaxHeap
Suppose H is a complete binary tree with n elements. Then H is called a heap or maxheap, if
the value at N is greater than or equal to the value at any of the children of N.
MinHeap
The value at N is less than or equal to the value at any of the children of N.
The operations on a heap
(i) New node is inserted into a Heap
(ii) Deleting the Root of a Heap
Heap sort is a two step algorithm.
The first step is to build a heap out of the data.
The second step begins with removing the largest element from the heap. We insert the
removed element into the sorted array. For the first element, this would be position 0 of the
array. Next we reconstruct the heap and remove the next largest item, and insert it into the
array. After we have removed all the objects from the heap, we have a sorted array. We can
vary the direction of the sorted elements by choosing a min-heap or max-heap in step one.
Heap sort can be performed in place. The array can be split into two parts, the sorted array
and the heap. The storage of heaps as arrays is diagrammed at Binary heap#Heap
implementation. The heap's invariant is preserved after each extraction, so the only cost is
that of extraction.
Algorithm:
function heapify(a,count) is
(end is assigned the index of the first (left) child of the root)
end := 1
while end < count
(sift up the node at index end to the proper place such that all nodes above
the end index are in heap order)
siftUp(a, 0, end)
end := end + 1
(after sifting up the last node all nodes are in heap order)
function siftUp(a, start, end) is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := floor((child - 1) / 2)
if a[parent] < a[child] then (out of max-heap order)
swap(a[parent], a[child])
child := parent (repeat to continue sifting up the parent now)
else
return
Example:
Bubble Sort
Bubble sort, sometimes incorrectly referred to as sinking sort, is a simple sorting algorithm
that works by repeatedly stepping through the list to be sorted, comparing each pair of
adjacent items and swapping them if they are in the wrong order. The pass through the list
is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm
gets its name from the way smaller elements "bubble" to the top of the list. Because it only
uses comparisons to operate on elements, it is a comparison sort. Although the algorithm is
simple, most other algorithms are more efficient for sorting large lists. In this sorting
algorithm, multiple swapping take place in one iteration. Smaller elements move or bubble
up to the top of the list. In this method, we compare the adjacent members of the list to be
sorted, if the item on top is greater than the item immediately below it, they are swapped.
Performance
Bubble sort has worst-case and average complexity both (n2), where n is the
number of items being sorted. There exist many sorting algorithms with substantially better
worst-case or average complexity of O(n log n). Even other (n2) sorting algorithms, such
as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort
is not a practical sorting algorithm when n is large.
Algorithm:
procedure bubbleSort( A : list of sortable items )
repeat
swapped = false
for i = 1 to length(A) - 1 inclusive do:
/* if this pair is out of order */
if A[i-1] > A[i] then
/* swap them and remember something changed */
swap( A[i-1], A[i] )
swapped = true
end if
end for
until not swapped
end procedure
1. Compare each pair of adjacent elements from the beginning of an array and, if they
are in reversed order, swap them.
2. If at least one swaps has been done, repeat step 1.
You can imagine that on every step big bubbles float to the surface and stay there. At the
step, when no bubble moves, sorting stops. Let us see an example of sorting an array to
make the idea of bubble sort clearer.
Example. Sort {5, 1, 12, -5, 16} using bubble sort.
Complexity analysis
Average and worst case complexity of bubble sort is O(n2). Also, it makes O(n2) swaps in the
worst case. Bubble sort is adaptive. It means that for almost sorted array it gives O(n)
estimation. Avoid implementations, which don't check if the array is already sorted on every
step (any swaps made). This check is necessary, in order to preserve adaptive property.
Complexity:
The total number of comparisons in Bubble sort is:
= (N-1)+(N-2).+2+1
=(N-1)*N/2=O(N2)
The time required to execute the bubble sort algorithm is proportional to n2 , where n is the
number of input items. The Bubble sort algorithm uses the O(n2) comparisons on average.
EXTERNAL SORT
The External sorting methods are applied only when the number of data elements to be
sorted is too large. These methods involve as much external processing as processing in the
CPU. To study the external sorting, we need to study the various external devices used for
storage in addition to sorting algorithms. This sorting requires auxiliary storage.The
following are the examples of external sorting
Searching refers to the operation of finding the location of a given item in a collection of
items. The search is said to be successful if ITEM does appear in DATA and unsuccessful
otherwise. The following searching algorithms are discussed in this chapter.
Sequential Searching
Binary Search
Binary Tree Search
Sequential Search:
This is the most natural searching method. The most intuitive way to search for a given
ITEM in DATA is to compare ITEM with each element of DATA one by one. .The algorithm for
a sequential search procedure is now presented. One of the most straightforward and
elementary searches is the sequential search, also known as a linear search.
As a real world example, pickup the nearest phonebook and open it to the first page of
names. We're looking to find the first "Smith". Look at the first name. Is it "Smith"?
Probably not. Now look at the next name. Is it "Smith"? Probably not. Keep looking at the
next name until you find "Smith".
The above is an example of a sequential search. You started at the beginning of a sequence
and went through each item one by one, in the order they existed in the list, until you found
the item you were looking for. Of course, this probably isn't how you normally look up a
name in the phonebook; we'll cover a method similar to the way you probably look up
phone numbers later in this guide.
Algorithm :
SEQUENTIAL SEARCH
INPUT : List of Size N. Target Value T
OUTPUT : Position of T in the list-1
BEGIN
Set FOUND = false
Set I := 0
While (I <= N) and (FOUND is false)
IF List[i] ==t THEN
FOUND = true
ELSE
I = I+1
IF FOUND==false THEN
T is not present in the List
END
Example:
Lets search for the number 3. We start at the beginning and check the first element in the
array. Is it 3?
Binary Search
The binary search algorithm can only be applied if the data are sorted. You can exploit the
knowledge that they are sorted to speed up the search.
The idea is analogous to the way people look up an entry in a dictionary or telephone book.
You don't start at page 1 and read every entry! Instead, you turn to a page somewhere
about where you expect the item to be. If you are lucky you find the item straight away. If
not, you know which part of the book will contain the item (if it is there), and repeat the
process with just that part of the book.
If you always split the data in half and check the middle item, you halve the number of
remaining items to check each time. This is much better than linear search, where each
unsuccessful comparison eliminates just one item.
The binary search algorithm applied to our array DATA works as follows. During each stage
of our algorithm, our search for ITEM is reduced to a segment of elements of DATA:
DATA[BEG], DATA[BEG + 1], DATA[BEG + 2], ...... DATA[END].
Note that the variable BEG and END denote the beginning and end locations of the segment
respectively. The algorithm compares ITEM with the middle element DATA[MID] of the
segment, where MID is obtained by
MID = INT((BEG + END) / 2)
If DATA[MID] = ITEM, then the search is successful and we set LOC: = MID.
Otherwise a new segment of DATA is obtained as follows:
If ITEM < DATA[MID], then ITEM can appear only in the left half of the segment:
DATA[BEG],DATA[BEG + 1],.. ,DATA[MID - 1]. So we reset END := MID - 1 and
begin searching again.
If ITEM > DATA[MID], then ITEM can appear only in the right half of the segment:
DATA[MID + 1], DATA[MID + 2],....,DATA[END] So we reset BEG := MID + 1 and
begin searching again. Initially, we begin with the entire array DATA; i.e. we begin
with BEG = 1 and END = n, If ITEM is not in DATA, then eventually we obtain
END<BEG
This condition signals that the search is unsuccessful, and in this case we assign
LOC: = NULL. Here NULL is a value that lies outside the set of indices of DATA. We
now formally state the binary search algorithm.
Algorithm
(Binary Search) BINARY(DATA, LB, UB, TEM, LOC) Here DATA is sorted array with lower
bound LB and upper bound UB, and ITEM is a given item of information. The variables BEG,
END and MID denote, respectively, the beginning, end and middle locations of a segment of
elements of DATA. This algorithm finds the location LOC of ITEM in DATA or sets LOC=NULL.
[Initialize segment variables.]
Set BEG := LB, END := UB and MID = INT((BEG + END)/ 2).
Repeat Steps 3 and 4 while BEG END and
DATA[MID] ITEM.
If ITEM<DATA[MID], then:
Set END := MID - 1.
Else:
Set BEG := MID + 1.
[End of If structure]
Set MID := INT((BEG + END)/2).
[End of Step 2 loop.]
If DATA[MID] :=ITEM, then:
Set LOC:=MID.
Else:
Set LOC := NULL.
[End of If structure.]
Exit.
Examples
Find 6 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}.
Step 1 (middle element is 19 > 6):
Step 2 (middle element is 5 < 6):
Step 3 (middle element is 6 == 6):
Example 2. Find 103 in {-1, 5, 6, 18,
Step
Step
Step
Step
Step
1
2
3
4
5
-1 5 6 18 19 25 46 78 102 114
-1 5 6 18 19 25 46 78 102 114
-1 5 6 18 19 25 46 78 102 114
19, 25, 46, 78, 102, 114}.
Observe that the third case is much more complicated than the first two cases. In all three
cases, the memory space of the deleted node N is returned to the AVAIL list.
Open Hashing
From above figure its clear that how collision get resolved by keeping a linked list.
2) Closed Hashing: - In this strategy collision is resolved by placing the conflicting
element near to the slot generated by the hash function. Associated with closed hashing is a
rehash strategy:
If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x),
h2(x), etc. Try each in order, if none empty table is full,
Lets take an example to understand it
HASH_TABLE_SIZE = 8
Input data :- a,b,c,d Hash for them H(a) = 0, H(b) = 3, H(c) = 7 and H(d) = 3
Finding position using linear hashing:
h1(d) = (h(d)+1)%8 = 4%8 = 4
Adding 1 to hash function of h(d) we get new position 4, and slot 4 is currently non
occupied. So entering d at position 4. In this way Closed hashing works.
Disadvantage of closed hashing is that it consumes more space as compared to open
hashing also it has less flexibility in accommodating for duplicate hash element.
Major advantage of closed hashing is that it reduces the overhead of introducing new data
structure and reduces cost of new memory allocation per new element insertion.