You are on page 1of 63

Design and Analysis of Algorithms

Graduate Course-Number CSC5011 Fall Semester 2011

Lecture 3 Searching Tactics

Dr. Md. Shamim Akhter


Assistant Professor Computer Science Department American International University Bangladesh Email: shamimakhter@aiub.edu

Searching Concept (1/3)


Common problem in computer science Involves storing and maintaining large data set, and then searching the data for particular values data storage and retrieval are key to many industry applications search algorithms are necessary to storing and retrieving data efficiently

Searching Concept (2/3)


For instance, a program that checks the spelling of words, searches for them in a dictionary, which is just an ordered list of words. Problems of this kind are called searching problems.

Searching Concept (3/3)


There are many searching algorithms. The natural searching method is linear search (or sequential search, or exhaustive search)
very simple but takes a long time to apply with large lists

A binary search repeatedly subdivides the list to locate an item


much faster than linear search

Like a binary search, an interpolation search repeatedly subdivides the list to locate an item

Linear / Sequential Search


Special case of brute-force search This is a very simple algorithm It uses a loop to sequentially step through an array, starting with the first element. It compares each element with the value being searched for and stops when that value is found or the end of the array is reached.

Linear Search (2/8)


Sub LinearSearch(x:int, a[]: Int, loc: Int) i:=1 While (i<=n) And (x<>a[i]) i:=i+1 End While If i<=n Then loc = i Else loc = 0 End Sub

Linear Search (3/8)


Array numlist contains

Searching for the the value 11, linear search examines 17, 23, 5, and 11 -> Found Searching for the the value 7, linear search examines 17, 23, 5, 11, 2, 29, and 3 -> Not Found

Linear Search (4/8)


The advantage is its simplicity.
It is easy to understand Easy to implement Does not require the array to be in order

The disadvantage is its inefficiency


If there are 20,000 items in the array and what you are looking for is in the 19,999th element, you need to search through the entire list.

Linear Search (5/8)


Whenever the number of entries doubles, so does the running time, roughly. If a machine does 1 million comparisons per second, it takes about 30 minutes for 4 billion comparisons.

Linear Search (6/8)

Linear Search (7/8)


Use a Sentinel to Improve the Performance
Sub LinearSearch2(x:int, a[]: Int, loc: Int) a[n+1] = x: n = n + 1: i = 1 While (x<>a[i]) i = i+1 End While If i<=n Then loc = i Else loc = 0 End Sub

Linear Search (8/8)


Apply Linear Search to Sorted Lists
Sub LinearSearch3(x:int, a[]: Int, loc: Int) i=1 While (x > a[i]) i = i+1 End While If a[i] = x Then loc = i Else loc = 0 End Sub

Binary Search (1/9)


Can We Search More Efficiently?
Yes, provided the list is in some kind of order, for example alphabetical order with respect to the names. If this is the case, we use a divide and conquer strategy to find an item quickly. This strategy is what one would use in a number guessing game, for example.

Binary Search (2/9)


Im Thinking of A Number between 1 and 1000. Guess it!
Is it 500? Nope, too low. Is it 750? Nope, too high. Is it 625? etc

This strategy guarantees a correct guess in no more than ten guesses!

Binary Search (3/9)


Apply This Strategy to Searching The resulting algorithm is called the Binary Search algorithm. We check the middle key in our list.
If it is beyond what we are looking for (too high), we look only at the 1st half of the list. If its not far enough in (too low), we look at the 2nd half.

Then iterate!

Binary Search (4/9)


1.

Divide a sorted array into three sections.


middle element elements on one side of the middle element elements on the other side of the middle element

2.

If the middle element is the correct value, done. Otherwise, go to step 1, using only the half of the array that may contain the correct value.

Binary Search (5/9)


3.

Continue steps 1 and 2 until either the value is found or there are no more elements to examine.

Binary Search (6/9)


Binary Search Example Array numlist2 contains
2 3 5 11 17 23 29

Searching for the value 11, binary search examines 11 and stops. Found. Searching for the value 7, binary search examines 11,3,5,and stops. Not Found.

Binary Search (7/9)


Algorithm for Binary search
Sub BinarySearch(x:int, a[]: int, loc: Int) i =1: j =n while i<j begin m =(i + j) \ 2 if x > a[m] then i=m+1 else j=m end if x=a[i] then loc=i else loc=0 End Sub

Binary Search (8/9)


The worst case number of comparisons grows by only 1 comparison every time list size is doubled. Only 32 comparisons would be needed on a list of 4 billion using Binary Search.
Sequential Search would need 4 billion comparisons and would take 30 minutes!

Binary Search (9/9)


Benefit
Much more efficient than linear search. For array of N elements, performs at most log2N comparisons.

Disadvantage
Requires that array elements be sorted.

Interpolation Search (1/9)


Binary search is a great improvement over linear search
eliminates large portion of the list without actually examine all

Values are fairly evenly distributed, interpolation can be used to eliminate more values at each step.

Interpolation Search (2/9)


Interpolation is the process of using knowledge to guess the position of an unknown value Indexes of known values in the list to guess what index the target value should have. Interpolation search selects the dividing point by interpolation using the following code: m = l + (x a[l])*(r-l)/(a[r]-a[l])

Interpolation Search (3/9)


Compare x to a[m]
If x = a[m]: Found. If x<a[m]: set r = m-1 If x > a[m]: set l = m + 1

If searching is still not finish, continue searching with new l and r. Stop searching when Found or x<a[l] or x>a[r].

Interpolation Search (4/9)


Example: Find the key x = 32 in the list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

1: l=1, r=20 -> m=1+(32-1)*(20-1)/(70-1) =


10 a[10]=21<32=x -> l=11 2: l=11, r=20 -> m=11+(30-24)*(20-11)/(7024) = 12 a[12]=32=x -> Found at m = 12

Interpolation Search (5/9)


Example: Find the key x = 30 in the list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70

1: l=1, r=20 -> m=1+(30-1)*(20-1)/(70-1) = 9 a[9]=19<30=x -> l=10 2: l=10, r=20 -> m=10+(30-21)*(20-10)/(7021) = 12 a[12]=32>30=x -> r = 11 3: l=10, r=11 -> m=10+(30-24)*(11-10)/(2421) = 12 m=12>11=r: Not Found

Interpolation Search (6/9)


Private Sub Interpolation(a[]: Int, x: Int, n: Int, Found: Boolean) l = 1: r = n Do While (r > l) m = l + ((x a[l]) / (a[r] a[l])) * (r - l) Verify and Decide What to do next Loop End Sub

Interpolation Search (7/9)


Verify and Decide what to do next If (a[m] = x) Or (m < l) Or (m > r) Then Found = iif(a[m] = x, True, False) Exit Do ElseIf (a[m] < x) Then l=m+1 ElseIf (a[m] > x) Then r=m1 End If

Interpolation Search (8/9)


Binary search is very fast (O(logn)), but interpolation search is much faster (O(loglogn)). For n = 2^32 (four billion items)
Binary search took 32 steps of verification Interpolation search took only 5 steps of verification.

Interpolation Search (9/9)


Interpolation search performance time is nearly constant for a large range of n. Interpolation is still more useful if the data had been stored on a hard disk or other relatively slow device.

Binary Search Tree (BST)


Its a binary tree ! For each node in a BST
left subtree is smaller than it; and right subtree is greater than it.

Search Operation

Search operation takes time O(h), where h is the height of a BST

Operation Insert

Worst Case

Performance
Depend on the shape of the tree Best Case:
Perfectly balanced tree, log N nodes from root to leave

Worst Case:
N nodes in a search path

Average Case:
1.39 log N comparisons for N keys

Balanced Tree
Tree structures support various basic dynamic set operations in time proportional to the height
of the tree e.g.: Search, Predecessor, Successor, Minimum, Maximum, Insert, and Delete

Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in the tree To ensure that the height of the tree is as small as possible and therefore provide the best running time

Balanced BST
BST Worst case O(N)
Need to be balanced

Approach:
rebalance the BST explicitly Recursive and linear time However, insertion cost quadratic
Frequently rebalancing

Is there a type of BST which guarantee??


Every insert and search will be logarithmic

Top Down 2-3-4 Trees


Nodes store 1, 2, or 3 keys and have 2, 3, or 4 children, respectively All leaves have the same depth

2-3-4 Tree Nodes


Introduction of nodes with more than 1 key, and more than 2 children 2-Node:
same as a binary node

3 Node: 2 keys, 3 links 4 Node:


3 keys, 4 links

Why 2-3-4? (1/2)


Why not minimize height by maximizing children in a d-tree? Let each node have d children so that we get O(logd N) search time! Right?

That means if d = N1/2, we get a height of 2

Why 2-3-4? (2/2)


However, searching out the correct child on each level requires O(log N1/2) by binary search 2 log N1/2 = O(log N) which is not as good as we had hoped for! 2-3-4-trees will guarantee O(log N) height using only 2, 3, or 4 children per node

Insertion into 2-3-4 Trees (1/3)


Insert the new key at the lowest internal node reached in the search 2-node becomes 3-node

3-node becomes 4-node What about a 4-node?


We cant insert another key!

Insertion into 2-3-4 Trees (2/3)


In our way down the tree, whenever we reach a 4-node, we break it up into two 2-nodes, and move the middle element up into the parent node

Insertion into 2-3-4 Trees (3/3)


Now we can perform the insertion using one of the previous two cases Since, we follow this method from the root down to the leaf, it is called top down insertion

Splitting the Tree


As we travel down the tree, if we encounter any 4-node we will break it up into 2-nodes. This guarantees that we will never have the problem of inserting the middle element of a former 4-node into its parent 4-node.

Splitting the Tree

Splitting the Tree

Time Complexity of Insertion in 2-3-4 Trees


Time complexity:
A search visits O(log N) nodes An insertion requires O(log N) node splits Each node split takes constant time Operations Search and Insert each take time O(log N)

Beyond 2-3-4 Trees


What do we know about 2-3-4 Trees?
Balanced O(log N) search time Different node structures

Can we get 2-3-4 tree advantages in a binary tree format???


Welcome to the world of Red-Black Trees!!!

Best both methods


Search in BST Insert in 2-3-4 search tree

Red-Black Tree
A red-black tree is a binary search tree with the following properties:
edges are colored red or black no two consecutive red edges on any root-leaf path same number of black edges on any root-leaf path (= black height of the tree) edges connecting leaves are black

Red-Black Tree

2-3-4 Tree Evolution


How 2-3-4 trees relate to red-black trees

Insertion into Red-Black Tree


1.

2.

3. 4.

Perform a standard search to find the leaf where the key should be added Replace the leaf with an internal node with the new key Color the incoming edge of the new node red Add two new leaves, and color their incoming edges black

Insertion into Red-Black Tree


If the parent had an incoming red edge, we now have two consecutive red edges!
We must re-organize tree to remove that violation. What must be done depends on the sibling of the parent.

Insertion - Plain and Simple

Case 1: Incoming edge of p is black

Right Left Rotation

Restructuring
Case 2: Incoming edge of p is red, and its sibling is black

Similar to a right rotation, we can do a left rotation...

Double Rotation
What if the new node is between its parent and grandparent in the inorder sequence? We must perform a double rotation (which is no more difficult than a single one)

This would be called a left-right double rotation

Last of the Rotations


And this would be called a right-left double rotation

Bottom-Up Rebalancing
Case 3: Incoming edge of p is red and its sibling is also red
We call this a promotion Note how the black depth remains unchanged for all of the descendants of g This process will continue upward beyond g if necessary: rename g as n and repeat.

Summary of Insertion
If two red edges are present, we do either
a restructuring (with a simple or double rotation) and stop, or a promotion and continue

A restructuring takes constant time and is performed at most once. It reorganizes an offbalanced section of the tree. Promotions may continue up the tree and are executed O(log N) times. The time complexity of an insertion is O(logN).