Design and Analysis of Algorithms: Lecture 3 - Searching Tactics

Design and Analysis of Algorithms
Graduate Course-Number CSC5011 Fall Semester 2011
Lecture 3 Searching Tactics
Dr. Md. Shamim Akhter

Assistant Professor Computer Science Department American International University Bangladesh Email: shamimakhter@aiub.edu
Searching Concept (1/3)

Common problem in computer science Involves storing and maintaining large data set, and then searching the data for particular values data storage and retrieval are key to many industry applications search algorithms are necessary to storing and retrieving data efficiently

For instance, a program that checks the spelling of words, searches for them in a dictionary, which is just an ordered list of words. Problems of this kind are called searching problems.

There are many searching algorithms. The natural searching method is linear search (or sequential search, or exhaustive search)
very simple but takes a long time to apply with large lists
A binary search repeatedly subdivides the list to locate an item

much faster than linear search
Like a binary search, an interpolation search repeatedly subdivides the list to locate an item
Linear / Sequential Search

Special case of brute-force search This is a very simple algorithm It uses a loop to sequentially step through an array, starting with the first element. It compares each element with the value being searched for and stops when that value is found or the end of the array is reached.
Linear Search (2/8)

Sub LinearSearch(x:int, a[]: Int, loc: Int) i:=1 While (i<=n) And (x<>a[i]) i:=i+1 End While If i<=n Then loc = i Else loc = 0 End Sub
Linear Search (3/8)

Array numlist contains
Searching for the the value 11, linear search examines 17, 23, 5, and 11 -> Found Searching for the the value 7, linear search examines 17, 23, 5, 11, 2, 29, and 3 -> Not Found
Linear Search (4/8)

The advantage is its simplicity.
It is easy to understand Easy to implement Does not require the array to be in order
The disadvantage is its inefficiency

If there are 20,000 items in the array and what you are looking for is in the 19,999th element, you need to search through the entire list.
Linear Search (5/8)

Whenever the number of entries doubles, so does the running time, roughly. If a machine does 1 million comparisons per second, it takes about 30 minutes for 4 billion comparisons.
Linear Search (6/8)
Linear Search (7/8)

Use a Sentinel to Improve the Performance
Sub LinearSearch2(x:int, a[]: Int, loc: Int) a[n+1] = x: n = n + 1: i = 1 While (x<>a[i]) i = i+1 End While If i<=n Then loc = i Else loc = 0 End Sub
Linear Search (8/8)

Apply Linear Search to Sorted Lists
Sub LinearSearch3(x:int, a[]: Int, loc: Int) i=1 While (x > a[i]) i = i+1 End While If a[i] = x Then loc = i Else loc = 0 End Sub
Binary Search (1/9)

Can We Search More Efficiently?
Yes, provided the list is in some kind of order, for example alphabetical order with respect to the names. If this is the case, we use a divide and conquer strategy to find an item quickly. This strategy is what one would use in a number guessing game, for example.
Binary Search (2/9)

Im Thinking of A Number between 1 and 1000. Guess it!
Is it 500? Nope, too low. Is it 750? Nope, too high. Is it 625? etc
This strategy guarantees a correct guess in no more than ten guesses!
Binary Search (3/9)

Apply This Strategy to Searching The resulting algorithm is called the Binary Search algorithm. We check the middle key in our list.
If it is beyond what we are looking for (too high), we look only at the 1st half of the list. If its not far enough in (too low), we look at the 2nd half.
Then iterate!
Binary Search (4/9)

1.
Divide a sorted array into three sections.

middle element elements on one side of the middle element elements on the other side of the middle element
2.
If the middle element is the correct value, done. Otherwise, go to step 1, using only the half of the array that may contain the correct value.
Binary Search (5/9)

3.
Continue steps 1 and 2 until either the value is found or there are no more elements to examine.
Binary Search (6/9)

Binary Search Example Array numlist2 contains
2 3 5 11 17 23 29
Searching for the value 11, binary search examines 11 and stops. Found. Searching for the value 7, binary search examines 11,3,5,and stops. Not Found.
Binary Search (7/9)

Algorithm for Binary search
Sub BinarySearch(x:int, a[]: int, loc: Int) i =1: j =n while i<j begin m =(i + j) \ 2 if x > a[m] then i=m+1 else j=m end if x=a[i] then loc=i else loc=0 End Sub
Binary Search (8/9)

The worst case number of comparisons grows by only 1 comparison every time list size is doubled. Only 32 comparisons would be needed on a list of 4 billion using Binary Search.
Sequential Search would need 4 billion comparisons and would take 30 minutes!
Binary Search (9/9)

Benefit
Much more efficient than linear search. For array of N elements, performs at most log2N comparisons.
Disadvantage
Requires that array elements be sorted.
Interpolation Search (1/9)

Binary search is a great improvement over linear search
eliminates large portion of the list without actually examine all
Values are fairly evenly distributed, interpolation can be used to eliminate more values at each step.

Interpolation is the process of using knowledge to guess the position of an unknown value Indexes of known values in the list to guess what index the target value should have. Interpolation search selects the dividing point by interpolation using the following code: m = l + (x a[l])*(r-l)/(a[r]-a[l])

Compare x to a[m]
If x = a[m]: Found. If x<a[m]: set r = m-1 If x > a[m]: set l = m + 1
If searching is still not finish, continue searching with new l and r. Stop searching when Found or x<a[l] or x>a[r].

Example: Find the key x = 32 in the list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70
1: l=1, r=20 -> m=1+(32-1)*(20-1)/(70-1) =

10 a[10]=21<32=x -> l=11 2: l=11, r=20 -> m=11+(30-24)*(20-11)/(7024) = 12 a[12]=32=x -> Found at m = 12

Example: Find the key x = 30 in the list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 4 7 9 9 12 13 17 19 21 24 32 36 44 45 54 55 63 66 70
1: l=1, r=20 -> m=1+(30-1)*(20-1)/(70-1) = 9 a[9]=19<30=x -> l=10 2: l=10, r=20 -> m=10+(30-21)*(20-10)/(7021) = 12 a[12]=32>30=x -> r = 11 3: l=10, r=11 -> m=10+(30-24)*(11-10)/(2421) = 12 m=12>11=r: Not Found

Private Sub Interpolation(a[]: Int, x: Int, n: Int, Found: Boolean) l = 1: r = n Do While (r > l) m = l + ((x a[l]) / (a[r] a[l])) * (r - l) Verify and Decide What to do next Loop End Sub

Verify and Decide what to do next If (a[m] = x) Or (m < l) Or (m > r) Then Found = iif(a[m] = x, True, False) Exit Do ElseIf (a[m] < x) Then l=m+1 ElseIf (a[m] > x) Then r=m1 End If

Binary search is very fast (O(logn)), but interpolation search is much faster (O(loglogn)). For n = 2^32 (four billion items)
Binary search took 32 steps of verification Interpolation search took only 5 steps of verification.

Interpolation search performance time is nearly constant for a large range of n. Interpolation is still more useful if the data had been stored on a hard disk or other relatively slow device.
Binary Search Tree (BST)

Its a binary tree ! For each node in a BST
left subtree is smaller than it; and right subtree is greater than it.
Search Operation
Search operation takes time O(h), where h is the height of a BST
Operation Insert
Worst Case
Performance
Depend on the shape of the tree Best Case:
Perfectly balanced tree, log N nodes from root to leave
Worst Case:
N nodes in a search path
Average Case:
1.39 log N comparisons for N keys
Balanced Tree
Tree structures support various basic dynamic set operations in time proportional to the height
of the tree e.g.: Search, Predecessor, Successor, Minimum, Maximum, Insert, and Delete
Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in the tree To ensure that the height of the tree is as small as possible and therefore provide the best running time
Balanced BST
BST Worst case O(N)
Need to be balanced
Approach:
rebalance the BST explicitly Recursive and linear time However, insertion cost quadratic
Frequently rebalancing
Is there a type of BST which guarantee??

Every insert and search will be logarithmic
Top Down 2-3-4 Trees

Nodes store 1, 2, or 3 keys and have 2, 3, or 4 children, respectively All leaves have the same depth
2-3-4 Tree Nodes

Introduction of nodes with more than 1 key, and more than 2 children 2-Node:
same as a binary node
3 Node: 2 keys, 3 links 4 Node:

3 keys, 4 links
Why 2-3-4? (1/2)

Why not minimize height by maximizing children in a d-tree? Let each node have d children so that we get O(logd N) search time! Right?
That means if d = N1/2, we get a height of 2
Why 2-3-4? (2/2)

However, searching out the correct child on each level requires O(log N1/2) by binary search 2 log N1/2 = O(log N) which is not as good as we had hoped for! 2-3-4-trees will guarantee O(log N) height using only 2, 3, or 4 children per node
Insertion into 2-3-4 Trees (1/3)

Insert the new key at the lowest internal node reached in the search 2-node becomes 3-node
3-node becomes 4-node What about a 4-node?

We cant insert another key!

In our way down the tree, whenever we reach a 4-node, we break it up into two 2-nodes, and move the middle element up into the parent node

Now we can perform the insertion using one of the previous two cases Since, we follow this method from the root down to the leaf, it is called top down insertion
Splitting the Tree

As we travel down the tree, if we encounter any 4-node we will break it up into 2-nodes. This guarantees that we will never have the problem of inserting the middle element of a former 4-node into its parent 4-node.
Splitting the Tree
Splitting the Tree
Time Complexity of Insertion in 2-3-4 Trees

Time complexity:
A search visits O(log N) nodes An insertion requires O(log N) node splits Each node split takes constant time Operations Search and Insert each take time O(log N)
Beyond 2-3-4 Trees

What do we know about 2-3-4 Trees?
Balanced O(log N) search time Different node structures
Can we get 2-3-4 tree advantages in a binary tree format???

Welcome to the world of Red-Black Trees!!!
Best both methods

Search in BST Insert in 2-3-4 search tree
Red-Black Tree
A red-black tree is a binary search tree with the following properties:
edges are colored red or black no two consecutive red edges on any root-leaf path same number of black edges on any root-leaf path (= black height of the tree) edges connecting leaves are black
Red-Black Tree
2-3-4 Tree Evolution

How 2-3-4 trees relate to red-black trees
Insertion into Red-Black Tree

1.
2.
3. 4.
Perform a standard search to find the leaf where the key should be added Replace the leaf with an internal node with the new key Color the incoming edge of the new node red Add two new leaves, and color their incoming edges black
Insertion into Red-Black Tree

If the parent had an incoming red edge, we now have two consecutive red edges!
We must re-organize tree to remove that violation. What must be done depends on the sibling of the parent.
Insertion - Plain and Simple
Case 1: Incoming edge of p is black
Right Left Rotation
Restructuring
Case 2: Incoming edge of p is red, and its sibling is black
Similar to a right rotation, we can do a left rotation...
Double Rotation
What if the new node is between its parent and grandparent in the inorder sequence? We must perform a double rotation (which is no more difficult than a single one)
This would be called a left-right double rotation
Last of the Rotations

And this would be called a right-left double rotation
Bottom-Up Rebalancing
Case 3: Incoming edge of p is red and its sibling is also red
We call this a promotion Note how the black depth remains unchanged for all of the descendants of g This process will continue upward beyond g if necessary: rename g as n and repeat.
Summary of Insertion
If two red edges are present, we do either
a restructuring (with a simple or double rotation) and stop, or a promotion and continue
A restructuring takes constant time and is performed at most once. It reorganizes an offbalanced section of the tree. Promotions may continue up the tree and are executed O(log N) times. The time complexity of an insertion is O(logN).

Design and Analysis of Algorithms: Lecture 3 - Searching Tactics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Analysis of Algorithms: Lecture 3 - Searching Tactics

Uploaded by

Copyright:

Available Formats

Design and Analysis of Algorithms

Graduate Course-Number CSC5011 Fall Semester 2011

Lecture 3 Searching Tactics

Dr. Md. Shamim Akhter

Searching Concept (1/3)

Searching Concept (2/3)

Searching Concept (3/3)

A binary search repeatedly subdivides the list to locate an item

Linear / Sequential Search

Linear Search (2/8)

Linear Search (3/8)

Linear Search (4/8)

The disadvantage is its inefficiency

Linear Search (5/8)

Linear Search (6/8)

Linear Search (7/8)

Linear Search (8/8)

Binary Search (1/9)

Binary Search (2/9)

This strategy guarantees a correct guess in no more than ten guesses!

Binary Search (3/9)

Binary Search (4/9)

Divide a sorted array into three sections.

Binary Search (5/9)

Binary Search (6/9)

Binary Search (7/9)

Binary Search (8/9)

Binary Search (9/9)

Interpolation Search (1/9)

Interpolation Search (2/9)

Interpolation Search (3/9)

Interpolation Search (4/9)

1: l=1, r=20 -> m=1+(32-1)*(20-1)/(70-1) =

Interpolation Search (5/9)

Interpolation Search (6/9)

Interpolation Search (7/9)

Interpolation Search (8/9)

Interpolation Search (9/9)

Binary Search Tree (BST)

Search operation takes time O(h), where h is the height of a BST

Is there a type of BST which guarantee??

Top Down 2-3-4 Trees

2-3-4 Tree Nodes

3 Node: 2 keys, 3 links 4 Node:

Why 2-3-4? (1/2)

That means if d = N1/2, we get a height of 2

Why 2-3-4? (2/2)

Insertion into 2-3-4 Trees (1/3)

3-node becomes 4-node What about a 4-node?

Insertion into 2-3-4 Trees (2/3)

Insertion into 2-3-4 Trees (3/3)

Splitting the Tree

Splitting the Tree

Splitting the Tree

Time Complexity of Insertion in 2-3-4 Trees

Beyond 2-3-4 Trees

Can we get 2-3-4 tree advantages in a binary tree format???

Best both methods

2-3-4 Tree Evolution

Insertion into Red-Black Tree

Insertion into Red-Black Tree

Insertion - Plain and Simple