Professional Documents
Culture Documents
Overview
Trees
G is parent of N
and child of A
A is an ancestor of P
P is a descendant of A
M is child of F and
grandchild of A
Generic
Tree:
Definitions
Each
E
h non-empty
t ttree h
has a roott node
d and
d zero or more subb
trees T1, , Tk
Recursive definition
Each sub-tree is a tree
An internal node is connected to its children by a directed
edge
Definitions
The length of a path is the number of edges on the path (i.e., k-1)
Each node has a path of length 0 to itself
There is exactly one path from the root to each node in a tree
Nodes ni,,nk are descendants of ni and ancestors of nk
Nodes ni+1,, nk are proper descendants
Nodes ni,,nk-1 are proper ancestors of ni
Cpt S 223. School of EECS, WSU
Can there be
more than one?
Trees
e g height(E)=2
e.g.,
height(E)=2, height(L)=0
= 3 (height of longest path from root)
e.g., depth(E)
depth(E)=1,
1, depth(L)=2
depth(L) 2
= 3 (length of the path to the deepest node)
Cpt S 223. School of EECS, WSU
Implementation of Trees
Number of children
can be dynamically
determined but.
more time
i
to
access children
Implementation of Trees
10
Binary Trees
11
E.g., (a + b * c)+((d * e + f) * g)
12
Evaluate expression
Post-order:
Post
order: left,
left right
right, root
Pre-order: root, left, right
In order: left,
In-order:
left root,
root right
Cpt S 223. School of EECS, WSU
13
14
Traversals
Pre-order: + + a * b c * + * d e f g
Post-order: a b c * + d e * f + g * +
In-order:
In
order: a + b * c + d * e + f * g
Cpt S 223. School of EECS, WSU
15
16
E.g., a b + c d e + * *
top
top
stack
(3)
(1)
top
top
((4))
((2))
+
a
+
b
17
E.g., a b + c d e + * *
top
t
top
(6)
(5)
+
b
c
d
+
e
+
d
18
19
Searching in BSTs
Contains
C
t i
(T
(T, x)
)
{
if (T == NULL)
then return NULL
if (T->element == x)
then return T
if (x < T->element)
then return Contains (T
(T->leftChild,
>leftChild x)
else return Contains (T->rightChild, x)
}
Typically assume no duplicate elements.
If duplicates, then store counts in nodes, or
each node has a list of objects.
Cpt S 223. School of EECS, WSU
20
Searching in BSTs
2
1
4
6
8
21
Searching in BSTs
findMin (T)
{
if (T == NULL)
then return NULL
if (T->leftChild == NULL)
then return T
else return findMin (T->leftChild)
}
C
Complexity
l it ?
O(h)
22
Searching in BSTs
findMax (T)
{
if (T == NULL)
then return NULL
if (T->rightChild == NULL)
then return T
else return findMax (T->rightChild)
}
C
Complexity
l it ?
O(h)
23
Printing BSTs
Complexity?
(n)
Cpt S 223. School of EECS, WSU
24
E.g., insert 5
Old ttree:
(1)
New tree:
insert(5)
(2)
(3)
(4)
25
Insert (x, T)
{
if (T == NULL)
Complexity?
then T = new Node(x)
else
if (x < T->element)
then if (T->leftChild == NULL)
then T->leftChild = new Node(x)
else Insert (x, T->leftChild)
else if (T->rightChild
(T >rightChild == NULL)
then (T->rightChild = new Node(x)
else Insert (x, T->rightChild)
26
}
E.g., remove(4):
(2)
remove(4):
(1)
(1)
8
(3)
(2)
(3)
Action:
(2)
Old tree:
Becomes
case 1 h
here
Cpt S 223. School of EECS, WSU
New tree:
28
29
Implementation of BST
30
const ?
Pointer to tree
node passed by
reference so it
can be
reassigned
within function.
31
Public member
functions calling
private recursive
member functions.
32
33
34
35
Case 2:
Copy successor data
Delete successor
Case 1: Just delete it
36
Post-order traversal
37
BST Analysis
Always (N)
Worst case: h = ?
Best case: h = ?
Average case: h = ?
(N)
(
(
lg N)
llg N)
38
HOW?
39
D(N)
+1
D(i)
()
+1 (adjustment)
D(N-i-1)
A similar
i il analysis
l i will
ill b
be
used in QuickSort
40
Randomly Generated
500-node BST (insert only)
Average node depth = 9.98
log2 500 = 8.97
41
42
B l
Balanced
d Binary
Bi
Search
S
h Trees
T
43
Why?
Solutions?
44
Balanced BSTs
AVL trees
45
AVL Trees
Definition:
E
Every
AVL tree is
i a BST such
h that:
h
1.
46
AVL Trees
= (2h)
47
AVL Trees
Which of these is a valid AVL tree?
x
h=0
h=1
h=2
h=2
48
How?
h = O(lg(N))
N = (2h) =>
49
AVL Insert
Insert(6):
( )
violation
balanced
Rotating 7-8
restores balance
Inserting 6 violates AVL
balance condition
50
AVL Insert
root
Fix at the
violoated
node
inserted
node
51
CASE
CASE
CASE
CASE
1:
1
2:
3:
4:
4
Insertt
Inse
Insert
Insert
I
Insert
t
into
into
into
iinto
t
the
the
the
th
the
left ssubtree
bt ee of the left child of k
right subtree of the left child of k
left subtree of the right child of k
right
i ht subtree
bt
off th
the right
i ht child
hild off k
52
Insert
CASE 2
Insert
CASE 3
Insert
right
e
subtree
right
subtree
left
subtree
CASE 1
left
subtree
right child
left child
CASE 4
Insert
53
Case 1 for
f AVL insert
Let this be the node with the
violation ((i.e, imbalance))
(nearest to the last insertion
site)
CASE 1
Insert
54
Remember: X, Y, Z could be empty trees, or single node trees, or mulltiple node trees.
Before:
Imbalance
After:
Balanced
inserted
Invariant:
55
Case 1 example
After:
Before:
Imbalance
Balanced
inserted
Cpt S 223. School of EECS, WSU
56
1.
2.
3.
4.
57
Case 4 for
f AVL insert
Let this be the node with the
violation ((i.e, imbalance))
(nearest to the last insertion
site)
CASE 4
Insert
58
Before:
After:
I b l
Imbalance
inserted
Invariant:
Balanced
59
Case 4 example
Automatically fixed
Imbalance
4
2
Imbalance
Fix this
node
balanced
2
inserted
7
Cpt S 223. School of EECS, WSU
60
Case 2 for
f AVL insert
Let this be the node with the
violation ((i.e, imbalance))
(nearest to the last insertion
site)
CASE 2
Insert
61
Note: X, Z can be empty trees, or single node trees, or mulltiple node trees
But Y should have at least one or more nodes in it because of insertion.
AVL Insert
Before:
inserted
Think of Y as =
After:
Imbalance
remains!
AVL Insert
Before:
Imbalance
After:
Balanced!
#2
#1
=Z
=X
X
=Y
inserted
Can be implemented as
two successive single rotations
63
Case 2 example
Imbalance
Balanced!
5
3
2
#1
#2
6
2
5
4
1
4
inserted
Approach: push 3 to 5s place
64
Case 3 for
f AVL insert
Let this be the node with the
violation ((i.e, imbalance))
(nearest to the last insertion
site)
CASE 3
Insert
65
AVL Insert
imbalance
#2
#1
inserted
66
Delete(2): ?
10
7
5
2
15
19
13
11
14
16
17
25
18
67
68
69
70
Q) Is it guaranteed
that the deepest
node with
imbalance is the
one that gets
fixed?
Insert first,
and then fix
A) Yes, recursion
will ensure that.
Case 1
Case 2
Case 4
Case 3
Cpt S 223. School of EECS, WSU
71
72
#2
#1
// #1
// #2
Cpt S 223. School of EECS, WSU
73
Splay Tree
Observation:
Idea:
Strategy:
74
Splay Tree
Solution 1
Perform single
g rotations with accessed/new
/
node and parent until accessed/new node
is the root
P bl
Problem
75
Splay Tree
Solution 2
76
Splaying: Zig-zag
77
Splaying: Zig-zig
78
Splay Tree
79
N
Now
att root;
t no right
i ht child
hild
80
Balanced BSTs
AVL trees
S l trees
Splay
t
Other self-balancing
self balancing BSTs:
All these trees assume N-node tree can fit in main memory
If not?
81
82
83
Set Insertion
84
pair<Type1,Type2>
Methods: first,
first second,
second
first_type, second_type
#include <utility>
pair<iterator,bool> insert (const Object & x)
{
iterator itr;
bool found;
return
t
pair<itr,found>;
i <it f
d>
}
Cpt S 223. School of EECS, WSU
85
86
87
Set Insertion
set<int> s;
for (int i = 0; i < 1000000; i++)
s.insert (s.end(), i);
88
Set Deletion
Remove x, if found
Return number of items deleted (0 or 1)
iterator erase (iterator itr);
Remove objects
j
from start up
p to (but
(
not including)
g) end
Returns iterator for object after last deleted object
Again, iterator advances from start to end
using in-order
in order traversal
Cpt S 223. School of EECS, WSU
89
Set Search
90
Associative container
91
key
(as well as
the value)
<
Key
<
>
Value
(can be a
struct byy
itself)
>
92
Methods
93
94
Example
struct ltstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) < 0;
Comparator if
}
Key type not
};
Key type
Value type
yp
primitive
int main()
{
map<const char*, int, ltstr> months;
months["january"]
months[
january ] = 31;
You really dont have to call
months["february"] = 28;
insert() explicitly.
months["march"] = 31;
This syntax will do it for you.
months["april"] = 30;
...
exists, then
If element already exists
key
value
95
Example (cont.)
...
months["may"] = 31;
months["june"] = 30;
...
months["december"] = 31;
cout << february -> " << months[february"] << endl;
map<const char*, int, ltstr>::iterator cur = months.find("june");
map<const char*, int, ltstr>::iterator prev = cur;
map<const char*, int, ltstr>::iterator next = cur;
++next; --prev;
cout << "Previous (in alphabetical order) is " << (*prev).first << endl;
cout << "Next (in alphabetical order) is " << (*next).first << endl;
months["february"] = 29;
cout << february -> " << months[february"] << endl;
}
Implementation of
set and map
Tree node
T
d points
i t to
t it
its predecessor
d
and
d
successor
97
set
map
98
B-Trees
A Tree Data Structure for Disks
99
Database Size
WDCC
6,000 TBs
NERSC
2 800 TBs
2,800
AT&T
323 TBs
Sprint
ChoicePoint
250 TBs
Yahoo!
100 TBs
YouTube
45 TBs
Amazon
42 TBs
Lib
Library
off Congress
C
20 TB
TBs
100
Kilo
Mega
g
Giga
Tera
Peta
Exa
Zeta
x
x
x
x
x
x
x
103
106
109
1012
1015
1018
1021
Needs more
sophisticated disk/IO
machine
101
Secondary Storage
Hardware
Storage capacity
>100 MB to 2-4GB
Giga (109) to
Terabytes (1012) to..
D t persistence
Data
i t
Transient
T
i t (erased
(
d
after process
terminates)
Persistent
P
i t t
(permanently stored)
Data access
speeds
~ a few
f
clock
l k cycles
l
(ie., x 10-9 seconds)
ll
d (10
( -33
milliseconds
sec) =
Data seek time +
be million times slower
read
d time
i couldthan
main memory read
102
103
Why ?
N=10
N
106 lg
l 106 is
i roughly
hl 20
20 disk seeks for each level would be too much!
104
2 keys
3 children
1 2
3 6
4 5
7 8
Height
g of a balanced 3-wayy
search tree?
105
4-way tree
106
Bigger Idea
key2
child0 child
1
child2
keyM-1
childM-3
Cpt S 223. School of EECS, WSU
childM-1
107
Using B-trees
key1
key2
keyM-1
Main memory
M-way B-tree
disk
108
Factors
key1
key2
child0 child
1
How big are
the keys?
keyM-1
child
hild2
childM-3
childM-1
Capacity of a
single
disk block
Tree height
dictates
#disk reads
dominates
Overall
search time
109
Example
key1
key2
child0 child
1
child
hild2
keyM-1
childM-3
childM-1
110
111
B+ trees: Definition
3.
4.
5.
6.
Parameters:
N, M, L
112
B+ tree of order 5
Root
Internal
nodes
Leaves
113
B+ tree of order 5
Index to the Data (store only keys)
Each internal node = 1 disk block
114
115
116
Leaf node
M child pointers
4 x M bytes
M-1
M
1 key entries
(M-1) x K bytes
L x D bytes
117
118
==>
B = 4M + (M-1)K
4 x M + (M-1) x K bytes
M = 1,024
119
Calculating L:
L = floor[ B / D ]
L = 2,048
,
ie., each leaf has to store 1,024 to 2,048
data items
120
121
122
how
= O ( logM #leaves)
123
B+ tree: Insertion
124
125
No more
room
Next: Insert(55)
Empty now
S split
So,
lit the
th previous
i
lleaff iinto
t 2 parts
t
126
127
128
129
130
Summary: Trees
STL set
t and map classes use balanced trees
to support logarithmic insert, delete and
search
S
Search
h tree
t
ffor Di
Disks
k
B+ tree
131