Professional Documents
Culture Documents
Trees
5.1 Introduction
The data type tree is familiar from everyday life. Examples include family trees and the
tree of directories and files on a disk.
A tree consists of a set of nodes that can contain data that are linked by edges. They
satisfy two properties.
1. A tree is connected. It is possible to get from any node to any other by following the
edges.
2. A tree has no loops. There is only one way to get from one node to another.
In computer science a tree always has a root node from which to start. We normally
draw a tree upside-down with the root node at the top.
H:
PP
P
PP
PP
PP
PP
web workspace test.zip
file.html myproject
PPP
PP
PP
PP
PP
file1.java file1.class .project
1
CHAPTER 5. TREES 2
Subtree The subtree below a node is the tree with that node as root and with all its
descendants.
workspace has a subtree of height 3.
h
HH
H
HH
h Hh
HH
@ @
@ @
h @h @h
@ @
A A A
A A A
h Ah Ah h Ah
A A A
Here is some code from AnimalFinder. The data is held in a tree of strings. This
method is called if the user responds yes or no to a question. here represents the place we
have arrived in the tree.
/* * MakeFileTree
* @author C.T. Stretch
*/
p ub li c c l a s s MakeFileTree
CHAPTER 5. TREES 8
{
s t a t i c Tree < String > theTree ;
i f ( s . endsWith ( ext ))
count ++;
}
else
{
Iterator < Tree < String > > it = t . childIterator ();
while ( it . hasNext ())
{
count += countType ( it . next () , ext );
}
}
return count ;
}
}
/* * MyBinaryTree
* Simple implementation of a binary tree
* @author C.T. Stretch
*/
p ub li c c l a s s MyBinaryTree <E > implements BinaryTree <E >
{
p r i v a t e BinaryTree <E > left , right ;
p r i v a t e E data ;
MyBinaryTree ( E data )
{
t h i s . data = data ;
}
p ub li c i n t size ()
{
CHAPTER 5. TREES 10
i n t n =1;
i f ( left != n u l l ) n += left . size ();
i f ( right != n u l l ) n += right . size ();
return n ;
}
p ub li c i n t height ()
{
i n t m =0 , n =0;
i f ( left != n u l l ) m = left . height ();
i f ( right != n u l l ) n = right . height ();
return 1+(( m > n )? m : n );
}
p ub li c E getData ()
{
return data ;
}
p ub li c boolean hasLeft ()
{
return left != n u l l ;
}
p ub li c boolean hasRight ()
{
return right != n u l l ;
}
return right ;
}
p ub li c boolean isLeaf ()
{
return ( left == n u l l )&&( right == n u l l );
}
}
Notice we do not use a separate Node class, the data of tree itself represents the root
node. We can only do this as we do not allow empty trees.
Notice we do not keep a separate size variable, instead we calculate the size when it is
needed. Keeping a size variable would be tricky as the size could be changed if a subtree
was altered.
The height and size are both calculated by recursive methods. The height of a tree is
the maximum of the height of its child subtrees plus one.
The size of a tree is the sum of the sizes of its child subtrees plus one.
/* *
* @author chris
CHAPTER 5. TREES 12
*/
p ub li c c l a s s ListTree <E > implements Tree <E >
{
p r i v a t e E data ;
p r i v a t e List < Tree <E > > children ;
ListTree ( E data )
{
t h i s . data = data ;
children =new ArrayList < Tree <E > >();
}
p ub li c i n t size ()
{
i n t n = 0;
Iterator < Tree <E > > it = children . iterator ();
while ( it . hasNext ())
n += it . next (). size ();
return n +1;
}
p ub li c i n t height ()
{
int n= 0, m;
Iterator < Tree <E > > it = children . iterator ();
while ( it . hasNext ())
{
m = it . next (). height ();
i f (m > n)
n= m;
}
return n +1;
}
p ub li c i n t numberOfChildren ()
{
return children . size ();
}
p ub li c Object getData ()
{
return data ;
CHAPTER 5. TREES 13
G H I J
Inorder traversal (Only for binary trees.) Visit the left child, then the node then the
right child. GDBHEACIFJ.
PreOrderIterator ()
{ push ( t h i s );
}
boolean hasNext ()
{ return ! s . isEmpty ();
}
E next ()
{ BinaryTree <E > t = s . pop ());
i f ( t . hasRight ()) s . push ( t . getRight ());
i f ( t . hasLeft ()) s . push ( t . getLeft ());
return t . getData ();
}
}
If we iterate over the tree above the stack looks like:
CHAPTER 5. TREES 15
A
C B
C E D
C E G
C E
C H
C
F
J I
J
.
The level order traversal can be done using a queue. We start with the root in the
queue at each step we remove a node from the queue, process it and add its children in
order to the queue.
If we iterate over the tree above the queue looks like:
A
C B
E D C
F E D
G F E
H G F
J I H G
J I H
J I
J
.
10 h
@
@
6 h @h12
@
@ @
@ @
4 h @h7 @h14
@ @
@ @
@ @
h @h h @h
@ @
1 5 13 16
10 h
@
@
4 h @h14
@
@ @
@ @
2 h @h12 @h 13
@ @
Adding
Adding a value to a BST is easy. Do a search to try to find the value, if it is not there we
have arrived at a node with no child where our value should be. We just add a new leaf at
that point with the value.
For example to add 11 to the tree in Figure 5.5 we search until we reach 12 which has
no left child, so we add a left child containing 11. To add 3 our search reaches 1 where we
add a right node.
Removing
Removing a leaf node is easy. To remove a non-leaf node we cannot leave a hole where
that node was. To remove a node with only one child we can replace it by that child. In
particular we can remove the smallest or largest node from any subtree as they cannot
have two children. If we have a node with two children we can remove the smallest value
from the right subtree (or the largest from the left subtree) and put its value in the node
to be removed. To find the smallest (or largest) keep going left (or right).
Tree balancing
There are many ways that we can arrange data in a BST. The numbers 1,2,3 can be held
in five different ways.
3 h h 3 h2 1 h h 1
@ @ @
@ @ @
2 h h 1 h @h3 @h @h 2
@ @ @
@ 1 3 @
@ @
1 h @h 2 h2 @h3
@ @
Notice that once we have chosen the shape of the tree we have no choice where to put
the numbers.
With larger data sets we have many more choices of tree.
For searching purposes it is clear that we want to make the height of the tree as small
as possible, as the height is the maximum number of comparisons needed. The worst case
is where the data is in a single chain, such as the outside four cases in Figure 5.7.
A binary tree with levels 1, 2, , k all filled has n = 2k 1 nodes. So the smallest
height for a binary tree with n nodes is about log2 (n).
To achieve a fast search we can use a complete binary tree. A binary tree is complete
if all levels except the last are full, and the last level is filled left to right
CHAPTER 5. TREES 18
5 h 4 h
@ @
@ @
h 7 @h 2 h h
@ @
3 A A
6 @
@
A A @
2 h 4 Ah 6 h 1 h 3 Ah 5 h 7 @h
A A @
The two trees in Figure 5.8 are complete. Notice that if we add a value of 1 to the first
tree we have to get the second tree. This means that we need to move the value in every
node. In other words if we want to keep our tree complete adding a node has order O(n).
This is no better than searching a list with binary search, we have a fast search but a slow
add.
A complete tree is fast to search but slow to maintain. By allowing a bit more flexibility
in the tree we can make it fast to maintain while keeping it fast to search.
We say a BST is balanced if at every node of the tree the height of its two children
differ by no more than one.
x x
@ @
@ @
x @x x @x
@ @
@ @
@ @
x @x
@ x x @x
@
@
@
x x @x
A balanced tree is close enough to a complete tree that the search operation is still
O(log n), but flexible enough that addition is also O(log n).
If you add a node to a balanced tree it may become unbalanced. We shall see that it
can be restored to balance by doing operations called rotations on the tree.
In the diagrams A, B and C are subtrees. Notice that either tree is a BST if the values
in A are less than l, the values in B are between l and r and the values in C are greater
than r. So if we rotate at any node in a BST we still have a BST.
If we have added a node to a balanced tree we can keep it balanced by either a single
rotation or a pair of rotations as in Figure 5.11.
The left right double rotation is the mirror image.
A pair of rotations is needed when the long branch at an unbalanced node has a dog-leg,
CHAPTER 5. TREES 19
r
h l h
@ @
@ @
l h r
@h
@ @
@
@ C A @
@ @
@ @
@ @
A B B C
Right rotation
-
Left rotation
that is in Figure 5.11 we want to rotate left at x, but the left child of z is longer than the
right, so we must first rotate right at z to cure this. In a double rotation the first rotation
is always in a child of the unbalanced node.
Using these operations we can rebalance any tree after adding or removing a node.
A tree with addition and removal methods that rebalance it is called an AVL tree after
Adelson-Velskii and Landis. Searching, adding and removing are all O(log n).
We can use an AVL tree to give another fast sorting algorithm called a tree sort. Put
the data into the tree one at a time, then list them using an inorder traversal. This is
O(n log n).
A sorted set A set of items (each item can appear at most once), where the items have
an ordering and are kept in order. The iterator runs through the items in the order
they are kept in.
A sorted map A map or dictionary stores pairs of objects, the key and the value. The
keys must be all different. From a key you can retrieve its value. For a sorted map
the keys have an ordering and are kept in order.
For a sorted set we store the items in the nodes of our search tree.
For a sorted map we store both the key and the value in the nodes of our search tree.
The Java library provides an interface SortedSet<E> and a class TreeSet<E> that
implements it. To use a SortedSet you can either set a Comparator in the constructor
and use elements that can be compared by this comparator or you can use elements that
implement the Comparable interface. Most of the methods of SortedSet are the same as
CHAPTER 5. TREES 20
x x y
h h h
@ @ @
@ z
@ @ @
@ y z
@h @h x h @h
@
A @ A @ A @
y @ @
@ z
A @
h @h
@ A @
@ A @
@ D B @ A B C D
@ @
@ @
@ @
B C C D
- -
Rotate right at z Rotate left at x
Set, which are the same as Collection. The extra features are that you cant add the same
element twice, and that the iterator returns the elements in order. There are a few extra
methods such as E first() and E last() to return the first and last elements. TreeSet
uses a tree to provide fast O(log n) methods to add and remove elements.
The Library provides an interface SortedMap<K,V> and a class TreeMap<K,V> to im-
plement it.
The K is the type of the keys and V is the type of the values.
Some of the methods of SortedMap are:
// Methods of Map
void clear ();
boolean containsKey ( Object key );
boolean containsValue ( Object value );
V get ( Object key );
boolean isEmpty ()
Set <K > keySet (); // Returns the set of keys .
V put ( K key , V value ); // Set the value for this key
// Returns the previous value or null
V remove ( Object key );
int size ();
Collection <V > values (); // Returns the collection of values
// Methods of SortedMap
K firstKey ();
K lastKey ();
CHAPTER 5. TREES 21
As an example using a sorted set we will look at a program for listing all the different
words in a text file. We use a sorted set of strings created by
p r i v a t e SortedSet < String > set =new TreeSet < String >();
We then find all the words in our file as strings and add them to the set. Remember
if the word is already in it will not be added again. We can then display the set. It will
appear in alphabetical order.
p r i v a t e SortedMap < String , Integer > map =new TreeMap < String , Integer >();
If the word is not yet in the map it is added with frequency 1. If the word is in the
map the frequency is increased by 1.
CHAPTER 5. TREES 22
a : 636
abide : 1
able : 1
about : 93
...
yourself : 10
youth : 6
zealand : 1
zigzag : 1
CHAPTER 5. TREES 23
15 30
! HH
!!! HH
! ! H
! !! HH
H
6 10 20 40
A A A
A A A
A A A
A A A
1 3 8 12 17 23 35 45
Searching a 2-3 tree is as before except we have three choices at some nodes. To find
8 in the example: Look at the root, 8 < 15 so go left, 8 is between 6 and 10 so take the
middle child. Found!
To add to a 2-3 tree find the leaf node where is belongs and add the value. If this node
now has three values split it and move the middle value up. This may require splitting
further nodes up the tree. A new level starts when the root gets split.
In our example first add 9. This just goes in the same leaf as 8. Now add 2. This goes
in the 1,3 leaf which splits pushing the 2 up into the 6, 10 node. This now splits pushing
the 6 up into the root. Finally the root splits pushing the 15 up into a new root.
Notice that to add to a 2-3 tree we need to search down the tree and then add up the
tree.
We will not look at removing a value, which can be done but is complicated.
2-4 trees
2-4 trees can have nodes with 2, 3 or 4 children. 2 and 3 nodes are as above, 4-nodes
have three data values and the subtree values are distributed between them as you would
expect.
CHAPTER 5. TREES 24
15
XXX
XXX
X
X
6 30
PP
HH
PP
HH
P
P
2 10 20 40
A A A A
A A A A
A A A A
A A A A
1 3 89 12 17 23 35 45
In a 2-4 tree we can avoid having to work up and down the tree during addition. We
split the 4 nodes on the way down, then we can always add a value without having to go
back up the tree.
Red-Black trees
A red-black tree is equivalent to a 2-4 tree. Instead of using three different types of node
it only uses binary nodes. 3 or 4 nodes are represented by two or three binary nodes as
in Figure 5.14. We indicate the top node of one of these multiple nodes by marking it in
some way, we think of this as colouring it red.
10
5 10 20 @
@
A @ 5 20
A @
A @ @
A @ @
The Java library uses red-black trees for its TreeMap and TreeSet classes.
B-trees
We can use nodes with larger numbers than 4 children. These are called B-trees (B for
Block) for most purposes these make things slower, as we need to search along the block
of data to find which child to follow. The are useful if our data is so large that we need
to store it on disk. Disks read their data in blocks. so if we make our nodes fill a block
CHAPTER 5. TREES 25
we are using the slow disk operations most efficiently. Large databases use versions of the
B-tree to store their data.