You are on page 1of 63

1

Chapter 7. Multidimensional Data Stru tures

Chapter 3 was devoted to sear hing problems in one-dimensional spa e. In this


hapter we will re onsider these problems in higher dimensional spa e and also
treat a number of problems whi h only be ome interesting in higher dimensions.
Let U be some ordered set and let S  U d for some d. An element x 2 S is a d-tuple
(x0 ; : : : ; xd 1 ). The simplest sear hing problem is to spe ify a point y 2 U d and to
ask whether y 2 S ; this is alled an exa t mat h query and an in prin iple be solved
by methods of Chapter 3. Order U d by lexi ographi order and use a balan ed sear h
tree. A very general form of query is to spe ify a region R  U d and to ask for
all points in R \ S . General region queries an only be solved by exhaustive sear h
of set S . Spe ial and more tra table ases are obtained by restri ting the query
region R to some sub lass of regions. Restri ting R to polygons gives us polygon
sear hing, restri ting it further to re tangles with sides parallel to the axis gives
us range sear hing, and nally restri ting the lass of re tangles even further gives
us partial mat h retrieval. In one-dimensional spa e balan ed trees solve all these
problems eÆ iently. In higher dimensions we will need di erent data stru tures
for di erent types of queries; d-dimensional trees, range trees and polygon trees
are therefore treated in 7.2. There is one other major di eren e to one-dimensional
spa e. It seems to be very diÆ ult to deal with insertions and deletions; i.e., the data
stru tures des ribed in 7.2 are mainly useful for stati sets. No eÆ ient algorithms
are known as of today to balan e these stru tures after insertions and deletions.
However, there is a general approa h to dynamization whi h we treat in 7.1. It is
appli able to a wide lass of problems and yields reasonably eÆ ient dynami data
stru tures.
In Se tion 7.2.3 we dis uss lower bounds. We will rst prove a lower bound
on the omplexity of partial mat h retrieval where no redundan y in storage spa e
is allowed. The lower bound implies the optimality of d-dimensional trees. The
se ond lower bound relates the omplexity of insertions, deletions and queries with
a ombinatorial quantity. The spanning bound implies the optimality of range trees
and near-optimality of polygon trees.
Multidimensional sear hing problems appear in numerous appli ations, most
notably database systems. In these appli ations U is an arbitrary ordered set, e.g., a
set of names or a set of possible in omes. Region queries arise in these appli ations
in a natural way; e.g., in a database ontaining information about persons, say
name, in ome and number of hildren, we might ask for all persons with
# hildren = 3 , a partial mat h query;
# hildren = 3, 1000 in ome  2000 , a range query;
in ome = 1000 + 1000  # hildren , a polygon query.

Version: 19.10.99 Time: 11:39 {1{


2
7.1. A Bla k Box Approa h to Data Stru tures

In Chapter 3 we designed lever data stru tures for variants of the di tionary prob-
lem. With a little bit of unfairness one might say that all we did in Chapter 3 is the
following: We started with binary sear h on sorted arrays and generalized it in two
dire tions. First we generalized to weighted stati trees in order to ope with inser-
tions and deletions. Finally we ombined both extensions and arrived at weighted
dynami trees. Suppose now that we want to repeat the generalization pro ess for
a di erent data stru ture, say interpolation sear h. Do we have to start all over
again or an we pro t from the development of Chapter 3? In this se tion we will
des ribe some general te hniques for generalization: dynamization and weighting.
We start out with a stati solution for some sear hing problem, i.e., a solution
whi h only supports queries, but does support neither insertions and deletions nor
weighted data. Then dynamization is a method whi h allows us to also support
insertions and deletions, weighting is a method whi h allows us to support queries
to weighted data and nally weighted dynamization ombines both extensions. Of
ourse, we annot hope to arrive at the very best data stru ture by only applying
general prin iples. Nevertheless, the general prin iples an give us very qui kly fully
dynami solutions with reasonable running time. Also there are data stru tures,
e.g., d-dimensional trees, where all spe ial purpose attempts of dynamization have
failed.
Binary sear h on sorted arrays will be our running example. Given a set of n
elements one an onstru t a sorted array in time O(n log n) (prepro essing time
is O(n log n)), we an sear h the array in time O(log n) (query time is O(log n)),
and the array onsumes spa e O(n) (spa e requirement is O(n)). Dynamization
produ es a solution for the di tionary problem (operations Insert, Delete, Member)
with running time O(log n) for Inserts and Deletes and O(log2 n) for Member. Thus
Inserts and Deletes are as fast as in balan ed trees but queries are less eÆ ient.
Weighting produ es weighted stati di tionaries with a ess time O(log 1=p) for an
element of probability p. This is the same order of magnitude as the spe ial purpose
solution of Se tion 3.4, the fa tor of proportionality is mu h larger though. Finally
weighted dynamization produ es a solution for the weighted, dynami di tionary
problem with running time O((log 1=p)2 ) for Member operations and running time
O(log 1=p) for Insert, Delete, Promote and Demote operations. Note that only
a ess time is worse than what we obtained by dynami weighted trees in 3.6.
Although sorted arrays are our main example, they are not an important ap-
pli ation of our general prin iples. The most important appli ations are data stru -
tures for higher dimensional sear hing problems des ribed in this hapter. In many
of these ases only stati solutions are known and all attempts to onstru t dynami
or weighted solutions by spe ial purpose methods have failed so far. The only dy-
nami or weighted solutions known today are obtained by applying the general
prin iples des ribed in this se tion.

Version: 19.10.99 Time: 11:39 {2{


7.1.1. Dynamization 3
7.1.1. Dynamization

We start with a de nition of sear hing problem.

De nition: Let T1 , T2 , T3 be sets. A sear hing problem Q of type T1 , T2 , T3 is a


fun tion Q : T1  2T2 ! T3 .

A sear hing problem takes a point in T1 and a subset of T2 and produ es an answer
in T3 . There are plenty of examples. In the member problem we have T1 = T2 ,
T3 = ftrue; falseg and Q(x; S ) = \x 2 S ". In the nearest neighbor problem in the
plane we have T1 = T2 = R2 , T3 = R and Q(x; S ) = Æ(x; y), where y 2 S and
Æ(x; y)  Æ(x; z ) for all z 2 S . Here Æ is some metri . In the inside the onvex hull
problem we have T1 = T2 = R2 , T3 = ftrue; falseg and Q(x; S ) = \is x inside the
onvex hull of point set S ". In fa t, our de nition of sear hing problem is so general
that just about everything is a sear hing problem.
A stati data stru ture S for a sear hing problem supports only query oper-
ation Q, i.e., for every S  T2 one an build a stati data stru ture S su h that
fun tion Q(x; S ) : T1 ! T3 an be omputed eÆ iently. We deliberately use the
same name for set S and data stru ture S be ause the internal workings of stru -
ture S are of no on ern in this se tion. We asso iate three measures of eÆ ien y
with stru ture S , query time QS , prepro essing time PS and spa e requirement SS .
QS (n) = time for a query on a set of n points using data stru ture S .

PS (n) = time to build S for a set of n points.

SS (n) = spa e requirement of S for a set of n points.


We assume throughout that QS (n), PS (n)=n and SS (n)=n are nonde reasing.
A semi-dynami data stru ture D for a sear hing problem supports in addition
operation Insert, i.e., we annot only query D but also insert new points into D.
A dynami stru ture supports Insert and Delete. We use the following notation for
the resour e requirements of D.
QD (n) = time for a query on a set of n points using stru ture D.

SD (n) = spa e requirement of D for a set of n points.

ID (n) = time for inserting a new point into a set of n points stored in D.

ID (n) = amortized time for n-th insertion, i.e., (maximal total time spent on exe uting
insertions in any sequen e of n operations starting with the empty set)=n.

DD (n) = time for deleting a point from a set of n points stored in D.

Version: 19.10.99 Time: 11:39 {3{


4
D D (n) = amortized time for n-th deletion, i.e., (maximal total time spent on exe uting
deletions in any sequen e of n operations (insertions, deletions, queries)
starting with the empty set)=n.
We will next des ribe a general method for turning stati data stru tures into semi-
dynami data stru tures. This method is only appli able to a sub lass of sear hing
problems, the de omposable sear hing problems.
De nition: A sear hing problem Q of type T1 ; T2 ; T3 is de omposable if there
is a binary operation t : T3  T3 ! T3 su h that for all S  T2 and all partitions
A; B of S , i.e., S = A [ B , A \ B = ;, and all x 2 T1 :
Q(x; S ) = t(Q(x; A); Q(x; B )):
Moreover, t is omputable in onstant time.
In de omposable sear hing problems we an put together the answer to a query
with respe t to set S from the answers with respe t to pie es A and B of S using
operator t. We note as a onsequen e of the de nition of de omposability that T3
with operation t is basi ally a ommutative semigroup with unit element Q(x; ;).
The member problem is de omposable with t = or, the nearest neighbor problem
is de omposable with t = min. However, the inside the onvex hull problem is not
de omposable.
Theorem 1. Let S be a stati data stru ture for a de omposable sear hing prob-
lem Q. Then there is a semi-dynami solution D for Q with
QD (n) = O(QS (n)  log n);
SD (n) = O(SS (n));
ID (n) = O((PS (n)=n)  log n):

Proof : The proof is based on a simple yet powerful idea. At any point of time the
dynami stru ture onsists of a olle tion of stati data stru tures for parts of S ,
i.e., set S is partioned into blo ks Si . Queries are answered by querying the blo ks
and omposing the partial answers by t. Insertions are dealt with by suitably
ombining blo ks. P
P???
The details are as follows. Let S be any set of n elements and let n = i=0 ai 2i ,
i=0 ai 2 f0; 1g, be the binary representation of n. Let S0 ; S1 ; : : : be any partition of S
with jSi j = ai 2i , 0  i  log n. Then stru ture D is just a olle tion of stati data
stru tures, one for ea h non-empty Si .
The spa e requirement of D is easily omputed as
X X
SD (n) = SS (ai 2i ) = (SS (ai 2i )=ai 2i )  ai 2i
i i
X
 (SS (n)=n)  ai 2i = SS (n):
i

Version: 19.10.99 Time: 11:39 {4{


7.1.1. Dynamization 5
basi assumption that SS (n)=n is nonde reasing.
The inequality follows from our F
Next note that Q(x; S ) = 0ilog n Q(x; Si ) and that there are never more
than log n non-empty Si 's. Hen e a query an be answered in time
X
log n + QS (ai 2i )  log n  (1 + QS (n)) = O(log n  QS (n)):
i
P
Finally onsider operation Insert(x; S ). Let n + 1 = i 2i and let j be su h that
j = 0, j 1 = j 2 = : : : = 0 = 1. Then j = 1, j 1 = : : : = 0 = 0.
We pro ess the (n + 1)-th insertion by taking the new point x and the 2j 1 =
Pj 1 i
i=0 2 points stored in stru tures S0 ; S1 ; : : : ; Sj 1 and onstru ting a new stati
data stru ture for fxg[ S0 [ S1 [  [ Sj 1 . Thus the ost of the (n +1)-st insertion
is PS (2j ). Next note that a ost of PS (2j ) has to be paid after insertions 2j  (2  l +1),
l = 0; 1; 2; : : : ; and hen e at most n=2j times during the rst n insertions. Thus the
total ost of the rst n insertions is bounded by
blog
Xn blog
Xn
PS (2j )  n=2j n PS (n)=n  PS (n)  (blog n + 1):
j =0 j =0

Hen e ID (n) = O((PS (n)=n)  log n).

Let us apply Theorem 1 to binary sear h on sorted arrays. We have SS (n) = n,


QS (n) = log n and PS (n) = n log n. Hen e we obtain a semi-dynami solution for
the member problem with SD (n) = O(n), QD (n) = (log n)2 and ID (n) = (log n)2 .
A tually, the bound on ID (n) is overlay pessimisti . Note that we an merge two
sorted arrays in linear time. Hen e we an onstru t a sorted array of size 2k out of
a point and sorted arrays of size 1; 2; 4; 8; : : : ; 2k 1 in time O(2k ) by rst merging
the two arrays of length 1, obtaining an array of length 2, merging it with the array
of length 2, : : : . Plugging this bound into the bound on ID (n) derived above yields
ID (n) = O(log n).
There are other situations where the bounds stated in Theorem 1 are overlay
pessimisti . If either QS (n) or PS (n)=n grow fast, i.e., is of order at least n for
some  > 0, then better bounds hold. Suppose for example that QS (n) = (n ) for
some  > 0. Then ( f. the proof of Theorem 1)
X
QD (n) = QS (ai 2i )
i
blog
Xn
= QS (2blog n )  QS (ai 2i )=QS (2blog n )
i=0
 blog
Xn 
= n  2i
( blog n ) = (n ) = (QS (n)):
i=0

Version: 19.10.99 Time: 11:39 {5{


6
Thus if either QS (n) or PS (n)=n grows fast then the log n fa tor in the orresponding
bound on QD (n) or ID (n) an be dropped.
The bound on insertion time derived in Theorem 1 is amortized. In fa t, the
time required to pro ess insertions u tuates widely. More pre isely, the 2k -th
insertion takes time PS (2k ), a non-trivial amount of time indeed. Theorem 2 shows
that we an turn the amortized time bound into a worst ase time bound without
in reasing the order of query time and spa e requirement.
Theorem 2. Let S be a stati data stru ture for a de omposable sear hing prob-
lem. Then there is a semi-dynami data stru ture D with

QD (n) = O(QS (n)  log n);


SD (n) = O(SS (n));
ID (n) = O(PS (n)=n  log n):

Proof : The basi idea is to use the onstru tion of Theorem 1, but to spread work
over time. More pre isely, whenever a stru ture of size 2k has to be onstru ted we
will spread the work over the next 2k insertions. This will have two onsequen es.
First, the stru ture will be ready in time to pro ess an over ow into a stru ture of
size 2k+1 and se ond, the time required to pro ess a single insertion is bounded by
blog
Xn
P (2k )=2k = O(P (n)=n  log n):
k=0

The details are as follows. The dynami stru ture D onsists of bags BA0 ; BA1 ,
: : : . Ea h bag BAi ontains at most three blo ks Biu [1℄, Biu [2℄ and Biu [3℄ of size 2i
that are \in use" and at most one blo k Bi of size 2i that is \under onstru tion".
More pre isely, at any point of time blo ks Biu [j ℄, i  0, 1  j  3, form a partition
of set S , and stati data stru tures are available for them. Furthermore, the stati
data stru ture for blo k Bi is under onstru tion. Blo k Bi is the union of two
blo ks Biu 1 [j ℄. We pro eed as follows. As soon as two Biu 's are available, we start
building a Bi +1 of size 2i+1 out of them. The work is spread over the next 2i+1
insertions, ea h time doing PS (2i+1 )=2i+1 steps of the onstru tion. When Bi +1 is
nished it be omes a Biu+1 and the two Biu 's are dis arded. We have to show that
there will be never more than three non-empty Biu 's.
Lemma 1. When we omplete a Bi and turn it into a blo k in use there are at
most two non-empty Biu 's.

Proof : Consider how blo ks in BAi develop. Consider the moment, say after the
t-th insertion, when BAi ontains two Biu 's and we start building a Bi +1 out of
them. The onstru tion will be nished 2i+1 insertions later. Observe that BAi got
Version: 19.10.99 Time: 11:39 {6{
7.1.1. Dynamization 7
a se ond Biu be ause the onstru tion of Bi was ompleted after the t-th insertion
and hen e Bi was turned into a Biu . Thus Bi was empty after insertion t and it
will take exa tly 2i insertions until it is full again and hen e gives rise to a third Biu
and it will take another 2i insertions until it gives rise to a fourth Biu . Exa tly at
this point of time the onstru tion of Biu+1 is ompleted and hen e two Biu 's are
dis arded. Thus we an start a new y le with just tow Biu 's ompleted.
It follows from Lemma 1 that there will be never more than three Biu 's and one Bi
for any i. Hen e
SD (n) = O(SS (n));
QD (n) = O(QS (n)  log n) and
blog
Xn
ID (n) = O(PS (2i )=2i ) = O(PS (n)=n  log n):
i=0
The remarks following Theorem 1 also apply to Theorem 2. The \logarithmi "
dynamization method des ribed above has a large similarity to the binary number
system. The a tions following the insertion of a point into a dynami stru ture of n
elements are in omplete analogy to adding a 1 to integer n written in binary. The
main di eren e is the ost of pro essing a arry. The ost is P (2k ) for pro essing
a arry from the k-th position in logarithmi dynamization, whilst it is O(1) in
pro essing integers. The analogy between logarithmi dynamization and the binary
number system suggests that other number systems give rise to other dynamization
methods. This is indeed the ase. For example, for every k one an uniquely write
every integer n as
k  
X ai
n=
i=1 i
with i 1  ai and a1 < a2 <    < ak (Exer ise 1). This representation gives
rise to k-binomial transformation. We represent
 a set S of n elements by k stati
stru tures, the i-th stru ture holding aii elements. Then QD (n) = O(QS (n)  k)
and ID (n) = O(k  n1=k  PS (n)=n) (Exer ise 1). More generally we have
Theorem 3. Let S be any stati data stru ture for a de omposable sear hing
problem and let k : N ! N be any \smooth" fun tion. Then there is a semi-
dynami data stru ture D su h that
a) if k(n) = O(log n) then
QD (n) = O(k(n)  QS (n));
ID (n) = O(k(n)  n1=k(n)  PS (n)=n):
b) if k(n) =
(log n) then
QD (n) = O(k(n)  QS (n));
ID (n) = O(log n= log(k(n)= log n)  PS (n)=n):
Version: 19.10.99 Time: 11:39 {7{
8
Proof : The proof an be found in K. Mehlhorn, M.H. Overmars: \Optimal Dy-
namization of De omposable Sear hing Problems", IPL 12 (1981), 93{98. The
details on the de nition of smoothness an be found there; fun tions like log n,
log log n, log log log n, n, (log n)2 are smooth in the sense of Theorem 3. The proof
is outlined in Exer ise 2.

Let us look at some examples. Taking k(n) = log n gives the logarithmi trans-
formation (note that n1= log n = 2), k(n) = k yields an analogue to the k-binomial
transformation, k(n) = k  n1=k yields a transformation with QD (n) = O(k  n1=k 
QS (n)) and ID (n) = O(kPS (n)=n), a dual to the k-binomial transformation, and
k(n) = (log n)2 yields a transformation with QD (n) = O((log n)2  QS (n)) and
ID (n) = O((log n= log log n)  PS (n)=n). Again it is possible to turn amortized time
bounds into worst ase time bounds by the te hniques used in the proof in Theo-
rem 2. The interesting fa t about Theorem 3 is that it des ribes exa tly how far
we an go by dynamization.

Theorem 4. Let h; k : N be fun tions. If there is a dynamization method whi h


turns every stati data stru ture S for any de omposable sear hing problem into a
dynami data stru ture D with QD = k(n  K  QS (n)) and ID (n) = h(n)  P (n)=n
then h(n) =
(OP (k)(n)) where

(k)(n) = klog n= log(k(n)= log n) ,if k(n) > 2  log n;
(n)1=k(n) ,if k(n)  2  log n.
OP

Proof : The proof an be found in K. Mehlhorn: \Lower Bounds on the EÆ ien y


of Transforming Stati Data Stru tures into Dynami Data Stru tures", Math. Sys-
tems Theory 15, 1{16 (1981).

Theorem 4 states that there is no way to onsiderably improve upon the results of
Theorem 3. There is no way to de rease the order of the query penalty fa tor (=
QD (n)=QS (n)) without simultaneously in reasing the order of the update penalty
fa tor (= ID (n)  n=PS (n)) and vi e versa. Thus all ombinations of query and
update penalty fa tor des ribed in Theorem 3 are optimal. Moreover, all optimal
transformations an be obtained by an appli ation of Theorem 3.
Turning stati into semi-dynami data stru tures is ompletely solved by The-
orems 1 to 4. How about deletions? Let us onsider the ase of the sorted array
rst. At rst sight deletions from sorted arrays are very ostly. After all, we might
have to shift a large part of the array after a deletion. However, we an do a \weak"
deletion very qui kly. Just mark the deleted elements and sear h as usual. As long
as only a few, let's say no more than 1=2 of the elements are deleted, sear h time is
still logarithmi in the number of remaining elements. This leads to the following
de nition.
Version: 19.10.99 Time: 11:39 {8{
7.1.1. Dynamization 9
De nition: A de omposable sear hing problem together with its stati stru ture S
is deletion de omposable i , whenever S ontains n points, a point an be deleted
from S in time DS (n) without in reasing the query time, deletion time and storage
required for S .
We assume that DS (n) is non-de reasing. The Member problem with stati stru -
ture sorted array is deletion de omposable with DS (n) = log n, i.e., we an delete
an arbitrary number of elements from a sorted array of length n and still keep query
and deletion time at log n. Of ourse, if we delete most elements then log n may be
arbitrarily large as a fun tion of the a tual number of elements stored.
Theorem 5. Let sear hing problem Q together with stati stru ture S be deletion
de omposable. Then there is a dynami stru ture D with
QD (n) = O(log n  QS (8  n));
SD (n) = O(SS (8  n));
ID (n) = O(log n  PS (n)=n);
D D (n) = O(PS (n)=n + DS (n) + log n):

Proof : The proof is a re nement of the onstru tion used in the proof of Theorem 1.
Again we represent a set S of n elements by a partition B0 , B1 , B2 , : : : . We
somewhat relax the ondition on the size of blo ks Bi ; namely, a Bi is either empty
or 2i 3 < jBi j  2i . Here jBi j denotes the a tual number of elements in blo k Bi .
Bi may be stored in a stati data stru ture whi h was originally onstru ted for
more points but never more than 2i points. In addition, we store all points of S in
a balan ed tree T . In this tree we store along with every element a pointer to the
blo k Bi ontaining the element. This will be useful for deletions. We also link all
elements belonging to Bi , i  0, in a linear list.
Sin e jBi j  2i 3 there are never more than log n + 3 non-empty blo ks. Also
sin e the stru ture ontaining Bi might have been onstru ted for a set eight times
the size we have
QD (n)  QS (8  n)  (log n + 3) = O(QS (8  n) log n):
Also X
SD (n)  SS (8  jBi j)
i
X
= SS (8  jBi j)=(8  jBi j)  8  jBi j
i
X
 SS (8  n)=8  n  8  jBi j
i
= SS (8  n):
Version: 19.10.99 Time: 11:39 {9{
10
It remains to des ribe the algorithms for insertion and deletion. We need two
de nitions rst. A non-empty blo k Bi is deletion-safe if jBi j  2i 2 and it is safe
if 2i 2  jBi j  2i 1 .
Insertions are pro essed as follows. After an insertion of a new point x we
nd the least k su h that 1 + jB0 j +    + jBk j  2k . We build a new stati data
stru ture Bk for fxg [ B0 [    [ Bk in time PS (2k ) and dis ard the stru ture
for blo ks B1 ; : : : ; Bk . In addition we have to update the di tionary for a ost
of O(log n + 2k ), log n for inserting the new point and 2k for updating the new
information asso iated with the points in the new Bk . Note that time O(1) per
element suÆ es if we hain all elements whi h belong to the same blo k in a linked
list.
Lemma 2. Insertions build only deletion-safe stru tures.
Proof : This is obvious if k = 0. If k > 0 then 1 + jB0 j +    + jBk 1 j > 2k by the
1

hoi e of k and hen e the laim follows


The algorithm for deletions is slightly more diÆ ult. In order to delete x we
rst use the di tionary to lo ate the blo k, say Bi , whi h ontains x. This takes
time O(log n). Next we delete x from Bi in time DS (2i ). If jBi j > 2i 3 or jBi j is
empty after the deletion then we are done. Otherwise, jBi j = 2i 3 and we have to
\rebalan e". If jBi 1 j > 2i 2 then we inter hange blo ks Bi and Bi 1 . This will
ost O(jBi j + jBi 1 j) = O(2i ) steps for hanging the di tionary; also Bi and Bi+1
are safe after the inter hange. If jBi 1 j  2i 2 then we join Bi 1 and Bi | the
resulting set has size at least 2i 3 and at most 2i 3 + 2i 2  2i 1 |and onstru t
either a new Bi 1 (if jBi 1 [ Bi j < 2i 2 ) or a new Bi (if jBi 1 [ Bi j  2i 2 ). This
will ost at most PS (2i ) + O(2i ) = O(PS (si )) time units; also Bi 1 and Bi are safe
after the deletion.
Lemma 3. If a deletion from Bi auses jBi j = 2i 3
then Bi 1 and Bi are safe after
restru turing.

Proof : Immediate from dis ussion above.

Lemma 4. D D (n) = O(PS (m)=m + DS (m) + log m), here m is the maximal size
of set S during the rst n updates.

Proof : By Lemmas 2 and 3 only deletion-safe blo ks are built. Hen e at least 2i 3
points have to be deleted from a blo k Bi before it auses restru turing after a
deletion. Hen e the ost for restru turing is at most 8  PS (m)=m per deletion. In
addition, log m time units are required to update the di tionary and DS (m) time
units to a tually perform the deletion.

Version: 19.10.99 Time: 11:39 {10{


7.1.1. Dynamization 11
Lemma 5. ID (n) = 4  PS (m)=m log m.
Proof : Consider any sequen e of n insertions and deletions into an initially empty
set. Suppose that we build a new Bk after the t0 -th update operation and that this
update is an insertion. Then B0 ; B1 ; : : : ; Bk 1 are empty after the t0 -th update.
Suppose also that the next time a Bl , l  k, is onstru ted after an insertion
is after the t1 -th update operation. Then immediately before the t1 -th update
1 + jB0 j +    + jBk 1 j > 2k 1 .
We will show that t1 t0  2k 2 . Assume otherwise. Then at most 2k 2 1
points in B0 [    [ Bk 1 are points whi h were inserted after time t0 . Hen e at
least 2k 2 points must have moved from Bk into B0 [    [ Bk 1 by restru turing
after a deletion. However, the restru turing algorithm onstru ts only deletion-
safe stru tures and hen e Bk an under ow (jBk j = 2k 3 ) at most on e during a
sequen e of 2k 2 updates. Thus at most 2k 3 < 2k 2 points an move from Bk
down to B0 [    [ Bk 1 between the t0 -th and the t1 -th update, a ontradi tion.
Thus t1 t0  2k 2 .
In parti ular, a new Bk is onstru ted at most n=2k 2 times after an insertion
during the rst n updates. The onstru tion of a new Bk has ost PS (2k ) for
building the Bk , O(2k ) for updating the di tionary and log m for inserting the new
point into the di tionary. Hen e
log m  !
X
ID (n) = O (PS (2k )  n=2k 2
+ 2k  n=2k ) + n  log m =n
k=0
= O(log m  PS (m)=m):
In our example (binary sear h on sorted arrays) we have QS (n) = DS (n) = log n
and PS (n) = n ( f. the remark following Theorem 1). Hen e QD (n) = O((log n)2 )
and ID (n) = D D (n) = O(log n). There is something funny happening here. We
need balan ed trees to dynamize sorted arrays. This is not a serious obje tion.
We ould do away with balan ed trees if we in rease the time bound for deletes to
O((log n)2 ). Just use the Member instru tion provided by the data stru ture itself
to pro ess a Delete.
Theorem 5 an be generalized in several ways. Firstly, one an turn amortized
bounds into worst ase bounds and se ondly one an hoose any of the transforma-
tions outlined in Theorem 3. This yields.
Theorem 6. Let sear hing problem Q together with stati stru ture S be deletion-
de omposable, and let k(n) be any smooth fun tion. Then there is a dynami
stru ture D with
QD (n) = O(k(n)  QS (n));
DD (n) = O(log n + PS (n)=n + DS (n));

ID (n) = OO(log n= log(k(n)= log n)  PS (n)=n; ,if k(n) =
(log n);
(k(n)  n1=k(n)  PS (n)=n); ,if k(n) = O(log n).

Version: 19.10.99 Time: 11:39 {11{


12

Proof : The proof ombines all methods des ribed in this se tion so far. It an
be found in M.H. Overmars/J.v. Leeuwen: \Worst Case Optimal Insertion and
Deletion Methods for De omposable Sear hing Problems, IPL 12 (1981), 168{173.

7.1.2. Weighting and Weighted Dynamization

In this se tion we des ribe weighting and then ombine it with dynamization de-
s ribed in the previous se tion. This will give us dynami weighted data stru tures
for a large lass of sear hing problems.

De nition: A sear hing problem Q : T1  2T2 ! T3 is monotone de omposable


if there are fun tions q : T3 ! ftrue; falseg and t : T3  T3 ! T3 su h that for all
x 2 T1 , S  T2 and all partitions A; B of S , i.e., A [ B = S , A \ B = ;:

Q(x; S ) = if q(Q(x; A)) then Q(x; A) else t (Q(x; A); Q(x; B )) fi

Again, there are plenty of examples. Member is monotone de omposable with q the
identity and t = or. -diameter sear h, i.e., Q((x; ); S ) = true if 9y 2 S : Æ(x; y) 
 is monotone de omposable with q the identity and t = or. Also orthogonal range
sear hing is monotone de omposable. Here T2 = R2 , T1 = all re tangles with sides
parallel to the axis and Q(R; S ) = (jR \ S j  1).
A query Q(x; S ) is su essful if there is a y 2 S su h that q(Q(x; fyg)). If
Q(x; S ) is su essful then any y 2 S with q(Q(x; fyg)) is alled a witness for x
(with respe t to S ). If y is a witness for x then Q(x; S ) = Q(x; fyg [ (S fyg)) =
if q(Q(x; fyg)) then Q(x; fyg) else : : : fi = Q(x; fyg).
Weighting is restri ted to su essful sear hes (but f. Exer ise 6). Let S =
fy1 ; : : : ; yn g  T2 and let  be a probability distribution on Su = fx 2 T1 ; Q(x; S )
is su essfulg. We de ne a reordering  of S and a dis rete probability distribution
p1 ; : : : ; pn on S as follows. Suppose that (1); : : : ; (k 1) and p1 ; : : : ; pk 1 , are
already de ned. For yj 2 S fy(1) ; : : : ; y(k 1) g let p(yj ) =  fx 2 Su ; yj is a
witness for x but none of y(1) ; : : : ; y(k 1) isg. De ne (k) = j su h that p(yj ) is
maximal and let pk = p(y(k) ). Then p1  p2      pn . We assume from now on
that S is reordered su h that  is the identity. Then pk is the probability that yk
is witness in a su essful sear h but none of y1 ; : : : ; yk 1 is.
Version: 19.10.99 Time: 11:39 {12{
7.1.2. Weighting and Weighted Dynamization 13
Theorem 7. Let Q be any monotone de omposable sear hing problem and suppose
that we have a stati data stru ture with query time QS (n), QS (n) non-de reasing,
for Q. Let S = fy1 ; : : : ; yn g  T2 , and let ; p1 ; : : : ; pn be de ned as above. Then
there is a weighted data stru ture W for Q where the expe ted time of a su essful
sear h is at most X
4 pi  QS (i)  4  pi  QS (1=pi ):
i

Proof : De ne f : N0 ! N0 by f (0) = 0 and QS (f (i)) = 2i for i  1. Then f (i)


is in reasing. We divide set S into blo ks B1 ; B2 ; : : : ; where Bi = fyj ; f (i 1) <
j  f (i)g. Then W onsists of a olle tion of stati data stru tures, one for ea h
Bi . A query Q(x; S ) is answered by the following algorithm.

i := 0;
repeat i := i + 1 until Q(x; Bi ) is su essful od;
output Q(x; Bi ).
Program 1

The orre tness of this algorithm is immediate from the de nition of monotone
de omposability. It remains to ompute the expe ted query time of a su essful
query. Let Su j = fx 2 T1 ; yj is a witness for x but y1 ; : : : ; yj 1 are notg. Then
pj = (Su j ). The ost of a query Q(x; S ) for x 2 Su j and f (i 1) < j  f (i) is
i
X i
X
QS (f (h) f (h 1))  QS (f (h))
h=1 h=1
Xi
= 2h
h=1
 4  2i 1

= 4  QS (f (i 1))
 4  QS (j ):
Thus the expe ted ost of a su essful query is
X X
4 pj  QS (j )  4  pj  QS (1=pj ):
j j

The last inequality follows from p1  p2     and hen e 1=pj  j .


Version: 19.10.99 Time: 11:39 {13{
14
It is worthwhile to go through our examples at this point. Let us look at the member
problem rst. If we use sorted arrays and binary sear h then QS (n) = log n and
hen e the expe ted time for a su essful sear h is
X X
4 pi  log i  4  pi  log 1=pi :
i i

This bound relates quite ni ely withPthe bounds derived in Chapter 3.4 on weighted
trees. There we derived a bound of i pi log 1=pi +1 on the expe ted sear h time in
weighted trees. Thus the bound derived by weighting is about four times the entire
truth. The bound of 4  log i derived now on individual sear hes an sometimes be
onsiderably better than the bound of log 1=pi derived in 3.4 ( f. Exer ise 3).
Binary sear h is not the only method for sear hing sorted arrays. If the keys
are drawn from a uniform distribution then interpolation sear h is a method with
O(log log n) expe ted query time. If the weights of keys are independent of key
values then every blo k Bi is a random sample drawn from a uniform distribution
and hen e the expe ted time of a su essful sear h is
X X
O( pi  log log i) = O( pi  log log 1=pi )
i i

( f. Exer ise 4).


Let us nally look at orthogonal range sear hing in two-dimensional
p spa e. 2-
dimensional trees ( f. 7.2.1 below)P are apsolution with QS (n) = n. The weighting
yields an expe ted sear h time of i pi i.
The onstru tion used in the proof of Theorem 7 is optimal among a large lass
of algorithms, namely all algorithms whi h divide set S into blo ks, onstru t stati
data stru tures for ea h blo k, and sear h through these blo ks sequentially in some
order independent of the a ording probability. If an element in the i-th blo k has
higher probability than an element in the (i 1)-st blo k then inter hanging the
elements will redu e average sear h time. Thus a sear h with witness yi (re all that
S is reordered su h that p1  p2    ) must ertainly have ost QS (n1 ) +    +
QS (nk ) where n1 +    + nk  1. If we assume that QS (x + y)  QS (x) + QS (y) for
all x; y, i.e., Q is subadditive, then Q(n1 )+    QS (nk )  QS (n1 +    + nk )  QS (i).
Hen e a sear h with witness yi has ost at least QS (i) under the modest assumption
of subadditivity of QS . Thus the onstru tion used in the proof of Theorem 7 is
optimal be ause it a hieves query time O(QS (i)) for a sear h with witness yi for
all i.
We lose this se tion by putting all on epts together. We start with a stati
data stru ture for a monotone and deletion de omposable sear hing problem Q and
then use dynamization and weighting to produ e a dynami weighted data stru -
ture W for Q. W supports queries on a weighted set S , i.e., a set S = fy1 ; : : : ; yn g
and weight fun tion w : S ! N with query time depending on weight. It also
supports operations Promote (y; a) and Demote (y; a), y 2 T2 , a 2 N. Promote (y; a)
in reases the weight of element y by a and Demote (y; a) de reases the weight of y
Version: 19.10.99 Time: 11:39 {14{
7.1.2. Weighting and Weighted Dynamization 15
by a. Insert and Delete are spe ial ases of Promote and Demote ( f. 3.6 for the
spe ial ase: Q = Member).
We obtain W in a two step pro ess. In the rst step we use dynamization
and turn S into a dynami data stru ture D with QD (n) = QS (n)  log n and
UD (n) = max(ID (n); DD (n)) = O(PS (n)=n log n + DS (n)) ( f. Theorem 6). UD (n)
is the time to perform an update (either Insert or Delete) on D. In the se ond
step we use weighting and turn D into a weighted stru ture W . More pre isely, we
de ne f by QD (f (n)) = 2n and store a set S = fy1 ; : : : ; yn g by utting it into blo ks
as des ribed in Theorem 7, i.e., blo k Bi ontains all yj with f (i 1) < j  f (i).
Here we assumed w.l.o.g. that w(y1 )  w(y2 )      w(yn ). This suÆ es to
support queries. A query in WP with witness y 2 S takes time O (QD (w(S )=w(y )))
by Theorem 7. Here w(S ) = fw(y); y 2 S g.
We need to add additional data stru tures in order to support Promote and
Demote . We store set S in a weighted dynami tree ( f. 3.6) T . Every element in
tree T points to the blo k Bi ontaining the element. Furthermore, we keep for
every blo k Bi the weights of the points in Bi in a balan ed tree. This allows us to
nd the smallest and largest weight in a blo k fast.
We are now in a position to des ribe a realization of Promote (y; a). We rst
use the weighted dynami tree to nd the blo k whi h ontains y. This takes time
O(log w(S )=w(y)). Suppose that blo k Bi ontains y. We then run through the
following routine.

(1) delete y from blo k Bi ; h := i;


(2) while w(y) + a > maximal weight of any element in Bh
(3) do delete the element with maximal weight from Bh 1 and insert it into Bh ;
(4) h h 1;
od
(5) insert y into Bh
Program 2

The algorithm above is quite simple. If the weight of y is in reased it might


have to move to a blo k with smaller index. We make room by deleting it from the
old blo k and moving the element with minimal weight down one blo k for every
blo k. We obtain the following time bound for Promote (y; a):
 i 
X
O log(w(S )=w(y)) + log(f (h)) + UD (f (h)) :
h=0

Here log w(S )=w(y) is the ost of sear hing for y in tree T and log f (h) + U (S (h))
is the ost of inserting and deleting an element from a stru ture of size h and
updating the balan ed tree whi h holds the weights. Observing U (n)  log n for
all n, i  df 1 (w(S )=w(y))e and UD (df 1 (w(S )=w(y))e)  UD (w(S )=w(y)) 
Version: 19.10.99 Time: 11:39 {15{
16
log w(S )=w(y) this bound simpli es to
 X 
O UD (f (h)) :
0hdf 1( w(S )=w(y))e

The algorithm for Demote (y; a) is ompletely symmetri . The details are left for
the reader (Exer ise 5). We obtain exa tly the same running time as for Promote ,
ex ept for the fa t that w(y) has to be repla ed by the new weight w(y) a.
Theorem 8. A stati data stru ture with query time QS , prepro essing time PS ,
and weak deletion time DS for a monotone deletion de omposable sear hing problem
an be extended to a dynami weighted data stru ture W su h that:
a) A query in weighted set S (with weight fun tion w : S ! N) with witness
y 2 S takes time O(QD (w(S )=w(y))). Here QD (n) = QS (n)  log n.
b) Promote (y; a) takes time
 X 
O UD (f (h)) :
hdf
0 1( w(S )=w(y))e

) Demote (y; a) takes time


 X 
O UD (f (h)) :
0 hdf 1( w(S )=(w(y) a))e

Proof : Immediate from the dis ussion above.


Let us look again at binary sear h in sorted arrays as a stati data stru ture
for Member. Then QS (n) = DS (n) = O(log n) and PS (n) = O(n) ( f. the re-
p2n Theorem 1). Hen e QD (n) = O((log n) ), UD (n) = O(log n2), and
mark following 2

f (n) = 2 . A query for y with weight w(y) takes time O((log w(S )=w(y) ), the
square of the sear h time in weighted dynami trees. Also Promote (y; a) takes
time UD (f (df 1 (w(S )=w(y))e)) = O(log w(S )=w(y)) and Demote (y; a) takes time
O(log w(S )=w(y) a)). This is the same order as in weighted dynami trees. Of
ourse, weighted dynami trees are part of the data stru ture W onsidered here.
Again ( f. Theorem 5) this is not a serious obje tion. Sin e Member is the query
onsidered here we an repla e the use of weighted dynami trees by a use of the
data stru ture itself. This will square the time bounds for Promote and Demote .
Also binary sear h in sorted arrays is not a very important appli ation of weighted
dynamization. In the important appli ations in this hapter the use of a weighted,
dynami di tionary is negligible with respe t to the omplexity of the data stru ture
itself.
Dynamization and weighting are powerful te hniques. They provide reasonably
eÆ ient dynami weighted data stru tures very qui kly whi h an then be used as
Version: 19.10.99 Time: 11:39 {16{
7.1.3. Order De omposable Problems 17
a referen e point for more spe ial developments. Tuning to the spe ial ase under
onsideration is always ne essary, as weighting and dynamization tend to produ e
somewhat lumsy solutions if applied blindly.

7.1.3. Order De omposable Problems

In Se tions 1 and 2 we developed the theory of dynamization and weighting for


de omposable sear hing problems and sub lasses thereof. Although a large number
of problems are de omposable sear hing problems, not all problems are. An example
is provided by the inside the onvex hull problem. Here we are given a set s  R2
and a point x 2 R2 and are asked to de ide whether x 2 CH(S ) (= the onvex
hull of S ). In general, there is no relation between CH(S ) and CH(A), CH(B ) for
arbitrary partitions A, B of S . However, if we hoose the partition intelligently then
there is a relation. Suppose that we order the points in S a ording to x- oordinate
and split into sets A and B su h that the x- oordinate of any point in A is no
larger than the x- oordinate of any point in B . Then the onvex hull of S an be
onstru ted from CH(A) and CH(B ) by adding a \low" and the \high" tangent.

Figure 1. Convex hull of S


These tangents an be onstru ted in time O(log n) given suitable represen-
tations of CH(A) and CH(B ). The details are spelled out in 8.2 and are of
no importan e here. We infer two things from this observation. First, if we
hoose A and B su h that jAj = jB j = jS j=2 and apply the same splitting
pro ess re ursively to A and B then we an onstru t the onvex hull in time
T (n) = 2  T (n=2) + O(log n) = O(n). This does not in lude the time for sorting S
a ording to x- oordinate. The details are des ribed in Theorem 9. Se ond, onvex
hulls an be maintained eÆ iently. If we a tually keep around the re ursion tree
used in the onstru tion of CH(S ) then we an insert a new point in S by going
down a single path in this tree and redoing the onstru tion along this path only.
Sin e the path has length O(log n) and we spent time O(log n) in every node for
merging onvex hulls this will onsume O((log n)2 ) time units per insertion and
deletion of a new point. The details are des ribed in Theorem 10 below.
De nition: Let T1 and T2 be sets and let P : 2T1 ! T2 be a set problem. P is
order de omposable if there is a linear order < on T1 and an operator t :
Version: 19.10.99 Time: 11:39 {17{
18
T2  T2 ! T2 su h that for every S  T1 , S = fa1 < a2 <    < an g and every i
P (fa1 ; : : : ; an g) = t(P (fa1 ; : : : ; ai g); P (fai+1 ; : : : ; an g)):
Moreover, t is omputable in time C (n) in this situation.
We assume throughout that C (n) is non-de reasing. In the onvex hull example we
have T1 = R2 , T2 = the set of onvex polygons in R2 , t merges two onvex hulls,
and C (n) = O(log n). We outlined above that onvex hulls an be onstru ted
eÆ iently by divide and onquer. This is true in general for order de omposable
problems.
Theorem 9. Let P be order de omposable. Then P (S ), S  T1 an be omputed
in time Sort (jS j) + T (jS j) where Sort (n) is the time required to sort a set of n
elements a ording to <, and T (n) = T (bn=2 ) + T (dn=2e) + O(C (n)) for n > 1
and T (1) = for some onstant C .

Proof : The proof is a straightforward appli ation of divide and onquer. We rst
sort S in time Sort (jS j) and store S in sorted order in an array. This will allow
us to split S in onstant time. Next we either ompute P (S ) dire tly in onstant
time if jS j = 1 or we split S into sets A and B of size bn=2 and dn=2e respe tively
in onstant time (if S = fa1 < a2 <    < an g then A = fa1 ; : : : ; abn=2 g and
b = fabn=2+1 ; : : : ; an g, ompute P (A) and P (B ) in time T (bn=2 ) and T (dn=2e)
respe tively by applying the algorithm re ursively, and then ompute P (S ) =
t(P (A); P (B )) in time C (n). Hen e T (n) = T (bn=2 ) + T (dn=2e) + O(1) + C (n) =
T (bn=2 ) + T (dn=2e) + O(C (n)).
Re urren e T (n) = T (bn=2 )+T (dn=2e)+C (n) is easily solved for most C ( f. 2.1.3).
In parti ular, T (n) = O(n) if C (n) = O(n ) for some  < 1, T (n) = O(C (n)) if
C (n) = (n1+ ) for some  > 0, and T (n) = O(C (n)  (log n)k+1 ) if C (n) =
(n  (log n)k ) for some k  0.
The proof of Theorem 9 re e ts the lose relation between order de omposable
problems and divide and onquer. A non-re ursive view of divide and onquer is to
take any binary tree with jS j leaves, to write the elements of S into the leaves (in
sorted order), to solve the basi problems in the leaves and then to use operator t
to ompute P for larger subsets of S . What tree should we use? A omplete binary
tree will give us the most eÆ ient algorithm, but any reasonably weight-balan ed
tree will not be mu h worse. If we want to support insertions and deletions this is
exa tly what we should do. So let D be a BB[ ℄-tree with jS j leaves for some .
(Exer ise 9 shows that we annot obtain the same eÆ ien y by using (a; b)-trees,
or AVL-trees, or : : : ). We store the elements of S in sorted order (a ording to <)
in the leaves of D and use D as a sear h tree for S . What should we store in the
internal nodes of D beside the sear h tree information? A rst idea is to store
P (S (v)) in node v where S (v) is the set stored in the leaves below v. P (S (v)) is
easily omputed bottom-up starting at the leaves and working towards the root.
Version: 19.10.99 Time: 11:39 {18{
7.1.3. Order De omposable Problems 19
Not quite, if v has sons, x; y and we ompute P (S (v)) = t(P (S (x)); P (S (y))) then
appli ation of t will in general destroy (the representation of) P (S (x)) and P (S (y)).
Making a opy of P (S (x)) and P (S (y)) before applying t might ost a lot more
than C (jS (v)j) and is therefore ex luded. A di erent strategy is alled for.
We store P (S (r)) only in the root r. In internal nodes v 6= r we store two things.
First, the sequen e a(v) of a tions exe uted to ompute t applied to P (S (x)) and
P (S (y)). This sequen e has length O(C (n)). Se ond, the pie e P  (S (v)) whi h is
left over from P (S (v)) when P (S (father (v))) is omputed by applying t to P (S (v))
and P (S (brother (v)). We all tree D augmented by this additional information an
augmented tree.
Lemma 6. An augmented tree D for set S has spa e requirement T (jS j) and an
be onstru ted in time Sort (jS j) + T (jS j) where
T (n) = max [T (  n) + T ((1 )  n) + O(C (n))℄:
 1

Proof : The re ursion for T (n) follows from the fa t that  jS (x)j=jS (v)j  1
for any node v with sons x, y in a BB[ ℄-tree D. The spa e bound follows sin e at
most t storage ells an be used in t time units for any t.
The remark following Theorem 9 also applies to Lemma 6. In parti ular, T (n) =
O(n) if C (n) = O(n ) for some  < 1, and : : : . The spa e bound stated in Lemma 6
is usually overlay pessimisti . One does not use a new storage ell every time unit
in general.
We will next des ribe how to insert into and delete from an augmented tree.
We des ribe insertion in detail and leave deletion for the reader, deletion being very
similar to insertion. Let a be a new point whi h we want to insert in S . Let D be
an augmented tree for S . We rst use D as a sear h tree. This will outline a path p
down tree D. Let p = v0 ; v1 ; : : : ; vk with v0 being the root. We walk down this path
and re onstru t the P (S (vi ))'s as we walk down. More pre isely, we start in root v0
with P (S (v0 )) in our hands and use the sequen es of a tions a(v0 ) stored in v0 and
the leftover pie es P  (S (v1 )) and P  (S (brother (v1 ))) stored in v1 and its brother
to re onstru t P (S (v1 )) and P (S (brother (v1 ))) by running a(v0 ) ba kwards. This
will take time O(C (jS (v0 )j)). Next we repeat this pro ess with v1 ; : : : ; vk . At the
end we have re onstru ted P (S (brother (vi ))), 1  i  k, and P (S (vk )).
Lemma 7. Let D be an augmented tree for S , jS j = n and let p = v0 ; : : : ; vk be a
path from the root v0 to a leaf. Then P (S (brother (vi ))), 1  i  k, and P (S (vk ))
an be re onstru ted in time O(C (n)  log n).
Proof : The algorithm outlined above has running time
X X X
C (jS (vi )j)  C (n  (1 )i )  C (n) = O(C (n)  log n)
i i i
sin e the depth of the tree is O(log n) and jS (vi )j  n  (1 )i .
Version: 19.10.99 Time: 11:39 {19{
20
P P
If C (n) = (n ) for some  > 0 then i C (n  (1 )i ) = n  i (1 )i = O(n ) =
O(C (n)). In (a; b)-trees this improved laim is not true in general, i.e., there are
(a; b)-trees where re onstru tion along a path has ost O(n  log n) if C (n) = (n )
( f. Exer ise 9).
The remainder of the insertion algorithm is now almost routine. We insert the
new point a, walk ba k to the root and merge the P 's as we go along. More pre isely,
we rst ompute P (a), then merge it with P (S (vk )), then with P (S (brother (vk ))),
: : : . The time bound derived in Lemma 2 applies again ex ept that we forgot about
rotations and double rotations.
vi
vi+1 Rotation

D1 D3

D2 D3 D1 D2

Figure 2.
Suppose that we have to rotate at node vi and assume that vi+1 is the root of
subtree D1 . As we walk ba k to the root we have already omputed P (S (vi+1 )).
Also P (S (brother (vi+1 ))) is available from the top-down pass. We reverse the on-
stru tion at brother (vi+1 ) and thus ompute P for the relevant nodes after the
rotation. Double rotations are treated similarly, the details are left to the reader.
Also it is obvious that the time bound derived in Lemma 2 does still apply, be ause
rotations and double rotations at most require to extend the re onstru tion pro ess
to a onstant vi inity of the path of sear h. We summarize in:

Theorem 10. Let P be an order de omposable problem with merging operator


t omputable in time C (n). Then P an be dynamized su h that insertions and
deletions take time O(C (n)) if C (n) = (n ) for  > 0 and time O(C (n)  log n)
otherwise.

Proof: By the dis ussion above. The time bound follows from Lemma 2 and the
remark following it.

In the onvex hull problem we have C (n) = O(log n). Thus we an maintain onvex
hulls under insertions and deletions with time bound O((log n)2 ) per update. More
examples of order de omposable problems are dis ussed in Exer ises 10{20.

Version: 19.10.99 Time: 11:39 {20{


7.2. Multi-dimensional Sear hing Problems 21
7.2. Multi-dimensional Sear hing Problems

This se tion is devoted to sear hing problems in multi-dimensional spa e. Let Ui ,


0  i < d, be an ordered set and let U = U0  U1      Ud 1 . An elment x =
(x0 ; : : : ; xd 1 ) 2 U is also alled point or re ord or d-tuple; it is ustomary to talk
about points in geometri appli ations and about re ords in database appli ations.
No su h distin tion is made here. Components xi are also alled oordinates or
attributes.
A region sear hing problem is spe i ed by a set  2U of regions in U . The
problem is then to organize a stati set S  U su h that the queries of the form
\list all elements in S \ R" or \ ount the number of points in S \ R" an be
answered eÆ iently for arbitrary R 2 . We note that region sear hing problems
are de omposable sear hing problems and hen e the ma hinery developed in 7.1.1
and 7.1.2 applies to them. Thus we automati ally have dynami solutions for region
sear hing problems on e a stati solution is found. We address four types of region
queries.
a) Orthogonal Range Queries: Here is the set of hyper ubes in U , i.e.,

OR = fR; R = [l0 ; h0 ℄  [l1 ; h1 ℄      [ld 1 ; hd 1 ℄ where


li ; hi 2 Ui and li  hi g:
b) Partial Mat h Queries: Here is the set of degenerated hyper ubes where
every side is either a single point or all of Ui , i.e.,
PM = fR; R = [l0 ; h0 ℄  [l1 ; h1 ℄      [ld 1 ; hd 1 ℄ where
li ; hi 2 Ui and either li = hi or
li = 1 and hi = +1 for every ig:
if li = hi then the i-th oordinate is spe i ed, otherwise it is unspe i ed.
) Exa t Mat h Queries: Here is the set of singletons, i.e.,

EM = fR; R = fxg for some x 2 U g:


d) Polygon Queries: Polygon queries are only de ned for U = R2 . We have

P = fR; R is a simple polygonal region in R2 g:


Exa t mat h queries are not really a new hallenge; however the three other types
of problems are. There seems to be no single data stru ture doing well on all of
them and we therefore des ribe three data stru tures: d-dimensional trees, polygon
trees and range trees. d-dimensional trees and polygon trees use linear spa e and
solve partial mat h queries and polygon queries in time O(n ) where  depends on
the type of the problem. Range trees allow us to solve orthogonal range queries
Version: 19.10.99 Time: 11:39 {21{
22
in time O((log n)d ) but they use non-linear spa e O(n  (log n)d 1 ). In fa t they
exhibit a tradeo between speed and spa e.
In view of Chapter 3 these results are disappointing. In one-dimensional spa e
we ould solve a large number of problems in linear spa e and logarithmi time, in
higher dimensions all data stru tures mentioned above either use non-linear spa e
or use \rooti " time O(n ) for some , 0 <  < 1. Se tion 7.2.3 is devoted to
lower bounds and explains this behavior. We show that partial mat h requires
rooti time when spa e is restri ted to its minimum and that orthogonal range
queries and polygon queries either require large query or large update time. Large
update time usually points to large spa e requirement, although it is not on lusive
eviden e.

7.2.1. D-dimensional Trees and Polygon Trees

We start with d-dimensional trees and show that they support partial mat h re-
trieval and orthogonal range querieswith rooti sear h time. However, they do not
do well for arbitrary polygon queries. A dis ussion of why they fail for polygon
retrieval leads to polygon trees.
d-dimensional trees are a straightforward, yet powerful extension of one-dimen-
sional trees. At every level of a dd-tree we split the set a ording to one of the
oordinates. Fairness demands that we use the di erent oordinates with the same
frequen y; this is most easily a hieved if we go through the oordinates in y li
order.
De nition: Let S  U0      Ud 1 , jS j = n. A dd-tree for S (starting at
oordinate i) is de ned as follows
1) If d = n = 1 then it onsists of a single leaf labeled by the unique element
x 2 S.
2) If d > 1 or n > 1 then it onsists of a root labeled by some element di 2 Ui
and three subtrees T< , T= and T> . Here T< is a dd-tree starting at oordinate
(i + 1) mod d for set S< = fx 2 S ; x = (x0 ; : : : ; xd 1 ) and xi < di g, T>
is a dd-tree starting at oordinate (i + 1) mod d for set S> = fx 2 S ; x =
(x0 ; : : : ; xd 1 ) and xi > di g and T= is a (d 1)-dimensional tree starting at
oordinate i mod (d 1) for set S= = f(x0 ; : : : ; xi 1 ; xi+1 ; : : : ; xd 1 ); x =
(x0 ; : : : ; xi 1 ; di ; xi+1 ; : : : ; xd 1 ) 2 S g.
Figure 3 shows a 2d-tree for set S = f(1; II); (1; III); (2; I); (2; III); (3; I); (3; II)g start-
ing at oordinate 0. Here U0 = U1 = f1; 2; 3g. Arabi and roman numerals are used
to distinguish oordinates.
It is very helpful to visualize 2d-trees as subdivisions of the plane. The root
node splits the plane by verti al line x0 = 2 into three parts: left halfplane, right
halfplane and the line itself. The left son of the root then splits the left halfplane
by horizontal line x1 = 2, : : : .
Version: 19.10.99 Time: 11:39 {22{
7.2.1. D-dimensional Trees and Polygon Trees 23

2
< = >
II II I
= > < > = >

(1; II) (1; III) (2; I) (2; III) (3; I) (3; II)
Figure 3.

(1; III)
(2; III)

x1 = II (3; II)
(1; II)
(2; I) x1 = I
(3; I)

x0 = 2
Figure 4.
The three sons of a node v in a dd-tree do not all have the same quality. The
root of T= (the son via the =-pointer) represents a set of one smaller dimension. In
general we will not be able to bound the size of this set. The roots of T< and T>
(the sons via the <-pointer and the >-pointer) represent sets of the same dimension
but generally smaller size. Thus every edge of a dd-tree redu es the omplexity of
the set represented: either in dimension or in size. In 1d-trees, i.e., ordinary sear h
trees, only redu tions in size are required.
It is lear how to perform exa t mat h queriesin dd-trees. Start at the root,
ompare the sear h key with the value stored in the node and follow the orre t
pointer. Running time is proportional to the height of the tree. Our rst task is
therefore to derive bounds on the height of dd-trees.
De nition:
a) Let T be a dd-tree and let v be a node of T . Then S (v) is the set of leaves in
the subtree with root v, d(v) is the depth of node v, and sd (v), the number of
<-pointers and >-pointers on the path from the root to v, is the strong depth
of v. Node x is a proper son of node v if it is a son via a <- or >-pointer.
b) A dd-tree is ideal if jS (x)j  jS (v)j=2 for every node v and all proper sons x
of v.
Ideal dd-trees are a generalization of perfe tly balan ed 1d-trees.
Version: 19.10.99 Time: 11:39 {23{
24
Lemma 1. Let T be an ideal dd-tree for set S , jS j = n.
a) d(v)  d + log n for every node v of T .
b) sd (v)  log n for every node v of T .

Proof : a) follows from b) and the fa t that at most d =-pointers an be on the path
to any node v. Part b) is immediate from the de nition of ideal tree.

Theorem 1. Let S  U = U0      Ud 1 , jS j = n.
a) An exa t mat h query in an ideal dd-tree for S takes time O(d + log n).
b) An ideal dd-tree for S an be onstru ted in time O(n  (d + log n)).
Proof : a) Immediate from Lemma 1, a).
b) We des ribe a pro edure whi h onstru ts ideal dd-trees in time O(n  (d +log n)).
Let S0 = fx0 ; (x0 ; : : : ; xd 1 ) 2 S g be the multi-set of 0-th oordinates of S . We
use the linear time median algorithm of 2.4 to nd the median d0 of S0 . d0 will
be the label of the root. then learly jS< j  jS j=2 and jS> j  jS j=2 where S< =
fx 2 S ; x0 < d0 g and S> = fx 2 S ; x0 > d0 g. We use the same algorithm
re ursively to onstru t dd-tree for S< and S> (starting at oordinate 1) and a
(d 1)-dimensional tree for S= . This algorithm will learly onstru t an ideal dd-
tree T for S . The bound on the running time an be seen as follows. In every
node v of T we spend O(jS (v)j) steps to ompute the median of a set of size jS (v)j.
Furthermore, S (v) \ S (w) = ; if v and w are nodes on the same depth and hen e
X
jS (v)j  n
d(v)=k

for every k, 0  k < d + log n. Thus the running time is bounded by


 
X X X
O(jS (v)j) = O jS (v)j
v node of T kd+log n d(v)=k
0

= O(n  (d + log n)):


Insertions into dd-trees are a non-trivial problem. A rst idea is to use an analogue
to the naive insertion algorithm into one-dimensional trees. If x is to be inserted
into tree T , sear h for x in T until a leaf is rea hed and repla e that leaf by a small
subtree with two leaves. Of ourse, the tree will not be ideal after the insertion
in general. We might de ne weight-balan ed dd-trees to remedy the situation, i.e.,
we hoose some parameter , say = 1=4, and require that jS (x)j  (1 )jS (v)j
for every node v and all proper sons x of v. This is a generalization of BB[ ℄-
trees. Two problems arise. Both problems illustrate a major di eren e between
one-dimensional and multi-dimensional sear hing.
Version: 19.10.99 Time: 11:39 {24{
7.2.1. D-dimensional Trees and Polygon Trees 25
The rst problem is that although Theorem 1 is true for weight-balan ed dd-
trees, Theorem 2 and 3 below are false, i.e., query time in near-ideal dd-trees may
have a di erent order than query timepin ideal trees. More pre isely, partial mat h
in ideal 2d-trees has running time O( n) but it has running time
(n1= log 8=3 ) =

(n0:706 ) in weight-balan ed dd-trees, = 1=4 (Exer ise 14). Thus weight balan ed
dd-trees are only useful for exa t mat h queries.
A se ond problem is that weight-balan ed dd-trees are hard to rebalan e. Ro-
tations are of no use sin e splitting is done with respe t to di erent oordinates on
di erent levels. Thus it is impossible to hange the depth of a node as rotations do.
There is a way out. Suppose that we followed path p = v0 ; v1 ; : : : to insert point x.
Let i be minimal su h that vi goes out of balan e by the insertion. Then rebalan e
the tree by repla ing the subtree rooted at vi by an ideal tree for set S (vi ). This
ideal tree an be onstru ted in time O(m  (d + log m)) where m = jS (vi )j. Thus
rebalan ing is apparently not as simple and heap as in one-dimensional trees. The
worst ase ost for rebalan ing after an insertion is learly O(n  (d + log n)) sin e
we might have to rebuild the entire tree. However, amortized time bounds are
mu h better as we will sket h. We use te hniques developed in 3.5.1 (in parti ular
in the proof of Theorem 4). We showed there (Lemmas 2 and 3 in the proof of
Theorem 4), that the total number of rebalan ing operations aused by nodes v
with 1=(1 )i  jS (v)j  1=(1 )i+1 during the rst n insertions (and dele-
tions) is O(n  (1 )i ). A rebalan ing operation aused by su h a node has ost
O((1 ) i  (d + 1)) in weight-balan ed dd-trees. Hen e the total ost of restru -
turing a weight-balan ed dd-tree during a sequen e of n insertions and deletions
is
X
O(n  (1 )i  (1 ) i  (d + 1)) = O(n  log n  (d + log n)):
0iO(log n)
Thus the amortized ost of an insertion or deletion is O(log n  (d + log n)). The
details of this argument are left for Exer ise 13.
Dynamization ( f. 7.1) also gives us dynami dd-trees with O((d +log n)  log n)
insertion and deletion time. Query time for exa t mat h queriesis O((d+log n)log n)
whi h is not quite as good as for weight-balan ed dd-trees. However, dynamization
has one major advantage. The time bounds for partial mat h and orthogonal range
queries(Theorem 2, 3 and 4 below) stay true for dynami dd-trees.
It is about time that we move to partial mat h queries. Let R = [l0 ; h0 ℄ 
    [ld 1 ; hd 1 ℄ with li = hi or li = 1, hi = +1 be a partial mat h query. If
li = hi then the i-th oordinate is alled spe i ed. We use s to denote the number
of spe i ed oordinates. The algorithm for partial mat h queriesis an extension
of the exa t mat h algorithm. As always we start sear hing in the root. Suppose
that the sear h rea hed node v. Suppose further that we split a ording to the i-th
oordinate in v and that key di is stored in v. If the i-th oordinate is spe i ed in
query R, then the sear h pro eeds to exa tly one son of v, namely the son via the
<-pointer if li = hi < di , the son via the =-pointer if li = hi = di , and : : : . If the
i-th oordinate is unspe i ed in query R then the sear h pro eeds to all three sons
Version: 19.10.99 Time: 11:39 {25{
26
of v. On e we rea h a leaf, we return it if it belongs to region R. The orre tness
of this algorithm heavily depends on set S . We treat a favourable spe ial ase rst:
invertible sets.
De nition: S  U = U0 U1   Ud 1 is invertible if for all x = (x0 ; : : : ; xd 1 ) 2
S , y = (y0 ; : : : ; yd 1 ) 2 S : xi = yi for some i implies x = y.
A set is invertible if all proje tion fun tions are inje tive when restri ted to S .
Theorem 2. Let T be an ideal dd-tree for invertible set S  U = U0  U1     
Ud 1 . Then a partial mat h query with s < d spe i ed omponents takes time
O(d  2d s  n1
s
d ):

Proof : Let T 0 be the subtree of T onsisting of all nodes visited by the sear h. It
suÆ es to show that the number of nodes of T 0 is bounded by O(d  2d s  n1 s=d ). A
node of T 0 is alled bran hing if it has a proper son and non-bran hing otherwise.
Sin e S is invertible all des endants of non-bran hing nodes are non-bran hing.
Hen e all bran hing nodes an be rea hed by following <- and >-pointers only. A
bran hing node of T 0 is a proper bran hing node if it has two proper sons.
We laim that there are at most 2d(log n)=de(d s) proper bran hing nodes in T 0 .
This follows from the fa t that at most d s out of any d onse utive nodes on any
path through T 0 are proper bran hing nodes, be ause only d s out of d onse utive
nodes split a ording to unspe i ed omponents. Also d(v) = sd (v)  log n for all
bran hing nodes. Hen e there are at most d(log n)=de  (d s) proper bran hing
nodes on any path through T 0 and thus the bound follows. It remains to ount the
improper bran hing nodes and the non-bran hing nodes in T 0 . Again onsider any
path through T 0 . Then there an be at most d onse utive nodes whi h are not
proper bran hing nodes and hen e the total number of nodes of T 0 is
O(d  2d d e(d s) ) = O(d  2d s  n d )
log n d s

= O(d  2d s  n1
s
d ):
The behavior of the partial mat h algorithm on general sets is harder to analyze.
Let us look at an example rst. Let Ui = R, 0  i < d, and let S = f0gk 
f0; : : : ; m 1gd k for some m and k. Then jS j = md k . Consider rst partial
mat h query R1 whi h spe i es the rst s = k oordinates as being 0 and leaves the
remaining oordinates unspe i ed. Then the answer to the query is the entire set S
and hen e the running time of any algorithm must be at least linear. Consider next
partial mat h query R2 whi h spe i es the rst s = k + 1 oordinates as being 0
and leaves the remaining oordinates unspe i ed. Then the query is \equivalent"
to a partial mat h query in a d k = d s + 1 dimensional set with one spe i ed
oordinate. In view of Theorem 2 we therefore annot hope to do better than
O(n1 1=(d s+1) ) time units. This is indeed the bound.
Version: 19.10.99 Time: 11:39 {26{
7.2.1. D-dimensional Trees and Polygon Trees 27
Theorem 3. Let T be an ideal dd-tree for S , jS j = n. Then a partial mat h query
with s spe i ed omponents takes time

O(f (d; d s)  nmax( 2 ;1 + (d + 1)  jAj):


1 1
d s+1 )

Here A is the set of answers to the query and f (d; d s) is some fun tion in reasing
in both arguments. f is independent of T and S .

Proof : Let T 0 be the subtree of T onsisting of all nodes visited in the sear h. We
split the set of nodes of T 0 into three lasses whi h we ount separately. A node is
a tertiary node (belongs to the third lass) if all des endants of v belong to A, i.e.,
if S (v)  A. The number of tertiary nodes is learly bounded by (d + 1)  jAj. A
non-tertiary node is a primary node if it is rea hable without using an =-pointer.
All other nodes of T 0 are se ondary nodes. We will show that the number of primary
and se ondary nodes is bounded by

f (d; d s)  nmax( 2 ;1
1 1
d s+1 )

for some suitable fun tion f . The proof is by indu tion on d s and for xed d s
by indu tion on s and n.
If d = s then partial mat h is equivalent to exa t mat h and the laim follows
from Theorem 1, a). So let us assume d > s. If s = 0 then all the nodes are tertiary
and the laim is trivial. This leaves the ase d > s  1. If n is small then the laim
is ertainly true by suitable hoi e of f (d; d s).
The primary nodes are easy to ount. We have shown in the proof of Theorem 2
that their number is O(d  2d s  n1 s=d ). It remains to ount the se ondary nodes.
We group the se ondary nodes into maximal subtrees. If v is the root of su h
a subtree then v is rea hed via an =-pointer and there is no other =-pointer on the
path to v. Thus sd (v) = d(v) 1 and jS (v)j  n=2sd(v)  2  n=2d(v) . Also there
an be at most 2dj=de(d s) su h nodes v with d(v) = j . This follows from the fa t
that all nodes on the path to v are primary nodes and hen e at most dj=de  (d s)
of these nodes an be proper bran hing nodes; f. the proof of Theorem 2.
In the subtree with root v we have to ompute a partial mat h query on a
(d 1)-dimensional set with s0 spe i ed omponents. Here s0 = s or s0 = s 1.
Also, s0  1. Note that v and all its des endants are tertiary nodes if s0 = 0. By
indu tion hypothesis there are at most

f (d 1; d 1 s0 )  mmax( 2 ;1 s0 +1 )
1 1
d 1

non-tertiary nodes visited in the subtree with root v where m = jS (v)j. For the
reminder of the argument we have to distinguish two ases, s = 1 and s  2.
Case 1 : s  2.
Sin e d 1 s0  d s, f is in reasing, and d s  1 we on lude that the number
of non-tertiary nodes below v is bounded by f (d 1; d s)  m1 1=(d s+1) . We nish
Version: 19.10.99 Time: 11:39 {27{
28
the proof by summing this bound for all roots of maximal subtrees of se ondary
nodes. Let RT be the set of su h roots. Then
X
f (d 1; d s)  jS (v)j1
1
d s+1
v2RT
X
 f (d 1; d s)  2dj=de(d s)  (2n=2j )1
1
d s+1
j 1
X
 (2n) 1 1
d s+1  f (d 1; d s)  2 d ( s)  [2(d s)=d = d s+1) ℄j
1+1 (

j 1
 (f (d; d s) d  2d s )  n 1 d s+1
1

for suitable hoi e of f (d; d s). Note that (d s)=d 1 + 1=(d s + 1) = s=d +
1=(d s + 1) < 0 for 2  s  d. Adding the bound for the number of primary nodes
proves the theorem.

Case 2 : s = 1. De ne RT as in Case 1. Sin e s  s0  1 we have s0 = 1. Consider


the ase d = 2 rst. Then the query below v degenerates to an exa t mat h query
in a one-dimensional set, and hen e there are at most d + log jS (v)j non-tertiary
nodes below v. Summing this bound for all nodes in RT we obtain
X
(d + log jS (v)j)
v2RT
d+Xdlog ne
 2dj=de(d s)  (d + log(2n=2j ))
j =1
dlogX
ne 1

 2 d s) 
(
2(dlog ne k)=d  (d + 1 + k)
k= d
where we used the substitution k = dlog ne j
dlog
Xne
2 d s +1
n 
1=d
(d + 1 + k)=2k=d
k= d
 (f (2; 1) d  2d s )  nmax( 2 ;1
1 1
d s+1 )

by suitable hoi e of f (2; 1); re all that d = 2 and s = 1. Adding the bound for the
number of primary nodes proves the theorem.
It remains to onsider the ase d  3. We infer from the indu tion hypothesis that
the number of non-tertiary nodes below v 2 RT is bounded by f (d 1; d 2) 
Version: 19.10.99 Time: 11:39 {28{
7.2.1. D-dimensional Trees and Polygon Trees 29
jS (v)j1 1 ( =d 1)
in this ase. Summing this bound for all v 2 RT we obtain
X
f (d 1; d 2)  jS (v)j1
1
d 1

f 2RT
d+Xdlog ne
 f (d 1; d 2)  2dj=de(d  (2n=2j ) 1 1
1) d 1

j =1
dlog ne
d+X
 (2n) 1 d
1
1  f (d 1; d 2)  2d  1
[2
d 1 1
d +d 1 1
℄j
j =1
 (2n) 1 d
1
1  f (d 1; d 2)   (2n) d (
1
1) d
where is a onstant depending on d
 (f (d; d 1) d  2d s )  n1 d1
by suitable hoi e of f (d; d 1). Adding the bound for the number of primary nodes
proves the theorem.
Theorem 3 shows that d-dimensional trees support partial mat h querieswith p rooti
running time. In parti ular if d = 2 and s = 1 then the running time is O( n + jAj)
even in the ase of general sets. We will see in Se tion 7.4.1 that this annot be
improved without in reasing storage. However it is trivial to improve upon this
result by using O(d!  n) storage.
Let S  U = U0      Ud 1 . For any of the d! possible orderings of the
attributes build a sear h tree as follows: Order S lexi ographi ally and build a
standard one-dimensional sear h tree for S . A partial mat h query with s spe i-
ed omponents is then easily answered in time O(d  log n + jAj). Assume w.l.o.g.
that the rst s attributes are spe i ed, i.e., R = [l0 ; h0 ℄      [ld 1 ; hd 1 ℄ with
li = hi for 0  i < s and li = 1, hi = +1 for s  i < d. Sear h for
key (l0 ; : : : ; ls 1 ; 1; : : : ; 1) in tree T orresponding to the natural order of at-
tributes. This takes time O(d  log n). The answer to the query will then onsist of
the next jAj leaves of T in in reasing order. Thus logarithmi sear h time an be
obtained at the expense of in reased storage requirement. For small d, say d = 2,
this approa h is feasible and in fa t we use it daily. After all, there is a German-
English and an English-German di tionary and no one ever omplained about the
redundan y in storage.
Another remark about Theorem 3 is also in order at this pla e. The running
time stated in Theorem 3 is for the enumerative version of partial mat h retrieval:
\Enumerate all points in S \ R". A simpler version is to ount only jS \ Rj. If we
store in every node v of a dd-tree the ardinality jS (v)j of S (v) then the ounting
version of partial mat h retrieval has running time O(f (d; d s)  nmax( 2 ;1 d s+1 ) );
1 1

f. Exer ise 15.


The next harder type of queries are orthogonal range queries. Let R = [l0 ; h0 ℄ 
   [ld 1; hd 1 ℄ be a hyper ube in U0    Ud 1. Before we an explain the sear h
Version: 19.10.99 Time: 11:39 {29{
30
algorithm we need to introdu e one more on ept; the range of a node. We an
asso iate a hyper ube Reg (v) with every node of a dd-tree in a natural way, namely
Reg (v ) = fx 2 U0    Ud 1 ; an exa t mat h query of x goes through v g. Reg (v )
is easily determined re ursively. If v is the root then Reg (v) = U0      Ud 1 .
If v is a son of w, say via the <-pointer, and w is labeled with d 2 Ui then
Reg (v ) = Reg (w ) \ f(x0 ; : : : xd 1 ); xi < dg.
We are now in a position to des ribe the sear h algorithm for orthogonal range
query. Let R be the query hyper ube. As always we start the sear h in the root r.
Then R \ Reg (r) 6= ;. Assume for the indu tive step that the sear h has rea hed
node v with R \ Reg (x) 6= ;. There is at least one su h son and all sons with that
property an be found in time O(1). Finally, if v is a leaf then we output the leaf
if v 2 R.
We analyze the running time of this algorithm only in two dimensions and leave
the higher-dimensional ase to the reader. The proof for the higher-dimensional ase
is ompletely analogous but somewhat more tedious.
Theorem 4. Let T be an ideal 2d-tree for S  U  U , jS j = n.
0 1 Then an
orthogonal range query takes time

O(d  4d  n1 + d  jAj)
1
d

where A is the set of answers.

Proof : Let R = [l0 ; h0 ℄  [l1 ; h1 ℄ be a re tangle in U0  U1 and let T 0 be the subtree


of all nodes visited when answering query R. Observe rst that Reg (v) \ R 6= ; i
v is visited in the sear h. Observe next that Reg (v)  R implies S (v)  A. Hen e
the number of nodes v of T 0 with Reg (v)  R is ertainly bounded by d  jAj. It
remains to ount the number of nodes v with Reg (v) \ R 6= ; and Reg (v) R 6= ;.
Let N be the set of su h nodes. If v 2 N then there must be one of the four
bounding line segments of R whi h interse ts Reg (v) but does not ontain Reg (v).
Thus jN j  4  t where t is the maximal number of nodes su h that Reg (v) interse ts
with but is not ontained in any xed horizontal or verti al line segment.
Claim: Let T be an ideal 2d-tree with n leaves and let L = f(x; y) 2 U0  U1 ; x =
l0 ; l1  y  h1 g be a verti al line segment. Then the p number of nodes v su h that
Reg (v ) interse ts L but is not ontained in L is O ( n).

Proof : A node v of T is alled a primary node if there is no =-pointer on the path


form the root to v. Let Pd be the number of primary nodes v of depth k su h that
Reg (v ) interse ts L (but is not ontained in L). Then P0  P1  2 and Pk +2  2  Pk
follows from the observation that a verti al line an interse t at most two of the
four Regions R1 ; R2 ; R3 ; R4 asso iated with the proper grandsons of any node v.
This fa t is illustrated in Figure 5. From P0  P1  2 and Pk+1  2  Pk we infer
Pk  2  2k=2 .
Next onsider any primary node v of depth k su h that Reg (v) interse ts L.
Let x be the son of v via the =-pointer. Then S (x)  S (v) and hen e jS (x)j 
Version: 19.10.99 Time: 11:39 {30{
7.2.1. D-dimensional Trees and Polygon Trees 31

R1
R3

R2
R4

Figure 5.
jS (v)j  n=2k . Also there are at most 2  log S (x) des endants w of x su h that
Reg (w) interse ts L but is not ontained in L. This an be seen as follows.
The tree with root s is a one-dimensional sear h tree for a set of nodes whi h
lie either on a horizontal or a verti al line. If they lie on a horizontal line then the
sear h below x follows exa tly one path down the tree. If they lie on a verti al
line (whi h then must be the line x = l0 ) then Reg (w) interse ts L but is not
ontained in L i either l1 2 Reg (w) or h1 2 Reg (w). The set of nodes w with
l1 2 Reg (w) (h1 2 Reg (w)) form a path in the tree with root x. Thus there are
at most 2  log jS (x)j des endants w of x su h that Reg (w) interse ts L but is not
ontained in L.
Putting everything together we have shown that the number of nodes v in T
su h that Reg (v) interse ts L but is not ontained in L is at most
X p
2  2k=2  2  log n=2k = 4  n 
X
2(k log n)=2  (log n k)
0klog n 0klog n
p
= O( n):
This proves the laim and the theorem.

p (d = 2) 2d-trees support even orthogonal range queries with


So in two dimensions
running time O( n). Can we stret h the use of dd-trees even further? If we want to
talk about more ompli ated queries we have to make some additional assumptions
about the Ui 's. Let us assume for the sequel that d = 2 and U0 = U1 = R. It is
then natural to generalize orthogonal range queries to arbitrary polygon queries.
In a database whi h ontains persons stored by in ome and number of hildren we
might ask for all persons where the in ome ex eeds $1000 plus $200 for every hild.
This query des ribes a triangle in two-spa e. Do 2d-trees support eÆ ient polygon
sear hing? The answer is no ( f. Exer ise 18) and the reason for this an be seen
learly in the proof of Theorem 4. A line segment in arbitrary position an interse t
the regions asso iated with all four proper grandsons of a node v and in fa t an
interse t the regions of all nodes of a 2d-tree. What an we do to over ome this
diÆ ulty? First, every node v of the tree should de ne a subdivision of Reg (v)
su h that any line segment an interse t only a proper subset of the regions in the
subdivision. One possible way of a hieving this is to divide Reg (v) into four regions
by two straight lines.
Version: 19.10.99 Time: 11:39 {31{
32

Reg (v )
R4 R3
L1 R2 R1

L2
Figure 6.
Then any straight line an interse t at most three of the four regions R1 , R2 , R3 ,
R4 plus a number of the \one-dimensional" regions de ned by the lines themselves.
With the notation of the laim in the proof of Theorem 4 we would obtain Pk+1 
3  Pk and hen e ould hope for a sear h time of 3log n= log 4 = nlog 3= log 4  n0:8 . Note
that the depth of the tree will be log n= log 4 be ause we divide into four pie es in
every step. However, we have to be areful. The arrangement above is only orre t
if the depth of the tree is indeed log n= log 4, i.e., if the tree is ideal. Thus lines L1
and L2 above have to be hosen su h that jRi \ S (v)j  djS (v)j=4e for 1  i  4.
The following lemma shows that this is always possible.
Lemma 2. Let S  R2 , jS j = n and let n1 ; n2 ; n3 ; n4 be su h that n1 + n2 + n3 +
n4  n. If L1 is a line su h that n1 + n2 points of S are on one side of L1 and n3 + n4
points of S are on the other side of L1 then there is a line L2 su h that the four
open regions R1 ; R2 ; R3 and R4 de ned by L1 and L2 ontain at most n1 ; n2 ; n3 ; n4
points of S respe tively. Also L2 an be omputed in time O(n2 ).

R1

R2 P R3

R4
L1 L2
Figure 7.
Proof : For any point P on L1 let f (P ) be the minimum angle between L1 and L2
su h that regions R1 ; R2 ontain at most n1 ; n2 points respe tively. Then f (P ) is
a ontinuous fun tion of P . Also limP ! 1 f (P ) = 0 and limP !+1 f (P ) = .
Similarly de ne g(P ) be the minimum angle between lines L1 and L2 su h that
regions R3 ; R4 ontain at most n3 and n4 points respe tively. Then g(P ) is a
Version: 19.10.99 Time: 11:39 {32{
7.2.1. D-dimensional Trees and Polygon Trees 33
ontinuos fun tion of P and limP ! 1 g(P ) =  and limP !+1 g(P ) = 0. Hen e
there is a point P su h that f (P ) = g(P ). Then P and f (P ) de ne line L2 with
the desired property. This shows the existen e of line L2 . It also shows that line L2
an be assumed to go through two points of S . Thus there are only n2 andidates
for L2 .
L1
S1 Pi

S2

Figure 8.
Let K1 ; : : : ; Kk , k = n  (n 1)=2 be the lines de ned by all pairs of points
of S ordered a ording to their interse tion point with L1 . Let Pi be the point of
interse tion of Ki and L1 . Consider any xed Pi . Express all points of S in polar
oordinates with respe t to Pi and nd among the n1 + n2 points \above" L1 two
points whi h de ne the n1 -th and (n1 + 1)-th largest angle between line L1 and the
line de ned by Pi and the point. This an be done in time O(n1 + n2 ) by the linear
time sele tion algorithm 2.4. In this way we have omputed a se tor S1 through
whi h line L2 must go if it were to interse t L1 in Pi . In a similar way we ompute
se tor S2 based on the points \below" L2 . If there is a line whi h goes through
se tors S1 and S2 the we are done and have found line L2 . If se tors S1 and S2
do not have a line in ommon (as it is the ase in Figure 8) then we an restri t
the sear h to one of the hal ines de ned by L1 and Pi . In Figure 8 this hal ine is
shown bold. We summarize. In time O(n) we an either determine that L2 goes
through Pi or ex lude one of the hal ines de ned by L1 and Pi .
This suggests that we an use binary sear h to nd line L2 . We rst ompute
in time O(n2 ) lines K1 ; : : : ; Kk and points P1 ; : : : ; Pk . Next we nd the median
point of P1 ; : : : ; Pk in time O(n2 ). Then we are either done or an restri t the
sear h to k=2 points. This de ision takes time O(n). Thus line L2 an be found
in O(log n2 ) iterations and the ost of the i-th iteration is O(k=2i + n). Total ost
is thus O(n2 ).
Lemma 2 and the pre eding dis ussion lead to:
De nition:
a) A 4-way polygon tree T for set S  R2 , jS j = n is de ned as follows: If set S
is ollinear then T is an ordinary one-dimensional sear h tree for S . If set S
is not ollinear then T onsists of a root r and six subtrees. There are two
lines L1 and L2 asso iated with r and there is one subtree for ea h of the six
sets S \ R1 , S \ R2 , S \ R3 , S \ R4 , S \ L1 , S \ L2 . Here R1 ; R2 ; R3 ; R4 are
the four open regions de ned by lines L1 and L2 .
Version: 19.10.99 Time: 11:39 {33{
34
b) A 4-way polygon tree T is ideal if for every node v of T and son x of v: If S (v)
is ollinear then jS (x)j  djS (v)j=2e and if S (v) is not ollinear and x is one of
the four sons orresponding to regions R1 ; : : : ; R4 then jS (x)j  djS (v)j=4e.

Theorem 5. Let S  R2 , jS j = n.
a) An ideal 4-way polygon tree for set S an be onstru ted in time O(n2 ).
b) If T is an ideal 4-way polygon tree for S and R is a polygonal region with s
sides then A = R \ S an be omputed in time O(s  nlog 3= log 4 + jAj).

Proof : a) If S is ollinear then an ideal tree an be onstru ted in time O(n  log n),
the time required to sort S . If S is not ollinear then lines L1 ; L2 dividing the plane
into four open regions ontaining at most dn=4e points of S ea h an be omputed
in time O(n2 ) by Lemma 2. Hen e T (n), the time required to build a 4-way polygon
tree for n points, satis es the re urren e
T (n)  O(n2 + n  log n) + 4  T (dn=4e):
Thus T (n) = O(n2 ) by Theorem 2.1.3.4.
b) Let R be a polygonal region with s sides. We triangulate R ( f. Se tion 8.4.2) and
ompute R0 \ S separately for ea h of the s 1 triangles R0 in the triangulation. It
therefore suÆ es to show that A0 = R0 \ S an be omputed in time O(nlog 3= log 4 +
jA0 j) for a triangle R0. This shows that we may assume w.l.o.g. that R is a triangle.
We des ribe the sear h algorithm next. The sear h rea hes only nodes v of
the polygon tree T 0 with Reg (v) \ R 6= ;. Let us assume indu tively that when
the sear h rea hes node v we have determined Reg (v) \ ei , 1  i  3, for ea h of
the three sides of triangle R. Note that Reg (v) is onvex and hen e Reg (v) \ ei is
a line segment. Also note that Reg (v)  R i Reg (v) \ ei = ; for 1  i  3 or
Reg (v )  ei for some i (re all that we assume Reg (v ) \ R 6= ;). If Reg (v )  R
then the sear h pro eeds to all six (two, if Reg (v) is one-dimensional) sons of v and
learly Reg (w)  R for all sons w of v.
The ase Reg (v) 6 R is slightly more ompli ated. Let w be a son of v. Then
Reg (w ) = Reg (v ) \ C where C is either a line or a one-shaped region, as indi ated
in Figure 8. Then ei \ Reg (w) = (ei \ Reg (v)) \ (ei \ C ) and hen e ei \ Reg (w)
is readily omputed for 1  i  3. If ei \ Reg (w) 6= ; for some i then ertainly
R \ Reg (w) 6= ; and hen e the sear h pro eeds to node w. If ei \ Reg (w) = ; for all i
then the sear h pro eeds to node w i 2 R where = L1 \ L2 is the interse tion
of the two lines whi h are asso iated with node v. Note that Reg (w)  R if 2 R
and that Reg (w) \ R = ; if 2= R.
It remains to estimate the omplexity of this algorithm. Let T 0 be the subtree
of all nodes visited in the sear h. It suÆ es to bound the number of nodes of T 0 .
If v 2 T 0 then Reg (v) \ R 6= ; and hen e either Reg (v)  R or Reg (v) \ R 6= ;
and Reg (v) R 6= ;. In the former ase we have S (v)  A and hen e the number
of nodes with Reg (v)  R is O(jAj). In the latter ase there must be an edge e of
Version: 19.10.99 Time: 11:39 {34{
7.2.1. D-dimensional Trees and Polygon Trees 35

a possibility
for C

L1
Reg (v)

L2
Figure 9.
region R su h that Reg (v) \ e 6= ; but Reg (v) is not ontained in e. It therefore
suÆ es to bound t where t is the maximal number of nodes v su h that Reg (v)
interse ts but is not ontained in any xed line segment L.
Claim: t  O(nlog 3= log 4 ).
Proof : Let L be any line segment. Let Pk be the number of primary nodes v , i.e.,
Reg (v ) is not a line segment, of depth k su h that Reg (v ) \ L 6= ;. Then P1 = 1 and
Pk+1  3  Pk sin e L an interse t at most 3 of the four open regions asso iated
with the sons of any primary node. Thus Pk  3k .
Let v be a primary node of depth k. Then v has two sons x and y whi h are
not primary nodes. We have S (x) [ S (y)  S (v) and jS (v)j  dn=4k e sin e T is an
ideal 4-way tree. The argument used in the proof of Theorem 4 shows that there
are at most 2  log S (x) des endants w of x su h that Reg (w) interse ts L but is
not ontained in L. The analogous laim holds true for y. Putting both bounds
together we on lude that
X
t 4  Pk  logdn=4k e
0klog n= log 4
X
3 log n= log 4  8  3k log n= log 4  (log n 2  k)
klog n= log 4
0

= O(3log n= log 4 ) = O(nlog 3= log 4 )


Can we improve upon 4-way polygon trees? Exer ise 19 shows that one an always
ut any set of n non- ollinear points into 2j open regions su h that any straight line
will interse t at most j + 1 out of these regions and su h that no region ontains
more than dn=2j e points. Here j  2 is any integer. Polygon trees based on
subdivisions of this form allow us to do polygon retrieval in time O(nlog(j +1)= log 2j ).
The exponent is minimized for j = 3 and is 0.77 in this ase.

Version: 19.10.99 Time: 11:39 {35{


36
7.2.2. Range Trees and Multidimensional Divide and Conquer

D-dimensional trees support orthogonal range queries with linear spa e O(n) and
rooti time O(n1 1=d ). Range trees will allow us to trade spa e for time. More
spe i ally, we an obtain polylogarithmi query time at the expense of non-linear
storage or rooti query time O(nd and spa e O((1=)d  n) for any  > 0. Also
range trees support insertions and deletions in a natural way.
Orthogonal range queries in one-dimensional spa e are parti ularly simple. If
S  U0 then any ordinary balan ed tree will do. We an ompute S \ [l0 ; h0 ℄ by
running down two paths in the tree (the sear h path a ording to l0 and the sear h
path a ording to h0 ) and then listing all leaves between those paths. The query
time is O(log n + jAj) and spa e requirement is O(n). The ounting version, i.e.,
to ompute jS \ [l0 ; h0 ℄j, only takes time O(log n) if we store in every node the
number of leaf des endants. This omes from the fa t that we have to add up
at most O(log n) ounts to get the nal answer, namely the ounts of all nodes
whi h are sons of a node on one of the two paths and whi h lie between the two
paths. It is very helpful at this point to interpret sear h trees geometri ally. We
an view a sear h tree as a hierar hi al de omposition of S into intervals, namely
sets Reg (v) \ S . The de omposition pro ess is balan ed, i.e., we try to split set S
evenly at every step, and it is ontinued to the level of singleton sets. The important
fa t is that for every on eivable interval [l0 ; h0 ℄ we an de ompose S \ [l0 ; h0 ℄ into
only O(log n) pie es from the de omposition. Hen e the O(log n) query time for
ounting S \ [l0 ; h0 ℄.
This idea readily generalizes into two-dimensional (and d-dimensional spa e).
Let S  U0  U1 . We rst proje t S onto U0 and build a balan ed de omposition
of the proje tion as des ribed above. Suppose now that we have to ompute S \
([l0 ; h0 ℄  [l1 ; h1 ℄). We an rst de ompose [l0 ; h0 ℄ into O(log n) intervals. For ea h
of these intervals we only have to solve a one-dimensional problem. This we an
do eÆ iently if we also have data stru tures for al these one-dimensional problems
around. Ea h one-dimensional problem will ost O(log n) steps and so total run
time is O((log n)2 ). However, spa e requirement goes up to O(n  log n) be ause
every point has to be stored in log n data stru tures for one-dimensional problems.
The details are as follows.

De nition: Let S  U0  U1    Ud 1 and let P = fi1 ; : : : ; ik g  f0; : : : ; d 1g.


Then p(S; P ) = f(xi1 ; : : : ; xik ); x 2 S g is the proje tion of S onto oordinates P .
If P = fig then we also write pi (S ) instead of p(S; fig).

p
De nition: Let m 2 N and let 2 (1=4; 1 2=s). m is a sla k parameter and
is a weight-balan ing parameter. A d-fold range tree for multiset S  U0  U1 
    Ud 1, jS j = n is de ned as follows. If d = 1 then T is any BB[ ℄-tree for S .
If d > 1 then T onsists of a BB[ ℄-tree T0 for p0 (S ). T0 is alled the primary tree.
Furthermore, for every node v of T0 with depth (v) 2 m  there is an auxiliary
Version: 19.10.99 Time: 11:39 {36{
7.2.2. Range Trees and Multidimensional Divide and Conquer 37
tree Ta (v). Ta (v) is a (d 1)-fold tree for set p(S (v); f1; : : : ; d 1g). Here S (v) is
the set of x = (x0 ; : : : ; xd 1 ) 2 S su h that leaf x0 is des endant of v in T0 .
The pre ise de nition of range trees di ers in two respe ts from the informal dis-
ussion. First, we do not insist on perfe t balan e. This will slightly degrade query
time but will allow us to support insertions and deletions dire tly. Also we intro-
du e sla k parameter m whi h we an use to ontrol spa e requirement and query
time.
Lemma 3. Let Sm (d; n) be the spa e requirement of a d-fold tree with sla k pa-
rameter m for a set of n elements. Then Sm (d; n) = O(n  (  log n=m)d 1 ) where
= 1= log(1=(1 )).
Proof : Note rst that the depth of a BB[ ℄-tree with n leaves is at most  log n.
Thus every point x 2 S is stored in the primary tree, in at most  log n=m primary
trees of auxiliary trees, in at most (  log n=m)2 primary trees of auxiliary-auxiliary
trees, : : : . Thus the total number of nodes ( ounting dupli ates) stored in all trees
and hen e spa e requirement is
X
O (n  ((  log n)=m)i ) = O(n((  log n)=m)d 1 ):
nid 1
We will use two examples to illustrate the results about range trees: m = 1 and
m =  log n for some  > 0. If m = 1 then Sm (d; n) = O(n  ( log n)d 1 ) and if
m =  log n then Sm = O(( =)d 1  n).
Lemma 4. Ideal d-fold range trees, i.e., jS (x)j  dS (v)=2e for all nodes v (primary
or otherwise) and sons x of v, an be onstru ted in time O(d  n  log n + n 
((log n)=m)d 1 ). Here m is the sla k parameter.
Proof : We start by sorting S d-times, on e a ording to the 0-th oordinate, on e
a ording to the rst oordinate, : : : . This will take time O(d  n log n). Let Tm (d; n)
be the time required to build an ideal d-fold tree for a set of n elements if S is sorted
a ording to every oordinate. We will show that Tm (d; n) = O(n  ((log n)=m)d 1 ).
This is learly true for d = 1 sin e O(n) time suÆ es to build an ideal BB[ ℄-tree
from a sorted list. For d > 1 we onstru t the primary tree in time O(n) and we have
to onstru t auxiliary trees of sizes n1 ; : : : ; nt . We have n1 +    + nt  n  (log n)=m
sin e every point is stored in (log n)=m auxiliary trees. Note that the primary tree
has depth log n sin e it is ideal. Hen e
X
Tm (d; n) = O(n) + Tm (d 1; ni )
i
X
= O(n) + O( ni  (log n=m)d 2 )
i
= O(n  (log n=m)d 1 ):
If m = 1 then ideal d-fold trees an be onstru ted in time O(n  (log n)max(1;d 1)
)
and ifm =  log n they an be onstru ted in time O(d  n  log n).
Version: 19.10.99 Time: 11:39 {37{
38
Lemma 5. Let Qm (d; n) be the time required to answer a range query in a d-fold
tree for a set of n elements. Then Qm (d; n) = O(log n  (  (2m =m)  log n)d 1 + jAj).
Here and m are as in Lemma 3.

Proof : The laim is obvious for d = 1. So let d > 1 and let R = [l0 ; h0 ℄     
[ld 1 ; hd 1 ℄ be an orthogonal range query. We sear h for l0 and h0 in the primary
tree T0 . his will de ne two paths of length at most  log n in T0 . Consider one
of these paths. There are at most log n nodes v su h that v is a son of one of
the nodes on the paths and v is between two paths. Every su h node represents
a subset of points of S whose 0-th oordinate is ontained in [l0 ; h0 ℄. We have to
solve (d 1)-dimensional problems on these subsets. Let v be any su h node and
let v1 ; : : : ; vt be the losest des endants of v su h that m divides depth (vi ). Then
t  2m 1 and auxiliary trees exist for all vi 's. Also we an ompute S \ R by
forming the union of S (vi ) \ ([l1 ; h1 ℄      [ld 1 ; hd 1 ℄) over all vi 's. Sin e the
number of vi 's is bounded by 2   ((log n)=m)  2m 1 we have:
Qm (d; n)   (2m =m)  log n  Qm (d 1; n) + jAj:
This proves Lemma 5.
If m = 1 then Qm (d; n) = O(log n  (2   log n)d 1 + jAj) and if m =  log n then
Qm (d; n) = O(log n  ( =)d 1  nd ).
We lose our dis ussion of range trees by dis ussing insertion and deletion al-
gorithms. We will show that the amortized ost of an insertion or deletion is poly-
logarithmi . Suppose that point x = (x0 ; x1 ;    ; xd 1 ) has to be inserted (deleted).
We sear h for x0 in the primary tree and insert or delete it whatever is appropriate.
This has ost O(log n). Furthermore, we have to insert x into (delete x from) at
most (  log n)=m auxiliary trees, ((  log n)=m)2 auxiliary-auxiliary trees, : : : . Thus
the total ost of an insertion or deletion is O(log n  ( log n=m)d 1 ) not ounting
the ost for rebalan ing. Rebalan ing is done as follows. For every (primary or
auxiliary or auxiliary-auxiliary or : : : ) tree into whi h x is insert (from whi h x is
deleted) we nd a node v of minimal depth whi h goes out of balan e. We repla e
the subtree rooted at v by an ideal d0 -fold tree for the set S (v) of des endants of v.
Here d0 = d if v is a node of the primary tree of an auxiliary tree, : :0: ; d d0 is alled
the level of node v. This will take time O(d0  q  log q + q  (log q=m)d 1 ) by Lemma 4
where q = jS (v)j. Rebalan ing on the last level (d0 = 1) is done di erently. On
level 1 we use the standard algorithm for rebalan ing BB[ ℄-trees.
Worst ase insertion/deletion ost is now easily omputed. It is O(d2  n 
log n + n  (log n=m)d 1 ), essentially the ost of onstru ting a new d-fold tree
from s rat h. Amortized insertion/deletion ost is mu h smaller as we demonstrate
next. We use Theorem III.5.1.4 to obtain a polylogarithmi bound on amortized
insertion/deletion ost.
Note rst that a point x is inserted into (deleted from) at most (( log n)=m)d 1
trees of level 1 for a (worst ase) ost of O(log n) ea h. Thus total rebalan ing ost
on level 1 is O(log n  (  log n=m)d 1 ).
Version: 19.10.99 Time: 11:39 {38{
7.2.2. Range Trees and Multidimensional Divide and Conquer 39
We next onsider levels l, 2  l led. We showed (Lemmas 2 and 3 in the
proof of Theorem III.5.1.4) that the total number of rebalan ing operations aused
by nodes v at level l with 1=(1 )i  jS (v)j  1=(1 )i+1 during the rst
n insertions/deletions is O(T Ai;l  (1 )i ), where T Ai;l is the total number of
transa tions whi h go through nodes v at level l with 1=(1 )i  jS (v)j  1=(1
)i+1 ; here 0  i   log n. The ost of a rebalan ing operation aused by
su h node v is O(l  (1 ) (i+1)  (i + 1) + ((i + 1)=m)l 1 ) by Lemma 2. Also
T Ai;l  n  ((  log n)=m)d l by a simple indu tion on l starting with l = d. Thus
total rebalan ing ost at levels l  2 is at most
X X
n  ((  log n)d l  (1 )i  l  (1 ) (i+1)  (i + 1 + ((i + 1)=m)l 1 )
2 ld 0i log n
X
= O( n  ((  log n)d l  l  (( log n)2 + (  log n=m)l  (m=l))
ld
2

= O(n  (m2 + m  d)  ((  log n)=m)d ):


We summarize in
Lemma 6. Amortized insertion/deletion ost in d-fold range trees with sla k pa-
rameter m is O((m2 + m  d)  ((  log n)=m)d ).

Proof : By pre eding dis ussion.

Theorem 6.pd-fold range trees with sla k parameter m  1 and balan e parameter
2 (1=4; 1 2=2) for a set of n elements take spa e O(n((  log n)=m)d 1 ), support
orthogonal range queries with time bound O(log n  (  (2m =m)  log n)d 1 + jAj),
and have amortized insertion/deletion ost O((m2 + m  d)  ((  log n)=m)d ). Here
= 1= log(1=(1 )). In parti ular, we have:

sla k spa e query time insertion/deletion time


1 n  (  log n)d 1
log n  (2   log n)d 1 d  (  log n)d
  log n n  ( =)d 1 ( =)d 1  nd  log n ( =)d  ((  log n)2 + d    log n)

Proof : Immediate from Lemmas 1 to 6.


Sear h trees are always examples for divide and onquer. Range trees and dd-
trees exemplify a variant of divide and onquer whi h is parti ularly useful for
multidimensional problems: multidimensional divide and onquer. A problem of
size n in d-spa e is solved by redu ing it to two problems of size at most n=2 in
d-spa e and one problem of size at most n in (d 1)-dimensional spa e. Range
Version: 19.10.99 Time: 11:39 {39{
40
trees (with sla k parameter m = 1) t well into this paradigm. A set of size n is
split into two subsets of size n=2 ea h at the root. In addition, an auxiliary tree is
asso iated with the root whi h solves the (d 1)-dimensional range query problem
for the entire set. Other appli ations of multidimensional divide and onquer are
dd-trees and polygon trees, domination problems (Exer ises 22, 23) and losest
point problem (Exer ise 24, 25).
In xed radius near neighbors problem we are given a set S  Rd and a real
 > 0 and are asked to ompute the P set of all pairs (x; y) 2 S  S su h that
2 1=2
dist 2 (x; y ) < . Here dist 2 (x; y ) = ( 0i<d (xi yi ) ) is the Eu lidian or L2 -
norm, but similar approa hes work for other norms. We denote the set of su h pairs
by NN (S ). Of ourse, NN (S ) might be as large as n2 , n = jS j, if the points of S
lie very dense. In most appli ations dense sets do not arise. We therefore restri t
our onsiderations to sparse sets.

De nition: Let  > 0, > 0. Set S  Rd is (; )-sparse if for every x 2 Rd we


have jfy 2 S ; dist 2 (x; y) < gj < , i.e., any sphere of radius  ontains at most
points of S .

If S is (; )-sparse then jNN (S )j   n, i.e., the size of the output is at most
linear. We apply the paradigm of multidimensional divide and onquer to solve the
xed radius near neighbors problem.
If d = 1 then a simple method will do. Sort set S in time O(n  log n) and
then make one linear s an through (the sorted version of) S . For every point x 2 S
look at the pre eding points in the linear order and nd out whi h of them have
distan e at most  from x. In this way, we an produ e NN (S ) in time O(  n) from
the sorted list. Altogether we have an O(n  log n +  n) algorithm in one-dimensional
spa e.
if d  2 then we proje t S onto the 0-th oordinate and nd the median
of the multiset p0 (S ) of proje ted points. Let that median be m. We split S
into two sets A and B of n=2 points ea h, namely A ontains only points x 2 S
with x0  m and B ontains only points x 2 S with x0  m. We apply the
algorithm re ursively to d-dimensional point sets A and B . This will ompute
all pairs (x; y) 2 NN (S ) where both points are in either A or B . It remains to
ompute pairs (x; y) 2 NN (S ) with x 2 A and y 2 B . If x 2 A, y 2 B and
(x; y) 2 NN (S ) then x and y both belong to the slab SL of width 2   around
hyperplane x0 = m, i.e., SL = fx = (x0 ; : : : ; xd 1 ) 2 S ; jx0 mj < g. So all we
have to do is to solve NN on point set SL. SL is not quite (d 1)-dimensional.
We make it (d 1)-dimensional by proje ting the points in SL onto hyperplane
x0 = m, i.e., we ompute S 0 = fx0 ; there is x = (x0 ; : : : ; xd 1 ) 2 SL su h that
x0 = (x1 ; : : : ; xd 1 )g. The ru ial observation is that S 0 is still sparse,and that
NN (S 0 ) \ ontains" NN (SL).

Lemma 7.
a) If S is (; )-sparse then S 0 is (; (1 + 2d )  )-sparse.
Version: 19.10.99 Time: 11:39 {40{
7.2.2. Range Trees and Multidimensional Divide and Conquer 41
b) If x; y 2 SL and dist 2 (x; y) <  then dist 2 (x0 ; y0 ) < .
Proof : a) Consider any point x0 = (m; x1 ; : : : ; xd 1 ) on hyperplane x0 = m.
We have to ompute a bound on the number P of points in the \strange" sphere
0
SSPH (x ) = fy 2 S ; jy0 mj <  and ( 1 <d (xi yi )2 )1=2 < g with enter x0
be ause exa tly the proje tions of the points in SSPH (x0 ) have distan e at most 
from x0 in (d 1)-dimensional set S 0 . It is easy to see ( f. Figure 10 for an il-
lustration in 2-spa e) that SSPH (x0 ) an be overed with (1 + 2d ) d-dimensional
spheres of radius . Any su h sphere an ontain at most points of S and hen e
jSSPH (x0 )j  (1 + 2d )  . This shows that S 0 is (; (1 + 2d )  ) sparse.
z

}|

8{z }| {
>
>
>
>
>
<

>
>
>
>
>
:
8
>
x0
>
>
>
>
<

>
>
>
>
>
:

x0 = m
Figure 10. Illustration in 2-spa e.
b) obvious.
Lemma 7 holds true for other norms as well; however, fa tor (1+2d ) in 7a) depends
on the norm. We infer from Lemma 7 that we an ompute NN (SL) by solving
the (d 1)-dimensional problem on S 0 and then going through list NN (S 0 ) and
throwing out some pairs. This leads to
Theorem 7. Let d be xed and let S  Rd be (; )-sparse. Then NN (S ) an be
omputed in time

O(n  (log n)d =d! + (1 + ^)  n  (log n)d 1 =(d 2)!);


Q
id (1 + 2 )  and n = jS j.
where ^ = i
2

Proof : We will rst derive a re urren e on T (i; n), the time to ompute NN (S )
for any (; )-sparse set S , jS j = n and s  Ri , i  d. We have
T (i; n)  2  T (i; n=2) + T (i 1; n) + O(n)
Version: 19.10.99 Time: 11:39 {41{
42
sin e in order to solve an i-dimensional problem on n points we spend O(n) time
on omputing the median and splitting the set and then solve two i-dimensional
problems on n=2 points ea h and one (i 1)-dimensional problem on at most n
points. Also
T (i; 1) = 0
sin e subproblems of size 1 are trivial and
T (1; n) = O(n  log n + ^  n)
sin e all one-dimensional problems generated are (; )-sparse by Lemma 7a) and
therefore an be solved in time O(n  log n + ^  n). It is not too hard to verify by
indu tion on n and-i that T (i; n) = O(n  (log n)i =i! + (1 + ^)  n  (log n)i 1 =(i 1)!).
We leave this for Exer ise 25. We will rather show how one arrives at the bound
for T (i; n).
Observe rst that it suÆ es to study re urren e
U (i; n) = 2  U (i; n=2) + U (i 1; n) + n for i  2; n  2
U (i; 1) = 0 for i  1
U (1; n) = n  log n + ^  n for n  2
be ause we have T (i; n) = O(U (i; n)). We solve this re urren e for n a power of
two. Let F (i; k) = U (i; 2k )=2k . By substitution we obtain
V (i; k) = V (i; k 1) + V (i 1; k) + 1 for i  2; k  1
V (i; 0) = 0 for i  1
V (1; k) = k + ^ for k  1
This further simpli ed by setting V (i; k) = W (i; k) 1. Then
W (i; k) = W (i; k 1) + W (i 1; k) for i  2; k  1
W (i; 0) = 1 for i  1
W (1; k) = k + 1 + ^ for k  1
If the boundary onditions were simpler, namely all equal to one, then this re ursion
has a simple ombinatori interpretation. It ounts a set of paths. More pre isely,
if
X (i; k) = X (i; k 1) + X (i 1; k) for i  1; k  1
X (i; 0) = X (0; k) = 1 for i; k  0
then X (i; k) is exa tly the set of paths from the origin (0; 0) to point (i; k) where
the set of edges onsists of unit length horizontal and verti al lines.
Every path from (0; 0) to (i; k) has length (number
 of edges) i + k and ontains
exa tly i horizontal edges. Hen e X (i; k) = i+i k . In parti ular, X (1; k) = k + 1. It
is now easy to express W in terms of X . Write W (i; k) = W1 (i; k) + W2 (i; k) where
Wj (i; k) = Wj (i 1; k) + Wj (i; k 1) j = 1; 2, i  2, k  1
W1 (1; k) = k + 1 W2 (1; k) = ^ for k  1
W1 (i; 0) = 1 W2 (i; 0) = 0 for i  1
Version: 19.10.99 Time: 11:39 {42{
7.2.2. Range Trees and Multidimensional Divide and Conquer 43

k
3 1 4 10 20
2 1 3 6 10
1 1 2 3 4
0 1 1 1 1
0 1 2 3 i

Figure 11. korrigiert,


Then W1 (i; k) = X (i; k) and W2 (i; k) = ^  X (i 1; k 1) and therefore W (i; k) =
X (i; k) + ^  X (i 1; k 1). Reversing all substitutions we obtain
     
i + log n i 1 + log n 1
T (i; n) = O n  + ^  n
i i 1
for n a power of two. Finally using the approximation
   a 1 
a+b ba b
= +
a a! (a 2)!
for a xed and b growing we have
T (d; n) = O(n  (log n)d =d! + (1 + ^)  n  (log n)d 1 =(d 2)!);
for n a power of two. It is now tedious but straight forward to verify by indu tion
that this formula holds for all n (Exer ise 25).
We will next des ribe two improvements upon the basi algorithm for the xed
radius near neighbors problem. Presorting, the rst improvement, is of general
interest and was used already in the proof of Lemma 2; the strategy of nding good
dividing lines, the se ond improvement, helps only in a few situations.
We observed already that the one-dimensional problem an be solved in linear
time if set S is sorted, but that it takes time O(n log n) for general inputs. When
we solve a two-dimensional problem we redu e it to a olle tion of one-dimensional
problems of total size O(n log n). (The re urren e for the total size S (n) of all
one-dimensional problems generated from a two-dimensional problem is S (n) =
n + 2  S (n=2) whi h solves for S (n) = (n  log n).) We have to sort all these problem
instan es for a total ost of O(n  (log n)2 ). A better strategy is to sort all of S
a ording to y- oordinate on e and then to pull out only sorted subproblems in
the divide-step. If we pro eed a ording to this strategy then all one-dimensional
problem instan es generated are sorted and hen e an be solved in linear time. Thus
two-dimensional problems an be sorted in time O(n  log n). This generalizes to
Version: 19.10.99 Time: 11:39 {43{
44
Theorem 8. Let d  2 be xed and let S  Rd be ; )-sparse. Then NN (S )
an beQ omputed in time O((1 + ^)  n  (log n)d =(d 2)!) where n = jS j and
1

^ =  id (1 + 2i ).
2

Proof : We sort S on e a ording to the last oordinate in time O(n log n). Then
we pro eed as des ribed above. With a little are all subproblems generated are
also sorted. Hen e we obtain the same re urren e as in the proof of Theorem 7 with
the only hange that T (1; n) = O(n + ^  n) now. This will save one fa tor of log n
throughout.
Theorems 7 and 8 derive upper bounds on the performan e of a multi dimensional
divide and onquer algorithm for the xed radius near neighbor problem. Are there
any sets S  Rd where this upper bound is a tually a hieved? Let us look at
the two-dimensional ase. If the points of S rowd into a very narrow, say width
< , verti al slab then all subproblems generated will have indeed maximal size
and so our algorithm will run very long. A similar observation holds true in higher
dimensional spa e. however, this observation also suggests a major improvement
upon the basi algorithm. There is no a-priori reason for only looking at verti al
dividing lines, we an also look for horizontal dividing lines and hoose whatever
is better. A \good" dividing line is a line whi h divides set S into (nearly) equal
parts, de nes a small (size O(n)) lower dimensional subproblem whi h is easy to
nd. Good dividing lines always exist. We ontent ourselves to a dis ussion in the
two-dimensional spa e and leave the general ase to the reader.
Lemma 8. Let S  R , jS j = n,
2
be (; )-sparse. Then there exists a line L
orthogonal to one of the axes su h that
1) no half-spa e de ned by L ontains more than 4  n=5 points of S ;
p
2) the slab of width 2   around L ontains at most 36   n=5 points of S .
Proof : For i, i = 0; 1, let li = minfa; xi  a for at least n=5 points of S g and
hi = minfa; xi  a for at least 4  n=5 points of S g. Next onsider lines Lij = fy 2
R2 ; yi = li + (2  j + 1)  g, 0  j  (hi li )=2   1, and the slabs of width 2  
around them, i.e., SLij = fy 2 R2 ; li + 2  j   < yi < li + 2  (j + 1)  g.

slab
SLi1
| {z } | {z }| {z }
 2 2
:::
li Li0 Li1 Li2 hi
Figure 12.
Version: 19.10.99 Time: 11:39 {44{
7.2.2. Range Trees and Multidimensional Divide and Conquer 45
Claim:
a) For every i and j : No half-spa e de ned by Lij ontains more than 4  n=5
points of S .
p
b) For
p
every i: If ( h i li ) = 2    n=(20  ) then there is a j su h that jS \ SLij j 
36   n=5.
p
) There is an i su h that (hi li )=2    n=(20  ).
Proof : a) Sin e there are n=5 points x of S with xi < li there are learly that many
points with xi  li + (2  j + 1)  . Also here are less than 4  n=5 points x 2 S with
xi < hi and hen e less than 4  n=5 points x 2 S with xi  li + (2  j + 1)   < hi .
This proves a).
b) Slabs SLij , j  0, are pairwisep
disjoint and ontain at most 3  n=5 points of
S together. Ifp(hi li )=2    (1=20  )  n then there must be one j su h that
jS \ SLij j  36   n=5.

h1
C R1
l1
R0
l0 h0
Figure 13. Illustration of part ).,
p
) Assume otherwise. Then (hi li ) <   n=5  for i = 0; 1. Let Ri = fy 2
R2 ; li  yi  hi g and let C = R1 \ R2 . Furthermore, let f = jC \ S j and
ni = j(Ri C ) \ S j. Then f + ni  3  n=5 sin e jRi \ S j  3  n=5 and n0 + n1 + f  n
sin e sets R0 C , R1 C , C are pairwise disjoint. Thus n  n0 + n1 + f =
(n0 + f ) + (n1 + fp) f  6  n=5 f or f > n=5. C is a re tangle whose sides have
length at most   n=5  and is hen e easily overed by n=5  ir les of radius .
Sin e S is (; )-sparse any su h ir le ontains at most points of S and hen e
f < (n=5  )  = n=5, a ontradi tion.

Note that Lemma 8 also suggests a linear algorithm for nding a good dividing line.
Compute l0 ; h0 ; l1 ; h1 in linear time using the
plinear median algorithm (Se tion 2.4).
Let us assume w.l.o.g. that (h0 l0p )    n=5  . The proof p
of Lemma 8 shows
that one of the slabs SLi;j , 0  j  n=20  ontains at most 36   n=5 points
of S . The number of points in these slabs an be determined in linear time by
bu ket sort (Se tion 2.2.2). Thus a good dividing line an be determined in linear
Version: 19.10.99 Time: 11:39 {45{
46
time. We obtain the following re urren e for T (2; n), the time to ompute NN (S )
for an (; )-sparse set S  R2 , jS j = n.
p
T (2; n) = max T (2; n1 ) + T (2; n n1 ) + T (1; 36   n=5) + O(n):
n=5n1 4n=5

Sin e T (1; n) = O(n log n) we on lude


T (2; n) = max (T (2; n1 ) + T (2; n n1 )) + O(n):
n=5n1 4n=5

Theorem 9. The good dividing line approa h to the xed radius near neighbor
problem leads to an O(n log n) algorithm in 2-dimensional spa e.

Proof : In Se tion 3.5.1 Theorem 2a) we showed that the re urren e above has
solution T (2; n) = O(n log n).
Theorem 9 also holds true in higher-dimensional spa e. In d-spa e one an always
nd a dividing hyperplane whi h splits S into nearly equal parts (1=5 to 4=5 at the
worst) and su h that the slab around this hyperplane ontains at most O(n1 1=d )
points. This leads dire tly to an O(n log n) algorithm in d-spa e (Exer ise 26).

7.2.3. Lower Bounds

This se tion is devoted to lower bounds. We over two approa hes. The rst
approa h deals with partial mat h retrieval in minimum spa e and shows that rooti
sear h time is the best we an hope for. In parti ular, we show that dd-trees are
an optimal data stru ture. The se ond, more general approa h deals with a wide
lass of dynami multi-dimensional region sear hing problems. A region sear hing
problem ( f. introdu tion to 7.2) over universe U is spe i ed by a lass  2U
of regions. We show that the ost of insert, delete and query operations an be
bounded from below by a ombinatorial quantity, the spanning bound of lass .
The spanning bound is readily omputed for polygon and orthogonal range queries
and an be used to show that polygon trees and range trees are nearly optimal.

7.2.3.1. Partial Mat h Retrieval in Minimum Spa e

dd-trees are a solution for the partial mat h retrieval problem with rooti sear h
time and linear spa e. In fa t, dd-trees are a minimum spa e solution be ause
dd-trees are easily stored as linear arrays. The Figure 14 shows an ideal dd-tree
for (invertible) set S = f(1; II); (2; IV); (3; III); (4; V); (5; I)g and its representation
as an array. The orresponden e between tree and array is the same as for binary
sear h ( f. Se tion 3.3.1).
Version: 19.10.99 Time: 11:39 {46{
7.2.3.1. Partial Mat h Retrieval in Minimum Spa e 47

3 1 II
2 IV
IV (3; III) V 3 III
5 I
4 V
(1; II) (2; IV) (5; I) (4; V)
Figure 14. An ideal dd-tree.

The aim of this se tion is to show that dd-trees are an optimum minimum spa e
solution for the partial mat h retrieval problem; more pre isely, we show
(n1 1=d )
is a lower bound on the time omplexity of partial mat h retrieval in d-dimensional
spa e with one spe i ed omponent in a de ision tree model of omputation. The
exa t model of omputation is as follows.
Let Sn be the set of permutations of elements 0; 1; : : : ; n 1. For 1 ; : : : ; d 1 2
Sn let A(1 ; : : : ; P id 1 ) = f(i; 1 (i); : : : ; d 1 (i)); 0  i  ng and let In =
fA(1 ; : : : ; d 1 ); 1 ; : : : ; d 1 2 Sng. Then jInj = (n!)d 1 . In is the lass of
invertible d-dimensional sets of ardinality n with omponents drawn from the
range 0; 1; : : : ; n 1. We restri t ourselves to this range be ause in the de ision
tree model of omputation only the relative size of elements is relevant. A de ision
tree algorithm for the partial mat h retrieval problem of size n onsists of
1) a storage assignment SA whi h spe i es for every A 2 In the way of stor-
ing A in a table M [0 : : n 1; 0 : : d 1℄ with n rows and d olumns, i.e., SA :
(Sn )d 1 ! Sn with the following interpretation. For all 1 ; : : : ; d 1 2 Sn and
 = SA(1 ; : : : ; d 1 ): Tuple (i; 1 (i); : : : ; d 1 (i)) of set A(1 ; : : : ; d 1 ) is
stored in row (i) of table M , i.e., M [(i); j ℄ = j (i) for 0  j  d 1,
0  i < n. Here 0 is the identity permutation.
2) d de ision trees T0 ; : : : ; Td 1 . Trees Tj are ternary trees. The internal nodes
of tree Tj are labelled by expressions of the form X ? M [i; j ℄ where 0  i < n.
The three edges out of a node are labelled <, = and >. Leaves are labelled yes
or no .
A de ision tree algorithm is used as follows. Let A 2 In , let y 2 R and let j 2
[0 : : d 1℄. In order to de ide whether there is x = (x0 ; x1 ; : : : ; xd 1 ) 2 A with
xj = y we store A in table M as spe i ed by SA and then use de ision tree Tj to
de ide the question, i.e., we ompare y with elements in the j -th olumn of M as
pres ribed by Tj .

Theorem 10. If SA; T0 ; : : : ; Td 1 solves the partial mat h retrieval problem of


size n in d-spa e then there is a j su h that depth (Tj ) =
(n1 1=d ), i.e., the worst
ase time omplexity of a de ision tree algorithm for the partial mat h retrieval
problem is
(n1 1=d ).

Version: 19.10.99 Time: 11:39 {47{


48
Proof : The proof onsists of two parts. In the rst part we reformulate the problem
as a membership problem and in the se ond part we a tually derive a lower bound.
Consider tree Tj . It de ides whether there is an tuple x = (x0 ; : : : ; xd 1 ) 2 A
with xj = y, i.e., it de ides membership of y in the proje tion of A onto the j -th
oordinate. It does so by sear hing in an array, namely the j -the olumn of table M .
The j -th olumn of table M ontains n distin t elements, here integers 0; : : : ; n 1.
The ru ial observation is that these n di erent elements appear in many di erent
orderings. This observation leads to the following de nitions.
For 0  j < d letOT (j ) be the set of order types o urring in the j -th olumn,
i.e.,
OT (j ) = f 2 Sn ; there are 1 ; : : : ; d 1 2 Sn su h that

 = SA(1 ; : : : ; d 1 ) Æ j 1 g:
This de nition needs some explanation. Let 1 ; : : : ; d 1 2 Sn , let  = SA(1 ; : : : ;
d 1 ), and let A = A(1 ; : : : ; d 1 ). When set S is stored in table M then tuple
(i; 1 (i); : : : ; d 1 (i)) is stored in row (i) of table M , i.e., M [(i); j ℄ = j (i). In
other words, M [(j 1 (l); j ℄ ontains integer l, 0  l < n, i.e.,  Æ j 1 is one of
order types o urring in the j -th olumn.
Lemma 9. There is a j su h that jOT (jj )j  (n!)1 1=d .
Proof : The dis ussion following the de nition of OT (j ) shows that the mapping
(1 ; : : :Q; d 1 ) 7! (0 ; : : : ; d 1 ) where j = SA(1 ; : : : ; d 1 ) Æ j 1 is inje tive.
Hen e 0j d 1 jOT (j )j  (n!)d 1 .
Next, we des ribe pre isely the omputational power of de ision trees Tj . Let
^  Sn be a set of permutations. A de ision tree T solves problem SST () |

sear hing semi-sorted tables | if for every B = fx0 < x1 <    < xn 1 g, every
x and every  2 : ^ If B is stored in linear array M [0 : : n 1℄ a ording to order
type , i.e., M [(l)℄ = xl for 0  l < n 1, then T orre tly de ides x 2 B .
Lemma 10. Tj solves SST (OT (j )) for 0  j  d 1.
Proof : Note rst that Tj solves SST (OT (j )) for every B = fx0 < x1 <    < xn 1 g
if it does so for B = f0; 1; : : : ; n 1g. Next let  2 OT (j ). Then there must be
1 ; : : : ; d 1 su h that  = SA(1 ; : : : ; d 1 ) Æ j 1 . In parti ular, if our partial
mat h retrieval algorithm is applied to set A = A(1 ; : : : ; d 1 ) then A is stored in
table M [0 : : n 1; 0 : : d 1℄ su h that M [(l); j ℄ = l for all l; i.e., B = f0; : : : ; n 1g
is stored in the j -th olumn of M a ording to order type . Thus Tj solves
SST (OT (j )).

Lemma 9 and 10 redu e the partial mat h retrieval problem to the sear hing semi-
sorted tables problem. Lemma 11 gives a lower bound on the omplexity of the
latter problem.
Version: 19.10.99 Time: 11:39 {48{
7.2.3.2. The Spanning Bound 49
Lemma 11. Let ^  Sn and let de ision tree T solve SST ().
a) For every inje tive mapping  : [0 : : k 1℄ ! [1 : : n℄: jf(k);  2  and
^ i) = (i) for 0  i < kgj  depth (T ).
(
b) j^ j  depth (T )n .
Proof : b) Is a simple onsequen e of part a). Namely, let ^ k = fj[0::k 1℄;  2 ^ g.
Then j ^ 0 j = 1 and j ^ k+1 j  depth (T )  j
^ k j by part a). Hen e j^ j = j^ n j 
depth (T ) .
n
a) Let  : [0 : : k 1℄ ! [1 : : n℄ be inje tive, let  2 ^ and let B = fx0 < x1 <
   < xn 1g be stored in table M [0 : : n 1℄ a ording to . Consider a sear h for
x, xk 1 < x < xk . It de nes a path in tree T leading to a leaf whi h is labelled
\no". On this path x is ompared with at most depth (T ) distin t table positions,
say M [i1 ℄; : : : ; M [ih ℄, h  depth (T ). We laim (k) = il for some l, 1  l  h.
Assume otherwise. Then T [il ℄ 6= xk for all l. Consider a sear h for x = xk .
it will lead to exa tly the same leaf be ause the out ome of all omparisons is
un hanged. hen e T de ides that xk does not belong to B , a ontradi tion. We
have thus shown that (k) = il for some l, 1  l  h  depth (T ).
Theorem 10 is now an immediate onsequen e of Lemmas 1, 2 and 3. By Lemma 1,
there is a j with jOT (j )j  (n!)1 1=d . By Lemma 2, Tj solves SST (OT (j )) and
hen e has depth jOT (j )j1=n by Lemma p 3. Finally, jOT (j )j1=n  ((n!)1 1=d )1=n =
((n!)1=n )1 1=d =
(n1 1=d ) sin e n!  2    n(n=e)n by Stirling's approximation.

It is open whether Theorem 1 is also valid for more general models of omputation.
In parti ular, it is not known whether the lower bound is valid in a more general
de ision tree model where omparisons of the form T [i; j ℄ ? T [h; j ℄ are also allowed.
It is on eivable, that omparisons of this form an speed up sear hes onsiderably,
be ause they an be used to infer information about the storage assignment. This
point is followed up in Exer ise 29. We should also emphasize at this point that the
restri tion to minimum spa e solutions whi h is aptured in the de nition of storage
assignment is essential for the argument. After all, range trees provide us with
polylogarithmi sear h time if we are willing to use non-linear spa e. Exer ises 30{
32 dis uss various extensions.

7.2.3.2. The Spanning Bound

We introdu e the spanning bound and use it to prove lower bounds on the om-
plexity of polygon retrieval and orthogonal range queries.
We will rst de ne the region sear hing problem in an abstra t setting. Let U
be the key spa e, let M be a ommutative monoid (i.e., a set M with a ommutative,
asso iative operation + : M  M ! M and an element 0 2 M su h that x + 0 = x
Version: 19.10.99 Time: 11:39 {49{
50
for all x 2 M ) and let  2U be a set of regions U . The -region sear hing problem
is to (eÆ iently) maintain a partial fun tion S : U ! M under the operations
Insert(x; m): pre ondition: x 2 dom S , x 2 U , m 2 M
: e e t: S S [ f(x; m)g
Delete (x): pre ondition: x 2 dom S , x 2 U
: e e t: dom S dom S fxg
Query (R): pre ondition: RP2
: e e t: output x2R\dom S S (x)
This is in omplete agreement to our previous dis ussion of sear hing problems. U
is the key spa e. The problem is to maintain a set of pairs (x; m), where x 2 U ,
m 2 M ; m is the \information" asso iated with key x. Insert and Delete add and
delete pairs and Query sums the information over a region R.
Next we x the model of omputation. There is an in nite supply v0 ; v1 ; v2 ; : : :
of variables whi h take values in M . Initially, 0 is stored in every variable. The
instru tion repertoire onsists vi vj + v k , vi Input , Output vi , i; j; k 
0. Exer ise 33 dis usses a larger instru tion repertoire. A program is given by
an (in nite) state spa e Z , an initial state z0 inZ orresponding to the empty
fun tion S , and three fun tions fI ; fD ; fQ . Here fI : U  M  Z ! Z  Ins  ,
fD : U  Z ! Z  Ins  and fQ :  Z ! Z  Ins  where Ins  is the set of
all sequen es of instru tions from the repertoire. Fun tion fI has the following
semanti s. If the algorithm is in state z 2 Z , operation Insert (x; m) is to be
exe uted, and fI (x; m; z ) = (z 0 ; ) then z 0 is the new state and sequen e  2 Ins 
is to be exe uted. The rst instru tion of  is of the form vi Input and pla es
m into register vi . The remaining instru tions of  are of the form vi vj + vk .
The semanti s of fD and fQ are de ned similarly, i.e., after a deletion a sequen e
of additions is exe uted and after a query a sequen e of additions followed by an
output instru tion is exe uted.
A program Z; z0 ; fI ; fD ; fQ is orre t if it is orre t for all hoi es of monoid M .
It is orre t for a parti ular hoi e of M if the answers to all queries are omputed
orre tly.
The ost of inserting(x; y) in ontrol state z is the number of instru tions in ,
where (z 0 ; ) = fI (x; m; z ). The ost of a sequen e of operations is the sum of the
osts of the operations in the sequen e. We use Cn to denote the maximal ost of
any sequen e of n insertions, deletions and query operations (starting with empty
fun tion S ).
Example 1 (One-dimensional range trees): Let U = R, M = (N0 ; +; 0), and
let be the set of intervals. The set Z of ontrol states is the set of all BB[ ℄-
trees T for nite subsets of R, z0 is the empty tree. Let T be a BB[ ℄-tree. With
every node of T we asso iate a variable v whi h ontains the weight (= number of
leaves) in the subtree rooted at that node. An insert or delete requires the update
of O(log n) variables; the update requires only additions if we start updating at the
leaves. Also a query an be answered by summing O(log n) variables.
Version: 19.10.99 Time: 11:39 {50{
7.2.3.2. The Spanning Bound 51
The basi idea for the lower bound argument is as follows. It is intuitively lear and
will be made pre ise below that every variable ontains the sum of S (x) over some
subsets of U . A query for region R is then answered by summing some variables, i.e.,
by assembling R \ dom S from smaller pie es. If all queries are \easy" to answer,
then set R \ dom S an be assembled from only a few pie es for every R 2 .
This implies that we need to store information about some x 2 dom S in many (the
pre ise number depends on the stru ture of ) di erent pla es. If we delete x at this
point then a lot of variables be ome useless and must be re omputed after inserting
x with a di erent monoid value m. This argument suggests that updates are ostly
if queries are heap. The lower bound is then obtained by balan ing the ost of the
two operations. more generally it suggests that there is a trade-o between query
and update ost. In the ase of range trees we have seen su h a trade-o (as an
upper bound) in Se tion 7.2.2.
De nition:
a) Let X  U , X nite and let R1 ; R2 ; : : : ; Rl be all sets of the form X \ R,
R 2 . Then F = fY1 ; : : : ; Ym g, ; 6= Yi  X , is a spanning family for X
(with respe t to ) if
1) every Ri is the disjoint union of some Yi 's and
2) every Yi whi h is not a singleton is the disjoint union of some Yj and Yh .
b) For F = fY1 ; : : : Ymg a spanning family de ne
t(F ) = max minft; there is a representation of Ri by t disjoint Yj 's in F g
i

and
(F )0 maxfd; x is ontained in d Yj 'sg:
x2X
) For X  U , X nite, let
B (X ) = minfmax(t(F ); (F )); F is a spanning family for X g
and
Bn = maxfB (X ); X  U; jX j  ng:

We an now state the main theorem of this se tion.


Theorem 11. For every program Z; z0 ; fI ; fD ; fQ : Cn  bn=16 Bn .
Proof : We onstru t a sequen e of operations Op 1 ; Op 2 ; : : : ; Op n of total ost at
least bn=16 . The onstru tion is in three steps. in step one we show that we
an restri t attention to normal form programs, in step two we asso iate the ost
of normal form programs with the spanning bound and in step three we nally
onstru t a hard sequen e of operations.
Version: 19.10.99 Time: 11:39 {51{
52
De nition: A program Z; z0 ; fI ; fD ; fQ is in normal form if no variable is assigned
to twi e.

Lemma 12. For every program there is a normal form program of the same ost.

Proof : Lemma 12 states that spa e an be used intentionally wasteful and we all are
experts in that. A formal argument goes as follows. Let the normal form program
have variables v00 ; v10 ; : : : and ontrol set Z 0 = Z  W where W is the set of nite,
inje tive mappings from V = fv0 ; v1 ; : : :g to V 0 = fv00 ; v10 ; : : :g. Any sequen e 
of instru tions is repla ed by a sequen e of instru tions whi h assigns to unused
variables only. Asso iation w 2 W is updated a ordingly.

We open step two by xing monoid M . Let M be the set of multi-subsets of U  N


with operation union. We will only onsider sequen es of operations Op 1 ; : : : ; Op n
where ea h Insert is of the form Insert (x; (x; t)). In addition, t ounts the number
of times x was inserted so far. Moreover, x was deleted exa tly (t 1)-times before
it is inseted before the t-th time. Let v be any variable. Then val (v), the value
stored in v, is a multi subset of U  N. set (v) is the proje tion of val (v) on U .
Let Op 1 ; Op 2 ; : : : be a sequen e of Inserts, Deletes and Queries. Let Sh denote
fun tion S after exe ution of Op 1 ; : : : Op h . Then Sh 8x) = (x; t) for some t for every
x 2 dom Sh . t is the number of insertions Insert (x; ) in Op 1 ; : : : ; Op n . We say that
variable v is useless at h i val (v) 6 Range (Sh ). If v is not useless at h then v is
useful at h.

Lemma 13.
a) If v is useless at h then v is useless at h0 for all h0  h.
b) For every h : F = fset (v); v is useful at h and set (v) 6= ;g is a spanning family
for dom Sh.

Proof : a) If v is useless at h then val (v) 6 Range (Sh ), i.e., there is a pair (x; t) 2
val (v ) Range (Sh ). Sin e (x; t) 2 val (v ) and val (v ) must be a sum of some of
the monoid elements assigned to variables after insertions, x was inserted at least t
times during Op 1 ; : : : ; Op h . Sin e (x; t) 2= Range (Sh ) it was also deleted at least t
times. Hen e (x; t) 2= Range (Sh ) for all h0  h by our hoi e of Op 1 ; Op 2 ; : : :. Sin e
val (v ) will never hange we infer that v is useless at all h  h.
0
b) We have to verify properties 1) and 2) of a spanning family. Let us verify
property 2) rst. If v was assigned by v Input , then set (v ) is a singleton.
Hen e if set (v) is not a singleton then v was assigned by v u + w and hen e
val (v ) = val (u) + val (w ). Sin e v is useful at h and hen e val (v )  Range (Sh ) we
on lude that set (v) = set (u) [ set (w) and that set (u) = set (v) \ set (w) = ; (For
this inferen e it is important that we take the monoid of multi-sets and not the
monoid of subsets under union). This proves property 2).
Version: 19.10.99 Time: 11:39 {52{
7.2.3.2. The Spanning Bound 53
Property 1) an be seen as follows. let R 2 and suppose
P (for the moment) that
Op h+1 = Query (R). The answer to this query, i.e., fS (x); x 2 R \ dom Shg is
omputed as a sum of some variables. Call the set of these variables
P A. No variables
v 2 A an be useless at h sin e val (v) 6 Range (Sh ) implies fval (v); v 2 Ag 6
Range (Sh ). Also sets set (v ), v 2 A must be pairwise disjoint by the argument used
to prove property 2).
We are now ready to onstru t sequen e Op 1 ; : : : Op n of ost at least bn=16  Bn .
Let m = dn=2e and let X = fx1 ; : : : ; xk g  U , jX j  m be su h that B (X ) = Bm .
The following program de nes Op 1 ; : : : ; Op n .
a) Let Op i = Insert (xi ; (xi ; k1))) for 1  i  k
b) do bn=4 times
o at this point F = fset (v); set (v) 6= ; and v usefulg is a spanning family
for X and hen e Bm = B (X )  maxf(F ); t(F )g o
Case 1 : t(F )  Bm :
Then there is R 2 su h that at least Bm elements of F are needed to span
R \ dom S = R \ X . We let the next operation be Query (R). Answering this
query requires to sum at least Bm variables.
Case 2 : (F )  Bm :
Then there is x 2 X su h that x is ontained in at least Bm elements of F . Let the
next two operations be Delete (x), Insert (x; (x; t)) for the appropriate t. his
will make all variables v with (x; t 1) 2 val (v) and set (v) 2 F useless. There
are at least Bm su h variables.
It remains to estimate the omplexity of sequen e Op 1 ; Op 2 ; : : : ; Op n de ned
above. Let a (b) be the number of times ase 1 (2) was exe uted. Then a+b  bn=4 .
Also the total ost of Case 1 is at least a  Bm . In ase 2 at least b  Bm variables
are made useless. hen e at least that many variables must be assigned to. Thus
Cn  min maxfa  Bm ; b  Bm g
a+b=bn=4
 bn=8  Bm  bn=16  Bn
where the last inequality follows from
Lemma 14.
a) Bm  Bn for m  n.
b) Bm+n  Bm + Bn for all m and n.
Proof : a) Immediate from the de nition.
b) Let X  U , jX j = m + n, be su h that B (X ) = Bm+n. Let X1 ; X2 be a
partition of X with jX1 j  m, jX2 j  n. Then there are spanning families F1
and F2 for X1 and X2 respe tively with max(t(Fi ); (Fi ))  B (Xi ) for i = 1; 2.
F = F1 [ F2 is a spanning family for X = X1 [ X2 with t(F ) = t(F1 ) + t(F2 ) and
(F ) = max((F1 ); (F2 )). Thus Bn+m = B (X )  max(t(F ); (F ))  B (X1 ) +
B (X2 )  Bn + Bm .
Version: 19.10.99 Time: 11:39 {53{
54
The signi an e of Theorem 11 lies in the fa t that it relates the omplexity of an
algorithm, a quantity whi h involves time and is therefore diÆ ult to handle, with
a purely ombinatorial quantity, whi h is mu h easier to deal with. Before we apply
the spanning bound to orthogonal range queries and polygon retrieval it is helpful
to visualize spanning families in terms of graphs.
Let  2U be a set of regions and let X = fx1 ; : : : ; xn g  U . Let
R1 ; R2 ; : : : ; Rn be all sets of the form X \ R, R 2 (We may assume w.l.o.g.
that the number of sets is equal to the number of points be ause we an always add
either tious points or regions). Furthermore, let F = fY1 ; : : : ; Ym g be a spanning
family. Let us onstru t a bipartite graph G with node set fx1 ; : : : ; xn ; R1 ; : : : ; Rn g
and edge set E = f(xi ; Ri ); xi 2 Rj g. For every region Rj let Sj  f1; : : : ; mg be
su h that Rj is the disjoint union of Yl , l 2 Sj .
x1 ontains is used to R1
represent
x2 R2
Yi
.. ..
. .
xn Rn
Figure 15.
We an now \fa tor" graph G into disjoint omplete bipartite graphs as follows.
For every Yl onsider the omplete bipartite graph with nodes fxi ; xi 2 Yl g on the
X -side and fRj ; l 2 S g on the R-side.
Lemma 15. E is the disjoint union of the sets f(xi ; rj ); xi 2 Yl and l 2 Sj g,
1  l  m.

Proof : Let (xi ; Rj ) 2 E , i.e., xi 2 Rj . then there is exa tly one l su h that xi 2 Yl
and l 2 Sj .
For xi (Rj ) let deg(xi ) (deg(Rj )) be the degree of xi (Rj ) in the fa tored graph, i.e.,
deg(xi ) = jfl; xi 2 Yl gj and deg(Rj ) = jfl; l 2 Sj gj. Then t(F ) = maxj deg(Rj )
and (F ) = maxi deg(xi ). We want to derive lower bounds on max(t(F ); (F )) =
maxi;j (deg(Rj ); deg(xi )) whi h is ertainly no smaller than
X X  X 
deg(xi ) + deg(Rj ) =2  n = (ldeg (Yl ) + rdeg (Yl )) =2  n
i j l

Here ldeg (Yl ) = jYl j and rdeg (Yl ) = jfj ; l 2 Sj gj. It thus suÆ es to prove lower
bounds on the total degree of sets Yl , 1  l  m.
Version: 19.10.99 Time: 11:39 {54{
7.2.3.2. The Spanning Bound 55
Appli ation 1: Polygon Retrieval
We onsider a spe ial ase of polygon retrieval: line retrieval. More pre isely, we
assume U = R2 and the set of all lines in R2 , i.e., = ff(x0 ; x1 ) 2 R2 ; ax0 +bx1 =
g; a; b; 2 R; a 6= 0 or b 6= 0g.
Lemma 16. Let S = fy1 ; y2 ; : : : ; yn g  R2 and let L1 ; : : : ; Ln be a set of n pairwise
distin t lines. Let ri = jLi \ S j be the number of points of S on line Li and let
F = fY1 ; : : : ; Ym g be a spanning family for S with respe t to . Then
n
X
max(t(F ); (F ))  ri =2  n:
i=1

Proof : Consider any Yl . We laim that min(ldeg (Yl ); rdeg (Yl ))  1. Assume
ldeg(Yl )  2, i.e., there are points yj ; yh , j 6= h, su h that fyj ; yh g  Yl . Sin e
two points determine a line there is at most one line Lk su h that Yl  Lk . Thus
rdeg (Yl )  1.
Next observe that
X
jE j = ldeg (Yl )  rdeg (Yl ) [by Lemma 15℄
l
X
 (ldeg (Yl ) + rdeg (Yl ))min(ldeg (Yl ); rdeg (Yl ))  1℄
[sin e
l
 2  n  max(t(F[by
); (dis ussion
f )) following Lemma 15.℄
Pn
Thus max(t(F ); (F ))  jE j=2  n = i=1 ri =2  n.

We an now prove a lower bound on the omplexity of line retrieval by exhibiting


a set of lines of large total rank.

Theorem 12. The omplexity of line retrieval is


(n4=3 ), i.e., there is a sequen e
of n insertions, deletions and line retrievals of total ost
(n4=3 ).

Proof : In view of Lemma 16 and Theorem 11 it suÆ es to onstru t a set of n


points and n linesp su h that most lines ontain many points.
let A = b n and let S = [1 : : A℄  [1 : : A℄. For integers i; j; a; b let L(i; j; a; b)
be the line through points (i; j ) and (i + a; j + b). We onsider the set L of su h
lines given by 1  a  A1=3 , 1  j  A=2, 1  b  a, g d(a; b)) = 1.
Claim:
a) If (i; j; a; b) 6= (i0 ; j 0 ; a0 ; b0 ) then lines L(i; j; a; b), L(i0 ; j 0 ; a0 ; b0 ) are distin t.
b) The number N of lines in L satis es A2  N =
(A2 ).

Version: 19.10.99 Time: 11:39 {55{


56
) The total number of points from S on lines in L is
(n4=3 ).
Proof : a) Assume that the two lines are identi al. Then they must have identi al
slopes and hen e b=a = b0 =a0 . Sin e g d(a; b) = g d(a0 ; b0 ) = 1 we on lude a = a0
and b = b0 . Next, from (i; j ) 2 L(i0 ; j 0 ; a; b) we on lude (i; j ) = (i0 ; j 0 ) + x  (a; b)
for some x 2 R. Sin e g d(a; b) = 1 we must have x 2 N and hen e i  i0 mod a.
From 1  i; i0  a we infer i = i0 and hen e j = j 0 .
b) The number N of lines is ertainly no larger than A=2  (A1=3 )3 = A2 =2. Also it
is at least
1=3
AX
A=2  a  jfb; g d(a; b) = 1 and b  agj
a=1
1=3
AX
 A =2 
4 3= jfb; g d(a; b) = 1 and b  agj
a=A1=3 =2
=
(A4=3  (A1=3 )2 ) =
(A2 )
P
sin e m a=1 jfb; g d(a; b) = 1 and b  agj = (3= )  m + O (m  log n) ( f. G. Hardy,
2 2

E. Wright: The theory of Numbers, Fourth Edition, Oxford University Press, 1965,
p. 265).
) Every line in L ontains at least (A=2)=A1=3 = A2=3 =2 points of S . Thus the
total number of points from S on lines in L is
(A8=3 ) by part b) whi h in turn
is
(n4=3 ).

Similar arguments an be used to show lower bounds of the same order for half-
spa e retrieval and ir ular queries (Exer ises 34, 35). The best upper bound n0:77
on polygon retrieval is by polygon trees. There is still a gap to lose.
Appli ation 2: Orthogonal Range Queries
The lower bound for orthogonal range queries is somewhat harder to obtain. How-
ever, there is a merit to that. It agrees with the upper bound.

Theorem 13. The omplexity of orthogonal range queries in Rd is


(n  (log n)d ),
i.e., for every n there is a sequen e of n insertions, deletions and orthogonal range
queries of ost at least
(n  (log n)d ).

Proof : We will prove a lower bound of order (log n)d on the spanning bound. Let
A = bn1=d , let X = [1 : : A℄d . Then jU j = Ad . Also we onsider the following lass
of Ad \one-sided" range queries. For y 2 X let Ry = fx 2 U ; x  yg, where x  y
if xi  yi for 0  i  d.
Let F = fY1 ; : : : ; Ym g be a spanning family for X . As above onsider the
omplete bipartite graph asso iated with Yl ( f. dis usion following proof of Theo-
rem 11), i.e., let In (Yl ) = fx 2 X ; x 2 Yl g and let Out (Yl ) = fRy ; Yl is used to
represent Ry ; y 2 X g. Then Yl ontributes all of In (Yl )  Out (Yl ) to the bipartite
Version: 19.10.99 Time: 11:39 {56{
7.2.3.2. The Spanning Bound 57
graph G with edge set f(x; Ry ); x  yg asso iated with the orthogonal range query
problem.
The idea for the proof is now as follows. If Yl ontributes many edges to
graph G then most edges (x; Ry ) ontributed by Yl must have x  y. This suggests
to weight the edges (x; Ry ) of G su h that the weight is a de reasing fun tion of
y x. We an then hope to bound the weight of the edges overed by any Yl
from above and the weight of all edges from below. This would give the bound.
What weight fun tion should we hoose? It should be symmetri with respe t to th
oordinates. About the simplest de reasing fun tion with this property is to assign
weight

w(x; y) = ((y0 x0 + 1)(y1 x1 + 1)    (yd 1 xd 1 + 1)) 1

to edge (x; Ry ) for x  y.

Lemma 17. For every Yl 2 F :


X
w(x; y)  (2  )d  (jIn (Yl )j + jOut (Yl )j)
x2In (Yl )
Ry 2Out (Yl )

Proof : Let mi = maxfxi ; (x0 ; x1 ; : : : xd 1 ) 2 In (Yl )g and let B = f(m0 x0 ; : : : ; md 1


xd 1 ); (x0 ; : : : xd 1 ) 2 In (Yl )g, C = f(y0 m0 ; : : : ; yd 1 md 1 ); (y0 ; : : : yd 1 ) 2
Out (Yl )g. Then all elements of B and C are non-negative. This is obvious for B
and follows for C from the fa t that x 2 In (Yl ), Ry 2 Out (Yl ) implies x 2 Ry and
hen e x  y. We have
X X
w(x; y) = w(m u; m + v)
x2In (Yl ) u 2B
Ry 2Out (Yl ) v 2C
X
= ((u0 + v0 + 1)    (ud 1 + vd 1 + 1)) 1

u2B;v2C
X
 ((u0 + v0 + 1)    (ud 1 + vd 1 + 1)) 1

u2B [C;v2B [C

For i0  0, i1  0, : : : , id 1  0 let

(i0 ; i1 ; : : : ; id 1 ) 2 B [ C ;
ai0 i1 :::id 1 = 10;; ifotherwise.

Version: 19.10.99 Time: 11:39 {57{


58
Then
X X ai0 i1 :::id 1  aj0 j1 :::jd 1
w(x; y) 
( i + j0 + 1)    (id 1 + jd 1 + 1)
x2In (Yl ) i0 0;:::;id 1 0 0
Ry 2Out (Yl ) j0 0;:::;jd 1 0
X
 (2  )d  a2i0 i1 :::id 1
i0 ;:::;id 1 0
= (2  ))d  jB [ C j
 (2  )d  (jB j + jC j) = (2  )d  (jIn (Yl )j + jOut (Yl )j)
Here the next to last inequality follows from the following fa t.
Fa t: Let ai0 :::id 1 ; aj0 0j1 :::jd 1  0 be d-fold subs ripted variables. Then
X ai0 i1 :::id 1  aj0 j1 :::jd 1 X
( i + j + 1)    ( i + j + 1)
 (2  )d  a2i0 i1 :::id 1
i0 0;:::;id 1 0 0 0 d 1 d 1 i0 ;:::;id 1 0
j0 0;:::;jd 1 0

Proof : Case d = 1 is implied by Hilbert's inequality


X YX
ai  aj =(i + j + 1)  a2i ;
i;j 0 i0
f. G. Hardy, J. Littlewood, G. Polya, Inequalities, Cambridge University Press,
1967, p. 235. The general ase an be shown along similar lines. A omplete proof
an be found in M.L. Fredman, A Lower Bound on the Complexity of Orthogonal
Range Queries, JACM 28 (1981), 696{705.

Lemma 18. For all n  1, A = bn1=d


X
w(x; y) =
((A  log A)d ) =
(n  (log n)d )
(1 ;:::;1)xy(A;:::;A)

Proof : We have
X
w(x; y)
(1 ;:::;1)xy(A;:::;A)
X
 ((y0 x0 + 1)    (yd 1 xd 1 + 1)) 1

(1;:::;1)x(A=2)
(0 ;:::;0)y x(A=2;:::;A=2)
 X d
= (A=2)d  1=(y0 x0 + 1)
y0 x0 A=2
0

=
((A  log A)d ) =
(n  (log n)d )
Version: 19.10.99 Time: 11:39 {58{
7.3. Exer ises 59
The proof of Theorem 13 is now easily ompleted. We have
 
X
max((F ); t(F ))  jIn (Yl )j + jOut (Yl )j =2  n
l
[by the dis ussion following Lemma 15℄
X X
 w(x; y)=((2  )d  2  n)
l x2In (Yl )
Ry 2Out (Yl )

[by Lemma 17℄


X
= w(x; y)=((2  )d  2  n)
;:::;1)xy(A;:::;A)
(1

=
((log n)d )
[by Lemma 18℄
We have thus shown an
((log n)d ) lower bound on the spanning bound of orthog-
onal range queries. An appli ation of Theorem 2 nishes the proof.
Theorem 13 shows that range trees are optimal. They allow to pro ess n
insertions, deletions and queries in time O(n  (log n)d ) and no data stru ture an
do better.

7.3. Exer ises

P 
1) Show that every integer n an be uniquely written as n = ki=0 aii where
P k+1.℄
i 1  ai and a1 < a2 <    ak . [Hint: Use the identity ki=0 r+i i = r+k+1
Analyze the k-binomial transformation based on this representation.
2) Let f : N ! N be any non-de reasing fun tion with f P (i)  2 for all i. Let S be
any set with n elements. Let i = blog n and let n 2 = j 0 aj bj where b = f (i)
i
and aj 2 N0 and 0  aj < b.
a) Design a dynamization method based on the following representation of set S .
S is represented by a large blo k Slarge ontaining 2i points and stru tures Sj ,
j  0. Sj ontains exa tly aj  bj points of S .
b) Design a dynamization method based on the following representation of S . S
is represented by a large blo k Slarge ontaining 2i points and stru tures Sj;l ,
j  0, 1  l  aj . A stru ture Sj;l ontains exa tly bj points of S .
Determine QD (n) and ID (n) in both ases. Reformulate your answers and prove
Theorem 3.

Version: 19.10.99 Time: 11:39 {59{


60
3) Re onsider weighted trees as investigated in Se tion 3.4. Let 1 ; : : : ; n be a
probability distribution and let rank (i) = jfj ; j  i gj. Is the depth of node i in
a weighted tree bounded by O(rank (i))?
4) Work out weighted interpolation sear h in detail. In parti ular, state pre isely
under what assumption the O(log log n) bound on sear h time applies to all blo ks
onstru ted by weighting.
5) Des ribe algorithm Demote (y; a) in a weighted dynami data stru ture in detail.
Analyze its running time.
6) Use weighting to turn sorted arrays + binary sear h into weighted di tionar-
ies. Do not only support su essful but also unsu essful sear hes. There are
probabilities asso iated with unsu essful sear hes as well, i.e., start with a distri-
bution ( 0 ; 1 ; : : : ; n ; n ) as in Se tion 3.4. [Hint: De ne distribution 1 ; : : : ; n by
i := i + ( i 1 + i )=2 and use ideas similar to the ones used to prove Theorem 7.℄
7) Do Exer ise 6) for interpolation sear h.
8) Develop self-organizing ( f. 3.7) data stru tures for monotone de omposable
sear hing problems. [Hint: Use algorithm Promote of Theorem 8 to implement a
\Move to rst group" or \Move up one group" heuristi . Choose the elements whi h
move down arefully (randomly!).℄
9) Let T be an (a; b)-tree ( f. 3.5.2) with n leaves. For a node v let w(v) be the
number of leaves in the subtree with root v and let d(v) be the depth of v. Is there
a onstant > 1 su h that w(v)  n= d(v) for all v and T ? If not, what does this
mean for the dynamization of order de omposable problem based on (a; b)-trees. In
parti ular, is the remark following Lemma 2 valid?
10) Let VD (S ) be the Voronoi diagram ( f. 8.3) of point set S  R2 . Show that
Voronoi diagrams an be maintained in time O(n) per insertion and deletion. [Hint:
Use order de omposability.℄
11) A half-spa e in R2 is a set f(x; y) 2 R; ax + by  g for some a; b; 2 R.
Show that the interse tion of n halfspa es an be omputed in time O(n log n)
and that the interse tion an be maintained under insertions and deletions in time
O((log n)2 ) per update. [Hint: The interse tion is always a onvex polygon. Use
order de omposability and the results of Se tion 8.1.℄
12) For (x1 ; y1 ); (x2 ; y2 ) 2 R2 let (x1 ; y1 )  (x2 ; y2 ) if x1  x2 and y1  y2 ).
Show that the maximal elements of a set S  R2 , jS j = n, an be omputed in
time O(n log n) and that it an be maintained in time O((log n)2 ) per insertion and
deletion.

Version: 19.10.99 Time: 11:39 {60{


7.3. Exer ises 61
13) De ne weight-balan ed dd-trees (as outlined in the remarks following 7.2.1,
Theorem 2). Rebalan e a weight-balan ed dd-tree after an insertion/deletion by
repla ing the largest subtree whi h went out of balan e by an ideal dd-tree. Show
that the amortized ost of an insertion/deletion is O(8d + log n)  log n).

14) Are Theorems 2, 3 and 4 true for weight-balan ed dd-trees? [Hint: Consider
weight-balan ed 2d-trees with = 1=4. Take a tree where every node of even depth
has balan e 1=4 and every node of odd depth has balan e 1=2. Consider a partial
mat h query with spe i ed 0-th oordinate. This oordinate is hosen su h that
the sear h is always dire ted into the heavier subtree.℄
ub 15 Show that the ounting version of partial mat h retrieval has time omplexity
O(n1 1=(d s+1) ) in ideal dd-trees.
16) Compute fun tion f (d; d s) of Theorem 3 expli itely. Can you improve upon
the argument used to prove Theorem 3 in order to get a better bound on f (d; d s)?

17) Prove Theorem 2.1.4 for arbitrary d.


18) Show that an arbitrary polygon query may have linear running time in an ideal
2d-tree.

19) A j -way subdivision of the plane onsists of two in nite parallel lines L1 ; L2
and half-lines L3 ; L4 ; : : : ; Lj su h that the starting point of Li lies on Li 1 , Li
interse ts L1 and is fully to the right of Li 1 . A j -way subdivision divides the
plane into 2  j open regions and j one dimensional regions. Show that for every
set S  R, jS j = n, not all points of S ollinear, there is a j -way subdivision su h
that jS \ Ri j  dn=2j e for any of the open regions Ri . Dis uss polygon trees based
on j -way subdivisions and show that they yield O(s  nlog(j +1)= log 2j ) retrieval time.
Here s is the number of sides of the polygon.

20) Design a stati data stru ture for orthogonal range queries whi h uses spa e
O(n1+ ) for some  > 0 and has query time O(d  log n). [Hint: Find a hierar hi al
de omposition of set S into ontiguous subsets su h that every ontiguous subset
of S an be found by using only a few pie es.℄

21) Base range trees on (a; b)-trees ( f. Se tion 3.5.2). Reprove some or all of
Lemmas 1{4 in Se tion 2.2.

22) For x = (x0 ; : : : ; xd 1 ) and y = (y0 ; : : : ; yd 1 ) de ne x  y i xi  yi for all i.


For S  U0    Ud 1 and x 2 S let rank (x) = jfy 2 S ; y < xgj be the number of
points less than x. rank is also alled the empiri al umulative distribution fun tion.
Show that rank (x), x 2 S an be omputed in time O(n  (log n)max(1;d 1) ). [Hint:
Use range trees.℄

Version: 19.10.99 Time: 11:39 {61{


62
23) Let S  U0      Ud 1 and let  be de ned as in Exer ise 22). Show how
to ompute the set of maxima of S in time O(n  (log n)max(1;d 2) ). [Hint: Use
multi-dimensional divide-and- onquer.℄
24) Design an algorithm for the xed radius near neighbors problem with respe t
P 
to the Lp -Norm, p > 0. We have dist p (x; y) = 0i<d jxi yi jp 1=p . Cases p = 1
and p = 1 are parti ularly interesting. Here dist 1 (x; y) = maxi jxi yi j is the
ity-blo k metri .
25) Complete the proof of Theorem 7 of Se tion 7.2.2.
26) Extend Lemma 6 of Se tion 7.2.2 to d-spa e, d  3. Use the extension to
generalize Theorem 9 to d-spa e.
27) Study the average ase omplexity of the -nearest neighbor problem under the
following assumption. S is drawn from [0; 1℄d a ording to the uniform distribution.
28) (Closest Pair). given S  Rd , nd x; y 2 S su h that dist (x; y)  dist (x0 ; y0 )
for all x0 ; y0 2 S , x0 6= y0 . [Hint: Extend the algorithm for the -nearest neighbor
problem.℄
29) (Sear hing semi-sorted tables): Let  = f; there is a j 2 [0 : : n 1℄ su h
that (i) = (i + j ) mod n for all ig. Show a linear lower bound for SST () in the
de ision tree model onsidered in Se tion 7.2.2.1. Show that O(log n) omparisons
suÆ e if omparisons of the form T [h℄ ? T [k℄ are permitted! [Hint: use the proof
te hnique of Lemma 3 to prove the lower bound, use omparisons T [h℄ ? T [k℄ to
nd j for the upper bound.℄
30) Extend Se tion 7.2.3.1, Theorem 1 to an average ase lower bound.
31) Let SA; T0 ; : : : ; Td 1 be aQ
solution for the partial mat h retrieval problem in the
sense of Se tion 7.2.3.1. Show 0i<d depth (Ti ) =
(nd 1 ), in parti ular depth (T0 )
depth (T1 ) =
(n) for d = 2. Modify dd-trees su h that a query with spe i ed 0-th
oordinate takes time O(n ) and a query with spe i ed 1-th oordinate takes time
O(n1 ). Here 0 < < 1.
32) Show that a partial mat h query with s spe i ed omponents takes time

(n1 s=d ) in the worst ase in the de ision tree model.


33) Show that Se tion 7.2.3, Theorem 2 stays true if additional instru tions vi
 vk , if vi = vj then : : : ,, i; j  0, 2 N, are allowed.
34) Show an n4=3 lower bound on the omplexity of half-spa e queries. A half-spa e
in R2 is of the form f(x0 ; x1 ); ax0 + bx1  g for some a; b; 2 R.

Version: 19.10.99 Time: 11:39 {62{


7.4. Bibliographi Notes 63
35) Show an n4=3 lower bound on the omplexity of ir ular queries, i.e., queries
of the form f(x0 ; x1 ); (x0 a)2 + (x1 b)2  g.
36) Let  2n be a set of regions and let B : N ! N be the spanning bound
with respe t to . Show: there is an algorithm in the sense of Se tion 7.2.3.2 with
Cn = O(Bn log n). [Hint: Use Se tion 7.1, Theorem 5 on deletion de omposable
sear hing problems; show that there is a data stru ture S whi h supports deletions
in time Bn , i.e., Ds (n) = Bn , and whi h an be built in time n  Bn , i.e., Ps (n) =
n  Bn .℄

7.4. Bibliographi Notes

Dynamization was introdu ed by Bentley (79) and later explored by Bentley/Saxe


(80) (Theorem 1 and Exer ise 1), Overmars/v. Leeuwen (81,81) (Theorems 2, 5
and 6), Mehlhorn/Overmars (81) (Theorem 3) and Mehlhorn (81) (Theorem 4).
The se tion on weighting follows Frederi kson (82) and Alt/Mehlhorn (82). Over-
mars (82) introdu ed order de omposable problems.
d-dimensional trees were introdu ed by Bentley (75) and Theorems 1 and 2
of Se tion 2.1 are taken from there. Weight-balan ed dd-trees were dis ussed by
Overmars/v. Leeuwen (82). The analysis of orthogonal range queries (Theorem 4)
in dd-trees is taken from Lee/Wong (77). Theorem 3 has not appeared before.
Polygon trees with sla k parameter 1 are due to Bentley (79), Lue ker (78),
and Willard (78). The treatment of range trees with general sla k parameter
seems to be new. A stati trade-o between spa e and query time is established
in Bentley/Maurer-(80). The treatment of multi-dimensional divide and onquer
follows Bentley (80); Exer ises 22{26 an also be found there. Monier (80) treats
re urren es arising in this area.
Se tion 2.3.1 follows Alt/Mehlhorn/Munro (81); Exer ises 29{31 an also be
found there. Se tion 2.3.2 is the work of Fredman (81, 81, 81). Yao (82) proves
lower bounds on time/spa e trade-o s for a similar model of omputation.

Version: 19.10.99 Time: 11:39 {63{

You might also like