Professional Documents
Culture Documents
In Chapter 3 we designed
lever data stru
tures for variants of the di
tionary prob-
lem. With a little bit of unfairness one might say that all we did in Chapter 3 is the
following: We started with binary sear
h on sorted arrays and generalized it in two
dire
tions. First we generalized to weighted stati
trees in order to
ope with inser-
tions and deletions. Finally we
ombined both extensions and arrived at weighted
dynami
trees. Suppose now that we want to repeat the generalization pro
ess for
a dierent data stru
ture, say interpolation sear
h. Do we have to start all over
again or
an we prot from the development of Chapter 3? In this se
tion we will
des
ribe some general te
hniques for generalization: dynamization and weighting.
We start out with a stati
solution for some sear
hing problem, i.e., a solution
whi
h only supports queries, but does support neither insertions and deletions nor
weighted data. Then dynamization is a method whi
h allows us to also support
insertions and deletions, weighting is a method whi
h allows us to support queries
to weighted data and nally weighted dynamization
ombines both extensions. Of
ourse, we
annot hope to arrive at the very best data stru
ture by only applying
general prin
iples. Nevertheless, the general prin
iples
an give us very qui
kly fully
dynami
solutions with reasonable running time. Also there are data stru
tures,
e.g., d-dimensional trees, where all spe
ial purpose attempts of dynamization have
failed.
Binary sear
h on sorted arrays will be our running example. Given a set of n
elements one
an
onstru
t a sorted array in time O(n log n) (prepro
essing time
is O(n log n)), we
an sear
h the array in time O(log n) (query time is O(log n)),
and the array
onsumes spa
e O(n) (spa
e requirement is O(n)). Dynamization
produ
es a solution for the di
tionary problem (operations Insert, Delete, Member)
with running time O(log n) for Inserts and Deletes and O(log2 n) for Member. Thus
Inserts and Deletes are as fast as in balan
ed trees but queries are less eÆ
ient.
Weighting produ
es weighted stati
di
tionaries with a
ess time O(log 1=p) for an
element of probability p. This is the same order of magnitude as the spe
ial purpose
solution of Se
tion 3.4, the fa
tor of proportionality is mu
h larger though. Finally
weighted dynamization produ
es a solution for the weighted, dynami
di
tionary
problem with running time O((log 1=p)2 ) for Member operations and running time
O(log 1=p) for Insert, Delete, Promote and Demote operations. Note that only
a
ess time is worse than what we obtained by dynami
weighted trees in 3.6.
Although sorted arrays are our main example, they are not an important ap-
pli
ation of our general prin
iples. The most important appli
ations are data stru
-
tures for higher dimensional sear
hing problems des
ribed in this
hapter. In many
of these
ases only stati
solutions are known and all attempts to
onstru
t dynami
or weighted solutions by spe
ial purpose methods have failed so far. The only dy-
nami
or weighted solutions known today are obtained by applying the general
prin
iples des
ribed in this se
tion.
A sear
hing problem takes a point in T1 and a subset of T2 and produ
es an answer
in T3 . There are plenty of examples. In the member problem we have T1 = T2 ,
T3 = ftrue; falseg and Q(x; S ) = \x 2 S ". In the nearest neighbor problem in the
plane we have T1 = T2 = R2 , T3 = R and Q(x; S ) = Æ(x; y), where y 2 S and
Æ(x; y) Æ(x; z ) for all z 2 S . Here Æ is some metri
. In the inside the
onvex hull
problem we have T1 = T2 = R2 , T3 = ftrue; falseg and Q(x; S ) = \is x inside the
onvex hull of point set S ". In fa
t, our denition of sear
hing problem is so general
that just about everything is a sear
hing problem.
A stati
data stru
ture S for a sear
hing problem supports only query oper-
ation Q, i.e., for every S T2 one
an build a stati
data stru
ture S su
h that
fun
tion Q(x; S ) : T1 ! T3
an be
omputed eÆ
iently. We deliberately use the
same name for set S and data stru
ture S be
ause the internal workings of stru
-
ture S are of no
on
ern in this se
tion. We asso
iate three measures of eÆ
ien
y
with stru
ture S , query time QS , prepro
essing time PS and spa
e requirement SS .
QS (n) = time for a query on a set of n points using data stru
ture S .
ID (n) = time for inserting a new point into a set of n points stored in D.
ID (n) = amortized time for n-th insertion, i.e., (maximal total time spent on exe
uting
insertions in any sequen
e of n operations starting with the empty set)=n.
Proof : The proof is based on a simple yet powerful idea. At any point of time the
dynami
stru
ture
onsists of a
olle
tion of stati
data stru
tures for parts of S ,
i.e., set S is partioned into blo
ks Si . Queries are answered by querying the blo
ks
and
omposing the partial answers by t. Insertions are dealt with by suitably
ombining blo
ks. P
P???
The details are as follows. Let S be any set of n elements and let n = i=0 ai 2i ,
i=0 ai 2 f0; 1g, be the binary representation of n. Let S0 ; S1 ; : : : be any partition of S
with jSi j = ai 2i , 0 i log n. Then stru
ture D is just a
olle
tion of stati
data
stru
tures, one for ea
h non-empty Si .
The spa
e requirement of D is easily
omputed as
X X
SD (n) = SS (ai 2i ) = (SS (ai 2i )=ai 2i ) ai 2i
i i
X
(SS (n)=n) ai 2i = SS (n):
i
Proof : The basi
idea is to use the
onstru
tion of Theorem 1, but to spread work
over time. More pre
isely, whenever a stru
ture of size 2k has to be
onstru
ted we
will spread the work over the next 2k insertions. This will have two
onsequen
es.
First, the stru
ture will be ready in time to pro
ess an over
ow into a stru
ture of
size 2k+1 and se
ond, the time required to pro
ess a single insertion is bounded by
blog
Xn
P (2k )=2k = O(P (n)=n log n):
k=0
The details are as follows. The dynami
stru
ture D
onsists of bags BA0 ; BA1 ,
: : : . Ea
h bag BAi
ontains at most three blo
ks Biu [1℄, Biu [2℄ and Biu [3℄ of size 2i
that are \in use" and at most one blo
k Bi
of size 2i that is \under
onstru
tion".
More pre
isely, at any point of time blo
ks Biu [j ℄, i 0, 1 j 3, form a partition
of set S , and stati
data stru
tures are available for them. Furthermore, the stati
data stru
ture for blo
k Bi
is under
onstru
tion. Blo
k Bi
is the union of two
blo
ks Biu 1 [j ℄. We pro
eed as follows. As soon as two Biu 's are available, we start
building a Bi
+1 of size 2i+1 out of them. The work is spread over the next 2i+1
insertions, ea
h time doing PS (2i+1 )=2i+1 steps of the
onstru
tion. When Bi
+1 is
nished it be
omes a Biu+1 and the two Biu 's are dis
arded. We have to show that
there will be never more than three non-empty Biu 's.
Lemma 1. When we
omplete a Bi
and turn it into a blo
k in use there are at
most two non-empty Biu 's.
Proof : Consider how blo
ks in BAi develop. Consider the moment, say after the
t-th insertion, when BAi
ontains two Biu 's and we start building a Bi
+1 out of
them. The
onstru
tion will be nished 2i+1 insertions later. Observe that BAi got
Version: 19.10.99 Time: 11:39 {6{
7.1.1. Dynamization 7
a se
ond Biu be
ause the
onstru
tion of Bi
was
ompleted after the t-th insertion
and hen
e Bi
was turned into a Biu . Thus Bi
was empty after insertion t and it
will take exa
tly 2i insertions until it is full again and hen
e gives rise to a third Biu
and it will take another 2i insertions until it gives rise to a fourth Biu . Exa
tly at
this point of time the
onstru
tion of Biu+1 is
ompleted and hen
e two Biu 's are
dis
arded. Thus we
an start a new
y
le with just tow Biu 's
ompleted.
It follows from Lemma 1 that there will be never more than three Biu 's and one Bi
for any i. Hen
e
SD (n) = O(SS (n));
QD (n) = O(QS (n) log n) and
blog
Xn
ID (n) = O(PS (2i )=2i ) = O(PS (n)=n log n):
i=0
The remarks following Theorem 1 also apply to Theorem 2. The \logarithmi
"
dynamization method des
ribed above has a large similarity to the binary number
system. The a
tions following the insertion of a point into a dynami
stru
ture of n
elements are in
omplete analogy to adding a 1 to integer n written in binary. The
main dieren
e is the
ost of pro
essing a
arry. The
ost is P (2k ) for pro
essing
a
arry from the k-th position in logarithmi
dynamization, whilst it is O(1) in
pro
essing integers. The analogy between logarithmi
dynamization and the binary
number system suggests that other number systems give rise to other dynamization
methods. This is indeed the
ase. For example, for every k one
an uniquely write
every integer n as
k
X ai
n=
i=1 i
with i 1 ai and a1 < a2 < < ak (Exer
ise 1). This representation gives
rise to k-binomial transformation. We represent
a set S of n elements by k stati
stru
tures, the i-th stru
ture holding aii elements. Then QD (n) = O(QS (n) k)
and ID (n) = O(k n1=k PS (n)=n) (Exer
ise 1). More generally we have
Theorem 3. Let S be any stati
data stru
ture for a de
omposable sear
hing
problem and let k : N ! N be any \smooth" fun
tion. Then there is a semi-
dynami
data stru
ture D su
h that
a) if k(n) = O(log n) then
QD (n) = O(k(n) QS (n));
ID (n) = O(k(n) n1=k(n) PS (n)=n):
b) if k(n) =
(log n) then
QD (n) = O(k(n) QS (n));
ID (n) = O(log n= log(k(n)= log n) PS (n)=n):
Version: 19.10.99 Time: 11:39 {7{
8
Proof : The proof
an be found in K. Mehlhorn, M.H. Overmars: \Optimal Dy-
namization of De
omposable Sear
hing Problems", IPL 12 (1981), 93{98. The
details on the denition of smoothness
an be found there; fun
tions like log n,
log log n, log log log n, n, (log n)2 are smooth in the sense of Theorem 3. The proof
is outlined in Exer
ise 2.
Let us look at some examples. Taking k(n) = log n gives the logarithmi
trans-
formation (note that n1= log n = 2), k(n) = k yields an analogue to the k-binomial
transformation, k(n) = k n1=k yields a transformation with QD (n) = O(k n1=k
QS (n)) and ID (n) = O(kPS (n)=n), a dual to the k-binomial transformation, and
k(n) = (log n)2 yields a transformation with QD (n) = O((log n)2 QS (n)) and
ID (n) = O((log n= log log n) PS (n)=n). Again it is possible to turn amortized time
bounds into worst
ase time bounds by the te
hniques used in the proof in Theo-
rem 2. The interesting fa
t about Theorem 3 is that it des
ribes exa
tly how far
we
an go by dynamization.
Theorem 4 states that there is no way to
onsiderably improve upon the results of
Theorem 3. There is no way to de
rease the order of the query penalty fa
tor (=
QD (n)=QS (n)) without simultaneously in
reasing the order of the update penalty
fa
tor (= ID (n) n=PS (n)) and vi
e versa. Thus all
ombinations of query and
update penalty fa
tor des
ribed in Theorem 3 are optimal. Moreover, all optimal
transformations
an be obtained by an appli
ation of Theorem 3.
Turning stati
into semi-dynami
data stru
tures is
ompletely solved by The-
orems 1 to 4. How about deletions? Let us
onsider the
ase of the sorted array
rst. At rst sight deletions from sorted arrays are very
ostly. After all, we might
have to shift a large part of the array after a deletion. However, we
an do a \weak"
deletion very qui
kly. Just mark the deleted elements and sear
h as usual. As long
as only a few, let's say no more than 1=2 of the elements are deleted, sear
h time is
still logarithmi
in the number of remaining elements. This leads to the following
denition.
Version: 19.10.99 Time: 11:39 {8{
7.1.1. Dynamization 9
Denition: A de
omposable sear
hing problem together with its stati
stru
ture S
is deletion de
omposable i, whenever S
ontains n points, a point
an be deleted
from S in time DS (n) without in
reasing the query time, deletion time and storage
required for S .
We assume that DS (n) is non-de
reasing. The Member problem with stati
stru
-
ture sorted array is deletion de
omposable with DS (n) = log n, i.e., we
an delete
an arbitrary number of elements from a sorted array of length n and still keep query
and deletion time at log n. Of
ourse, if we delete most elements then log n may be
arbitrarily large as a fun
tion of the a
tual number of elements stored.
Theorem 5. Let sear
hing problem Q together with stati
stru
ture S be deletion
de
omposable. Then there is a dynami
stru
ture D with
QD (n) = O(log n QS (8 n));
SD (n) = O(SS (8 n));
ID (n) = O(log n PS (n)=n);
D D (n) = O(PS (n)=n + DS (n) + log n):
Proof : The proof is a renement of the
onstru
tion used in the proof of Theorem 1.
Again we represent a set S of n elements by a partition B0 , B1 , B2 , : : : . We
somewhat relax the
ondition on the size of blo
ks Bi ; namely, a Bi is either empty
or 2i 3 < jBi j 2i . Here jBi j denotes the a
tual number of elements in blo
k Bi .
Bi may be stored in a stati
data stru
ture whi
h was originally
onstru
ted for
more points but never more than 2i points. In addition, we store all points of S in
a balan
ed tree T . In this tree we store along with every element a pointer to the
blo
k Bi
ontaining the element. This will be useful for deletions. We also link all
elements belonging to Bi , i 0, in a linear list.
Sin
e jBi j 2i 3 there are never more than log n + 3 non-empty blo
ks. Also
sin
e the stru
ture
ontaining Bi might have been
onstru
ted for a set eight times
the size we have
QD (n) QS (8 n) (log n + 3) = O(QS (8 n) log n):
Also X
SD (n) SS (8 jBi j)
i
X
= SS (8 jBi j)=(8 jBi j) 8 jBi j
i
X
SS (8 n)=8 n 8 jBi j
i
= SS (8 n):
Version: 19.10.99 Time: 11:39 {9{
10
It remains to des
ribe the algorithms for insertion and deletion. We need two
denitions rst. A non-empty blo
k Bi is deletion-safe if jBi j 2i 2 and it is safe
if 2i 2 jBi j 2i 1 .
Insertions are pro
essed as follows. After an insertion of a new point x we
nd the least k su
h that 1 + jB0 j + + jBk j 2k . We build a new stati
data
stru
ture Bk for fxg [ B0 [ [ Bk in time PS (2k ) and dis
ard the stru
ture
for blo
ks B1 ; : : : ; Bk . In addition we have to update the di
tionary for a
ost
of O(log n + 2k ), log n for inserting the new point and 2k for updating the new
information asso
iated with the points in the new Bk . Note that time O(1) per
element suÆ
es if we
hain all elements whi
h belong to the same blo
k in a linked
list.
Lemma 2. Insertions build only deletion-safe stru
tures.
Proof : This is obvious if k = 0. If k > 0 then 1 + jB0 j + + jBk 1 j > 2k by the
1
Lemma 4. D D (n) = O(PS (m)=m + DS (m) + log m), here m is the maximal size
of set S during the rst n updates.
Proof : By Lemmas 2 and 3 only deletion-safe blo
ks are built. Hen
e at least 2i 3
points have to be deleted from a blo
k Bi before it
auses restru
turing after a
deletion. Hen
e the
ost for restru
turing is at most 8 PS (m)=m per deletion. In
addition, log m time units are required to update the di
tionary and DS (m) time
units to a
tually perform the deletion.
Proof : The proof
ombines all methods des
ribed in this se
tion so far. It
an
be found in M.H. Overmars/J.v. Leeuwen: \Worst Case Optimal Insertion and
Deletion Methods for De
omposable Sear
hing Problems, IPL 12 (1981), 168{173.
In this se
tion we des
ribe weighting and then
ombine it with dynamization de-
s
ribed in the previous se
tion. This will give us dynami
weighted data stru
tures
for a large
lass of sear
hing problems.
Again, there are plenty of examples. Member is monotone de
omposable with q the
identity and t = or. -diameter sear
h, i.e., Q((x; ); S ) = true if 9y 2 S : Æ(x; y)
is monotone de
omposable with q the identity and t = or. Also orthogonal range
sear
hing is monotone de
omposable. Here T2 = R2 , T1 = all re
tangles with sides
parallel to the axis and Q(R; S ) = (jR \ S j 1).
A query Q(x; S ) is su
essful if there is a y 2 S su
h that q(Q(x; fyg)). If
Q(x; S ) is su
essful then any y 2 S with q(Q(x; fyg)) is
alled a witness for x
(with respe
t to S ). If y is a witness for x then Q(x; S ) = Q(x; fyg [ (S fyg)) =
if q(Q(x; fyg)) then Q(x; fyg) else : : : fi = Q(x; fyg).
Weighting is restri
ted to su
essful sear
hes (but
f. Exer
ise 6). Let S =
fy1 ; : : : ; yn g T2 and let be a probability distribution on Su
= fx 2 T1 ; Q(x; S )
is su
essfulg. We dene a reordering of S and a dis
rete probability distribution
p1 ; : : : ; pn on S as follows. Suppose that (1); : : : ; (k 1) and p1 ; : : : ; pk 1 , are
already dened. For yj 2 S fy(1) ; : : : ; y(k 1) g let p(yj ) = fx 2 Su
; yj is a
witness for x but none of y(1) ; : : : ; y(k 1) isg. Dene (k) = j su
h that p(yj ) is
maximal and let pk = p(y(k) ). Then p1 p2 pn . We assume from now on
that S is reordered su
h that is the identity. Then pk is the probability that yk
is witness in a su
essful sear
h but none of y1 ; : : : ; yk 1 is.
Version: 19.10.99 Time: 11:39 {12{
7.1.2. Weighting and Weighted Dynamization 13
Theorem 7. Let Q be any monotone de
omposable sear
hing problem and suppose
that we have a stati
data stru
ture with query time QS (n), QS (n) non-de
reasing,
for Q. Let S = fy1 ; : : : ; yn g T2 , and let ; p1 ; : : : ; pn be dened as above. Then
there is a weighted data stru
ture W for Q where the expe
ted time of a su
essful
sear
h is at most X
4 pi QS (i) 4 pi QS (1=pi ):
i
i := 0;
repeat i := i + 1 until Q(x; Bi ) is su
essful od;
output Q(x; Bi ).
Program 1
The
orre
tness of this algorithm is immediate from the denition of monotone
de
omposability. It remains to
ompute the expe
ted query time of a su
essful
query. Let Su
j = fx 2 T1 ; yj is a witness for x but y1 ; : : : ; yj 1 are notg. Then
pj = (Su
j ). The
ost of a query Q(x; S ) for x 2 Su
j and f (i 1) < j f (i) is
i
X i
X
QS (f (h) f (h 1)) QS (f (h))
h=1 h=1
Xi
= 2h
h=1
4 2i 1
= 4 QS (f (i 1))
4 QS (j ):
Thus the expe
ted
ost of a su
essful query is
X X
4 pj QS (j ) 4 pj QS (1=pj ):
j j
This bound relates quite ni
ely withPthe bounds derived in Chapter 3.4 on weighted
trees. There we derived a bound of i pi log 1=pi +1 on the expe
ted sear
h time in
weighted trees. Thus the bound derived by weighting is about four times the entire
truth. The bound of 4 log i derived now on individual sear
hes
an sometimes be
onsiderably better than the bound of log 1=pi derived in 3.4 (
f. Exer
ise 3).
Binary sear
h is not the only method for sear
hing sorted arrays. If the keys
are drawn from a uniform distribution then interpolation sear
h is a method with
O(log log n) expe
ted query time. If the weights of keys are independent of key
values then every blo
k Bi is a random sample drawn from a uniform distribution
and hen
e the expe
ted time of a su
essful sear
h is
X X
O( pi log log i) = O( pi log log 1=pi )
i i
Here log w(S )=w(y) is the
ost of sear
hing for y in tree T and log f (h) + U (S (h))
is the
ost of inserting and deleting an element from a stru
ture of size h and
updating the balan
ed tree whi
h holds the weights. Observing U (n) log n for
all n, i df 1 (w(S )=w(y))e and UD (df 1 (w(S )=w(y))e) UD (w(S )=w(y))
Version: 19.10.99 Time: 11:39 {15{
16
log w(S )=w(y) this bound simplies to
X
O UD (f (h)) :
0hdf 1( w(S )=w(y))e
The algorithm for Demote (y; a) is
ompletely symmetri
. The details are left for
the reader (Exer
ise 5). We obtain exa
tly the same running time as for Promote ,
ex
ept for the fa
t that w(y) has to be repla
ed by the new weight w(y) a.
Theorem 8. A stati
data stru
ture with query time QS , prepro
essing time PS ,
and weak deletion time DS for a monotone deletion de
omposable sear
hing problem
an be extended to a dynami
weighted data stru
ture W su
h that:
a) A query in weighted set S (with weight fun
tion w : S ! N) with witness
y 2 S takes time O(QD (w(S )=w(y))). Here QD (n) = QS (n) log n.
b) Promote (y; a) takes time
X
O UD (f (h)) :
hdf
0 1( w(S )=w(y))e
f (n) = 2 . A query for y with weight w(y) takes time O((log w(S )=w(y) ), the
square of the sear
h time in weighted dynami
trees. Also Promote (y; a) takes
time UD (f (df 1 (w(S )=w(y))e)) = O(log w(S )=w(y)) and Demote (y; a) takes time
O(log w(S )=w(y) a)). This is the same order as in weighted dynami
trees. Of
ourse, weighted dynami
trees are part of the data stru
ture W
onsidered here.
Again (
f. Theorem 5) this is not a serious obje
tion. Sin
e Member is the query
onsidered here we
an repla
e the use of weighted dynami
trees by a use of the
data stru
ture itself. This will square the time bounds for Promote and Demote .
Also binary sear
h in sorted arrays is not a very important appli
ation of weighted
dynamization. In the important appli
ations in this
hapter the use of a weighted,
dynami
di
tionary is negligible with respe
t to the
omplexity of the data stru
ture
itself.
Dynamization and weighting are powerful te
hniques. They provide reasonably
eÆ
ient dynami
weighted data stru
tures very qui
kly whi
h
an then be used as
Version: 19.10.99 Time: 11:39 {16{
7.1.3. Order De
omposable Problems 17
a referen
e point for more spe
ial developments. Tuning to the spe
ial
ase under
onsideration is always ne
essary, as weighting and dynamization tend to produ
e
somewhat
lumsy solutions if applied blindly.
Proof : The proof is a straightforward appli
ation of divide and
onquer. We rst
sort S in time Sort (jS j) and store S in sorted order in an array. This will allow
us to split S in
onstant time. Next we either
ompute P (S ) dire
tly in
onstant
time if jS j = 1 or we split S into sets A and B of size bn=2
and dn=2e respe
tively
in
onstant time (if S = fa1 < a2 < < an g then A = fa1 ; : : : ; abn=2
g and
b = fabn=2+1
; : : : ; an g,
ompute P (A) and P (B ) in time T (bn=2
) and T (dn=2e)
respe
tively by applying the algorithm re
ursively, and then
ompute P (S ) =
t(P (A); P (B )) in time C (n). Hen
e T (n) = T (bn=2
) + T (dn=2e) + O(1) + C (n) =
T (bn=2
) + T (dn=2e) + O(C (n)).
Re
urren
e T (n) = T (bn=2
)+T (dn=2e)+C (n) is easily solved for most C (
f. 2.1.3).
In parti
ular, T (n) = O(n) if C (n) = O(n ) for some < 1, T (n) = O(C (n)) if
C (n) = (n1+ ) for some > 0, and T (n) = O(C (n) (log n)k+1 ) if C (n) =
(n (log n)k ) for some k 0.
The proof of Theorem 9 re
e
ts the
lose relation between order de
omposable
problems and divide and
onquer. A non-re
ursive view of divide and
onquer is to
take any binary tree with jS j leaves, to write the elements of S into the leaves (in
sorted order), to solve the basi
problems in the leaves and then to use operator t
to
ompute P for larger subsets of S . What tree should we use? A
omplete binary
tree will give us the most eÆ
ient algorithm, but any reasonably weight-balan
ed
tree will not be mu
h worse. If we want to support insertions and deletions this is
exa
tly what we should do. So let D be a BB[℄-tree with jS j leaves for some .
(Exer
ise 9 shows that we
annot obtain the same eÆ
ien
y by using (a; b)-trees,
or AVL-trees, or : : : ). We store the elements of S in sorted order (a
ording to <)
in the leaves of D and use D as a sear
h tree for S . What should we store in the
internal nodes of D beside the sear
h tree information? A rst idea is to store
P (S (v)) in node v where S (v) is the set stored in the leaves below v. P (S (v)) is
easily
omputed bottom-up starting at the leaves and working towards the root.
Version: 19.10.99 Time: 11:39 {18{
7.1.3. Order De
omposable Problems 19
Not quite, if v has sons, x; y and we
ompute P (S (v)) = t(P (S (x)); P (S (y))) then
appli
ation of t will in general destroy (the representation of) P (S (x)) and P (S (y)).
Making a
opy of P (S (x)) and P (S (y)) before applying t might
ost a lot more
than C (jS (v)j) and is therefore ex
luded. A dierent strategy is
alled for.
We store P (S (r)) only in the root r. In internal nodes v 6= r we store two things.
First, the sequen
e a(v) of a
tions exe
uted to
ompute t applied to P (S (x)) and
P (S (y)). This sequen
e has length O(C (n)). Se
ond, the pie
e P (S (v)) whi
h is
left over from P (S (v)) when P (S (father (v))) is
omputed by applying t to P (S (v))
and P (S (brother (v)). We
all tree D augmented by this additional information an
augmented tree.
Lemma 6. An augmented tree D for set S has spa
e requirement T (jS j) and
an
be
onstru
ted in time Sort (jS j) + T (jS j) where
T (n) = max [T ( n) + T ((1 ) n) + O(C (n))℄:
1
Proof : The re
ursion for T (n) follows from the fa
t that jS (x)j=jS (v)j 1
for any node v with sons x, y in a BB[℄-tree D. The spa
e bound follows sin
e at
most t storage
ells
an be used in t time units for any t.
The remark following Theorem 9 also applies to Lemma 6. In parti
ular, T (n) =
O(n) if C (n) = O(n ) for some < 1, and : : : . The spa
e bound stated in Lemma 6
is usually overlay pessimisti
. One does not use a new storage
ell every time unit
in general.
We will next des
ribe how to insert into and delete from an augmented tree.
We des
ribe insertion in detail and leave deletion for the reader, deletion being very
similar to insertion. Let a be a new point whi
h we want to insert in S . Let D be
an augmented tree for S . We rst use D as a sear
h tree. This will outline a path p
down tree D. Let p = v0 ; v1 ; : : : ; vk with v0 being the root. We walk down this path
and re
onstru
t the P (S (vi ))'s as we walk down. More pre
isely, we start in root v0
with P (S (v0 )) in our hands and use the sequen
es of a
tions a(v0 ) stored in v0 and
the leftover pie
es P (S (v1 )) and P (S (brother (v1 ))) stored in v1 and its brother
to re
onstru
t P (S (v1 )) and P (S (brother (v1 ))) by running a(v0 ) ba
kwards. This
will take time O(C (jS (v0 )j)). Next we repeat this pro
ess with v1 ; : : : ; vk . At the
end we have re
onstru
ted P (S (brother (vi ))), 1 i k, and P (S (vk )).
Lemma 7. Let D be an augmented tree for S , jS j = n and let p = v0 ; : : : ; vk be a
path from the root v0 to a leaf. Then P (S (brother (vi ))), 1 i k, and P (S (vk ))
an be re
onstru
ted in time O(C (n) log n).
Proof : The algorithm outlined above has running time
X X X
C (jS (vi )j) C (n (1 )i ) C (n) = O(C (n) log n)
i i i
sin
e the depth of the tree is O(log n) and jS (vi )j n (1 )i .
Version: 19.10.99 Time: 11:39 {19{
20
P P
If C (n) = (n ) for some > 0 then i C (n (1 )i ) = n i (1 )i = O(n ) =
O(C (n)). In (a; b)-trees this improved
laim is not true in general, i.e., there are
(a; b)-trees where re
onstru
tion along a path has
ost O(n log n) if C (n) = (n )
(
f. Exer
ise 9).
The remainder of the insertion algorithm is now almost routine. We insert the
new point a, walk ba
k to the root and merge the P 's as we go along. More pre
isely,
we rst
ompute P (a), then merge it with P (S (vk )), then with P (S (brother (vk ))),
: : : . The time bound derived in Lemma 2 applies again ex
ept that we forgot about
rotations and double rotations.
vi
vi+1 Rotation
D1 D3
D2 D3 D1 D2
Figure 2.
Suppose that we have to rotate at node vi and assume that vi+1 is the root of
subtree D1 . As we walk ba
k to the root we have already
omputed P (S (vi+1 )).
Also P (S (brother (vi+1 ))) is available from the top-down pass. We reverse the
on-
stru
tion at brother (vi+1 ) and thus
ompute P for the relevant nodes after the
rotation. Double rotations are treated similarly, the details are left to the reader.
Also it is obvious that the time bound derived in Lemma 2 does still apply, be
ause
rotations and double rotations at most require to extend the re
onstru
tion pro
ess
to a
onstant vi
inity of the path of sear
h. We summarize in:
Proof: By the dis
ussion above. The time bound follows from Lemma 2 and the
remark following it.
In the
onvex hull problem we have C (n) = O(log n). Thus we
an maintain
onvex
hulls under insertions and deletions with time bound O((log n)2 ) per update. More
examples of order de
omposable problems are dis
ussed in Exer
ises 10{20.
We start with d-dimensional trees and show that they support partial mat
h re-
trieval and orthogonal range querieswith rooti
sear
h time. However, they do not
do well for arbitrary polygon queries. A dis
ussion of why they fail for polygon
retrieval leads to polygon trees.
d-dimensional trees are a straightforward, yet powerful extension of one-dimen-
sional trees. At every level of a dd-tree we split the set a
ording to one of the
oordinates. Fairness demands that we use the dierent
oordinates with the same
frequen
y; this is most easily a
hieved if we go through the
oordinates in
y
li
order.
Denition: Let S U0 Ud 1 , jS j = n. A dd-tree for S (starting at
oordinate i) is dened as follows
1) If d = n = 1 then it
onsists of a single leaf labeled by the unique element
x 2 S.
2) If d > 1 or n > 1 then it
onsists of a root labeled by some element di 2 Ui
and three subtrees T< , T= and T> . Here T< is a dd-tree starting at
oordinate
(i + 1) mod d for set S< = fx 2 S ; x = (x0 ; : : : ; xd 1 ) and xi < di g, T>
is a dd-tree starting at
oordinate (i + 1) mod d for set S> = fx 2 S ; x =
(x0 ; : : : ; xd 1 ) and xi > di g and T= is a (d 1)-dimensional tree starting at
oordinate i mod (d 1) for set S= = f(x0 ; : : : ; xi 1 ; xi+1 ; : : : ; xd 1 ); x =
(x0 ; : : : ; xi 1 ; di ; xi+1 ; : : : ; xd 1 ) 2 S g.
Figure 3 shows a 2d-tree for set S = f(1; II); (1; III); (2; I); (2; III); (3; I); (3; II)g start-
ing at
oordinate 0. Here U0 = U1 = f1; 2; 3g. Arabi
and roman numerals are used
to distinguish
oordinates.
It is very helpful to visualize 2d-trees as subdivisions of the plane. The root
node splits the plane by verti
al line x0 = 2 into three parts: left halfplane, right
halfplane and the line itself. The left son of the root then splits the left halfplane
by horizontal line x1 = 2, : : : .
Version: 19.10.99 Time: 11:39 {22{
7.2.1. D-dimensional Trees and Polygon Trees 23
2
< = >
II II I
= > < > = >
(1; II) (1; III) (2; I) (2; III) (3; I) (3; II)
Figure 3.
(1; III)
(2; III)
x1 = II (3; II)
(1; II)
(2; I) x1 = I
(3; I)
x0 = 2
Figure 4.
The three sons of a node v in a dd-tree do not all have the same quality. The
root of T= (the son via the =-pointer) represents a set of one smaller dimension. In
general we will not be able to bound the size of this set. The roots of T< and T>
(the sons via the <-pointer and the >-pointer) represent sets of the same dimension
but generally smaller size. Thus every edge of a dd-tree redu
es the
omplexity of
the set represented: either in dimension or in size. In 1d-trees, i.e., ordinary sear
h
trees, only redu
tions in size are required.
It is
lear how to perform exa
t mat
h queriesin dd-trees. Start at the root,
ompare the sear
h key with the value stored in the node and follow the
orre
t
pointer. Running time is proportional to the height of the tree. Our rst task is
therefore to derive bounds on the height of dd-trees.
Denition:
a) Let T be a dd-tree and let v be a node of T . Then S (v) is the set of leaves in
the subtree with root v, d(v) is the depth of node v, and sd (v), the number of
<-pointers and >-pointers on the path from the root to v, is the strong depth
of v. Node x is a proper son of node v if it is a son via a <- or >-pointer.
b) A dd-tree is ideal if jS (x)j jS (v)j=2 for every node v and all proper sons x
of v.
Ideal dd-trees are a generalization of perfe
tly balan
ed 1d-trees.
Version: 19.10.99 Time: 11:39 {23{
24
Lemma 1. Let T be an ideal dd-tree for set S , jS j = n.
a) d(v) d + log n for every node v of T .
b) sd (v) log n for every node v of T .
Proof : a) follows from b) and the fa
t that at most d =-pointers
an be on the path
to any node v. Part b) is immediate from the denition of ideal tree.
Theorem 1. Let S U = U0 Ud 1 , jS j = n.
a) An exa
t mat
h query in an ideal dd-tree for S takes time O(d + log n).
b) An ideal dd-tree for S
an be
onstru
ted in time O(n (d + log n)).
Proof : a) Immediate from Lemma 1, a).
b) We des
ribe a pro
edure whi
h
onstru
ts ideal dd-trees in time O(n (d +log n)).
Let S0 = fx0 ; (x0 ; : : : ; xd 1 ) 2 S g be the multi-set of 0-th
oordinates of S . We
use the linear time median algorithm of 2.4 to nd the median d0 of S0 . d0 will
be the label of the root. then
learly jS< j jS j=2 and jS> j jS j=2 where S< =
fx 2 S ; x0 < d0 g and S> = fx 2 S ; x0 > d0 g. We use the same algorithm
re
ursively to
onstru
t dd-tree for S< and S> (starting at
oordinate 1) and a
(d 1)-dimensional tree for S= . This algorithm will
learly
onstru
t an ideal dd-
tree T for S . The bound on the running time
an be seen as follows. In every
node v of T we spend O(jS (v)j) steps to
ompute the median of a set of size jS (v)j.
Furthermore, S (v) \ S (w) = ; if v and w are nodes on the same depth and hen
e
X
jS (v)j n
d(v)=k
(n0:706 ) in weight-balan
ed dd-trees, = 1=4 (Exer
ise 14). Thus weight balan
ed
dd-trees are only useful for exa
t mat
h queries.
A se
ond problem is that weight-balan
ed dd-trees are hard to rebalan
e. Ro-
tations are of no use sin
e splitting is done with respe
t to dierent
oordinates on
dierent levels. Thus it is impossible to
hange the depth of a node as rotations do.
There is a way out. Suppose that we followed path p = v0 ; v1 ; : : : to insert point x.
Let i be minimal su
h that vi goes out of balan
e by the insertion. Then rebalan
e
the tree by repla
ing the subtree rooted at vi by an ideal tree for set S (vi ). This
ideal tree
an be
onstru
ted in time O(m (d + log m)) where m = jS (vi )j. Thus
rebalan
ing is apparently not as simple and
heap as in one-dimensional trees. The
worst
ase
ost for rebalan
ing after an insertion is
learly O(n (d + log n)) sin
e
we might have to rebuild the entire tree. However, amortized time bounds are
mu
h better as we will sket
h. We use te
hniques developed in 3.5.1 (in parti
ular
in the proof of Theorem 4). We showed there (Lemmas 2 and 3 in the proof of
Theorem 4), that the total number of rebalan
ing operations
aused by nodes v
with 1=(1 )i jS (v)j 1=(1 )i+1 during the rst n insertions (and dele-
tions) is O(n (1 )i ). A rebalan
ing operation
aused by su
h a node has
ost
O((1 ) i (d + 1)) in weight-balan
ed dd-trees. Hen
e the total
ost of restru
-
turing a weight-balan
ed dd-tree during a sequen
e of n insertions and deletions
is
X
O(n (1 )i (1 ) i (d + 1)) = O(n log n (d + log n)):
0iO(log n)
Thus the amortized
ost of an insertion or deletion is O(log n (d + log n)). The
details of this argument are left for Exer
ise 13.
Dynamization (
f. 7.1) also gives us dynami
dd-trees with O((d +log n) log n)
insertion and deletion time. Query time for exa
t mat
h queriesis O((d+log n)log n)
whi
h is not quite as good as for weight-balan
ed dd-trees. However, dynamization
has one major advantage. The time bounds for partial mat
h and orthogonal range
queries(Theorem 2, 3 and 4 below) stay true for dynami
dd-trees.
It is about time that we move to partial mat
h queries. Let R = [l0 ; h0 ℄
[ld 1 ; hd 1 ℄ with li = hi or li = 1, hi = +1 be a partial mat
h query. If
li = hi then the i-th
oordinate is
alled spe
ied. We use s to denote the number
of spe
ied
oordinates. The algorithm for partial mat
h queriesis an extension
of the exa
t mat
h algorithm. As always we start sear
hing in the root. Suppose
that the sear
h rea
hed node v. Suppose further that we split a
ording to the i-th
oordinate in v and that key di is stored in v. If the i-th
oordinate is spe
ied in
query R, then the sear
h pro
eeds to exa
tly one son of v, namely the son via the
<-pointer if li = hi < di , the son via the =-pointer if li = hi = di , and : : : . If the
i-th
oordinate is unspe
ied in query R then the sear
h pro
eeds to all three sons
Version: 19.10.99 Time: 11:39 {25{
26
of v. On
e we rea
h a leaf, we return it if it belongs to region R. The
orre
tness
of this algorithm heavily depends on set S . We treat a favourable spe
ial
ase rst:
invertible sets.
Denition: S U = U0 U1 Ud 1 is invertible if for all x = (x0 ; : : : ; xd 1 ) 2
S , y = (y0 ; : : : ; yd 1 ) 2 S : xi = yi for some i implies x = y.
A set is invertible if all proje
tion fun
tions are inje
tive when restri
ted to S .
Theorem 2. Let T be an ideal dd-tree for invertible set S U = U0 U1
Ud 1 . Then a partial mat
h query with s < d spe
ied
omponents takes time
O(d 2d s n1
s
d ):
Proof : Let T 0 be the subtree of T
onsisting of all nodes visited by the sear
h. It
suÆ
es to show that the number of nodes of T 0 is bounded by O(d 2d s n1 s=d ). A
node of T 0 is
alled bran
hing if it has a proper son and non-bran
hing otherwise.
Sin
e S is invertible all des
endants of non-bran
hing nodes are non-bran
hing.
Hen
e all bran
hing nodes
an be rea
hed by following <- and >-pointers only. A
bran
hing node of T 0 is a proper bran
hing node if it has two proper sons.
We
laim that there are at most 2d(log n)=de(d s) proper bran
hing nodes in T 0 .
This follows from the fa
t that at most d s out of any d
onse
utive nodes on any
path through T 0 are proper bran
hing nodes, be
ause only d s out of d
onse
utive
nodes split a
ording to unspe
ied
omponents. Also d(v) = sd (v) log n for all
bran
hing nodes. Hen
e there are at most d(log n)=de (d s) proper bran
hing
nodes on any path through T 0 and thus the bound follows. It remains to
ount the
improper bran
hing nodes and the non-bran
hing nodes in T 0 . Again
onsider any
path through T 0 . Then there
an be at most d
onse
utive nodes whi
h are not
proper bran
hing nodes and hen
e the total number of nodes of T 0 is
O(d 2d d e(d s) ) = O(d 2d s n d )
log n d s
= O(d 2d s n1
s
d ):
The behavior of the partial mat
h algorithm on general sets is harder to analyze.
Let us look at an example rst. Let Ui = R, 0 i < d, and let S = f0gk
f0; : : : ; m 1gd k for some m and k. Then jS j = md k . Consider rst partial
mat
h query R1 whi
h spe
ies the rst s = k
oordinates as being 0 and leaves the
remaining
oordinates unspe
ied. Then the answer to the query is the entire set S
and hen
e the running time of any algorithm must be at least linear. Consider next
partial mat
h query R2 whi
h spe
ies the rst s = k + 1
oordinates as being 0
and leaves the remaining
oordinates unspe
ied. Then the query is \equivalent"
to a partial mat
h query in a d k = d s + 1 dimensional set with one spe
ied
oordinate. In view of Theorem 2 we therefore
annot hope to do better than
O(n1 1=(d s+1) ) time units. This is indeed the bound.
Version: 19.10.99 Time: 11:39 {26{
7.2.1. D-dimensional Trees and Polygon Trees 27
Theorem 3. Let T be an ideal dd-tree for S , jS j = n. Then a partial mat
h query
with s spe
ied
omponents takes time
Here A is the set of answers to the query and f (d; d s) is some fun
tion in
reasing
in both arguments. f is independent of T and S .
Proof : Let T 0 be the subtree of T
onsisting of all nodes visited in the sear
h. We
split the set of nodes of T 0 into three
lasses whi
h we
ount separately. A node is
a tertiary node (belongs to the third
lass) if all des
endants of v belong to A, i.e.,
if S (v) A. The number of tertiary nodes is
learly bounded by (d + 1) jAj. A
non-tertiary node is a primary node if it is rea
hable without using an =-pointer.
All other nodes of T 0 are se
ondary nodes. We will show that the number of primary
and se
ondary nodes is bounded by
f (d; d s) nmax( 2 ;1
1 1
d s+1 )
for some suitable fun
tion f . The proof is by indu
tion on d s and for xed d s
by indu
tion on s and n.
If d = s then partial mat
h is equivalent to exa
t mat
h and the
laim follows
from Theorem 1, a). So let us assume d > s. If s = 0 then all the nodes are tertiary
and the
laim is trivial. This leaves the
ase d > s 1. If n is small then the
laim
is
ertainly true by suitable
hoi
e of f (d; d s).
The primary nodes are easy to
ount. We have shown in the proof of Theorem 2
that their number is O(d 2d s n1 s=d ). It remains to
ount the se
ondary nodes.
We group the se
ondary nodes into maximal subtrees. If v is the root of su
h
a subtree then v is rea
hed via an =-pointer and there is no other =-pointer on the
path to v. Thus sd (v) = d(v) 1 and jS (v)j n=2sd(v) 2 n=2d(v) . Also there
an be at most 2dj=de(d s) su
h nodes v with d(v) = j . This follows from the fa
t
that all nodes on the path to v are primary nodes and hen
e at most dj=de (d s)
of these nodes
an be proper bran
hing nodes;
f. the proof of Theorem 2.
In the subtree with root v we have to
ompute a partial mat
h query on a
(d 1)-dimensional set with s0 spe
ied
omponents. Here s0 = s or s0 = s 1.
Also, s0 1. Note that v and all its des
endants are tertiary nodes if s0 = 0. By
indu
tion hypothesis there are at most
f (d 1; d 1 s0 ) mmax( 2 ;1 s0 +1 )
1 1
d 1
non-tertiary nodes visited in the subtree with root v where m = jS (v)j. For the
reminder of the argument we have to distinguish two
ases, s = 1 and s 2.
Case 1 : s 2.
Sin
e d 1 s0 d s, f is in
reasing, and d s 1 we
on
lude that the number
of non-tertiary nodes below v is bounded by f (d 1; d s) m1 1=(d s+1) . We nish
Version: 19.10.99 Time: 11:39 {27{
28
the proof by summing this bound for all roots of maximal subtrees of se
ondary
nodes. Let RT be the set of su
h roots. Then
X
f (d 1; d s) jS (v)j1
1
d s+1
v2RT
X
f (d 1; d s) 2dj=de(d s) (2n=2j )1
1
d s+1
j 1
X
(2n) 1 1
d s+1 f (d 1; d s) 2 d ( s) [2(d s)=d = d s+1) ℄j
1+1 (
j 1
(f (d; d s) d 2d s ) n 1 d s+1
1
for suitable
hoi
e of f (d; d s). Note that (d s)=d 1 + 1=(d s + 1) = s=d +
1=(d s + 1) < 0 for 2 s d. Adding the bound for the number of primary nodes
proves the theorem.
2 d s)
(
2(dlog ne k)=d (d + 1 + k)
k= d
where we used the substitution k = dlog ne j
dlog
Xne
2 d s +1
n
1=d
(d + 1 + k)=2k=d
k= d
(f (2; 1) d 2d s ) nmax( 2 ;1
1 1
d s+1 )
by suitable
hoi
e of f (2; 1); re
all that d = 2 and s = 1. Adding the bound for the
number of primary nodes proves the theorem.
It remains to
onsider the
ase d 3. We infer from the indu
tion hypothesis that
the number of non-tertiary nodes below v 2 RT is bounded by f (d 1; d 2)
Version: 19.10.99 Time: 11:39 {28{
7.2.1. D-dimensional Trees and Polygon Trees 29
jS (v)j1 1 ( =d 1)
in this
ase. Summing this bound for all v 2 RT we obtain
X
f (d 1; d 2) jS (v)j1
1
d 1
f 2RT
d+Xdlog ne
f (d 1; d 2) 2dj=de(d (2n=2j ) 1 1
1) d 1
j =1
dlog ne
d+X
(2n) 1 d
1
1 f (d 1; d 2) 2d 1
[2
d 1 1
d +d 1 1
℄j
j =1
(2n) 1 d
1
1 f (d 1; d 2)
(2n) d (
1
1) d
where
is a
onstant depending on d
(f (d; d 1) d 2d s ) n1 d1
by suitable
hoi
e of f (d; d 1). Adding the bound for the number of primary nodes
proves the theorem.
Theorem 3 shows that d-dimensional trees support partial mat
h querieswith p rooti
running time. In parti
ular if d = 2 and s = 1 then the running time is O( n + jAj)
even in the
ase of general sets. We will see in Se
tion 7.4.1 that this
annot be
improved without in
reasing storage. However it is trivial to improve upon this
result by using O(d! n) storage.
Let S U = U0 Ud 1 . For any of the d! possible orderings of the
attributes build a sear
h tree as follows: Order S lexi
ographi
ally and build a
standard one-dimensional sear
h tree for S . A partial mat
h query with s spe
i-
ed
omponents is then easily answered in time O(d log n + jAj). Assume w.l.o.g.
that the rst s attributes are spe
ied, i.e., R = [l0 ; h0 ℄ [ld 1 ; hd 1 ℄ with
li = hi for 0 i < s and li = 1, hi = +1 for s i < d. Sear
h for
key (l0 ; : : : ; ls 1 ; 1; : : : ; 1) in tree T
orresponding to the natural order of at-
tributes. This takes time O(d log n). The answer to the query will then
onsist of
the next jAj leaves of T in in
reasing order. Thus logarithmi
sear
h time
an be
obtained at the expense of in
reased storage requirement. For small d, say d = 2,
this approa
h is feasible and in fa
t we use it daily. After all, there is a German-
English and an English-German di
tionary and no one ever
omplained about the
redundan
y in storage.
Another remark about Theorem 3 is also in order at this pla
e. The running
time stated in Theorem 3 is for the enumerative version of partial mat
h retrieval:
\Enumerate all points in S \ R". A simpler version is to
ount only jS \ Rj. If we
store in every node v of a dd-tree the
ardinality jS (v)j of S (v) then the
ounting
version of partial mat
h retrieval has running time O(f (d; d s) nmax( 2 ;1 d s+1 ) );
1 1
O(d 4d n1 + d jAj)
1
d
R1
R3
R2
R4
Figure 5.
jS (v)j n=2k . Also there are at most 2 log S (x) des
endants w of x su
h that
Reg (w) interse
ts L but is not
ontained in L. This
an be seen as follows.
The tree with root s is a one-dimensional sear
h tree for a set of nodes whi
h
lie either on a horizontal or a verti
al line. If they lie on a horizontal line then the
sear
h below x follows exa
tly one path down the tree. If they lie on a verti
al
line (whi
h then must be the line x = l0 ) then Reg (w) interse
ts L but is not
ontained in L i either l1 2 Reg (w) or h1 2 Reg (w). The set of nodes w with
l1 2 Reg (w) (h1 2 Reg (w)) form a path in the tree with root x. Thus there are
at most 2 log jS (x)j des
endants w of x su
h that Reg (w) interse
ts L but is not
ontained in L.
Putting everything together we have shown that the number of nodes v in T
su
h that Reg (v) interse
ts L but is not
ontained in L is at most
X p
2 2k=2 2 log n=2k = 4 n
X
2(k log n)=2 (log n k)
0klog n 0klog n
p
= O( n):
This proves the
laim and the theorem.
Reg (v )
R4 R3
L1 R2 R1
L2
Figure 6.
Then any straight line
an interse
t at most three of the four regions R1 , R2 , R3 ,
R4 plus a number of the \one-dimensional" regions dened by the lines themselves.
With the notation of the
laim in the proof of Theorem 4 we would obtain Pk+1
3 Pk and hen
e
ould hope for a sear
h time of 3log n= log 4 = nlog 3= log 4 n0:8 . Note
that the depth of the tree will be log n= log 4 be
ause we divide into four pie
es in
every step. However, we have to be
areful. The arrangement above is only
orre
t
if the depth of the tree is indeed log n= log 4, i.e., if the tree is ideal. Thus lines L1
and L2 above have to be
hosen su
h that jRi \ S (v)j djS (v)j=4e for 1 i 4.
The following lemma shows that this is always possible.
Lemma 2. Let S R2 , jS j = n and let n1 ; n2 ; n3 ; n4 be su
h that n1 + n2 + n3 +
n4 n. If L1 is a line su
h that n1 + n2 points of S are on one side of L1 and n3 + n4
points of S are on the other side of L1 then there is a line L2 su
h that the four
open regions R1 ; R2 ; R3 and R4 dened by L1 and L2
ontain at most n1 ; n2 ; n3 ; n4
points of S respe
tively. Also L2
an be
omputed in time O(n2 ).
R1
R2 P R3
R4
L1 L2
Figure 7.
Proof : For any point P on L1 let f (P ) be the minimum angle between L1 and L2
su
h that regions R1 ; R2
ontain at most n1 ; n2 points respe
tively. Then f (P ) is
a
ontinuous fun
tion of P . Also limP ! 1 f (P ) = 0 and limP !+1 f (P ) = .
Similarly dene g(P ) be the minimum angle between lines L1 and L2 su
h that
regions R3 ; R4
ontain at most n3 and n4 points respe
tively. Then g(P ) is a
Version: 19.10.99 Time: 11:39 {32{
7.2.1. D-dimensional Trees and Polygon Trees 33
ontinuos fun
tion of P and limP ! 1 g(P ) = and limP !+1 g(P ) = 0. Hen
e
there is a point P su
h that f (P ) = g(P ). Then P and f (P ) dene line L2 with
the desired property. This shows the existen
e of line L2 . It also shows that line L2
an be assumed to go through two points of S . Thus there are only n2
andidates
for L2 .
L1
S1 Pi
S2
Figure 8.
Let K1 ; : : : ; Kk , k = n (n 1)=2 be the lines dened by all pairs of points
of S ordered a
ording to their interse
tion point with L1 . Let Pi be the point of
interse
tion of Ki and L1 . Consider any xed Pi . Express all points of S in polar
oordinates with respe
t to Pi and nd among the n1 + n2 points \above" L1 two
points whi
h dene the n1 -th and (n1 + 1)-th largest angle between line L1 and the
line dened by Pi and the point. This
an be done in time O(n1 + n2 ) by the linear
time sele
tion algorithm 2.4. In this way we have
omputed a se
tor S1 through
whi
h line L2 must go if it were to interse
t L1 in Pi . In a similar way we
ompute
se
tor S2 based on the points \below" L2 . If there is a line whi
h goes through
se
tors S1 and S2 the we are done and have found line L2 . If se
tors S1 and S2
do not have a line in
ommon (as it is the
ase in Figure 8) then we
an restri
t
the sear
h to one of the hal
ines dened by L1 and Pi . In Figure 8 this hal
ine is
shown bold. We summarize. In time O(n) we
an either determine that L2 goes
through Pi or ex
lude one of the hal
ines dened by L1 and Pi .
This suggests that we
an use binary sear
h to nd line L2 . We rst
ompute
in time O(n2 ) lines K1 ; : : : ; Kk and points P1 ; : : : ; Pk . Next we nd the median
point of P1 ; : : : ; Pk in time O(n2 ). Then we are either done or
an restri
t the
sear
h to k=2 points. This de
ision takes time O(n). Thus line L2
an be found
in O(log n2 ) iterations and the
ost of the i-th iteration is O(k=2i + n). Total
ost
is thus O(n2 ).
Lemma 2 and the pre
eding dis
ussion lead to:
Denition:
a) A 4-way polygon tree T for set S R2 , jS j = n is dened as follows: If set S
is
ollinear then T is an ordinary one-dimensional sear
h tree for S . If set S
is not
ollinear then T
onsists of a root r and six subtrees. There are two
lines L1 and L2 asso
iated with r and there is one subtree for ea
h of the six
sets S \ R1 , S \ R2 , S \ R3 , S \ R4 , S \ L1 , S \ L2 . Here R1 ; R2 ; R3 ; R4 are
the four open regions dened by lines L1 and L2 .
Version: 19.10.99 Time: 11:39 {33{
34
b) A 4-way polygon tree T is ideal if for every node v of T and son x of v: If S (v)
is
ollinear then jS (x)j djS (v)j=2e and if S (v) is not
ollinear and x is one of
the four sons
orresponding to regions R1 ; : : : ; R4 then jS (x)j djS (v)j=4e.
Theorem 5. Let S R2 , jS j = n.
a) An ideal 4-way polygon tree for set S
an be
onstru
ted in time O(n2 ).
b) If T is an ideal 4-way polygon tree for S and R is a polygonal region with s
sides then A = R \ S
an be
omputed in time O(s nlog 3= log 4 + jAj).
Proof : a) If S is
ollinear then an ideal tree
an be
onstru
ted in time O(n log n),
the time required to sort S . If S is not
ollinear then lines L1 ; L2 dividing the plane
into four open regions
ontaining at most dn=4e points of S ea
h
an be
omputed
in time O(n2 ) by Lemma 2. Hen
e T (n), the time required to build a 4-way polygon
tree for n points, satises the re
urren
e
T (n) O(n2 + n log n) + 4 T (dn=4e):
Thus T (n) = O(n2 ) by Theorem 2.1.3.4.
b) Let R be a polygonal region with s sides. We triangulate R (
f. Se
tion 8.4.2) and
ompute R0 \ S separately for ea
h of the s 1 triangles R0 in the triangulation. It
therefore suÆ
es to show that A0 = R0 \ S
an be
omputed in time O(nlog 3= log 4 +
jA0 j) for a triangle R0. This shows that we may assume w.l.o.g. that R is a triangle.
We des
ribe the sear
h algorithm next. The sear
h rea
hes only nodes v of
the polygon tree T 0 with Reg (v) \ R 6= ;. Let us assume indu
tively that when
the sear
h rea
hes node v we have determined Reg (v) \ ei , 1 i 3, for ea
h of
the three sides of triangle R. Note that Reg (v) is
onvex and hen
e Reg (v) \ ei is
a line segment. Also note that Reg (v) R i Reg (v) \ ei = ; for 1 i 3 or
Reg (v ) ei for some i (re
all that we assume Reg (v ) \ R 6= ;). If Reg (v ) R
then the sear
h pro
eeds to all six (two, if Reg (v) is one-dimensional) sons of v and
learly Reg (w) R for all sons w of v.
The
ase Reg (v) 6 R is slightly more
ompli
ated. Let w be a son of v. Then
Reg (w ) = Reg (v ) \ C where C is either a line or a
one-shaped region, as indi
ated
in Figure 8. Then ei \ Reg (w) = (ei \ Reg (v)) \ (ei \ C ) and hen
e ei \ Reg (w)
is readily
omputed for 1 i 3. If ei \ Reg (w) 6= ; for some i then
ertainly
R \ Reg (w) 6= ; and hen
e the sear
h pro
eeds to node w. If ei \ Reg (w) = ; for all i
then the sear
h pro
eeds to node w i
2 R where
= L1 \ L2 is the interse
tion
of the two lines whi
h are asso
iated with node v. Note that Reg (w) R if
2 R
and that Reg (w) \ R = ; if
2= R.
It remains to estimate the
omplexity of this algorithm. Let T 0 be the subtree
of all nodes visited in the sear
h. It suÆ
es to bound the number of nodes of T 0 .
If v 2 T 0 then Reg (v) \ R 6= ; and hen
e either Reg (v) R or Reg (v) \ R 6= ;
and Reg (v) R 6= ;. In the former
ase we have S (v) A and hen
e the number
of nodes with Reg (v) R is O(jAj). In the latter
ase there must be an edge e of
Version: 19.10.99 Time: 11:39 {34{
7.2.1. D-dimensional Trees and Polygon Trees 35
a possibility
for C
L1
Reg (v)
L2
Figure 9.
region R su
h that Reg (v) \ e 6= ; but Reg (v) is not
ontained in e. It therefore
suÆ
es to bound t where t is the maximal number of nodes v su
h that Reg (v)
interse
ts but is not
ontained in any xed line segment L.
Claim: t O(nlog 3= log 4 ).
Proof : Let L be any line segment. Let Pk be the number of primary nodes v , i.e.,
Reg (v ) is not a line segment, of depth k su
h that Reg (v ) \ L 6= ;. Then P1 = 1 and
Pk+1 3 Pk sin
e L
an interse
t at most 3 of the four open regions asso
iated
with the sons of any primary node. Thus Pk 3k .
Let v be a primary node of depth k. Then v has two sons x and y whi
h are
not primary nodes. We have S (x) [ S (y) S (v) and jS (v)j dn=4k e sin
e T is an
ideal 4-way tree. The argument used in the proof of Theorem 4 shows that there
are at most 2 log S (x) des
endants w of x su
h that Reg (w) interse
ts L but is
not
ontained in L. The analogous
laim holds true for y. Putting both bounds
together we
on
lude that
X
t 4 Pk logdn=4k e
0klog n= log 4
X
3 log n= log 4 8 3k log n= log 4 (log n 2 k)
klog n= log 4
0
D-dimensional trees support orthogonal range queries with linear spa
e O(n) and
rooti
time O(n1 1=d ). Range trees will allow us to trade spa
e for time. More
spe
i
ally, we
an obtain polylogarithmi
query time at the expense of non-linear
storage or rooti
query time O(nd and spa
e O((1=)d n) for any > 0. Also
range trees support insertions and deletions in a natural way.
Orthogonal range queries in one-dimensional spa
e are parti
ularly simple. If
S U0 then any ordinary balan
ed tree will do. We
an
ompute S \ [l0 ; h0 ℄ by
running down two paths in the tree (the sear
h path a
ording to l0 and the sear
h
path a
ording to h0 ) and then listing all leaves between those paths. The query
time is O(log n + jAj) and spa
e requirement is O(n). The
ounting version, i.e.,
to
ompute jS \ [l0 ; h0 ℄j, only takes time O(log n) if we store in every node the
number of leaf des
endants. This
omes from the fa
t that we have to add up
at most O(log n)
ounts to get the nal answer, namely the
ounts of all nodes
whi
h are sons of a node on one of the two paths and whi
h lie between the two
paths. It is very helpful at this point to interpret sear
h trees geometri
ally. We
an view a sear
h tree as a hierar
hi
al de
omposition of S into intervals, namely
sets Reg (v) \ S . The de
omposition pro
ess is balan
ed, i.e., we try to split set S
evenly at every step, and it is
ontinued to the level of singleton sets. The important
fa
t is that for every
on
eivable interval [l0 ; h0 ℄ we
an de
ompose S \ [l0 ; h0 ℄ into
only O(log n) pie
es from the de
omposition. Hen
e the O(log n) query time for
ounting S \ [l0 ; h0 ℄.
This idea readily generalizes into two-dimensional (and d-dimensional spa
e).
Let S U0 U1 . We rst proje
t S onto U0 and build a balan
ed de
omposition
of the proje
tion as des
ribed above. Suppose now that we have to
ompute S \
([l0 ; h0 ℄ [l1 ; h1 ℄). We
an rst de
ompose [l0 ; h0 ℄ into O(log n) intervals. For ea
h
of these intervals we only have to solve a one-dimensional problem. This we
an
do eÆ
iently if we also have data stru
tures for al these one-dimensional problems
around. Ea
h one-dimensional problem will
ost O(log n) steps and so total run
time is O((log n)2 ). However, spa
e requirement goes up to O(n log n) be
ause
every point has to be stored in log n data stru
tures for one-dimensional problems.
The details are as follows.
p
Denition: Let m 2 N and let 2 (1=4; 1 2=s). m is a sla
k parameter and
is a weight-balan
ing parameter. A d-fold range tree for multiset S U0 U1
Ud 1, jS j = n is dened as follows. If d = 1 then T is any BB[℄-tree for S .
If d > 1 then T
onsists of a BB[℄-tree T0 for p0 (S ). T0 is
alled the primary tree.
Furthermore, for every node v of T0 with depth (v) 2 m there is an auxiliary
Version: 19.10.99 Time: 11:39 {36{
7.2.2. Range Trees and Multidimensional Divide and Conquer 37
tree Ta (v). Ta (v) is a (d 1)-fold tree for set p(S (v); f1; : : : ; d 1g). Here S (v) is
the set of x = (x0 ; : : : ; xd 1 ) 2 S su
h that leaf x0 is des
endant of v in T0 .
The pre
ise denition of range trees diers in two respe
ts from the informal dis-
ussion. First, we do not insist on perfe
t balan
e. This will slightly degrade query
time but will allow us to support insertions and deletions dire
tly. Also we intro-
du
e sla
k parameter m whi
h we
an use to
ontrol spa
e requirement and query
time.
Lemma 3. Let Sm (d; n) be the spa
e requirement of a d-fold tree with sla
k pa-
rameter m for a set of n elements. Then Sm (d; n) = O(n (
log n=m)d 1 ) where
= 1= log(1=(1 )).
Proof : Note rst that the depth of a BB[℄-tree with n leaves is at most
log n.
Thus every point x 2 S is stored in the primary tree, in at most
log n=m primary
trees of auxiliary trees, in at most (
log n=m)2 primary trees of auxiliary-auxiliary
trees, : : : . Thus the total number of nodes (
ounting dupli
ates) stored in all trees
and hen
e spa
e requirement is
X
O (n ((
log n)=m)i ) = O(n((
log n)=m)d 1 ):
nid 1
We will use two examples to illustrate the results about range trees: m = 1 and
m = log n for some > 0. If m = 1 then Sm (d; n) = O(n (
log n)d 1 ) and if
m = log n then Sm = O((
=)d 1 n).
Lemma 4. Ideal d-fold range trees, i.e., jS (x)j dS (v)=2e for all nodes v (primary
or otherwise) and sons x of v,
an be
onstru
ted in time O(d n log n + n
((log n)=m)d 1 ). Here m is the sla
k parameter.
Proof : We start by sorting S d-times, on
e a
ording to the 0-th
oordinate, on
e
a
ording to the rst
oordinate, : : : . This will take time O(d n log n). Let Tm (d; n)
be the time required to build an ideal d-fold tree for a set of n elements if S is sorted
a
ording to every
oordinate. We will show that Tm (d; n) = O(n ((log n)=m)d 1 ).
This is
learly true for d = 1 sin
e O(n) time suÆ
es to build an ideal BB[℄-tree
from a sorted list. For d > 1 we
onstru
t the primary tree in time O(n) and we have
to
onstru
t auxiliary trees of sizes n1 ; : : : ; nt . We have n1 + + nt n (log n)=m
sin
e every point is stored in (log n)=m auxiliary trees. Note that the primary tree
has depth log n sin
e it is ideal. Hen
e
X
Tm (d; n) = O(n) + Tm (d 1; ni )
i
X
= O(n) + O( ni (log n=m)d 2 )
i
= O(n (log n=m)d 1 ):
If m = 1 then ideal d-fold trees
an be
onstru
ted in time O(n (log n)max(1;d 1)
)
and ifm = log n they
an be
onstru
ted in time O(d n log n).
Version: 19.10.99 Time: 11:39 {37{
38
Lemma 5. Let Qm (d; n) be the time required to answer a range query in a d-fold
tree for a set of n elements. Then Qm (d; n) = O(log n (
(2m =m) log n)d 1 + jAj).
Here
and m are as in Lemma 3.
Proof : The
laim is obvious for d = 1. So let d > 1 and let R = [l0 ; h0 ℄
[ld 1 ; hd 1 ℄ be an orthogonal range query. We sear
h for l0 and h0 in the primary
tree T0 . his will dene two paths of length at most
log n in T0 . Consider one
of these paths. There are at most
log n nodes v su
h that v is a son of one of
the nodes on the paths and v is between two paths. Every su
h node represents
a subset of points of S whose 0-th
oordinate is
ontained in [l0 ; h0 ℄. We have to
solve (d 1)-dimensional problems on these subsets. Let v be any su
h node and
let v1 ; : : : ; vt be the
losest des
endants of v su
h that m divides depth (vi ). Then
t 2m 1 and auxiliary trees exist for all vi 's. Also we
an
ompute S \ R by
forming the union of S (vi ) \ ([l1 ; h1 ℄ [ld 1 ; hd 1 ℄) over all vi 's. Sin
e the
number of vi 's is bounded by 2
((log n)=m) 2m 1 we have:
Qm (d; n)
(2m =m) log n Qm (d 1; n) + jAj:
This proves Lemma 5.
If m = 1 then Qm (d; n) = O(log n (2
log n)d 1 + jAj) and if m = log n then
Qm (d; n) = O(log n (
=)d 1 nd ).
We
lose our dis
ussion of range trees by dis
ussing insertion and deletion al-
gorithms. We will show that the amortized
ost of an insertion or deletion is poly-
logarithmi
. Suppose that point x = (x0 ; x1 ; ; xd 1 ) has to be inserted (deleted).
We sear
h for x0 in the primary tree and insert or delete it whatever is appropriate.
This has
ost O(log n). Furthermore, we have to insert x into (delete x from) at
most (
log n)=m auxiliary trees, ((
log n)=m)2 auxiliary-auxiliary trees, : : : . Thus
the total
ost of an insertion or deletion is O(log n (
log n=m)d 1 ) not
ounting
the
ost for rebalan
ing. Rebalan
ing is done as follows. For every (primary or
auxiliary or auxiliary-auxiliary or : : : ) tree into whi
h x is insert (from whi
h x is
deleted) we nd a node v of minimal depth whi
h goes out of balan
e. We repla
e
the subtree rooted at v by an ideal d0 -fold tree for the set S (v) of des
endants of v.
Here d0 = d if v is a node of the primary tree of an auxiliary tree, : :0: ; d d0 is
alled
the level of node v. This will take time O(d0 q log q + q (log q=m)d 1 ) by Lemma 4
where q = jS (v)j. Rebalan
ing on the last level (d0 = 1) is done dierently. On
level 1 we use the standard algorithm for rebalan
ing BB[℄-trees.
Worst
ase insertion/deletion
ost is now easily
omputed. It is O(d2 n
log n + n (log n=m)d 1 ), essentially the
ost of
onstru
ting a new d-fold tree
from s
rat
h. Amortized insertion/deletion
ost is mu
h smaller as we demonstrate
next. We use Theorem III.5.1.4 to obtain a polylogarithmi
bound on amortized
insertion/deletion
ost.
Note rst that a point x is inserted into (deleted from) at most ((
log n)=m)d 1
trees of level 1 for a (worst
ase)
ost of O(log n) ea
h. Thus total rebalan
ing
ost
on level 1 is O(log n (
log n=m)d 1 ).
Version: 19.10.99 Time: 11:39 {38{
7.2.2. Range Trees and Multidimensional Divide and Conquer 39
We next
onsider levels l, 2 l led. We showed (Lemmas 2 and 3 in the
proof of Theorem III.5.1.4) that the total number of rebalan
ing operations
aused
by nodes v at level l with 1=(1 )i jS (v)j 1=(1 )i+1 during the rst
n insertions/deletions is O(T Ai;l (1 )i ), where T Ai;l is the total number of
transa
tions whi
h go through nodes v at level l with 1=(1 )i jS (v)j 1=(1
)i+1 ; here 0 i
log n. The
ost of a rebalan
ing operation
aused by
su
h node v is O(l (1 ) (i+1) (i + 1) + ((i + 1)=m)l 1 ) by Lemma 2. Also
T Ai;l n ((
log n)=m)d l by a simple indu
tion on l starting with l = d. Thus
total rebalan
ing
ost at levels l 2 is at most
X X
n ((
log n)d l (1 )i l (1 ) (i+1) (i + 1 + ((i + 1)=m)l 1 )
2 ld 0i
log n
X
= O( n ((
log n)d l l ((
log n)2 + (
log n=m)l (m=l))
ld
2
Theorem 6.pd-fold range trees with sla
k parameter m 1 and balan
e parameter
2 (1=4; 1 2=2) for a set of n elements take spa
e O(n((
log n)=m)d 1 ), support
orthogonal range queries with time bound O(log n (
(2m =m) log n)d 1 + jAj),
and have amortized insertion/deletion
ost O((m2 + m d) ((
log n)=m)d ). Here
= 1= log(1=(1 )). In parti
ular, we have:
If S is (;
)-sparse then jNN (S )j
n, i.e., the size of the output is at most
linear. We apply the paradigm of multidimensional divide and
onquer to solve the
xed radius near neighbors problem.
If d = 1 then a simple method will do. Sort set S in time O(n log n) and
then make one linear s
an through (the sorted version of) S . For every point x 2 S
look at the
pre
eding points in the linear order and nd out whi
h of them have
distan
e at most from x. In this way, we
an produ
e NN (S ) in time O(
n) from
the sorted list. Altogether we have an O(n log n +
n) algorithm in one-dimensional
spa
e.
if d 2 then we proje
t S onto the 0-th
oordinate and nd the median
of the multiset p0 (S ) of proje
ted points. Let that median be m. We split S
into two sets A and B of n=2 points ea
h, namely A
ontains only points x 2 S
with x0 m and B
ontains only points x 2 S with x0 m. We apply the
algorithm re
ursively to d-dimensional point sets A and B . This will
ompute
all pairs (x; y) 2 NN (S ) where both points are in either A or B . It remains to
ompute pairs (x; y) 2 NN (S ) with x 2 A and y 2 B . If x 2 A, y 2 B and
(x; y) 2 NN (S ) then x and y both belong to the slab SL of width 2 around
hyperplane x0 = m, i.e., SL = fx = (x0 ; : : : ; xd 1 ) 2 S ; jx0 mj < g. So all we
have to do is to solve NN on point set SL. SL is not quite (d 1)-dimensional.
We make it (d 1)-dimensional by proje
ting the points in SL onto hyperplane
x0 = m, i.e., we
ompute S 0 = fx0 ; there is x = (x0 ; : : : ; xd 1 ) 2 SL su
h that
x0 = (x1 ; : : : ; xd 1 )g. The
ru
ial observation is that S 0 is still sparse,and that
NN (S 0 ) \
ontains" NN (SL).
Lemma 7.
a) If S is (;
)-sparse then S 0 is (; (1 + 2d )
)-sparse.
Version: 19.10.99 Time: 11:39 {40{
7.2.2. Range Trees and Multidimensional Divide and Conquer 41
b) If x; y 2 SL and dist 2 (x; y) < then dist 2 (x0 ; y0 ) < .
Proof : a) Consider any point x0 = (m; x1 ; : : : ; xd 1 ) on hyperplane x0 = m.
We have to
ompute a bound on the number P of points in the \strange" sphere
0
SSPH (x ) = fy 2 S ; jy0 mj < and ( 1
<d (xi yi )2 )1=2 < g with
enter x0
be
ause exa
tly the proje
tions of the points in SSPH (x0 ) have distan
e at most
from x0 in (d 1)-dimensional set S 0 . It is easy to see (
f. Figure 10 for an il-
lustration in 2-spa
e) that SSPH (x0 )
an be
overed with (1 + 2d ) d-dimensional
spheres of radius . Any su
h sphere
an
ontain at most
points of S and hen
e
jSSPH (x0 )j (1 + 2d )
. This shows that S 0 is (; (1 + 2d )
) sparse.
z
}|
8{z }| {
>
>
>
>
>
<
>
>
>
>
>
:
8
>
x0
>
>
>
>
<
>
>
>
>
>
:
x0 = m
Figure 10. Illustration in 2-spa
e.
b) obvious.
Lemma 7 holds true for other norms as well; however, fa
tor (1+2d ) in 7a) depends
on the norm. We infer from Lemma 7 that we
an
ompute NN (SL) by solving
the (d 1)-dimensional problem on S 0 and then going through list NN (S 0 ) and
throwing out some pairs. This leads to
Theorem 7. Let d be xed and let S Rd be (;
)-sparse. Then NN (S )
an be
omputed in time
Proof : We will rst derive a re
urren
e on T (i; n), the time to
ompute NN (S )
for any (;
)-sparse set S , jS j = n and s Ri , i d. We have
T (i; n) 2 T (i; n=2) + T (i 1; n) + O(n)
Version: 19.10.99 Time: 11:39 {41{
42
sin
e in order to solve an i-dimensional problem on n points we spend O(n) time
on
omputing the median and splitting the set and then solve two i-dimensional
problems on n=2 points ea
h and one (i 1)-dimensional problem on at most n
points. Also
T (i; 1) = 0
sin
e subproblems of size 1 are trivial and
T (1; n) = O(n log n +
^ n)
sin
e all one-dimensional problems generated are (;
)-sparse by Lemma 7a) and
therefore
an be solved in time O(n log n +
^ n). It is not too hard to verify by
indu
tion on n and-i that T (i; n) = O(n (log n)i =i! + (1 +
^) n (log n)i 1 =(i 1)!).
We leave this for Exer
ise 25. We will rather show how one arrives at the bound
for T (i; n).
Observe rst that it suÆ
es to study re
urren
e
U (i; n) = 2 U (i; n=2) + U (i 1; n) + n for i 2; n 2
U (i; 1) = 0 for i 1
U (1; n) = n log n +
^ n for n 2
be
ause we have T (i; n) = O(U (i; n)). We solve this re
urren
e for n a power of
two. Let F (i; k) = U (i; 2k )=2k . By substitution we obtain
V (i; k) = V (i; k 1) + V (i 1; k) + 1 for i 2; k 1
V (i; 0) = 0 for i 1
V (1; k) = k +
^ for k 1
This further simplied by setting V (i; k) = W (i; k) 1. Then
W (i; k) = W (i; k 1) + W (i 1; k) for i 2; k 1
W (i; 0) = 1 for i 1
W (1; k) = k + 1 +
^ for k 1
If the boundary
onditions were simpler, namely all equal to one, then this re
ursion
has a simple
ombinatori
interpretation. It
ounts a set of paths. More pre
isely,
if
X (i; k) = X (i; k 1) + X (i 1; k) for i 1; k 1
X (i; 0) = X (0; k) = 1 for i; k 0
then X (i; k) is exa
tly the set of paths from the origin (0; 0) to point (i; k) where
the set of edges
onsists of unit length horizontal and verti
al lines.
Every path from (0; 0) to (i; k) has length (number
of edges) i + k and
ontains
exa
tly i horizontal edges. Hen
e X (i; k) = i+i k . In parti
ular, X (1; k) = k + 1. It
is now easy to express W in terms of X . Write W (i; k) = W1 (i; k) + W2 (i; k) where
Wj (i; k) = Wj (i 1; k) + Wj (i; k 1) j = 1; 2, i 2, k 1
W1 (1; k) = k + 1 W2 (1; k) =
^ for k 1
W1 (i; 0) = 1 W2 (i; 0) = 0 for i 1
Version: 19.10.99 Time: 11:39 {42{
7.2.2. Range Trees and Multidimensional Divide and Conquer 43
k
3 1 4 10 20
2 1 3 6 10
1 1 2 3 4
0 1 1 1 1
0 1 2 3 i
^ =
id (1 + 2i ).
2
Proof : We sort S on
e a
ording to the last
oordinate in time O(n log n). Then
we pro
eed as des
ribed above. With a little
are all subproblems generated are
also sorted. Hen
e we obtain the same re
urren
e as in the proof of Theorem 7 with
the only
hange that T (1; n) = O(n +
^ n) now. This will save one fa
tor of log n
throughout.
Theorems 7 and 8 derive upper bounds on the performan
e of a multi dimensional
divide and
onquer algorithm for the xed radius near neighbor problem. Are there
any sets S Rd where this upper bound is a
tually a
hieved? Let us look at
the two-dimensional
ase. If the points of S
rowd into a very narrow, say width
< , verti
al slab then all subproblems generated will have indeed maximal size
and so our algorithm will run very long. A similar observation holds true in higher
dimensional spa
e. however, this observation also suggests a major improvement
upon the basi
algorithm. There is no a-priori reason for only looking at verti
al
dividing lines, we
an also look for horizontal dividing lines and
hoose whatever
is better. A \good" dividing line is a line whi
h divides set S into (nearly) equal
parts, denes a small (size O(n)) lower dimensional subproblem whi
h is easy to
nd. Good dividing lines always exist. We
ontent ourselves to a dis
ussion in the
two-dimensional spa
e and leave the general
ase to the reader.
Lemma 8. Let S R , jS j = n,
2
be (;
)-sparse. Then there exists a line L
orthogonal to one of the axes su
h that
1) no half-spa
e dened by L
ontains more than 4 n=5 points of S ;
p
2) the slab of width 2 around L
ontains at most 36
n=5 points of S .
Proof : For i, i = 0; 1, let li = minfa; xi a for at least n=5 points of S g and
hi = minfa; xi a for at least 4 n=5 points of S g. Next
onsider lines Lij = fy 2
R2 ; yi = li + (2 j + 1) g, 0 j (hi li )=2 1, and the slabs of width 2
around them, i.e., SLij = fy 2 R2 ; li + 2 j < yi < li + 2 (j + 1) g.
slab
SLi1
| {z } | {z }| {z }
2 2
:::
li Li0 Li1 Li2 hi
Figure 12.
Version: 19.10.99 Time: 11:39 {44{
7.2.2. Range Trees and Multidimensional Divide and Conquer 45
Claim:
a) For every i and j : No half-spa
e dened by Lij
ontains more than 4 n=5
points of S .
p
b) For
p
every i: If ( h i li ) = 2 n=(20
) then there is a j su
h that jS \ SLij j
36
n=5.
p
) There is an i su
h that (hi li )=2 n=(20
).
Proof : a) Sin
e there are n=5 points x of S with xi < li there are
learly that many
points with xi li + (2 j + 1) . Also here are less than 4 n=5 points x 2 S with
xi < hi and hen
e less than 4 n=5 points x 2 S with xi li + (2 j + 1) < hi .
This proves a).
b) Slabs SLij , j 0, are pairwisep
disjoint and
ontain at most 3 n=5 points of
S together. Ifp(hi li )=2 (1=20
) n then there must be one j su
h that
jS \ SLij j 36
n=5.
h1
C R1
l1
R0
l0 h0
Figure 13. Illustration of part
).,
p
) Assume otherwise. Then (hi li ) < n=5
for i = 0; 1. Let Ri = fy 2
R2 ; li yi hi g and let C = R1 \ R2 . Furthermore, let f = jC \ S j and
ni = j(Ri C ) \ S j. Then f + ni 3 n=5 sin
e jRi \ S j 3 n=5 and n0 + n1 + f n
sin
e sets R0 C , R1 C , C are pairwise disjoint. Thus n n0 + n1 + f =
(n0 + f ) + (n1 + fp) f 6 n=5 f or f > n=5. C is a re
tangle whose sides have
length at most n=5
and is hen
e easily
overed by n=5
ir
les of radius .
Sin
e S is (;
)-sparse any su
h
ir
le
ontains at most
points of S and hen
e
f < (n=5
)
= n=5, a
ontradi
tion.
Note that Lemma 8 also suggests a linear algorithm for nding a good dividing line.
Compute l0 ; h0 ; l1 ; h1 in linear time using the
plinear median algorithm (Se
tion 2.4).
Let us assume w.l.o.g. that (h0 l0p ) n=5
. The proof p
of Lemma 8 shows
that one of the slabs SLi;j , 0 j n=20
ontains at most 36
n=5 points
of S . The number of points in these slabs
an be determined in linear time by
bu
ket sort (Se
tion 2.2.2). Thus a good dividing line
an be determined in linear
Version: 19.10.99 Time: 11:39 {45{
46
time. We obtain the following re
urren
e for T (2; n), the time to
ompute NN (S )
for an (;
)-sparse set S R2 , jS j = n.
p
T (2; n) = max T (2; n1 ) + T (2; n n1 ) + T (1; 36
n=5) + O(n):
n=5n1 4n=5
Theorem 9. The good dividing line approa
h to the xed radius near neighbor
problem leads to an O(n log n) algorithm in 2-dimensional spa
e.
Proof : In Se
tion 3.5.1 Theorem 2a) we showed that the re
urren
e above has
solution T (2; n) = O(n log n).
Theorem 9 also holds true in higher-dimensional spa
e. In d-spa
e one
an always
nd a dividing hyperplane whi
h splits S into nearly equal parts (1=5 to 4=5 at the
worst) and su
h that the slab around this hyperplane
ontains at most O(n1 1=d )
points. This leads dire
tly to an O(n log n) algorithm in d-spa
e (Exer
ise 26).
This se
tion is devoted to lower bounds. We
over two approa
hes. The rst
approa
h deals with partial mat
h retrieval in minimum spa
e and shows that rooti
sear
h time is the best we
an hope for. In parti
ular, we show that dd-trees are
an optimal data stru
ture. The se
ond, more general approa
h deals with a wide
lass of dynami
multi-dimensional region sear
hing problems. A region sear
hing
problem (
f. introdu
tion to 7.2) over universe U is spe
ied by a
lass 2U
of regions. We show that the
ost of insert, delete and query operations
an be
bounded from below by a
ombinatorial quantity, the spanning bound of
lass .
The spanning bound is readily
omputed for polygon and orthogonal range queries
and
an be used to show that polygon trees and range trees are nearly optimal.
dd-trees are a solution for the partial mat
h retrieval problem with rooti
sear
h
time and linear spa
e. In fa
t, dd-trees are a minimum spa
e solution be
ause
dd-trees are easily stored as linear arrays. The Figure 14 shows an ideal dd-tree
for (invertible) set S = f(1; II); (2; IV); (3; III); (4; V); (5; I)g and its representation
as an array. The
orresponden
e between tree and array is the same as for binary
sear
h (
f. Se
tion 3.3.1).
Version: 19.10.99 Time: 11:39 {46{
7.2.3.1. Partial Mat
h Retrieval in Minimum Spa
e 47
3 1 II
2 IV
IV (3; III) V 3 III
5 I
4 V
(1; II) (2; IV) (5; I) (4; V)
Figure 14. An ideal dd-tree.
The aim of this se
tion is to show that dd-trees are an optimum minimum spa
e
solution for the partial mat
h retrieval problem; more pre
isely, we show
(n1 1=d )
is a lower bound on the time
omplexity of partial mat
h retrieval in d-dimensional
spa
e with one spe
ied
omponent in a de
ision tree model of
omputation. The
exa
t model of
omputation is as follows.
Let Sn be the set of permutations of elements 0; 1; : : : ; n 1. For 1 ; : : : ; d 1 2
Sn let A(1 ; : : : ; P id 1 ) = f(i; 1 (i); : : : ; d 1 (i)); 0 i ng and let In =
fA(1 ; : : : ; d 1 ); 1 ; : : : ; d 1 2 Sng. Then jInj = (n!)d 1 . In is the
lass of
invertible d-dimensional sets of
ardinality n with
omponents drawn from the
range 0; 1; : : : ; n 1. We restri
t ourselves to this range be
ause in the de
ision
tree model of
omputation only the relative size of elements is relevant. A de
ision
tree algorithm for the partial mat
h retrieval problem of size n
onsists of
1) a storage assignment SA whi
h spe
ies for every A 2 In the way of stor-
ing A in a table M [0 : : n 1; 0 : : d 1℄ with n rows and d
olumns, i.e., SA :
(Sn )d 1 ! Sn with the following interpretation. For all 1 ; : : : ; d 1 2 Sn and
= SA(1 ; : : : ; d 1 ): Tuple (i; 1 (i); : : : ; d 1 (i)) of set A(1 ; : : : ; d 1 ) is
stored in row (i) of table M , i.e., M [(i); j ℄ = j (i) for 0 j d 1,
0 i < n. Here 0 is the identity permutation.
2) d de
ision trees T0 ; : : : ; Td 1 . Trees Tj are ternary trees. The internal nodes
of tree Tj are labelled by expressions of the form X ? M [i; j ℄ where 0 i < n.
The three edges out of a node are labelled <, = and >. Leaves are labelled yes
or no .
A de
ision tree algorithm is used as follows. Let A 2 In , let y 2 R and let j 2
[0 : : d 1℄. In order to de
ide whether there is x = (x0 ; x1 ; : : : ; xd 1 ) 2 A with
xj = y we store A in table M as spe
ied by SA and then use de
ision tree Tj to
de
ide the question, i.e., we
ompare y with elements in the j -th
olumn of M as
pres
ribed by Tj .
= SA(1 ; : : : ; d 1 ) Æ j 1 g:
This denition needs some explanation. Let 1 ; : : : ; d 1 2 Sn , let = SA(1 ; : : : ;
d 1 ), and let A = A(1 ; : : : ; d 1 ). When set S is stored in table M then tuple
(i; 1 (i); : : : ; d 1 (i)) is stored in row (i) of table M , i.e., M [(i); j ℄ = j (i). In
other words, M [(j 1 (l); j ℄
ontains integer l, 0 l < n, i.e., Æ j 1 is one of
order types o
urring in the j -th
olumn.
Lemma 9. There is a j su
h that jOT (jj )j (n!)1 1=d .
Proof : The dis
ussion following the denition of OT (j ) shows that the mapping
(1 ; : : :Q; d 1 ) 7! (0 ; : : : ; d 1 ) where j = SA(1 ; : : : ; d 1 ) Æ j 1 is inje
tive.
Hen
e 0j d 1 jOT (j )j (n!)d 1 .
Next, we des
ribe pre
isely the
omputational power of de
ision trees Tj . Let
^ Sn be a set of permutations. A de
ision tree T solves problem SST () |
sear
hing semi-sorted tables | if for every B = fx0 < x1 < < xn 1 g, every
x and every 2 : ^ If B is stored in linear array M [0 : : n 1℄ a
ording to order
type , i.e., M [(l)℄ = xl for 0 l < n 1, then T
orre
tly de
ides x 2 B .
Lemma 10. Tj solves SST (OT (j )) for 0 j d 1.
Proof : Note rst that Tj solves SST (OT (j )) for every B = fx0 < x1 < < xn 1 g
if it does so for B = f0; 1; : : : ; n 1g. Next let 2 OT (j ). Then there must be
1 ; : : : ; d 1 su
h that = SA(1 ; : : : ; d 1 ) Æ j 1 . In parti
ular, if our partial
mat
h retrieval algorithm is applied to set A = A(1 ; : : : ; d 1 ) then A is stored in
table M [0 : : n 1; 0 : : d 1℄ su
h that M [(l); j ℄ = l for all l; i.e., B = f0; : : : ; n 1g
is stored in the j -th
olumn of M a
ording to order type . Thus Tj solves
SST (OT (j )).
Lemma 9 and 10 redu
e the partial mat
h retrieval problem to the sear
hing semi-
sorted tables problem. Lemma 11 gives a lower bound on the
omplexity of the
latter problem.
Version: 19.10.99 Time: 11:39 {48{
7.2.3.2. The Spanning Bound 49
Lemma 11. Let ^ Sn and let de
ision tree T solve SST ().
a) For every inje
tive mapping : [0 : : k 1℄ ! [1 : : n℄: jf(k); 2 and
^ i) = (i) for 0 i < kgj depth (T ).
(
b) j^ j depth (T )n .
Proof : b) Is a simple
onsequen
e of part a). Namely, let ^ k = fj[0::k 1℄; 2 ^ g.
Then j ^ 0 j = 1 and j ^ k+1 j depth (T ) j
^ k j by part a). Hen
e j^ j = j^ n j
depth (T ) .
n
a) Let : [0 : : k 1℄ ! [1 : : n℄ be inje
tive, let 2 ^ and let B = fx0 < x1 <
< xn 1g be stored in table M [0 : : n 1℄ a
ording to . Consider a sear
h for
x, xk 1 < x < xk . It denes a path in tree T leading to a leaf whi
h is labelled
\no". On this path x is
ompared with at most depth (T ) distin
t table positions,
say M [i1 ℄; : : : ; M [ih ℄, h depth (T ). We
laim (k) = il for some l, 1 l h.
Assume otherwise. Then T [il ℄ 6= xk for all l. Consider a sear
h for x = xk .
it will lead to exa
tly the same leaf be
ause the out
ome of all
omparisons is
un
hanged. hen
e T de
ides that xk does not belong to B , a
ontradi
tion. We
have thus shown that (k) = il for some l, 1 l h depth (T ).
Theorem 10 is now an immediate
onsequen
e of Lemmas 1, 2 and 3. By Lemma 1,
there is a j with jOT (j )j (n!)1 1=d . By Lemma 2, Tj solves SST (OT (j )) and
hen
e has depth jOT (j )j1=n by Lemma p 3. Finally, jOT (j )j1=n ((n!)1 1=d )1=n =
((n!)1=n )1 1=d =
(n1 1=d ) sin
e n! 2 n(n=e)n by Stirling's approximation.
It is open whether Theorem 1 is also valid for more general models of
omputation.
In parti
ular, it is not known whether the lower bound is valid in a more general
de
ision tree model where
omparisons of the form T [i; j ℄ ? T [h; j ℄ are also allowed.
It is
on
eivable, that
omparisons of this form
an speed up sear
hes
onsiderably,
be
ause they
an be used to infer information about the storage assignment. This
point is followed up in Exer
ise 29. We should also emphasize at this point that the
restri
tion to minimum spa
e solutions whi
h is
aptured in the denition of storage
assignment is essential for the argument. After all, range trees provide us with
polylogarithmi
sear
h time if we are willing to use non-linear spa
e. Exer
ises 30{
32 dis
uss various extensions.
We introdu
e the spanning bound and use it to prove lower bounds on the
om-
plexity of polygon retrieval and orthogonal range queries.
We will rst dene the region sear
hing problem in an abstra
t setting. Let U
be the key spa
e, let M be a
ommutative monoid (i.e., a set M with a
ommutative,
asso
iative operation + : M M ! M and an element 0 2 M su
h that x + 0 = x
Version: 19.10.99 Time: 11:39 {49{
50
for all x 2 M ) and let 2U be a set of regions U . The -region sear
hing problem
is to (eÆ
iently) maintain a partial fun
tion S : U ! M under the operations
Insert(x; m): pre
ondition: x 2 dom S , x 2 U , m 2 M
: ee
t: S S [ f(x; m)g
Delete (x): pre
ondition: x 2 dom S , x 2 U
: ee
t: dom S dom S fxg
Query (R): pre
ondition: RP2
: ee
t: output x2R\dom S S (x)
This is in
omplete agreement to our previous dis
ussion of sear
hing problems. U
is the key spa
e. The problem is to maintain a set of pairs (x; m), where x 2 U ,
m 2 M ; m is the \information" asso
iated with key x. Insert and Delete add and
delete pairs and Query sums the information over a region R.
Next we x the model of
omputation. There is an innite supply v0 ; v1 ; v2 ; : : :
of variables whi
h take values in M . Initially, 0 is stored in every variable. The
instru
tion repertoire
onsists vi vj + v k , vi Input , Output vi , i; j; k
0. Exer
ise 33 dis
usses a larger instru
tion repertoire. A program is given by
an (innite) state spa
e Z , an initial state z0 inZ
orresponding to the empty
fun
tion S , and three fun
tions fI ; fD ; fQ . Here fI : U M Z ! Z Ins ,
fD : U Z ! Z Ins and fQ : Z ! Z Ins where Ins is the set of
all sequen
es of instru
tions from the repertoire. Fun
tion fI has the following
semanti
s. If the algorithm is in state z 2 Z , operation Insert (x; m) is to be
exe
uted, and fI (x; m; z ) = (z 0 ; ) then z 0 is the new state and sequen
e 2 Ins
is to be exe
uted. The rst instru
tion of is of the form vi Input and pla
es
m into register vi . The remaining instru
tions of are of the form vi vj + vk .
The semanti
s of fD and fQ are dened similarly, i.e., after a deletion a sequen
e
of additions is exe
uted and after a query a sequen
e of additions followed by an
output instru
tion is exe
uted.
A program Z; z0 ; fI ; fD ; fQ is
orre
t if it is
orre
t for all
hoi
es of monoid M .
It is
orre
t for a parti
ular
hoi
e of M if the answers to all queries are
omputed
orre
tly.
The
ost of inserting(x; y) in
ontrol state z is the number of instru
tions in ,
where (z 0 ; ) = fI (x; m; z ). The
ost of a sequen
e of operations is the sum of the
osts of the operations in the sequen
e. We use Cn to denote the maximal
ost of
any sequen
e of n insertions, deletions and query operations (starting with empty
fun
tion S ).
Example 1 (One-dimensional range trees): Let U = R, M = (N0 ; +; 0), and
let be the set of intervals. The set Z of
ontrol states is the set of all BB[℄-
trees T for nite subsets of R, z0 is the empty tree. Let T be a BB[℄-tree. With
every node of T we asso
iate a variable v whi
h
ontains the weight (= number of
leaves) in the subtree rooted at that node. An insert or delete requires the update
of O(log n) variables; the update requires only additions if we start updating at the
leaves. Also a query
an be answered by summing O(log n) variables.
Version: 19.10.99 Time: 11:39 {50{
7.2.3.2. The Spanning Bound 51
The basi
idea for the lower bound argument is as follows. It is intuitively
lear and
will be made pre
ise below that every variable
ontains the sum of S (x) over some
subsets of U . A query for region R is then answered by summing some variables, i.e.,
by assembling R \ dom S from smaller pie
es. If all queries are \easy" to answer,
then set R \ dom S
an be assembled from only a few pie
es for every R 2 .
This implies that we need to store information about some x 2 dom S in many (the
pre
ise number depends on the stru
ture of ) dierent pla
es. If we delete x at this
point then a lot of variables be
ome useless and must be re
omputed after inserting
x with a dierent monoid value m. This argument suggests that updates are
ostly
if queries are
heap. The lower bound is then obtained by balan
ing the
ost of the
two operations. more generally it suggests that there is a trade-o between query
and update
ost. In the
ase of range trees we have seen su
h a trade-o (as an
upper bound) in Se
tion 7.2.2.
Denition:
a) Let X U , X nite and let R1 ; R2 ; : : : ; Rl be all sets of the form X \ R,
R 2 . Then F = fY1 ; : : : ; Ym g, ; 6= Yi X , is a spanning family for X
(with respe
t to ) if
1) every Ri is the disjoint union of some Yi 's and
2) every Yi whi
h is not a singleton is the disjoint union of some Yj and Yh .
b) For F = fY1 ; : : : Ymg a spanning family dene
t(F ) = max minft; there is a representation of Ri by t disjoint Yj 's in F g
i
and
(F )0 maxfd; x is
ontained in d Yj 'sg:
x2X
) For X U , X nite, let
B (X ) = minfmax(t(F ); (F )); F is a spanning family for X g
and
Bn = maxfB (X ); X U; jX j ng:
Lemma 12. For every program there is a normal form program of the same ost.
Proof : Lemma 12 states that spa
e
an be used intentionally wasteful and we all are
experts in that. A formal argument goes as follows. Let the normal form program
have variables v00 ; v10 ; : : : and
ontrol set Z 0 = Z W where W is the set of nite,
inje
tive mappings from V = fv0 ; v1 ; : : :g to V 0 = fv00 ; v10 ; : : :g. Any sequen
e
of instru
tions is repla
ed by a sequen
e of instru
tions whi
h assigns to unused
variables only. Asso
iation w 2 W is updated a
ordingly.
Lemma 13.
a) If v is useless at h then v is useless at h0 for all h0 h.
b) For every h : F = fset (v); v is useful at h and set (v) 6= ;g is a spanning family
for dom Sh.
Proof : a) If v is useless at h then val (v) 6 Range (Sh ), i.e., there is a pair (x; t) 2
val (v ) Range (Sh ). Sin
e (x; t) 2 val (v ) and val (v ) must be a sum of some of
the monoid elements assigned to variables after insertions, x was inserted at least t
times during Op 1 ; : : : ; Op h . Sin
e (x; t) 2= Range (Sh ) it was also deleted at least t
times. Hen
e (x; t) 2= Range (Sh ) for all h0 h by our
hoi
e of Op 1 ; Op 2 ; : : :. Sin
e
val (v ) will never
hange we infer that v is useless at all h h.
0
b) We have to verify properties 1) and 2) of a spanning family. Let us verify
property 2) rst. If v was assigned by v Input , then set (v ) is a singleton.
Hen
e if set (v) is not a singleton then v was assigned by v u + w and hen
e
val (v ) = val (u) + val (w ). Sin
e v is useful at h and hen
e val (v ) Range (Sh ) we
on
lude that set (v) = set (u) [ set (w) and that set (u) = set (v) \ set (w) = ; (For
this inferen
e it is important that we take the monoid of multi-sets and not the
monoid of subsets under union). This proves property 2).
Version: 19.10.99 Time: 11:39 {52{
7.2.3.2. The Spanning Bound 53
Property 1)
an be seen as follows. let R 2 and suppose
P (for the moment) that
Op h+1 = Query (R). The answer to this query, i.e., fS (x); x 2 R \ dom Shg is
omputed as a sum of some variables. Call the set of these variables
P A. No variables
v 2 A
an be useless at h sin
e val (v) 6 Range (Sh ) implies fval (v); v 2 Ag 6
Range (Sh ). Also sets set (v ), v 2 A must be pairwise disjoint by the argument used
to prove property 2).
We are now ready to
onstru
t sequen
e Op 1 ; : : : Op n of
ost at least bn=16
Bn .
Let m = dn=2e and let X = fx1 ; : : : ; xk g U , jX j m be su
h that B (X ) = Bm .
The following program denes Op 1 ; : : : ; Op n .
a) Let Op i = Insert (xi ; (xi ; k1))) for 1 i k
b) do bn=4
times
o at this point F = fset (v); set (v) 6= ; and v usefulg is a spanning family
for X and hen
e Bm = B (X ) maxf(F ); t(F )g o
Case 1 : t(F ) Bm :
Then there is R 2 su
h that at least Bm elements of F are needed to span
R \ dom S = R \ X . We let the next operation be Query (R). Answering this
query requires to sum at least Bm variables.
Case 2 : (F ) Bm :
Then there is x 2 X su
h that x is
ontained in at least Bm elements of F . Let the
next two operations be Delete (x), Insert (x; (x; t)) for the appropriate t. his
will make all variables v with (x; t 1) 2 val (v) and set (v) 2 F useless. There
are at least Bm su
h variables.
It remains to estimate the
omplexity of sequen
e Op 1 ; Op 2 ; : : : ; Op n dened
above. Let a (b) be the number of times
ase 1 (2) was exe
uted. Then a+b bn=4
.
Also the total
ost of Case 1 is at least a Bm . In
ase 2 at least b Bm variables
are made useless. hen
e at least that many variables must be assigned to. Thus
Cn min maxfa Bm ; b Bm g
a+b=bn=4
bn=8
Bm bn=16
Bn
where the last inequality follows from
Lemma 14.
a) Bm Bn for m n.
b) Bm+n Bm + Bn for all m and n.
Proof : a) Immediate from the denition.
b) Let X U , jX j = m + n, be su
h that B (X ) = Bm+n. Let X1 ; X2 be a
partition of X with jX1 j m, jX2 j n. Then there are spanning families F1
and F2 for X1 and X2 respe
tively with max(t(Fi ); (Fi )) B (Xi ) for i = 1; 2.
F = F1 [ F2 is a spanning family for X = X1 [ X2 with t(F ) = t(F1 ) + t(F2 ) and
(F ) = max((F1 ); (F2 )). Thus Bn+m = B (X ) max(t(F ); (F )) B (X1 ) +
B (X2 ) Bn + Bm .
Version: 19.10.99 Time: 11:39 {53{
54
The signi
an
e of Theorem 11 lies in the fa
t that it relates the
omplexity of an
algorithm, a quantity whi
h involves time and is therefore diÆ
ult to handle, with
a purely
ombinatorial quantity, whi
h is mu
h easier to deal with. Before we apply
the spanning bound to orthogonal range queries and polygon retrieval it is helpful
to visualize spanning families in terms of graphs.
Let 2U be a set of regions and let X = fx1 ; : : : ; xn g U . Let
R1 ; R2 ; : : : ; Rn be all sets of the form X \ R, R 2 (We may assume w.l.o.g.
that the number of sets is equal to the number of points be
ause we
an always add
either
tious points or regions). Furthermore, let F = fY1 ; : : : ; Ym g be a spanning
family. Let us
onstru
t a bipartite graph G with node set fx1 ; : : : ; xn ; R1 ; : : : ; Rn g
and edge set E = f(xi ; Ri ); xi 2 Rj g. For every region Rj let Sj f1; : : : ; mg be
su
h that Rj is the disjoint union of Yl , l 2 Sj .
x1
ontains is used to R1
represent
x2 R2
Yi
.. ..
. .
xn Rn
Figure 15.
We
an now \fa
tor" graph G into disjoint
omplete bipartite graphs as follows.
For every Yl
onsider the
omplete bipartite graph with nodes fxi ; xi 2 Yl g on the
X -side and fRj ; l 2 S g on the R-side.
Lemma 15. E is the disjoint union of the sets f(xi ; rj ); xi 2 Yl and l 2 Sj g,
1 l m.
Proof : Let (xi ; Rj ) 2 E , i.e., xi 2 Rj . then there is exa
tly one l su
h that xi 2 Yl
and l 2 Sj .
For xi (Rj ) let deg(xi ) (deg(Rj )) be the degree of xi (Rj ) in the fa
tored graph, i.e.,
deg(xi ) = jfl; xi 2 Yl gj and deg(Rj ) = jfl; l 2 Sj gj. Then t(F ) = maxj deg(Rj )
and (F ) = maxi deg(xi ). We want to derive lower bounds on max(t(F ); (F )) =
maxi;j (deg(Rj ); deg(xi )) whi
h is
ertainly no smaller than
X X X
deg(xi ) + deg(Rj ) =2 n = (ldeg (Yl ) + rdeg (Yl )) =2 n
i j l
Here ldeg (Yl ) = jYl j and rdeg (Yl ) = jfj ; l 2 Sj gj. It thus suÆ
es to prove lower
bounds on the total degree of sets Yl , 1 l m.
Version: 19.10.99 Time: 11:39 {54{
7.2.3.2. The Spanning Bound 55
Appli
ation 1: Polygon Retrieval
We
onsider a spe
ial
ase of polygon retrieval: line retrieval. More pre
isely, we
assume U = R2 and the set of all lines in R2 , i.e., = ff(x0 ; x1 ) 2 R2 ; ax0 +bx1 =
g; a; b;
2 R; a 6= 0 or b 6= 0g.
Lemma 16. Let S = fy1 ; y2 ; : : : ; yn g R2 and let L1 ; : : : ; Ln be a set of n pairwise
distin
t lines. Let ri = jLi \ S j be the number of points of S on line Li and let
F = fY1 ; : : : ; Ym g be a spanning family for S with respe
t to . Then
n
X
max(t(F ); (F )) ri =2 n:
i=1
Proof : Consider any Yl . We
laim that min(ldeg (Yl ); rdeg (Yl )) 1. Assume
ldeg(Yl ) 2, i.e., there are points yj ; yh , j 6= h, su
h that fyj ; yh g Yl . Sin
e
two points determine a line there is at most one line Lk su
h that Yl Lk . Thus
rdeg (Yl ) 1.
Next observe that
X
jE j = ldeg (Yl ) rdeg (Yl ) [by Lemma 15℄
l
X
(ldeg (Yl ) + rdeg (Yl ))min(ldeg (Yl ); rdeg (Yl )) 1℄
[sin
e
l
2 n max(t(F[by
); (dis
ussion
f )) following Lemma 15.℄
Pn
Thus max(t(F ); (F )) jE j=2 n = i=1 ri =2 n.
E. Wright: The theory of Numbers, Fourth Edition, Oxford University Press, 1965,
p. 265).
) Every line in L
ontains at least (A=2)=A1=3 = A2=3 =2 points of S . Thus the
total number of points from S on lines in L is
(A8=3 ) by part b) whi
h in turn
is
(n4=3 ).
Similar arguments
an be used to show lower bounds of the same order for half-
spa
e retrieval and
ir
ular queries (Exer
ises 34, 35). The best upper bound n0:77
on polygon retrieval is by polygon trees. There is still a gap to
lose.
Appli
ation 2: Orthogonal Range Queries
The lower bound for orthogonal range queries is somewhat harder to obtain. How-
ever, there is a merit to that. It agrees with the upper bound.
Proof : We will prove a lower bound of order (log n)d on the spanning bound. Let
A = bn1=d
, let X = [1 : : A℄d . Then jU j = Ad . Also we
onsider the following
lass
of Ad \one-sided" range queries. For y 2 X let Ry = fx 2 U ; x yg, where x y
if xi yi for 0 i d.
Let F = fY1 ; : : : ; Ym g be a spanning family for X . As above
onsider the
omplete bipartite graph asso
iated with Yl (
f. dis
usion following proof of Theo-
rem 11), i.e., let In (Yl ) = fx 2 X ; x 2 Yl g and let Out (Yl ) = fRy ; Yl is used to
represent Ry ; y 2 X g. Then Yl
ontributes all of In (Yl ) Out (Yl ) to the bipartite
Version: 19.10.99 Time: 11:39 {56{
7.2.3.2. The Spanning Bound 57
graph G with edge set f(x; Ry ); x yg asso
iated with the orthogonal range query
problem.
The idea for the proof is now as follows. If Yl
ontributes many edges to
graph G then most edges (x; Ry )
ontributed by Yl must have x y. This suggests
to weight the edges (x; Ry ) of G su
h that the weight is a de
reasing fun
tion of
y x. We
an then hope to bound the weight of the edges
overed by any Yl
from above and the weight of all edges from below. This would give the bound.
What weight fun
tion should we
hoose? It should be symmetri
with respe
t to th
oordinates. About the simplest de
reasing fun
tion with this property is to assign
weight
u2B;v2C
X
((u0 + v0 + 1) (ud 1 + vd 1 + 1)) 1
u2B [C;v2B [C
For i0 0, i1 0, : : : , id 1 0 let
(i0 ; i1 ; : : : ; id 1 ) 2 B [ C ;
ai0 i1 :::id 1 = 10;; ifotherwise.
Proof : We have
X
w(x; y)
(1 ;:::;1)xy(A;:::;A)
X
((y0 x0 + 1) (yd 1 xd 1 + 1)) 1
(1;:::;1)x(A=2)
(0 ;:::;0)y x(A=2;:::;A=2)
X d
= (A=2)d 1=(y0 x0 + 1)
y0 x0 A=2
0
=
((A log A)d ) =
(n (log n)d )
Version: 19.10.99 Time: 11:39 {58{
7.3. Exer
ises 59
The proof of Theorem 13 is now easily
ompleted. We have
X
max((F ); t(F )) jIn (Yl )j + jOut (Yl )j =2 n
l
[by the dis
ussion following Lemma 15℄
X X
w(x; y)=((2 )d 2 n)
l x2In (Yl )
Ry 2Out (Yl )
=
((log n)d )
[by Lemma 18℄
We have thus shown an
((log n)d ) lower bound on the spanning bound of orthog-
onal range queries. An appli
ation of Theorem 2 nishes the proof.
Theorem 13 shows that range trees are optimal. They allow to pro
ess n
insertions, deletions and queries in time O(n (log n)d ) and no data stru
ture
an
do better.
P
1) Show that every integer n
an be uniquely written as n = ki=0 aii where
P k+1.℄
i 1 ai and a1 < a2 < ak . [Hint: Use the identity ki=0 r+i i = r+k+1
Analyze the k-binomial transformation based on this representation.
2) Let f : N ! N be any non-de
reasing fun
tion with f P (i) 2 for all i. Let S be
any set with n elements. Let i = blog n
and let n 2 = j 0 aj bj where b = f (i)
i
and aj 2 N0 and 0 aj < b.
a) Design a dynamization method based on the following representation of set S .
S is represented by a large blo
k Slarge
ontaining 2i points and stru
tures Sj ,
j 0. Sj
ontains exa
tly aj bj points of S .
b) Design a dynamization method based on the following representation of S . S
is represented by a large blo
k Slarge
ontaining 2i points and stru
tures Sj;l ,
j 0, 1 l aj . A stru
ture Sj;l
ontains exa
tly bj points of S .
Determine QD (n) and ID (n) in both
ases. Reformulate your answers and prove
Theorem 3.
14) Are Theorems 2, 3 and 4 true for weight-balan
ed dd-trees? [Hint: Consider
weight-balan
ed 2d-trees with = 1=4. Take a tree where every node of even depth
has balan
e 1=4 and every node of odd depth has balan
e 1=2. Consider a partial
mat
h query with spe
ied 0-th
oordinate. This
oordinate is
hosen su
h that
the sear
h is always dire
ted into the heavier subtree.℄
ub 15 Show that the
ounting version of partial mat
h retrieval has time
omplexity
O(n1 1=(d s+1) ) in ideal dd-trees.
16) Compute fun
tion f (d; d s) of Theorem 3 expli
itely. Can you improve upon
the argument used to prove Theorem 3 in order to get a better bound on f (d; d s)?
19) A j -way subdivision of the plane
onsists of two innite parallel lines L1 ; L2
and half-lines L3 ; L4 ; : : : ; Lj su
h that the starting point of Li lies on Li 1 , Li
interse
ts L1 and is fully to the right of Li 1 . A j -way subdivision divides the
plane into 2 j open regions and j one dimensional regions. Show that for every
set S R, jS j = n, not all points of S
ollinear, there is a j -way subdivision su
h
that jS \ Ri j dn=2j e for any of the open regions Ri . Dis
uss polygon trees based
on j -way subdivisions and show that they yield O(s nlog(j +1)= log 2j ) retrieval time.
Here s is the number of sides of the polygon.
20) Design a stati
data stru
ture for orthogonal range queries whi
h uses spa
e
O(n1+ ) for some > 0 and has query time O(d log n). [Hint: Find a hierar
hi
al
de
omposition of set S into
ontiguous subsets su
h that every
ontiguous subset
of S
an be found by using only a few pie
es.℄
21) Base range trees on (a; b)-trees (
f. Se
tion 3.5.2). Reprove some or all of
Lemmas 1{4 in Se
tion 2.2.