Professional Documents
Culture Documents
18
Parallel Databases
This chapter is suitable for an advaru:Ed course, but can also be used for independent study projects by students of a first course. The chap ter cov ers several
aspects of the design of parallel d atabase systems - partitioning of data,. parallelization of indhidual relational operations, and parallelization of relational
expressions. The chapter also briefly coven some systems issues, such as cache
coherency and failure resiliency.
The most important applications of parallel databases today are for v..areh ousing and analyzing large amounts of data. Therefore partitioning of d ata a.."'ld
parallel query processing are co..e:red in significant detail. Query optimiz.ation is
also of importance, for the same reason. HoV\ever, parallel query optim.iz.ation is
still not a fully solved problem; exhaustive search,. as is used for sequential query
optimization, is too expensive in a parallel system, forcing the use of heuristics..
The description of parallel query processing algorithms is based O.."\ the
shared-nothing model. Students may be asked to study h0\'1' the algorithms can
be: improved if shared-memory machines are used instead.
13.9 For each of the three partitioning techniques, namely roW'ld-robin,. hash
Ylhen a
..
Huh partit!<mlng?
b.
R.angc partit!<mlng?
..
HCL
~artldoning:
Too many rtt0rdl '"ith tht u.mt ,.alu.e for the hashing attribute,. or a
poodr choocn huh function without the properties of randomness
one! unlfonnlty, can ...Wt in sbwed partition. To impron the
oituadcn,, w e ohoWd c:xpaiment with better has!Ung functions for
th.at relation.
b.
IW>s-p&rt!doning:
Non-unifonn diltribution of 1"&lun for the partitioning a.ttribute
(including dupliate 1"&lua for the partitioning attribute) ,,+uch are
not taken into account by 1 Nd partiti.on!ng \.-ector is the main
rcuon for llccwed partitions. Sorting the relatio.'l an the partitioning
attribute and then dhi.dirlg i t into n rang e:s "ith equal number of
tupla per range " 1ll gi1e 1 g ood putitioning 1ector 1vith \"a')' lO'\\
&biv.
U.U. Give: an ex.ample of 1 join that i1 not a simple equi-join for ""hich pa.rti-
tioned p uAilclll.n'I. can be used. \t\'Nt 1.ttrib u tes should be used for partitioning?
A.n.wa: \o\'e ght: tv.o eampl1:1 of such joins.
a...
r ~r. .-....t.A)-"(t........t.() 1l
Here: l\"I: N.vc &n cqui1oin cond ition \\hich can be executed first, a.nd
the extr1 conditions can be checked ind ependently on each tup le in
the join rault. Pa.rtitioned pa-rallelism is useful to execute the equi-
Joln,
b. r '(r..<l:(,"H/,J-'O)*""><IJ'OJ) l)l s
Thi.I ii a query in 1..-hich an r tuple and an s tuple join v.ith each
otMr 1f they fall into the 1arn1 range of \alues. Hence partitioned
puallclltm applies natuB.lly to this scenario, e\"en though the join
is l10t an cquljoln.
For both the qucrln, r ahould be partitioned on attribu te A a.."ld s on
attribute
For the accond query, the putitioning of !; should actually be
o.
us
b.
c.
d.
e.
left outer join.. if the join condition involves only equ ality
f.
Left outer join,. if the join condition involv es comparisons other than
equality
g. Full outer join,. if the join condition involv es comparisons other than
equality
Amiwu:
a.
\.Ve can pa...rallelize the difference operation by partitioning the relations on all the attributes,. and then computing differences locally at
each processor . "5 in aggregation, the cost o f transferring tuples during partitioning can be redu ced by partially computing differences
at each processor,. before partitioning .
b.
c.
For this.. partial counts cannot be computed locally before partitioning. Each processor ins.tead transfers all unique Ovalues for each A
v alue to the correct d estination processor. After partitioning, each
processor locally counts the n umber o f unique tup les for each v alue
o f A, and then ou tpu ts the final tt:S-ult.
d.
This can again be implemented like i;um, except that for each v alue
o f A.. a ISUDlo f the 9 ,-alues as ,.,ell as a coontof the n umber of tuples
in the gro up.. is transferred during partitio.-tlng. Then each p rocessor
o u tputs its local result.. by dh..-iding the total sum by total n umber of
tuples for each A v alue assigned to its p artition.
e . This can be performed just like partitioned natural join. .~er partitioning, each p rocessor computes the left outer join locally using
any o f the strategies of Chapter 12.
f.
The left outer join can be comp uted using a."\ exte."\Sion o f the
Fragment~d-Replicate scheme to compute non equi-jo ins. Consider r ~ s. The relations. a.re p artitioned, and r bd s is comp uted at
each site. \>\ie also collect tuples from r that did not match any tuples
from s; call the set of these dangling tuples at site i as 11, Jdter the
above step is d one at each site, for each fragment of r, \Ve take the
intersection of the 1//s from ev ery processor in \Vhich the &agment
of r \-\'as replicated. The intersections give the real set of dangling
tuples; these tup les a.re padded ,.,.-}th nulls and added to the resull
The intersections themsehes, followed by addition of padded tuples
to the result, can be done in parallel by partitioning.
g.
The algorithm is basically the same as abo, e, except that "''hen combining results, the p rocessing of dangling tuples must done for both
relations.
4
b.
c.
Suppose most transactions accessed one ua:vunl record, , ..hi.ch inclu des an uavu~1! typ..~attribute, and an associated nav unl type JJJtr.ilt.r
record, "'hich pr"-ides information about the account type. Ho"''
, ..ould you partition and/ or replicate data to speed up transactions?
You may assume that the 11rro1111!. iypeJJu1slt:r relation is rarely u pdated.
An.5wa:
a..
b.
c.
all.
1.&.15
b.
Ho'v \vould you choose behveen the altem.ati:ve partitioning techniqu es, based on the l\'orldoad?
c.
a.
b.
Another Luuc Lt that the ,.,,.orldoad may have a very large number
of qucria/ updata. Techniqun to reduce this number include the
fo~ing (a) combining repeated occurrences of a query that o.....Uy
differ ln conatantt, rtpl.acing them by one parametrized query along
'"-ith a count ol nw:nber ol occurrences and (b) dropping queries
'"hkh arc V'U). cheap ln)"""ay, or not lilc.ely to be affected by the
p.utitionlng choke.
c.
""Y'