Professional Documents
Culture Documents
0
%*+,
either dom*
4
+ dom*
0
+ or dom*
0
+ dom*
4
+
for every compYe(pr , or *where "# denotes a comparator+ such
that
%*+ and is a literal, dom*+
&urther, let *+ denote the application of a well$formed predicate e(pression to a tuple
!*+. *+ reduces in the conte(t of , ie. the occurrence of any %*+ in is
first replaced by . !he resulting e(pression is then reduced to a truth$value according
to the accepted semantics of comparators and boolean operators.
!hen , the resultant relation of the selection operation, is characterised by the following6
1
!he synta( of "attributeYname# and "literal# are unimportant in what follows and we leave it
unspecified
Introduction to Databases & Relational DM
%*+ %*+
!*+ M Z !*+ *+ N
5.4 ro0ection
Whereas a selection operation e(tracts rows of a relation meeting specified conditions, a
pro.ection operation e(tracts specified columns of a relation. !he desired columns are
simply specified by name. !he general effect is illustrated in &igure ; $?.
Figure %.!< !he pro.ection operation
We could thin' of selection as eliminating rows *tuples+ not meeting the specified
conditions. In li'e manner, we can thin' of a pro.ection as eliminating columns not
named in the operation. Powever, an additional step is reuired for pro.ection because
removing columns may result in duplicate rows, which are not allowed in relations. @uite
simply, any duplicate occurrence of a row must be removed so that the result is a relation
*a desired property of relational algebra operators+.
&or e(ample, using again the customer relation6
-ustomer
-O -name -city -phone
4 -odd >ondon 00<1K1;
0 Martin :aris ;;;;I4K
1 Deen >ondon 001?1I4
its projection over the attribute Ccity would yield (after eliminating all columns other
than Ccity):
Result
-city
>ondon
:aris
>ondon
duplicates
Introduction to Databases & Relational DM
Note the duplication of row 1 in row 3. Projection can result in duplication because the
resultant tuples have a smaller degree whereas the uniqueness of tuples in the source
relation is only guaranteed for the original degree of the relation. For the final result to be
a relation, duplicated occurrences must be removed, ie.
Result
-city
>ondon
:aris
!he form of a pro.ection operation is6
proEect ?source.relation.na#e@
o9er ?list.of.attribute.na#es@
gi9ing ?result.relation.na#e@
!hus the above operation would be written as6
proEect Custo#er
o9er Ccit1
gi9ing Result
5s with selection, Rsource$relation$nameS must be a valid relationAa relation name
defined in the database schema or the name of the result of a previous operation. Rlist$of$
attribute$namesS is a comma$separated list of at least one identifier. 3ach identifier
appearing in the list must be a valid attribute name of Rsource$relation$nameS. 5nd
finally, Rresult$relation$nameS must be a uniue identifier used to name the resultant
relation.
Why would we want to pro.ect a relation over some attributes and not othersT @uite
simply, we sometimes are interested in only a subset of an entity#s attributes given a
particular situation. !hus, if we needed to telephone all customers to inform them of
some new product line, data about a customer#s number and the city of residence are
superfluous. !he relevant data, and only the relevant data, can be presented using6
proEect Custo#er
o9er Cna#e; Cp4one
gi9ing Result
Result
-name -phone
-odd 00<1K1;
Martin ;;;;I4K
Deen 001?1I4
3(tending this e(ample, suppose further that we have multiple offices sited in ma.or
cities and the tas' of calling customers is distributed amongst such offices, ie. the office
in >ondon will call up customers resident in >ondon, etc. 2ow the simple pro.ection
above will not do, because it presents customer names and phone numbers without regard
Introduction to Databases & Relational DM
to their place of residence. If it was used by each office, customers will receive multiple
calls and you will probably have many annoyed customers on your hands, not to mention
the huge phone bills you unnecessarily incurredB
!he desired relation in this case must be restricted to only customers from a given city.
Pow can we specify thisT !he simple answer is that we cannot $ not with .ust the
pro.ection operation. Powever, the alert reader would have realised that the reuirement
to restrict resultant rows to only those from a given city is e(actly the sort of reuirement
that the selection operation is designed forB In other words, here we have an e(ample of a
situation that needs a composition of operations to compute the desired relation. !hus, for
the office in >ondon, the list of customers and phone numbers relevant to it is computed
by first selecting customers from >ondon, then pro.ecting the result over customer names
and phone numbers. !his is illustrated in &igure ; $;. &or offices in other cities, only the
predicate of the selection needs to be appropriately modified.
2ote that the order of the operations is significant, ie. a selection followed by a
pro.ection. It would not wor' the other way around *you can verify this by trying it out
yourself+.
Figure %.% -ombining operators to compute a desired relation
For#al Definition
If denotes a relation, then let
%*+ denote the finite set of attribute names of *ie. its intension+
!*+ denote the finite set of tuples of *ie. its e(tension+
, where !*+ and %*+, denote the value of attribute in tuple
!he pro.ection operation ta'es the form
proEect o9er gi9ing
where is a comma$separated list of attribute names. &ormally, *as a discrete
structure+ may be considered a tuple, but having a concrete enumeration synta( *comma$
separated list+.
>et %
tuple
*(+ denote the set of elements in the tuple (. !hen, must observe the following
constraint6
Introduction to Databases & Relational DM
%
tuple
*+ %*+
ie. every name occurring in must be a valid attribute name in the relation .
&urthermore, if !*+ and denotes a tuple, we define6
R*, , ) %
tuple
*+ %
tuple
*+
ie. a tuple element is in the tuple if and only if the attribute name occurs in .
!hen , the resultant relation of the pro.ection, is characterised by the following6
%*+ %
tuple
*+
!*+ M Z !*+ R*, , ) N
5.5 )atural 1oin
!he ne(t operation we will loo' at is the 2atural 9oin *hereafter referred to simply as
9oin+. !his operation ta'es two source relations as inputs and produces a relation whose
tuples are formed by concatenating tuples from each input source. It is basically a
cartesian product of the e(tensions of each input source. Powever, not all possible
combinations of tuples necessarily end up in the result. !his is because it implicitly
selects from among all possible tuple combinations only those that have identical values
in attributes shared by both relations.
Figure %.$ !he 9oin combines two relations over one or more common domains
!hus, in a typical application of a 9oin, the intensions of the input sources share at least
one attribute name or domain *we assume here that attribute names are global to a
schema, ie. the same name occurring in different relations denote the same attribute and
value domain+. !he 9oin is said to occur over such domain*s+. &igure ; $< illustrates the
general effect. !he shaded left$most two columns of the inputs are notionally the shared
attributes. !he result comprise these and the concatenation of the other columns from
each input. More precisely, if the degree of the input sources were m and n respectively,
and the number of shared attributes was s, then the degree of the resultant relation is
*mUn=s+.
5s an e(ample, consider the two relations below6
Introduction to Databases & Relational DM
-ustomer
-O -name -city -phone
4 -odd >ondon 00<1K1;
0 Martin :aris ;;;;I4K
1 Deen >ondon 001?1I4
!hese relations share the attribute "-O#, as indicated. !o compute the .oin of these
relations, consider in turn every possible pair of tuples formed by ta'ing one tuple from
each relation, and e(amine the values of their shared attribute. %o if the pair under
consideration was
<4, -odd, >ondon, 00<1K1;>and <4, 4, 04.K4, 0K>
we would find that the values match e(actly. In such a case, we concatenate them and add
the concatenation to the resultant relation. It doesn#t matter if the second tuple is
concatenated to the end of the first, or the first to the second, as long as we are consistent
about it. By convention, we use the former. 5dditionally, we omit the second occurrence
of the shared attribute in the result *repeated occurrence is superfluous+. !his gives us the
tuple
<4, -odd, >ondon, 00<1K1;, 4, 04.K4, 0K>
If, on the other hand, the pair under consideration was
<1, Deen, >ondon, 001?1I4> and <4, 4, 04.K4, 0K>
we would ignore it because the values of their shared attributes do not match e(actly.
!hus, the resultant relation after considering all pairs would be6
Result
-O -name -city -phone :O Date @nt
4 -odd >ondon 00<1K1; 4 04.K4 0K
4 -odd >ondon 00<1K1; 0 01.K4 1K
0 Martin :aris ;;;;I4K 4 0<.K4 0;
0 Martin :aris ;;;;I4K 0 0I.K4 0K
!he foregoing description is in fact general enough to admit operations on relations that
do not share any attributes at all *s H K+. !he .oin, in such a case, is simply the cartesian
product of the input sources# e(tensions *the condition that tuple combinations have
identical values over shared attributes is vacuously true since there are no shared
attributesB+. Powever, such uses of the operation are atypical.
!ransaction
-O :O Date @nt
4 4 04.K4 0K
4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
%hared
attribute
*s H 4+
Introduction to Databases & Relational DM
%yntactically, we will write the 9oin operation as follows6
Eoin ?source.relation.na#e@
1
&"D ?source.relation.na#e@
2
o9er ?attribute.na#e.list@
gi9ing ?result.relation.na#e@
where again
Rsource$relation$nameS
I
is a valid relation name *in the schema or the result
of a previous operation+
Rattribute$name$listS is a comma$separated non$empty list of attribute names,
each of which must occur in both input sources, and
Rresult$relation$nameS is a uniue identifier denoting the resultant relation
With this synta(, particularly with the over$clause, we have in fact ta'en the liberty
*4+ to insist that the .oin must be over at least one shared attribute, ie. we disallow
e(pressions of pure cartesian products of two relations that do not share any
attribute. !his restriction is of no practical conseuence, however, as in practice a
9oin is used to bring together information from different relations related through
some common value.
*0+ to allow a .oin over a subset of shared attributes, ie. we rela( *generalise+ the
restriction that a 9oin is over all shared attributes.
If a 9oin is over a proper subset of shared attributes, then shared attributes not specified in
the over$clause will each have its own column in the result relation. But in such cases, the
respective column labels will be ualified names. We will adopt the convention of
writing a ualified name as ".#, where is the column label and the relation name in
which appears. 5s an illustration, consider the relations below6
R4 R0
54 50 X 54 50 W
4 0 abc 0 1 pr
4 1 def 0 0 (y)
0 ? i.'
!he operation
Eoin R1 &"D R2 o9er &1 gi9ing Result
will yield
Result
54 R4.50 X R0.50 W
0 ? i.' 1 pr
0 ? i.' 0 (y)
Introduction to Databases & Relational DM
To see why Join is a necessary operation in the algebra, consider the following situation
(assume as context the Customer and Transaction relations above): the company decided
that customers who purchased product number 1 (P# = 1) should be informed that a fault
has been discovered in the product and that, as a sign of good faith and of how it values
its customers, it will replace the product with a brand new fault-free one. To do this, we
need to list, therefore, the names and phone numbers of all such customers.
First, we need to identify all customers who purchased product number 1. This
information is in the Transaction relation and, using the following selection operation, it
is easy to limit its extension to only such customers:
Next, we note that the resultant relation only identifies such customers by their customer
numbers. What we need, though, are their names and phone numbers. In other words, we
would like to extend each tuple in A with the customer name and phone number
corresponding to the customer number. As such items are found in the relation Customer
which shares the attribute C# with A, the join is a natural operation to perform:
With B, we have practically derived the information we needin fact, more than we
need, since we are interested only in the customer name (the Cname column) and phone
number (the Cphone column). But as weve learned, the irrelevant columns may be
easily removed using projection, as shown below.
5s a final e(ample, let us also assume we have the :roduct relation, in addition to the
-ustomer and !ransaction relations6
:roduct
:O :name :price
4 -:, 4KKK
0 LD, 40KK
Introduction to Databases & Relational DM
The task is to get the names of products sold to customers in London. Once again, this
task will require a combination of operations which must involve a Join at some point
because not all the information required are contained in one relation. The sequence of
operations required is shown below.
-ustomer
-O -name -city -phone
4 -odd >ondon 00<1K1;
0 Martin :aris ;;;;I4K
1 Deen >ondon 001?1I4
select Custo#er 74ere Ccit1 5 A/ondonB gi9ing
&
5 !ransaction
-
O
-name -city -phone -O :O Date @nt
4 -odd >ondon 00<1K1; 4 4 04.K4 0K
1 Deen >ondon 001?1I4 4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
Eoin & &"D Transaction o9er C: gi9ing B
B :roduct
-O -name -city ... :O Date @nt :O :name :price
4 -odd >ondon ... 4 04.K4 0K 4 -:, 4KKK
4 -odd >ondon ... 0 01.K4 1K 0 LD, 40KK
Eoin B &"D (roduct o9er (: gi9ing C
-
-
O
-name -city -phone :O Date @nt :name :price
Introduction to Databases & Relational DM
4 -odd >ondon 00<1K1; 4 04.K4 0K -:, 4KKK
4 -odd >ondon 00<1K1; 0 01.K4 1K LD, 40KK
proEect C o9er (na#e gi9ing Result
Result
:name
-:,
LD,
For#al Definition
5s before, if denotes a relation, then let
%*+ denote the finite set of attribute names of *ie. its intension+
!*+ denote the finite set of tuples of *ie. its e(tension+
, where !*+ and %*+, denote the value of attribute in tuple
&urther, if
4
and
0
are tuples, let
4
[
0
denote the tuple resulting from appending
0
to the
end of
4
.
We will also have need to use the terminology introduced in defining pro.ection above, in
particular, %
tuple
and the definition6
R*, , ) %
tuple
*+ %
tuple
*+
!he *natural+ .oin operation ta'es the form
Eoin &"D o9er gi9ing
5s with other operations, the input sources and must denote valid relations that are
either defined in the schema or are results of previous operations, and must be a uniue
identifier to denote the result of the .oin. is a tuple of attribute names such that6
%
tuple
*+ *%*+ %*++
>et H *%*+ %*++ = %
tuple
*+, ie. the set of shared attribute names not specified in the
over$clause. We ne(t define, for any relation r6
Rename*r, + M Z %*r+ = * H "r.p# p %*r+ + N
In the case that H MN or %*r+ H MN, Rename*r, + H %*r+.
!he 9oin operation can then be characterised by the following6
%*+ Rename*,+ Rename*,+
Introduction to Databases & Relational DM
!*+ M
4
[
0
Z
4
!*+ !*+ R*, ,
0
+
%
tuple
*+
4
H N
where
%
tuple
*+ H %*+ = %
tuple
*+
Introduction to Databases & Relational DM
$ Relational &lgebra '(art II)
2.1 Introduction
In the previous chapter, we introduced relational algebra as a fundamental model of
relational database manipulation. In particular, we defined and discussed three important
operations it provides6 %elect, :ro.ect and 2atural 9oin. !hese constitute what is called
the basic set of operators and all relational DBM%, without e(ception, support them.
We have presented e(amples of the power of these operations to construct solutions
*derived relations+ to various ueries. Powever, there are classes of practical ueries for
which the basic set is insufficient. !his is best illustrated with an e(ample. ,sing again
the same e(ample domain of customers and products they purchase, let us consider the
following reuirement6
FEet the names of customers who had purchased both product number 4 and product
number 0J
-ustomer !ransaction
-
O
-name -city -phone -O :O Date @nt
4 -odd >ondon 00<1K1; 4 4 04.K4 0K
0 Martin :aris ;;;;I4K 4 0 01.K4 1K
1 Deen >ondon 001?1I4 0 4 0<.K4 0;
0 0 0I.K4 0K
5ll the reuired pieces of data are in the relations shown above. It is uite easy to see
what the answer isAfrom the !ransaction relation, customers number 4 and number 0 are
the ones we are interested in, and cross$referencing the -ustomer relation *to retrieve
their names+ the customers are -odd and Martin respectively. 2ow, how can we
construct this solution using the basic operation setT
Wor'ing bac'wards, the final relation we wish to construct is a single$column relation
with the attribute "-name#. !hus, the last operation needed will be a pro.ection of some
relation over that attribute. %uch a relation must first be the result of .oining -ustomer
and !ransaction *over "-O#+, since -ustomer alone does not have data on products
purchased. %econd, it must contain only tuples of customers who had purchased products
4 and 0, ie. some form of selection must be applied. !his analysis suggests that the
reuired seuence of operations is a 9oin, followed by a %elect, and finally a :ro.ect.
!he following then may be a possible solution6
Eoin Custo#er &"D Transaction o9er C: gi9ing &
select & 74ere (: 5 1 &"D (: 5 2 gi9ing B
proEect B o9er Cna#e gi9ing Result
!he .oin results in6
Introduction to Databases & Relational DM
5
-O -name -city -phone :O Date @nt
4 -odd >ondon 00<1K1; 4 04.K4 0K
4 -odd >ondon 00<1K1; 0 01.K4 1K
0 Martin :aris ;;;;I4K 4 0<.K4 0;
0 Martin :aris ;;;;I4K 0 0I.K4 0K
5t this point, however, we discover a problem6 the selection on 5 results in an empty
relationB
!he problem is the selection condition6 no tuple can possibly satisfy a condition that
reuires a single attribute to have t1o different values *F:O H 4 52D :O H 0J+. !his is
obvious once it is pointed out, although it might not have been so at first glance. !hus
while the selection statement is syntactically correct, its logic is erroneous. What is
needed, effectively, is to select tuples of a particular customer only if there e(ists one
with :O H 4 and another with :O H 0, ie. the form of selection needed is dependent across
tuples. But the basic %elect operator cannot e(press this because it operates on each tuple
in turn and independently of one another.
?
!hus the proposed solution above is not a solution at all. In fact, no combination of the
basic operations can handle the uery or other ueries of this sort, for e(ample6
Get the names of customers who bought the product CPU but not the
product VDU, or
Get the names of customers who bought every product type that the
company sells, etc
!hese e(amples suggest that additional operations are needed. In the following, we shall
present them and show how they are used.
We will round up this chapter and our discussion of relational algebra with a discussion
of two other important topics6 how operations handle FnullJ values, and how seuences
of operations can be optimised for performance. 5 null value is inserted into a tuple field
to denote an *as yet+ un'nown value. -learly, this affects the evaluation of conditions
involving attribute values. 3(actly how will be e(plained in %ection <.?. &inally, we will
see that there may be several different seuences of operations that derive the same
result. In such cases, we may well as' which seuence is more efficient, ie. least costly or
better in performance, in some sense. 5 more precise notion of "efficiency# of operators
and how a given operator seuence can be made more efficient will be discussed in
section <.;.
?
%ome readers may have noted that if /R was used instead of 52D in the selection operation,
the desired result would be constructed. Powever, this is coincidental. !he use of /R is logically
erroneousAit means one or the other, but not necessarily both. !o see this, change the e(ample
slightly by deleting the last tuple in !ransaction and recompute the result *using /R+. Wour
answer would still be -odd and Martin, but the correct answer should be -odd aloneB
Introduction to Databases & Relational DM
2.2 Di.ision
5s the name of this operation implies, it involves dividing one relation by another.
Division is in principle a partitioning operation. !hus, < 0 can be paraphrased as
partitioning a single group of < into a number of groups of 0Ain this case, 1 groups of 0.
!he basic terminology used in arithmetic will be used here as well. !hus in an e(pression
li'e ( y, ( is the dividend and y the divisor. Division does not always yield whole
groups of the divisor, eg. C 0 gives 1 groups of 0 and a remainder group of 4. Relational
division too can leave remainders but, much li'e integer division, we ignore remainders
and focus only on constructing whole groups of the divisor.
!he manner in which a relational dividend is partitioned is a little more comple(. &irst
though, we should as' what aspect of a relation is being partitionedT !he answer simply
is the set of tuples in the relation. 2e(t, we as' how we decide to group some tuples
together and not othersT 2ot surprisingly, the basis for such decisions has to do with the
attribute values in the tuples. >et#s ta'e a loo' at an e(ample first before we describe the
process more precisely.
R R# Result
54 50 54 50 54
4 a 4 a 4
4 b 4 b 0
0 c 8Ma,bN 0 a
0 b 0 b
0 a
1 c
!he illustration above shows how we may divide a relation R, which is a simple binary
relation in this case with two attributes 54 and 50. &or clarity, the values of attribute 54
have been sorted so that a given value appears in contiguous rows *where there#s more
than one+. !he uestion we#re interested in is which of these values have in common an
arbitrary subset of values of attribute 50.
&or e(ample,
Fwhich values of 54 share the subset Ma,bN of 50TJ
By inspecting R, the reader can verify that the answer are the values 4 and 0, because
only tuples with these 54values have corresponding 50 entries of both "a# and "b#. :ut
another way, the tuples of R are grouped by the common denominator or divisor Ma,bN.
!his is shown in the relation R# where we emphasise the groups formed using double$line
borders. /ther tuples *the remainder of the division+ are ignored. 2ote that R# is not the
final result of divisionAit is only an intermediate wor'ing result. !he desired result are
the values of attribute 54 in it, or put another way, the pro.ection of R# over 54.
&rom this e(ample, we can see that a division of a relation R is performed over some
attribute of R. !he divisor is a subset of values from that attribute domain and the result is
Introduction to Databases & Relational DM
a relation comprising the remaining attributes of R. In relational algebra e(pessions, the
divisor is in fact specified by another relation D. &or this to be meaningful at all, D must
have at least one attribute in common with the R. !he division is over the common
attribute*s+ and the set of values used as the actual divisor are the values found in D. !he
general operation is depicted in the figure below.
Figure $.1. !he Division /peration
&igure < $0 shows a simple e(ample of dividing a binary relation R4 by a unary relation
R0. !he division is over the shared attribute I0. !he divisor is the set M4,0,1N, these being
the values found in the shared attribute in R0. Inspecting the tuples of R4, the value "a#
occur in tuples such that their I0 values match the divisor. %o "a# is included in the result.
"b# is not, however, as there is no tuple Rb,0S.
proEect Transaction
o9er C:; (: gi9ing &
proEect (roduct
o9er (: gi9ing B
Introduction to Databases & Relational DM
Figure $.2 Division of a binary relation by a unary relation
We can now specify the form of the operation6
di9ide ?di9idend.relation.na#e@ b1 ?di9isor.relation.na#e@
gi9ing ?result.relation.na#e@
Rdividend$relation$nameS and Rdivisor$relation$nameS must be names of defined
relations or results of previous operations. Rresult$relation$nameS must be a uniue name
used to denote the result relation. 5s mentioned above, the divisor must share attributes
with the dividend. In fact, we shall insist *on a stronger condition+ that the intension of
the divisor must be a subset of the dividend#s. !his is not really a restriction as any
relation that shares attributes with the dividend can be turned into the reuired form
simply by pro.ecting over them.
We can now show how division can be used for the type of ueries mentioned in the
introduction. !a'e the uery6
FEet the names of customers who bought every product type that the company sellsJ
!he !ransaction relation records customers who have ever bought anything. &or this
uery, however, we are not interested in the dates or purchase uantities but only in the
product types a customer purchased. %o we pro.ect !ransaction over -O and :O to give us
a wor'ing relation 5. !his is shown on the left side of the following illustration. 2e(t, we
need all the product types the company sells, and these may be obtained by pro.ecting the
relation :roduct over :O to give us a wor'ing relation B. !his is shown on the right side
of the illustration.
!ransaction :roduct
-O :O Date @nt :O :name :price
4 4 04.K4 0K 4 -:, 4KKK
4 0 01.K4 1K 0 LD, 40KK
0 4 0<.K4 0;
1 0 0I.K4 0K
5 B
di9ide & b1 B
gi9ing C
Eoin Custo#er; C
o9er C: gi9ing
Result
Introduction to Databases & Relational DM
-O :O :O
4 4 4
4 0 0
0 4
1 0
2ow as we are interested in only those customers that
purchased all products *ie. all the values in B+, B is thus
used to divide 5 to result in the wor'ing relation -. In
this case, there is only one such customer. &inally, the
details of the customer are obtained by .oining - with
the -ustomer relation over -O.
-ustomer -
-
O
-name -city -phone -O
4 -odd >ondon 00<1K1; 4
0 Martin :aris ;;;;I4K
1 Deen >ondon 001?1I4
Result
-O -name -city -phone
4 -odd >ondon 00<1K1;
For#al Definition
!o formally define the Divide operation, we will use the notation introduced and used in
-hapter ;. Powever, for convenience, we repeat here principal definitions to be used.
If denotes a relation, then let
%*+ denote the finite set of attribute names of *ie. its intension+
!*+ denote the finite set of tuples of *ie. its e(tension+
, where !*+ and %*+, denote the value of attribute in tuple
%
tuple
*(+ denote the set of elements in tuple (
&urthermore, if !*+, denotes a tuple, and %
tuple
*+ %*+, we define6
R*, , ) %
tuple
*+ %
tuple
*)
!he Divide operation ta'es the form
Introduction to Databases & Relational DM
di9ide b1 gi9ing
5s with other operations, the input sources and must denote valid relations that are
either defined in the schema or are results of previous operations, and must be a uniue
identifier to denote the result of the division. !he intensions of and must be such that
%*+ %*+
!he Divide operation can then be characterised by the following6
%*+ %*+ = %*+
!*+ M Z
4
!*+ R*
4
,,+ !*+ IM*+ N
where
%
tuple
*+ H %*+,
%
tuple
*+ H %*+, and
IM*+ H M t Z t !*+ R*t, , t) R*t, , + N
2.3 $et -!erations
Relations are basically sets. We should, therefore, be able to apply standard set operations
on them. !o do this, however, we must observe a basic rule6 a set operation on two or
more sets is meaningful if the sets comprise values of the same type. !his is so that
comparison of values from different sets is meaningful. It is uite pointless, for e(ample,
to attempt an intersection of a set of integers and a set of names. We can still perform the
operation, of course, but we can already tell at the outset that the result will be a null set
because any value from one will never be eual to any value from the other.
!o ensure this rule is observed for relations, we need to state what it means for two
relations to comprise values of the same type. 5s a relation is a set of tuples, the values
we are interested in are the tuples themselves. %o when is it meaningful to compare two
tuples for eualityT -learly, the structure of the tuples must be identical, ie. the tuples
must be of eual length and their corresponding elements must be of the same type. /nly
then can two tuples be eual, ie. when their corresponding element values are eual. !he
structure of a tuple, put another way, is in fact the intension or schema of the relation it
occurs in. !hus, meaningful set operations on relations reuire that the source relations
have identical intensions8schemas. %uch relations are said to be union-compatible.
!he set operations included in relational algebra are ,nion, Intersection, and Difference.
Qeeping in mind that they are applied to whole tuples, these operations behave in e(actly
the standard way. It goes without saying that their results are also relations with
intensions identical to the source relations.
!he ,nion operation ta'es the form
Rsource$relation$4S union Rsource$relation$0S giving Rresult$relationS
where Rsource$relation$iS are valid relations or results of previous operations and are
union$compatible, and Rresult$relationS is a uniue identifier denoting the resulting
relation.
Introduction to Databases & Relational DM
&igure < $1 illustrates this operation.
Figure $.3 Relational ,nion /peration
!he Intersection operation ta'es the form
Rsource$relation$4S intersect Rsource$relation$0S giving Rresult$relationS
where Rsource$relation$iS are valid relations or results of previous operations and are
union$compatible, and Rresult$relationS is a uniue identifier denoting the resulting
relation.
&igure < $? illustrate this operation.
Figure $.! Relational Intersection /peration
!he Difference operation ta'es the form
Rsource$relation$4S minus Rsource$relation$0S giving Rresult$relationS
where Rsource$relation$iS are valid relations or results of previous operations and are
union$compatible, and Rresult$relationS is a uniue identifier denoting the resulting
relation.
Introduction to Databases & Relational DM
&igure < $; illustrate this operation.
Figure $.% Relational Difference /peration
5s an e(ample of the need for set operations, consider the uery6 Fwhich customers
purchased the product -:, but not the product LD,TJ
!he seuence of operations to answer this uestion is uite lengthy, but not difficult.
:robably the best way to construct a solution is to wor' bac'wards and observe that if we
had a set of customers who purchased -:, *say W4+ and another set of customers who
purchased LD, *say W0+, then the solution is obvious6 we only want customers that
appear in W4 but not in W0, or in other words, the operation FW4 minus W0J.
!he problem now has been reduced to constructing the sets W4 and W0. !heir
constructions are similar, the difference being that one focuses on the product -:, while
the other the product LD,. We show the construction for W4 below.
!ransaction :roduct
-O :O Date @nt :O :nam
e
:price
4 4 04.K4 0K 4 -:, 4KKK
4 0 01.K4 1K 0 LD, 40KK
0 4 0<.K4 0;
1 0 0I.K4 0K
X
-O :O Date @nt :name :price
4 4 04.K4 0K -:, 4KKK
4 0 01.K4 1K LD, 40KK
0 4 0<.K4 0; -:, 4KKK
1 0 0I.K4 0K LD, 40KK
Eoin Transaction &"D (roduct o9er (: gi9ing F
!he above 9oin operation is needed
to bring in the product name into
the resulting relation. !his is then
used as the basis of a selection, as
shown on the right.
Introduction to Databases & Relational DM
W4
-O :O Date @nt :name :price
4
4 04.K4 0K -:, 4KKK
0 4 0<.K4 0; -:, 4KKK
-ustomer \4
-
O
-name -city -phone -O
4 -odd >ondon 00<1K1; 4
0 Martin :aris ;;;;I4K 0
1 Deen >ondon 001?1I4
W4
-O -name -city -phone
4 -odd >ondon 00<1K1;
0 Martin :aris ;;;;I4K
The construction for W2 is practically identical to that above except that the selection
operation specifies the condition Pname = VDU. The reader may like to perform these
steps as an exercise and verify that the following relation is obtained:
W0
-O -name -city -phone
4 -odd >ondon 00<1K1;
1 Deen >ondon 001?1I4
2ow we need only perform the difference operation FW4 minus W0 giving ResultJ to
construct a solution to the uery6
Result
-O -name -city -phone
0 Martin :aris ;;;;I4K
select F 74ere (na#e 5 C(U gi9ing G1
W4 now has only customer numbers that
purchased the product -:,. 5s we are interested
only in the customers and not other details, we
perform the pro.ection on the right.
proEect G1 o9er C: gi9ing H1
&inally, details of such
customers are obtained by
.oining \4 and -ustomer,
giving the desired relation
W4.
Eoin Custo#er &"D H1 o9er C: gi9ing I1
Introduction to Databases & Relational DM
For#al Definition
If denotes a relation, then let
%*+ denote the finite set of attribute names of *ie. its intension+
!*+ denote the finite set of tuples of *ie. its e(tension+
!he form of set operations is
?set operator@ gi9ing
where Rset operatorS is one of "union#, "intersect# or "minus#7 , are source relations
and the result relation. !he source relations must be union$compatible, ie. %*+ H %*+.
!he set operations are characterised by the following6
%*+ H %*+ H %*+ for all Rset operatorSs
for "union#
!*+ M t Z t !*+ t !*+ N
for "intersect#
!*+ M t Z t !*+ t !*+ N
for "minus#
!*+ M t Z t !*+ t !*+ N
2.4 )ull .alues
In populating a database with data ob.ects, it is not uncommon that some of these ob.ects
may not be completely 'nown. &or e(ample, in capturing new customer information
through forms that customers are reuested to fill, some fields may have been left blan'
*some customers may ta'e e(ception to revealing their age or phone numbersB+. In these
cases, rather than not have any information at all, we can still record those that we 'now
about. But what value do we insert into the un'nown fields of data ob.ectsT >eaving a
field blan' is not good enough as it can be interpreted as an empty string which may be a
valid value for some domains. We need a value that denotes "un'nown# and that cannot
be confused with valid domain values.
It is here that the ,ull value is used. We can thin' of it as a special value different from
any other value from any attribute domain. 5t the same time, we may thin' of it as
belonging to every attribute domain in the database, ie. it may appear as a value for any
attribute and not violate any type constraints. %yntactically, different DBM%s may use
different symbols to denote null values. &or our purposes, we will use the symbol "T#.
Pow do null values affect relational operationsT 5ll relational operations involve
comparing values in tuples, including :ro.ection *which involves comparison of result
tuples for duplicates+. !he 'ey to answering this uestion is in how we evaluate boolean
operations involving null values. !hus, for e(ample, what does FT S ;J evaluate toT !he
un'nown value could be greater than ;. But then again, it may not be. !hat is, the value
Introduction to Databases & Relational DM
of the boolean e(pression cannot be determined on the basis of available information. %o
perhaps we should consider the result of the comparison as un'nown as wellT
,nfortunately, if we did this, the relational operations we#ve discussed cease to be well$
definedB !hey all rely on comparisons evaluating categorically to one of two values6
!R,3 or &5>%3. &or e(ample, if the above comparison *FT S ;J+ was generated in the
process of selection, we would not 'now whether to include or e(clude the associated
tuple in the result if we were to admit a third value *,2Q2/W2+. If we wanted to do
that, we must go bac' and redefine all these operations based on some form of three$
valued logic.
!o avoid this problem, most systems that allow null values simply interpret any
comparison involving them as &5>%3. !he rationale is that even though they could be
true, they are not demonstrably true on the basis of what is 'nown. !hat is, the result of
any relational operation conservatively includes only tuples that demonstrably satisfy
conditions of the operation. 5dopting this convention, all the operations defined
previously still hold without any amendment. %ome implications on the outcome of each
operation are considered below.
&or the %elect operation, an un'nown value cannot identify a tuple. !his is illustrated in
&igure < $< which shows two %elect operations applied to the relation R. 2ote that
between the two operations, the selection criteria ranges over the entire domain of the
attribute I0. /ne would e(pect therefore, that any tuple in R4 would either be in the result
of the first or the second. !his is not the case, however, as the second tuple in R4 *Rb,TS+
is not selected in either operationAthe un'nown value in it falsifies the selection criteria
of both operationsB
Introduction to Databases & Relational DM
Figure $.$ %electing over null values
&or :ro.ection, tuples containing null values that are otherwise identical are not
considered to be duplicates. !his is because the comparison FT H TJ, by the above
convention, evaluates to &5>%3. !his leads to the situation as illustrated in &igure < $C
below. !he reader should note from this e(ample that the symbol "T#, while it denotes
some value much li'e a mathematical variable, is uite unli'e the latter in that it#s
occurrences do not always denote the same value. !hus FT H TJ is not demonstrably true
and therefore considered &5>%3.
Figure $.* :ro.ecting over null values
In a 9oin operation, tuples having null values under the common attributes are not
concatenated. !his is illustrated in &igure < $D *FTH4J, F4HTJ and FTHTJ are all &5>%3+.
Figure $., 9oining over null values
Introduction to Databases & Relational DM
In Division, the occurrence of even one null value in the divisor means that the result will
be an empty relation, as any value in the dividend#s common attribute*s+ will fail when
matched with it. !his is illustrated in &igure < $I below. 2ote, however, that this is not
necessarily the case if only the dividend contains null values under the common
attribute*s+Adivision may still be successful on tuples not containing null values.
Figure $.+ Division with null divisors
In set operations, because tuples are treated as a single unit in comparisons, a single rule
applies6 tuples otherwise identical but containing null values are considered to be
different *as was the case for :ro.ection above+. &igure < $4K illustrates this for each set
operation. 2ote that because of the occurrence of null values, the tuples in R0 are not
considered duplicates of R4#s tuples. !hus their union simply collects tuples from both
relations7 subtracting R0 from R4 simply results in R47 and their intersection is empty.
Figure $.1 %et operations involving null values
2.5 -!timisation
3ach relational operation entails a certain amount of wor'6 retrieving a tuple, e(amining a
tuple#s attribute values, comparing attribute values, creating new tuples, repeating a
process on each tuple in a relation, etc. &or a given operation, the amount of wor' clearly
varies with the cardinality of source relation*s+. &or e(ample, a selection performed on a
relation twice the cardinality of another *of the same degree+ would involve twice as
much wor'.
Introduction to Databases & Relational DM
We can also compare the relative amount of wor' needed between different operations
based on the number of tuples processed. 5n operation with two source inputs, for
e(ample, need to repeat its logic on every possible tuple$pair formed by ta'ing a tuple
from each input relation. !hus if we had two relations of cardinalities M and 2
respectively, a total of M2 tuple$pairs must be processed, ie. M *or 2+ times more than,
say, a selection operation on each individual relation. /f course, this is not an e(act
relative measure of wor', as there are also differences in the amount of wor' e(pended
by different operations at the tuple level. By and large, however, we are interested in the
order of magnitude of wor' *rather than the e(act amount of wor'+ and this is fairly well
appro(imated by the number of tuples processed.
We will call such a measure the efficiency of an operation. !hus, the efficiency of
selection and pro.ection is the cardinality of its single input relation, while the efficiency
of .oin, divide and set operations is the product of the respective cardinalities of their two
input relations.
Why should the efficiency of operations interest usT -onsider the following seuence of
operations6
.oin -ustomer 52D !ransaction over -O giving X7
select X where --ity H F>ondonJ giving Result
%uppose the cardinality of -ustomer was 4KK and that of !ransaction was 4KKK. !hen the
efficiency of the .oin operation is 4KK4KKK H 4KKKKK. !he cardinality of X is 4KKK *as it
is certainly intended that the -O in every !ransaction tuple matches a -O in one of the
-ustomer tuples+. !herefore, the efficiency of the selection is 4KKK. 5s these two
operations are performed one after another, the efficiency of the entire seuence of
operations is naturally the sum of their individual efficiencies, ie. 4KKKKKU4KKK H
4K4KKK.
2ow consider the following seuence6
select -ustomer where --ity H F>ondonJ giving X7
.oin X 52D !ransaction over -O giving Result
!he reader can verify that this seuence is relationally euivalent to the first, ie. they
produce identical results. But how does its efficiency compare with that of the firstT >et
us calculate using the same assumptions about the cardinalities. !he efficiency of the
selection is 4KK. !o estimate the efficiency of the .oin, we need to ma'e an assumption on
the cardinality of X. >et#s say that 4K customers live in >ondon. !hen the efficiency of
the .oin is 4K4KKK H 4KKKK, and the efficiency of the seuence as a whole is 4KKU4KKKK
H 4K4KKAten times more efficient than the firstB
/f course, the reader may thin' that the assumption about X#s cardinality was contrived
to give this dramatic performance improvement. !he point, however, is that the second
seuence can do no worse than the first, ie. if all customers in the -ustomer relation live
in >ondon, then it performs as poorly as the first. More li'ely, however, we e(pect a
performance improvement.
!he above e(ample illustrates a very important point about relational algebra6 there can
be more than one *seuence of+ e(pression that describe a desired result. !he main aim of
Introduction to Databases & Relational DM
optimisation, therefore, is to translate a given *seuence of+ e(pression into its most
efficient euivalent form. %uch optimisation may be done manually by a human user or
automatically by the database management system. 5utomatic optimisation may in fact
do better because the automatic optimiser has access to information that is not readily
available to a human optimiser, eg. current cardinalities of source relations, current data
values, etc. But the overwhelming ma.ority of relational DBM%#s available today merely
e(ecute operations reuested by users as is. !hus, it is important that users 'now how to
perform optimisations manually.
&or manual optimisation, it is perhaps less important to derive the most efficient form of
a uery than to follow certain guidelines, heuristics or rules$of$thumb that lead to more
efficient e(pressions. &reuently the latter will lead to acceptable performance and
e(pending more effort to find the optimal e(pression may not significantly improve that
performance if good heuristics are used. !here is, in fact, a simple and effective rule to
remember when writing ueries6 delay as long as possible the use of e(pensive
operationsB In particular, we should wherever possible put selection ahead of other
operations because it reduces the cardinality of relations. &igure < $44 illustrate the
application of this principle. !he reader should be able to verify that the two seuences of
operations are logically euivalent and that intuitively the selection operations before the
.oins can significantly improve the efficiency of the uery.
Figure $.11 Delay e(pensive operations
Introduction to Databases & Relational DM
* Relational Calculus '(art I)
3.1 Introduction
We established earlier the fundamental role of relational algebra and calculus in relational
databases *see ;.4+. More specifically, relational calculus is the basis for the notion of
relational completeness of a database language, ie. any language that can define any
relation e(pressible in relational calculus is relationally complete.
Relational 5lgebra *see chapters ; and <+ is one such language. Its approach is
procedural, ie. it provides a number of basic operations on relations and successive
applications of these operations must be properly seuenced to derive answers to
database ueries. !he basic operators are in themselves uite simple and easy to
understand. Powever, e(cept for fairly simple ueries, the construction of operation
seuences can be uite comple(. &urthermore, such constructions must also consider
efficiency issues and strive to find optimal ones *see <.;+. 5 considerable amount of
programming s'ill is therefore reuired to effectively use relational algebra.
Relational -alculus ta'es a different approach to the human=database interface. Rather
than reuiring users to specify how relations are to be manipulated, it only reuires them
to define 1hat the desired result is. Pow the result is actually computed, ie. the operations
used, their seuencing and optimisation, is left to the database management system to
wor' out. 5s it doesn#t deal with procedures *ie. seuencing of operations+, this approach
is freuently termed non-procedural or declarative.
Relational -alculus is mainly based on the well$'nown :ropositional -alculus, which is a
method of calculating with sentences or declarations. %uch sentences or declarations, also
termed propositions, are ones for which a truth value *ie. FtrueJ or FfalseJ+ may be
assigned. !hese can be simple sentences, such as Fthe ball is redJ, or they may be more
comple( involving one or more simple sentences, such as Fthe ball is red 52D the
playing field is greenJ. !he truth value of comple( sentences will of course depend on the
truth values of their components. !his is in fact what the calculus "calculates#, using rules
for combining truth values of component sentences.
In Relational -alculus, the sentences we deal with are simpler and refer specifically to
the relations and values in the database of interest. %imple sentences typically ta'e the
form of comparisons of values denoted by variables or constants, eg. X 1, X R W, etc.
More comple( sentences are built using logical connectives 5nd *"&#+ and /r *"Z#+, eg. X
S C & X R W Z X ;. %imple and comple( sentences li'e these are e(amples of Well$
&ormed &ormulae, which we will define fully later.
Regardless of their e(act synta(, a formula is in principle a logical function with one or
more free variables. &or purposes of illustration, we will write such functions as in the
following annotated e(ample6
Introduction to Databases & Relational DM
In the above e(ample, there is one free variable, X. !he value of the function can be
computed for specific instances of X. !hus,
&*4;+ *4; S 40 & 4; R 4D+ *true & true+ true
&*4K+ *4K S 40 & 4K R 4D+ *false & true+ false
5dditionally, free variables are deemed to range over a set of permitted values, ie. only
such values can instantiate them. We shall see the significance of this later, as applied to
relations. But .ust to illustrate the concept for now, consider the following function over
two free variables6
&*X,W+ H6 X S W & W R 40
%uppose X ranges over MD, 4;N and W ranges over MC,4?N. !hen &*D, 4?+ and &*4;, C+ are
allowable instantiations of the function, with truth values false and true respectively,
whereas &*4KKK,0KK+ is not a valid instantiation. %uch restrictions of values over which
free variables range become significant when we interpret a formula as the simple uery6
Fget the set of values of free variables for which the formula evaluates to trueJ. !hus, for
the above formula, we need only construct the following table involving only the
permitted values6
X W &*X,W+
D C true
D 4? false
4; C true
4; 4? false
!he desired set of values can then be read from the rows where &*X,W+ evaluated to true,
ie. the set M*D,C+, *4;,C+N.
Relational -alculus is an application of the above ideas to relations. We will develop
these ideas in greater detail in the following sections.
3.2 Tu!le 4ariables
&ree variables in logical functions can in principle range over any type of value. 5 feature
that distinguishes Relational -alculus from other similar calculi is that the free variables
range over relations. More specifically, any free variable ranges over the e(tension of a
designated relation, ie. the current set of tuples in the relation. !hus, a free variable may
be instantiated with a tuple selected from the designated relation.
%uppose, for e(ample, we introduced a variable - to range over the relation -ustomer, as
in &igure C $4. !hen - may be instantiated with any one of the three tuples at any one
time. !he e(ample shows - instantiated with the second tuple. 3uivalently, we may
Introduction to Databases & Relational DM
sometimes say that - "holds# a value instead of being instantitated with that value
;
. In
any case, because variables li'e - range over tuples *or is only permitted to hold a tuple+,
they are termed tuple variables.
Figure *.1 5 !uple Lariable - ranging over the -ustomer Relation
5 tuple has component parts, and unless we have a means of referring to such parts, the
logical functions we formulate over relations will have limited e(pressive power. Eiven,
for e(ample, two variables X and W that range over two different relations with a
common domain, we may want to specify a condition where their current instantiations
are such that the values under the common domain are identical. !hus while X *and W+
denote a tuple as a whole, we really wish to compare tuple component values. !he
syntactic mechanism provided for this purpose ta'es the form6
Rtuple$variable$nameS.Rattribute$nameS
and is interpreted to mean the value associated with Rattribute$nameS in the current
instantiation of Rtuple$variable$nameS. !hus, assuming the instantiation of - as in &igure
C $46
-.-O H 0
-.-name H "Martin#
Getc
!his denotation of a particular data item within a tuple variable is often referred to as a
pro.ection of the tuple variable over a domain *eg. F-.-nameJ is a pro.ection of tuple
variable - over the domain -name+.
Relational -alculus is a collection of rules of inference of the form6
Rtarget listS 6 Rlogical e(pressionS
where Rtarget listS is a list of free variables and8or their pro.ections that are referenced in
Rlogical e(pressionS. !his list is thought of as the Ftarget listJ because the set of
instantiations of the list items that ma'es Rlogical e(pressionS true is the desired result.
In other words, an inference rule may be thought of as a uery, and may be informally
understood as a reuest to find all variable instantiations that satisfy Rlogical e(pressionS
and, for each such instantiation, to e(tract the data items mentioned in Rtarget listS.
&or e(ample, consider the inference rule in &igure C $0. It references one free variable, -,
which ranges over -ustomer. !he Rtarget listS specifies items we are interested in $ only
the phone number in this case $ but only of those tuples that satisfy the Rlogical
e(pressionS. In other words, the rule may be paraphrased as the uery to Fget the set of
;
!his terminology may perhaps be favoured by programmers who are used to programming
language variables and to thin'ing about them as memory locations that can "hold# one value at a
time.
Introduction to Databases & Relational DM
phone numbers of customers who live in >ondonJ. 2ote that the use of the variable -
both in Rtarget listS and in Rlogical e(pressionS denotes the same instantiation, thereby
ensuring that F-.-phoneJ is e(tracted from the same tuple that satisfies the comparison
F-.-city H >ondonJ. !he computed set in this case would be M00<1K1;, 001?1I4N,
corresponding to the phone numbers in the first and last tuples $ these being the only
tuples satisfying F-.-city H >ondonJ.
Figure *.2 5n inference rule over the -ustomer relation
The reader should note the simplicity and declarative character of the inference rule,
which merely states the desired result (the <target list>) and the conditions that must be
satisfied (the <logical expression>) for a value to be included in the result. Contrast this
with relational algebra which would require the following construction:
select Custo#er 74ere Ccit1 5 J/ondonK gi9ing FD
proEect F o9er Cp4one gi9ing Result
!he above e(ample only used a single variable. Powever, a single variable can only
range over a single relation, while often the data items of interest are spread over more
than one relation. In such cases, we will need more than one tuple variable.
&igure C $1 illustrates such a case involving two variables, : and !, ranging over
relations :roduct and !ransaction respectively. 2ote that the inference rule at the top of
the figure
lists items from both variables in the target list *ie. :.:name, !.-O+
compares in the logical e(pression pro.ections of the two different variables over the
same domain *!.:O H :.:O+
It further illustrates specific instantiations of each variable and evaluation of the logical
e(pression in the conte(t of these instantiations. In this case, the logical e(pression is true
and therefore the items in the target list are e(tracted from the variables *shown in the
FresultJ table+. It is important to note that a given inference, as in this illustration, is
entirely in the conte(t of a specific instantiation of each tuple variable. It is meaningless,
for e(ample, to evaluate F!.:O H :.:OJ using one instance of : and F:.:rice S 4KKKJ
using another. !he total number of inferences that can be attempted for any given rule is
therefore the product of the cardinality of each variable#s range.
!he inference rule in this e(ample may be paraphrased as the uery Ffind the customer
numbers and product names, priced at more than 4KKK, that they purchasedJ. 5s an
e(ercise, the reader should attempt to construct this uery in relational algebra *hint6 it
will involve the basic operators %elect, :ro.ect and 9oin+.
Introduction to Databases & Relational DM
Figure *.3 Multiple variable inference
3.3 5uantifiers
>ogical e(pressions may also include variable 0uantifiers, specifically6
4. the e%istential uantifier, denoted by the symbol "#, and
0. the universal uantifier, denoted by the symbol "#
!hese uantifiers uantify variables. 5n e(istentially uantified variable, say (, is written
F(J and is read as Fthere e(ists an ( such thatGJ. 5 universally uantified variable is
written as F(J and is read as Ffor all ( GJ.
@uantification is applied to a formula and is written preceding it. &or e(ample,
( *( R y & y R 40+
would be read as Fthere e(ists an ( such that ( is less than y and y is less than 40J. !he
formula to which the uantification is applied is called the scope of uantification.
/ccurrences of uantified variables in the scope of uantification are said to be bound
*e(istentially or universally+. !he scope is normally obvious from the written
e(pressions, but if ambiguities might otherwise arise, we will use parenthesis to delimit
scope.
Informally, the formula F( * Re(prS +J asserts that there e(ists at least one value of (
*from among its range of values+ such that Re(prS is true. !his assertion is false only
when no value of ( can be found to satisfy Re(prS. /n the other hand, if the assertion is
true, there may be more than one such value of (, but we don#t care which. In other
words, the truth of an e(istentially uantified e(pression is not a function of the
uantified variable*s+
<
.
5s an e(ample, consider the unuantified e(pression
( R y & y R 40
and suppose x ranges over {4,15} and y over {7,14}. The truth table for the expression is:
<
!he truth of a uantified e(pression does depend, of course, on the range of permitted values of
the uantified variables.
Introduction to Databases & Relational DM
( y (Ry & yR40
? C true
? 4? false
4; C false
4; 4? false
2ow consider the same e(pression but with ( e(istentially uantified6
( *( R y & y R 40+
Since we dont care which value of x makes the expression true as long as there is at least
one, its truth depends only on the unbound variable y:
y ( *(Ry & yR40+
C true
4? false
An existentially quantified expression therefore has a distinctly different meaning from
the same expression unquantified. In particular, when <logical expression> of an
inference rule is existentially quantified, it becomes a query on the free variables only,
since it is a function of only those variables.
Figure *.! :roduct and !ransaction relations with associated tuple variables
-onsider, for e(ample, the :roduct and !ransaction relations in &igure C $?, with tuple
variables : ranging over the former and ! over the latter. !he rule
:.:name6 ! *!.:O H :.:O 5nd !.-O H 4+
is interpreted as the uery to find values of the free variable : such that there e(ists at
least one value of the bound variable ! satisfying the formula F!.:O H :.:O 5nd !.-O H
4J. 5s before, evaluation of the e(pression is in the conte(t of some instantiations of the
variables. 5ll possible values of : must be considered, but for each we need only find one
value of ! to satisfy e(pression. /nce we have done that, other possible values of !, if
any, may be ignored. &or e(ample, with : instantiated to the first tuple of :roduct, we
consider in turn tuples of !ransaction as values of !. We will find in fact that the first
already satisfies the e(pression and we may therefore ignore the others. With : set to the
second tuple, however, we will find no value for ! to satisfy the e(pression. !he result
for this e(ample therefore is only one value for :, with :.:nameH-:,.
Introduction to Databases & Relational DM
5s another e(ample, consider the relations in &igure C $; with associated tuple variables
as shown. %uppose, we are interested in finding the names of customers who bought the
product -:,. !hat is, our target is the value X.-name, but only if X is a customer who
has bought the product -:,. In other words, there must e(ist a W such that FX.-O H
W.-OJ. !his would establish that X bought a product, denoted by W.:O *the product
number+. &urthermore, this product must be a -:,, ie. there must e(ist a \ such that
FW.:O H \.:OJ and F\.:name H -:,J. !hus, the rule corresponding to our uery is
X.-name6 W \ * X.-O H W.-O & W.:O H \.-O & \.:name H -:, +
!he reader can verify that the answer satisfying this uery is M-odd, MartinN.
Figure *.% :roduct, -ustomer and !ransaction relations with associated tuple variables
,sing again the relations in &igure C $;, let#s loo' at a more comple( uery6 get the
names of customers who bought the product -:, and LD,. 5t first glance, this seems a
simple e(tension of the above uery6
X.-name6 W \ * X.-O H W.-O & W.:O H \.-O &
\.:name H -:, & \.:name H LD, +
But the reader who remembers a similar e(ample in section <.4 would have noted a
problem. %pecifically, the sube(pression F\.:name H -:, & \.:name H LD,J can
never be true for a given value of \Aa field of a given tuple can only hold one value, so
only one or the other can be true but not bothB /f course what we mean to specify is that
the customer purchased at least one product which is a -:,, and another which is a
LD,. %ince a tuple variable can hold only one value at a time, this clearly cannot be
done using only one tuple variable. !he solution, therefore, is to introduce additional
distinct variables to range over the same relation when more than one tuple is to be
considered at a time *note that relational calculus places no restriction on the number of
distinct variables that can range over a relation+. &or this particular e(ample, we need
only introduce one additional variable each for the relations !ransaction and :roduct
respectively, as shown in &igure C $<. !his will allow us to consider two separate
purchases at one time. !he correct formulation, therefore, is6
X.-name6 !4 !0 :4 :0 * X.-O H !4.-O & X.-O H !0.-O &
!4.:O H :4.-O & :4.:name H -:, &
!0.:O H :0.-O & :0.:name H LD, +
Introduction to Databases & Relational DM
&igure C $< additionally shows particular values of these variables that satisfy our uery.
Figure *.$ Multiple variables ranging over a relation
>et#s turn now to the universal uantifier. Informally, the formula F( * Re(prS +J asserts
that for every value of ( *from among its range of values+ Re(prS is true. >i'e the
e(istential uantifier, the truth of an e(istentially uantified e(pression is not a function
of the uantified variable*s+. -onsider, for e(ample, the unuantified e(pression
( R y Z y R 40
and suppose x ranges over {4,15} and y over {7,14}. The truth table for the expression is:
( y (Ry Z yR40
? C true
? 4? true
4; C true
4; 4? false
2ow consider the same e(pression but with ( universally uantified6
( *( R y Z y R 40+
In a sense, like the existentially quantified variable, we dont care what the values of x
are, as long as every one of them makes the expression true for any given y. Thus its truth
table is:
y ( *(Ry Z yR40+
C true
4? false
!he universal uantifier will be needed for ueries li'e the following6
Fget the names of customers who bought every type of productJ
Introduction to Databases & Relational DM
5ssume the relations as in &igure C $;. !he phrase Fevery type of productJ clearly means
every tuple of the :roduct relation. Powever, the :roduct relation does not record
purchases, which are found only in the !ransaction relation, ie. a product is purchased
*by someone+ if there is a transaction recording its purchase. In other words, a customer
*ie. X+ satisfies this uery if for every product *ie. \+ there is a transaction *ie. W+
recording its purchase by the customer. !his can now be uite simply rewritten in the
calculus6
X.-name6 \ W *X.-O H W.-O & W.:O H \.:O+
2ote that the different types of uantifiers can be mi(ed. But note also that their order is
significant, ie. ( y * Re(prS + is not the same as y ( * Re(prS +. &or e(ample,
( y * y is the mother of (+
asserts that everyone has a mother. Whereas,
y ( * y is the mother of (+
asserts that there is a single individual *y+ who is the mother of everyoneB
3.4 6ell7&ormed &ormulae
>et us now be more precise about the valid forms of logical e(pressions involving tuple
variables. %uch valid forms are called well$formed formulae *wff+ and are defined as
follows6
4. A # is a wff
if A is a pro.ection of a tuple variable,
# is a constant or a pro.ection of a tuple variable, and
is one of the comparison operators6 H, , R, S, ,
0. F1 $ F2 and F1 % F2 are wffs if F1 and F2 are wffs
1. &F' is a wff if F is a wff
?. x &F&x'' and x &F&x'' are wffs if F&x' is a wff with a free occurrence of the
variable (.
!he operator precedence for the "&K and "LK; operators follow the standard precedence
rules, ie. "&K binds stronger than "LK. !hus,
"&4 & &0 Z &1# "* &4 & &0 + Z &1#
3(plicit use of parenthesis, as in rule *1+ above, is reuired to override this default
precedence. !hus if the intention is for the "L# operator to bind stronger in the above
e(pression, it has to be written as
&4 & *&0 Z &1+
We can now be more specific about the form of a query in relational calculus:
*Rtarget listS+6*RwffS+
5s final e(amples for this chapter, consider the following ueries6
Introduction to Databases & Relational DM
5ssume the tuple variables -, ! and : ranging over relations -ustomer, !ransaction and
:roduct respectively. !he appropriate uery is as follows6
*-
.-
na
m
e, -.-ity, -.:hone+ 6
! : *-.-O H !.-O & !.Date R 0;.K4 & !.:O H :.:O & :.:name H -:,+
5ssume tuple variables as in @uery 4 above. !he appropriate uery is as follows6
*:.:name+ 6
! - * !.:O H :.:O & !.-O H -.-O & * -.-ity H >ondon Z -.-name H %mith ++
2ote that the use of parenthesis around the "or# e(pression above is necessary.
@uery 46 FEet the names, cities and phone numbers of customers who bought the
product -:, before the 0;
th
of 9anuaryJ
@uery 06 FEet the names of products bought by customers living in >ondon or by
the customer named %mithJ
Introduction to Databases & Relational DM
, Relational Calculus '(art II)
Relational -alculus, as defined in the previous chapter, provides the theoretical
foundations for the design of practical data sub$languages *D%>+. In this chapter, we will
loo' at an e(ample of oneAin fact, the first practical D%> based on relational calculusA
the 5lpha.
&urther to this, we will also loo' at an alternative calculusAstill a relational calculus *ie.
relations are still the ob.ects of the calculus+ but based on Domain 2ariables rather than
.uple 2ariables. Because of this, the relational calculus covered earlier is more
accurately termed Relational *alculus 1ith .uple 2ariables. !he reader will recall that
!uple Lariables range over tuples of relations and were central in the formulation of
inference rules and in the definition of well$formed formulae. Domain 2ariables, on the
other hand, range over domain values rather than tuples and conseuently reuire a
different construction of well$formed formulae. We will discuss this alternative in the
second part of this chapter.
8.1 T%e Data $ub7Language ,l!%a
DSL Alpha is directly based on relational calculus with tuple variables. It provides,
however, additional constructions that increase the query formulation power of the
language. Such constructions are in fact found in most practical DSL in use today.
3.1.1 "lpha *ommand
D%> 5lpha is a set of "lpha commands, each ta'ing the form6
Met ?7or8space@ '?target list@ ) < ?IFF@
<workspace> is an identifier or label that names a temporary working relation to hold the
result of the command (similar to the named working relation in the giving clause of
relational algebrasee section 5.3). The attributes of this relation are specified by <target
list> which is a list of tuple variable projections as in the previous chapter. <WFF> is of
course a well-formed formulae of relational calculus that must be satisfied before the
values in <target list> are extracted as a result tuple.
5s an e(ample, suppose the variable : ranges over
the :roduct relation as shown in &igure D $4. !hen
the following construction is a valid 5lpha
command6
Eet W*:.:name+ 6 :.:rice 4KKK
:roduct
:O :name :rice
4 -:, 4KKK
0 LD, 40KK
W
:.:name
-:,
Figure ,.1 3(ample relations
Introduction to Databases & Relational DM
The reader can see that except for the keyword Get and the naming of the result relation
(W in this example), the basic form is identical to the one used in the previous chapter,
which would simply be written
*:.:name+ 6 :.:rice 4KKK
The semantics of the Alpha command is also exactly the same, except that the result is a
named relation, as shown in the illustration.
3.1.2 Range Statement
In our e(position of relational calculus, tuple variables used in ueries were introduced
informally. We did this in the above e(ample too *vi). Fsuppose the variable : GJ+. !his
will not do, of course, if we wish the language to be interpreted by a computer. !hus,
tuple variables must be introduced and associated with the relations over which they
range using formal constructions. In D%> 5lpha, this is achieved by the range
declaration statement, which ta'es the basic form6
Range ?relation na#e@ ?9ariable na#e@
where <relation name> must name an existing relation and <variable name> introduces a
unique variable identifer. The variable <variable name> is taken to range over <relation
name> upon encountering such a declaration. The above example can now be written
more completely and formally as:
Range :roduct :7
Eet W*:.:name+ 6 :.:rice 4KKK
D%> 5lpha statements and commands, as the above construction shows, are separated by
semi$colons *"7#+.
D%> 5lpha also differs from relational calculus in the way it uantifies variables. &irst,
for a practical language, mathematical symbols li'e "# and "# need to be replaced by
symbols easier to 'ey in. D%> 5lpha uses the symbols "&//# and "-CM2# to stand for
"# and "# respectively. %econd, rather than using the uantifiers in the RW&&S
e(pression, they are introduced in the range declarations. !hus, the full synta( of range
declarations is6
Range ?relation na#e@ ?9ariable na#e@ N -CM2 L &// O
2ote that the use of uantifiers in the declaration is optional. If omitted, the variable is
ta'en to be a free variable whenever it occurs in an 5pha command.
>et us loo' at a number of e(amples.
5ssume the -ustomer relation as in . !his uery will only need a single free variable to
range over customer. !he 5lpha construction reuired is6
Range *ustomer 45
6et 7"8 4.*name9 4.*phone :; 4.*city < #ondon
also highlights the tuples in -ustomer satisfying the W&& of the command and the
associated result relation W5.
Query 1: Get the names and phone numbers of customers who live in London
Introduction to Databases & Relational DM
Figure ,.2 @uery 4
&or this uery, we will need to access the !ransaction relation, with records of which
customer bought which product, and the :roduct relation, which holds the names of
products. 5ssume these relations are as given in &igure D $1.
Figure ,.3 @uery 0
!he ob.ect of our uery is the :name attribute of :roduct, thus the tuple variable for
:roduct must necessarily be a free variable6
Range :roduct 57
The condition of the query requires us to look in the Transaction relation for a record of
purchase by Customer #2 - as long as we can find one such record, the associated product
is one that we are interested in. This is a clear case of existential quantification, and the
variable introduced to range over Transaction is therefore given by:
Range !ransaction B %/M37
The Alpha command for the query can now be written:
Eet W * 5.:name +6 5.:O H B.:O 5nd B.-O H 0
The associated tuples satisfying the WFF above are highlighted in the figure 8-3 (the
result relation is not shown).
@uery06 FEet the names of products bought by -ustomer O0J
@uery 16 FEet the names and phone numbers of customers in >ondon who bought the
product LD,J
Introduction to Databases & Relational DM
!his is a more comple( e(ample that will involve three relations, as shown in &igure D
$?. !he target data items are in the -ustomer relation *names and phone numbers+. %o the
tuple variable assigned to it must be free6
Range -ustomer X7
:art of the condition specified is that the customer must live in >ondon *ie. X.-city H
>ondon+, but the rest of the condition *F G who bought the product LD,J+ can only be
ascertained from the !ransaction relation *record of purchase by some customer+ and
:roduct relation *name of product+. In both these cases, we are .ust interested in finding
one tuple from each, ie. that there e(ists a tuple from each relation that satisfies the uery
condition. !hus, the variables introduced for them are given by6
Range !ransaction W %/M37
Range :roduct \ %/M37
!he 5lpha command can now be written as6
Eet W* X.-name, X.-phone +6
X.-city H >ondon 5nd X.-O H W.-O 5nd W.:O H \.:O 5nd \.:name H LD,
&igure D $? highlights one instantiation of each variable that satisfies the above W&&.
Figure ,.! @uery 1
5s with the previous e(ample, this one also reuires access to three relations as shown in
&igure D $;. 5 customer will satisfy this uery if for every product there is a transaction
recording that he8she purchased it. !his time, therefore, we have a case for universal
uantification $ FGall types of the company#s productsJ $ which will reuire that the
variable ranging over :roduct be universally uantified. !he variable for !ransaction,
onthe other hand, is e(istentially uantified *FGthere is a transactionGJ+. !he full 5lpha
construction therefore is6
Range -ustomer -7
Range :roduct : 5>>7
Range !ransaction ! %/M37
Eet W *-.-name+6 :.:O H !.:O 5nd !.-O H -.-O
&igure D $; highlights tuples from the various relations that satisfy this construction.
@uery ?6 FEet the names of customers who bought all types of the company#s
productsJ
Introduction to Databases & Relational DM
2ote that the order of uantified variable declarations is important. !he order above is
euivalent to F: !J. If variable ! was declared before :, it would be euivalent to F!
:J which would mean something uite differentB *see section C.1+
Figure ,.% @uery ?
This query involves only one relation: the Product relation (assume the Product relation
as in the above examples). Now, the most expensive product is that for which every
product has a price less than or equal to it. Or, in relational calculus, X is such a product
provided that Y X.Price Y.Price. Thus two variables are required, both ranging
over Product but one of them is universally quantified:
Range :roduct X7
Range :roduct W 5>>7
Eet W*X.:name+6 X.:rice W.:rice
It is perhaps interesting to note in passing that the choice by D%> 5lpha designers to
uantify variables at the point of declaration rather than at the point of use ma'es 5lpha
commands a little harder to readAit is not clear which variables are uantified .ust by
loo'ing at the 5lpha command. /ne must search for the variable declaration to see how,
if at all, it is uantified.
3.1.& "dditional -acilities
D%> 5lpha provides additional facilities that operate on the results of its commands.
While these are outside the realm of relational calculus, they are useful and practical
functions that enhances the utility of the language. !hese facilities fall loosely under two
headings6 0ualifiers, and library functions.
!he ualifiers affect the order of presentation of tuples in the result relation, based on the
ordering of values of a specified attribute in either an ascending or descending order, ie.
they may be thought of as sort functions over a designated attribute. 2ote that in
relational theory the order of tuples in a relation is irrelevant since a relation is a set of
values. %o the ualifiers affects only the presentation of a relation.
%yntactically, the ualifier is appended to the W&& and ta'es the following form6
P U( L DCI" Q ?attribute na#e@
@uery ;6 FEet the name of the most e(pensive productJ
Introduction to Databases & Relational DM
5s an e(ample, consider the reuirement for the names of products bought by -ustomer
O4 in descending order of their prices. !he 5lpha construction for this would be6
Range :roduct X7
Range !ransaction W %/M37
Eet ,W5* X.:name, X.:rice +6 *X.:O H W.:O 5nd W.-O H 0+ D/W2 X.:rice
&igure D $< shows the relations highlighting tuples satisfying the W&&. It also shows the
result relation ,W5 which can be seen to be ordered in descending order of price.
Figure ,.$ Result of ualified command
!he library functions, on the other hand, derives *computes+ new values from the data
items e(tracted from the database. 5nother way to put this is that the result relation of the
basic 5lpha command is further transformed by library functions to yield the final result.
Why would we want to do thisT -onsider for e(ample that we have a simple set of
integers, say M4,0,1N. !here are a variety of values we may wish to derive from it, such as
the number of items, or cardinality, of the set *library function -/,2!, ie.
-/,2!M4,0,1NH1+
the sum of the values in the set *library function !/!5>, ie. !/!5> M4,0,1NH<+
the minimum, or ma(imum, value in the set *library function MI2 and M5X, ie.
MI2 M4,0,1N H 4, or M5X M4,0,1N H 1+
the average of values in the set *library function 5L3R5E3, ie.
5L3R5E3 M4,0,1N H 0+
3(tending this idea to relations, and in particular the 5lpha command, library functions
are applied to attributes in the target list, ta'ing the form6
?librar1 function@'?attribute na#e@)
As an example, consider the need to find the number of customers who bought the
product VDU. This is quite a practical requirement to help management track how well
some products are doing on the market. Pure relational calculus, however, has no facility
to do this. But using the library function COUNT in DSL Alpha, we can write the
following:
Range !ransaction !7
Range :roduct : %/M37
Eet 555* -/,2!*!.-O+ +6 !.:O H :.:O 5nd :.:name H LD,
&igure D $C highlights the tuples satisfying the W&& and shows the result relation.
Introduction to Databases & Relational DM
Figure ,.* ,sing library function *-/,2!+
5s another e(ample, suppose we wanted to 'now how many products were bought by the
customer -odd. !he data items to answer this uestion are in the uantity field *@nt+ of
the !ransaction relation, but pure relational calculus can only retrieve the set of uantity
values associated with purchases by -odd. What we need is the sum of these values. !he
library function !/!5> of D%> 5lpha allows us to do this6
Range !ransaction !7 Range -ustomer - %/M37
Eet BBB* !/!5>* !.@nt + +6 !.-O H -.-O 5nd -.-name H -odd
&igure D $D summarises the e(ecution of this 5lpha command.
Figure ,., ,sing library function *!/!5>+
As a final remark, we note that we have only sampled a few library functions. It is not
our aim to cover DSL Alpha comprehensively, but only to illustrate real DSLs based on
the relational calculus, and to look at added features or facilities needed to turn them into
practical languages.
8.2 Relational Calculus /it% Domain 4ariables
3.2.1 Domain 2ariables
As noted in the introduction, there is an alternative to using tuple variables as the basis
for a relational calculus, and that is to use domain variables. Recall that a domain (see
section 2.2) in the relational model refers to the current set of values of a given kind
under an attribute name and is defined over all relations in the database, ie. an attribute
name denotes the same domain in whatever relation it occurs. A domain variable ranges
over a designated domain, ie. it can be instantiated to, or hold, any value from that
domain.
Introduction to Databases & Relational DM
&or e(ample, consider the domain -name found in the -ustomer relation. !his domain
has three distinct values as shown in &igure D $I. If we now introduced a variable, "-n#,
and designate it to range over -name, then -n can be instantiated to any of these values
*the illustration shows it holding the value "Martin#+.
5s with tuple variables6
a domain variable can hold only one value at any time
domain variables can be introduced for any domain in the database
more than one domain variable may be used to range over the same domain
2ote also that the value of a domain variable is an atomic value, ie. it does not comprise
component values as was the case with tuple variables. !hus there is no need for any
syntactic mechanism li'e the "dot notation# to denote component atomic values of tuple
variables. It also means that in constructing simple comparison e(pressions, domain
variables appear directly without any embellishments, eg. 5 S 4KKK, B H >ondon, -
0KKK, D :aris, etc. *assuming of course that the variables 5, B, - and D have been
designated to range over appropriate domains+.
In a relational calculus with domain variables we can write predicates of the form6
?relation na#e@' 3
1
; R ; 3
n
)
where
Rrelation nameS is the name of a relation currently defined in the database schema,
and
each (
i
is a domain variable ranging over a domain from the intension of Rrelation
nameS
!hus, suppose we have the situation as in &igure D $4K. It is then syntactically valid to
write6
-ustomer* 5, B +
as "-ustomer# is a valid relation name, and the variables "5# and "B# range over domains
that are in the intension of the -ustomer relation.
Figure ,.1 Lariables ranging over domains of a relation
!he meaning of such a predication can be stated as follows6
Figure ,.+ 5 Domain Lariable
Introduction to Databases & Relational DM
a predicate FRrelation nameS* (
4
, G , (
n
+J is true for some given
instantiation of each variable (
i
if and only if there e(ists a tuple in Rrelation
nameS that contains corresponding values of the variables (
4
, G , (
n
!hus, for e(ample, -ustomer*5,B+ is true when 5H-odd and BH>ondon, since the first
tuple of -ustomer has the corresponding values. In contrast, -ustomer*5,B+ is false when
5H-odd and BH:aris, as no tuple in -ustomer have these values. In fact, the values that
ma'e -ustomer*5,B+ true are6
-name -city
-odd >ondon
Martin :aris
Deen >ondon
that is, in relational algebra terms, a pro.ection of Rrelation nameS over the domains that
variables (
4
, G , (
n
range over.
5 uery in relational calculus with domain variables ta'e the form6
'?target list@) < '?logical e3pression@)
where
Rtarget listS is a comma$separated list of domain variable names, and
Rlogical e(pressionS is a truth$valued e(pression involving predicates and
comparisons over domain variables and constants *the rules for constructing well$
formed Rlogical e(pressionsS will be detailed later+
!he result of such a uery is a set of instantiations of variables in Rtarget listS that ma'e
Rlogical e(pressionS true.
&or e(ample, consider the database state in &igure D $44 and the uery
*(,y+ 6 *:roduct*(,y+ & y S4KKK+
which can be paraphrased as Fget product names and their prices for those products
costing more than 4KKKJ.
Figure ,.11 Database state for the uery F*(,y+6 *:roduct*(,y+ & y S 4KKK+J
!he only pair of *(,y+ instantiation satisfying logical e(pression in this case is
*LD,,40KK+, ie. the result of the uery is
( y
LD, 40KK
Domain variables, li'e tuple variables, may also be uantified with either the universal or
e(istential uantifier. 3(pressions involving uantified domain variables are interpreted
in the same way as for uantified tuple variables *see C.1+.
Introduction to Databases & Relational DM
-onsider the uery6 Fget the names of products bought by customer O4J. !he reuired
data items are in two relations6 :roduct and !ransaction, as follows.
:roduct !ransaction
:O :name :rice -O :O Date @nt
4 -:, 4KKK 4 4 04.K4 0K
0 LD, 40KK 4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
We can paraphrase the uery to introduce variables and ma'e it easier to formulate the
correct formal uery6
x is such a product name if there is a product number y for x and there is a customer
number ' that purchases y and ' is e(ual to )
!he phrase F( is such a product nameJ ma'es it clear that it is a variable for the ":name#
domain, and as this is our target data value, ( must be a free variable. !he phrase Fthere is
a product number y for (J clarifies two points6 *4+ that y is a variable for the :O domain,
and *0+ that it#s role is e(istential. %imilarly, the phrase Fthere is a customer number ) that
purchases yJ states that *4+ ) is a variable for the domain -O, and *0+ it#s role is
e(istential. !his can now be uite easily rewritten as the formal uery *assuming the
variables (,y and ) range over :name, :O and -O respectively+6
*(+ 6 y ) *:roduct*(,y+ & !ransaction*y,)+ & ) H 4+
where the sube(pressions
:roduct*(,y+ captures the condition Fthere is a product number y for (J
!ransaction*y,)+ captures the condition Fthere is a customer number ) that purchases
yJ, and
) H 4 clearly reuires that the customer number is 4
!he reader should be able to wor' out the solution to the uery as an e(ercise.
5s a final e(ample, consider the uery6 Fget the names of customers who bought all types
of the company#s productsJ. !he reader can perform an analysis of this uery as was
done above to confirm that the relevant database state is as shown in &igure D $40 and
that the correct formal uery is6
*(+ 6 y ) *-ustomer*(,)+ & !ransaction*y,)++
Figure ,.12 Database state for F*(+ 6 y ) *-ustomer*(,)+ & !ransaction*y,)++J
Introduction to Databases & Relational DM
!his e(ample illustrates a universally uantified domain variable y ranging over :O. &or
this uery, this means that the F!ransaction*y,)+Jpart of the logical e(pression must
evaluate to true for every possible instantiation of y given a particular instantiation of ).
!hus, when ( H -odd and ) H 4, both !ransaction*4,4+ and !ransaction*0,4+ must
evaluate to true. !hey do in this case and -odd will therefore be part of the result set.
But, when ( H Martin and ) H 0, !ransaction*4,0+ is true but !ransaction*0,0+ is notB %o
Martin is not part of the result set. -ontinuing in this fashion for every possible
instantiation of ( will eventually yield the full result.
3.2.2 7ell--ormed -ormula
We have not formally defined above what constitutes valid Rlogical e(pressionSs. We do
so here, but for the sa'e of a uniform terminology, we will use the phrase well$formed
formula *W&&+ instead of Rlogical e(pressionS .ust as we did for relational calculus with
tuple variables. !hus a formal uery in relational calculus with domain variables ta'e the
form6
'?target list@) < 'IFF)
where Rtarget listS is a comma$separated list of free variable names, and a W&& is
defined by the following rules6
4. (&A)*' is a W&& if ( is a relation name and A)* is a list of free variables
0. A # is a W&& if A is a variable, # is a constant or a variable, and
MH, , R, S, , N
1. F1 & F2 and F1 L F2 are W&&s if F1 and F2 are W&&s
?. &F' is a W&& if F is a W&&
;. x &F&x' and x &F&x'' if F&x' is a W&& with the variable x occurring free in it
5s usual, the operator precedence for the "&K and "LK; operators follow the standard
precedence rules, ie. "&K binds stronger than "LK. !hus,
"&4 & &0 Z &1# "* &4 & &0 + Z &1#
3(plicit use of parenthesis, as in rule *?+ above, is reuired to override this default
precedence. !hus if the intention is for the "L# operator to bind stronger in the above
e(pression, it has to be written as
&4 & *&0 Z &1+
Introduction to Databases & Relational DM
+ Data -ub./anguage -0/
9.1 Introduction
In this chapter, we shall learn more about the essentials of the relational model#s standard
language that will allow us to manipulate the data stored in the databases. !his language
is powerful yet fle(ible, thus ma'ing it popular. It is in fact one of the factors that has led
to the dominance of the relational model in the database mar'et today.
&ollowing -odd#s papers on the relational model and relational algebra and calculus
languages, research communities were prompted to wor' on the realisation of these
concepts. %everal implemented versions of the relational languages were developed,
amongst the most noted were %@> *%tructured @uery >anguage+, @B3 *@uery$By$
3(ample+ and @,3> *@uery >anguage+. Pere, we shall loo' into %@> with greater detail
as it the most widely used relational language today. /ne often hears of remar's that say,
FIt#s not relational if it doesn#t use %@>J. It is currently being standardised now as a
standard language for the Relational Data Model.
%@> had its origins bac' in 4IC? from IBM#s %ystem R research pro.ect as -tructured
2nglish 0uery /anguage *or %3@ue>+ for use on the IBM L%80 mainframes. It was
developed by -hamberlain et al. !he name was subseuently changed to -tructured
0uery /anguage or %@>. It is pronounced FseuelJ by some and %$@$> by others. IBM#s
products such as %@>8D% and the popular DB0 emerged from this. %@> is based on the
Relational -alculus with tuple variables. In 4ID<, the 5merican 2ational %tandards
Institute *52%I+ adopted %@> standards, contributing to its widespread adoption. Whilst
many commercial %@> products e(ist with various FdialectsJ, the basic command set and
structure remain fairly standard.
5lthough %@> is called a uery language, it is capable of more than .ust getting data off
relations in the databases. It can also handle data updates and even data definitionsadd
new data, change e(isting data, delete or create new structures. !hus %@> is capable of6
1. Data =uery
!he contents of the database are accessed via a set of commands whereby useful
information is returned to the end user
2. Data Maintenance
!he data within the relations can be created, corrected, deleted and modified
&. Data Definition
!he structure of the database and its relations can be defined and created
The end user is given an interface, as we have seen in Chapter 3, to interact with the
database via menus, query operations, report generators, etc. Behind this lies the SQL
engine that performs the more difficult tasks of creating relation structures, maintaining
the systems catalogues and data dictionary, etc.
%@> belongs to the category of the so$called &ourth$Eeneration >anguage *?E>+ because
of its power, conciseness and low$level of procedurality. 5s a non$procedural language it
Introduction to Databases & Relational DM
allows the user to specify 1hat must be done without detailing ho1 it must be done. !he
user#s %@> reuest specification is then translated by the RDBM% into the technical
details needed to get the reuired data. 5s a result, the relational database is said to
reuire less programming than any other database or file system environment. !his
ma'es %@> relatively easy to learn.
9.2 -!erations
+.2.1 #apping: The ,-. ,ele!t ,tatement
!he basic operation in %@> is called mapping, which transforms values from a database
to user reuirements. !his operation is syntactically represented by the following bloc'6
Figure +.1. %@> %elect
!his uncomplicated structure can used to construct ueries ranging from very simple
inuiries to more comple( ones by essentially defining the conditions of the predicate. It
thus provides immense fle(ibility.
!he %@> %elect command combines the Relational 5lgebra operators %elect, :ro.ect,
9oin and the -artesian :roduct. Because a single declarative$style command can be used
to retrieve virtually any stored data, it is also regarded by many to be an implementation
of the Relational -alculus. If we need to e(tract information from only one relation of the
database, we may encounter similarities and a few differences between the Relational
-alculus$based D%> 5lpha and %@>. In this case we may substitute 'ey words of D%>
5lpha for matching 'ey words of %@> as follows6
Introduction to Databases & Relational DM
Figure +.2. %imilarities of D%> 5lpha and %@> %elect
Let us refer back to the earlier example with the Customer relation.
%uppose we wish to FEet the names and phone numbers of customers living in >ondonJ.
With D%> 5lpha, we would specify this uery as6
Range -ustomer X7
Eet *X.-name, X.-phone+6 X.-cityH>ondon7
whereas in %@> its euivalent would be6
%elect -name, :hone
&rom -ustomer
Where -city H ">ondon#
In either case, the result would be the retrieval of the following two tuples6
!his simple uery highlights the three most used %@> clauses6
4. !he %3>3-! clause
!his effectively gets the columns that we are interested in getting from the relation. We
may be interested in a single column, thus we may for e(ample write F%elect
-phoneJ if we only wish to list .ust the telephone numbers. We may also however be
interested in listing the customer#s name, city and telephone number7 in which case,
we write F%elect -name, -city, -phoneJ.
0. !he &R/M clause
We need to identify the relations that our uery refers to and this is done via the &rom
clause. !he columns that we have chosen from the %elect clause must be found in the
relation names of the &rom clause as in F&rom -ustomerJ.
1. !he WP3R3 clause
Introduction to Databases & Relational DM
!his holds the conditions that allows us to restrict the tuples of the relation*s+. In the
e(ample FWhere -cityH>ondonJ asserts that we wish to select only the tuples which
contain the city name that is eual to the value ">ondon#.
!he system first processes the &rom clause *and all tuples of the chosen relation*s+ are
placed in the processing wor' area+, followed by the Where clause *which chooses, one
by one, the tuples that satisfy the clause conditions and eliminating those which do not+,
and finally the %elect clause *which ta'es the resultant tuples and displays only the values
under the %elect clause column names+.
+.2.2 /tpt Restri!tion
Most queries do not need every tuple in the relation but rather only a subset of the tuples.
As described previously in section 5.3, the following mathematical operators can be used
in the predicate to restrict the output:
Symbol Meaning
H 3ual to
R >ess than
S Ereater than
RH >ess than or eual to
SH Ereater than or eual to
RS 2ot eual to
5dditionally, the logical operators 52D, /R and 2/! may be used to place further
restrictions. !hese logical operators, along with parentheses, may be combined to
produce uite comple( conditional e(pressions.
%uppose we need to retrieve the tuples from the !ransaction relation such that the
following conditions apply6
4. !he transaction date is before 0< 9an and the uantity is at least 0;
0. /r, the customer number is 0
!he %@> statement that could get the desired result would be6
%elect -O, Date, @nt &rom !ransaction
Where *Date R #04.K4# 5nd @nt SH 0;+ /r -O H 0
+.2.3 Re!rsi0e #apping: ,123eries
!he main idea of %@> is the recursive usage of the mapping operation instead of using
the e(istential and universal uantifiers. %o far in our e(amples, we always 'now the
values that we want to put in our predicate. &or e(ample,
Where -city H ">ondon#
Where Date R #0<.K4# 5nd @nt S 0;
Introduction to Databases & Relational DM
%uppose we now wish to FEet the personal numbers of customers who bought the
product -:,J. We could start off by writing the %@> statement6
%elect -O
&rom !ransaction
Where :OH T
We cannot of course write FWhere :OH-:,J because -:, is a part name not its number.
Powever as we may recall, part number :O is stored in the !ransaction relation, but the
part name is in fact in another relation, the :roduct relation. !hus one needs to first of all
get the part name from :roduct via another %@> statement6
%elect :O
&rom :roduct
Where :name H "-:,#
Paving obtained the euivalent :O, the value is then used to complete the earlier uery.
!he way this is to be e(pressed is by writing the whole mapping operator in the right
hand side of comparison e(pressions of another mapping operator. !his effectively means
the use of an inner bloc' *sub$uery+ within the outer bloc' *main uery+ as depicted in
the figure below.
Figure +.3. @uery nesting
!he uery in the outer bloc' thus e(ecutes by using the value set generated earlier by the
sub$uery of the inner bloc'.
It is important to note that because the sub$uery replaces the value in the predicate of the
main uery, the value retrieved from the sub$uery must be of the same domain as the
value in the main predicate.
+.2.4 #ltiple Nesting
It is also possible that may be two or more inner bloc's within an outer %@> bloc'. &or
instance, we ne(t wish to6 FEet a date when customer -odd bought the product -:,J.
!he %@> statement we would start out with would probably loo' li'e this6
%elect Date
&rom !ransaction
Where :OHT
5nd -OHT
Introduction to Databases & Relational DM
5s in the earlier uery, the part number :O can be obtained via the part name :name in
the relation :roduct. !he customer name, -odd, however has to have its euivalent
customer number which has to be obtained from -O of the relation -ustomer. !hus to
complete the above uery, one would have to wor' two sub$ueries first as follows6
Figure +.!. Interpretation of sub$ueries
2ote that the original %@> notation utilises brac'ets or parentheses to determine inner
%@> bloc's as6
%elect Date
&rom !ransaction
Where :O H
* %elect :O
&rom :roduct
Where :name H -:,+
5nd -O H
* %elect -O
&rom -ustomer
Where -name H -odd+
%imilarly, an inner bloc' many contain further inner %@> bloc's. &or instance, if we
wish to FEet the names of customers who bought more than 0K pieces of the product
-:,J we need to specify6
%elect -name
&rom -ustomer
Where -O H
* %elect -O
&rom !ransaction
Where :O H
* %elect :O
Introduction to Databases & Relational DM
&rom :roduct
Where :name H -:, +
5nd @nt S 0K +
!hus we may visualise the nesting of sub$ueries as6
%elect G
&rom G.
Where Rattribute4S RoperatorS
* %elect Rattribute4S
&rom G
Where Rattribute0S RoperatorS
* %elect Rattribute0S
&rom G
Where Rattribute1S RoperatorS
* %elect Rattribute1S
&rom G
Where G + + +
The number of inner blocks or levels of nesting may, however, be limited by the storage
available in the workspace of the DBMS in use.
+.2.4 #ltiple Data 5tems
Standard comparison operators ( =, >, <, >=, <=, <> ) operate on two data items, as in x =
y or p >= 4. They cannot be applied to multiple data items. However, a particular SQL
block normally returns a set of values (i.e. not a single value which can be used in a
comparison).
&or instance6 FEet the product numbers of items which were bought by customers from
>ondonJ.
%elect :O
&rom !ransaction
Where -O H
* %elect -O
&rom -ustomer
Where -city H ">ondon# +
Eiven the sample database of the earlier e(amples, the result of the inner %@> bloc'
would yield two values for -O, which are 4 and 1, *or more precisely, the set M4, 1 N +.
Introduction to Databases & Relational DM
!he outer %@> bloc', in testing -O H M4, 1 N would effectively test if M4,0 N H M4, 1 N or
not. !hus the above %@> statement is not correctB
!o overcome the error caused by the testing of multiple values returned by the sub$uery,
%@> allows the use of comparison e(pressions in the form6
?attribute na#e@ ?set of 9alues@
!his logical e(pression is true if the current value of an attribute is included *or not
included, respectively in the set of values.
&or instance,
%mith In M -odd, %mith, Deen N is !rue,
and
%mith 2ot In M-odd, %mith, Deen N is &alse.
!hus in re$writing the earlier erroneous statement, we now replace the eual operator *H+
with the set membership operator "In# as follows6
%elect :O
&rom !ransaction
Where -O In
* %elect -O
&rom -ustomer
Where -city H ">ondon# +
This time it would yield the outer SQL block would effectively test C# in {1, 3}. The
outer SQL block would now only retrieve the P#s that are only in the set {1, 3 } i.e.
testing {1, 2 } In {1, 3 } This would result in returning P# 1 only, which is the expected
right answer.
Illustrating with another e(ample, consider the uery to F&ind the names of customers
who bought the product -:,J. Its corresponding %@> statement would thus be6
%elect -name &rom -ustomer
Where -O In
* %elect -O &rom !ransaction
Where :O In
* %elect :O &rom :roduct
In
"ot In
Introduction to Databases & Relational DM
Where :name H "-:,# + +
3(ecuting this step$by$step6
*4+ &rom the inner$most bloc',
%elect :O &rom :roduct
Where :name H -:,
would first yield :O 4 from :roduct, i.e. M4 N
*0+ !he ne(t bloc', would thus be
%elect -O &rom !ransaction
Where :O In M 4 N
and this would yield -O s 4 and 0 *as they bought :O 4+, i.e. M4, 0 N
*1+ 5nd finally, the outer$most bloc' would e(ecute
%elect -name &rom -ustomer
Where -O In M4, 0 N
would result in the names of customers 4 and 0, which are -odd and Martin
respectively.
We next go on to a slightly more complex example. Suppose we now wish to Get a
name of such customers who bought the product CPU but did not buy the product VDU.
In %@>, the statement would be6
%elect -name &rom -ustomer
Where -O In
* %elect -O &rom !ransaction Where :O In
* %elect :O &rom :roduct Where :name H "-:,# +
5nd -O 2ot In
* %elect -O &rom !ransaction Where :O In
* %elect :O &rom :roduct Where :name H "LD,# + +
Why dont you try to figure out, step-by-step, the sequence of results from the inner-most
blocks up to the final result of execution of the outer-most block?
2ote that the comparison operators
Rattribute nameS Rset of valuesS
In
"ot In
Introduction to Databases & Relational DM
are used instead of e(istential ualifiers *+. It is an implementation of multiple logical
/R conditions which is more efficiently handled.
%imilarly, comparison e(pressions
Rattribute nameS H &// Rset of valuesS
are used instead of universal ualifiers *+.
!his logical e(pression is valid *i.e. produces the logical value F!rueJ+ if the collection of
attribute name values in the database includes the given set of values.
&or instance, FEet personal numbers of those customers who bought all 'inds of
company#s productsJ, would have the following %@> statement for it6
%elect -O &rom !ransaction
Where :O H
5>> * %elect :O
&rom :roduct +
!he inner bloc' would yield the set M4, 0 Nof :O values. 3(ecuting the outer bloc' would
effectively test if the 1 customers in the !ransaction relation, i.e. -O 4, 0 and 1 would
have :O in M4, 0 N
!his test is as follows6
C: Transaction 'C:; 1) Transaction 'C:; 1) &ll (:
4 !rue !rue True S
0 !rue &alse &alse
1 &alse !rue &alse
The only customer that has P# equal to all P# as found in Product would be C# 1.
9.3 &urt%er Retrie.al &acilities
+.3.1 6oining Relations
In the e(amples that have been used so far, our retrievals have been of values ta'en from
one relation, as in F%elect -O &rom !ransactionJ. Powever, often we have to retrieve
information from two or more relations simultaneously. In other words, a number of
relations names may be used in the &rom clause of %@>. &or e(ample, if we wish to
access the relations -ustomer and !ransaction, we may write the %@> statement as
follows6
%elect G
&rom -ustomer, !ransaction
WhereG
Introduction to Databases & Relational DM
!he target list in the %elect clause may contain the attributes form various relations, as in
%elect -name, Date, @nt
&rom -ustomer, !ransaction
WhereG
where, if you recall, -name is an attribute of -ustomer and Date and @nt are attributes of
!ransaction.
%imilarly, comparison e(pressions in the Where clause may include attribute names from
various relations,
%elect -name, Date, @nt
&rom -ustomer, !ransaction
Where *-ustomer.-O H !ransaction.-O+ 5nd :O H 4
2ote that a so$called ualification techniue which is used to refer to attributes of the
same name belonging to different relations. -ustomer.-O refers to the -O of the
-ustomer relation whereas !ransaction.-O refers to the -O of the !ransaction relation.
Figure +.%. @ualifying attributes
!hus the uery FEet customer names, dates and number of pieces for transactions of the
product number 4J will result in6
It must be noted that the two *or more+ relations that must be combined on at least one
common lin'ing attribute *as in the Relational 5lgebra#s 9/I2 operator+. 5s in the above
e(ample, the lin' is established on -O as in the clause
Where -ustomer.-O H !ransaction.-O
+.3.2 Alias
In order to avoid a possible ambiguity in a uery definition %@> also allows to use an
alias for the relation name in the &rom clause. !he alias is an alternate name that is used
to identify the source relation and the attribute names may include an alias as a prefi(6
RaliasS.Rattribute nameS
%uppose we use ! and - as the aliases for the !ransaction and -ustomer relations
respectively. We may use these to label the attributes as in6
Introduction to Databases & Relational DM
%elect ... &rom -ustomer -, !ransaction !
Where -.-O H !.-O 5nd G
5n alias is especially useful when we wish to .oin a relation to itself because of grouping
as in the uery to F&ind the names and phone numbers of customers living in the same
city as the customer -oddJ6
%elect -0.-name, -0.-phone
&rom -ustomer -4, -ustomer -0
Where -0.-city H -4.-city
5nd -4.-name H "-odd#
!he resulting interpretation of the %@> statement is depicted in &igure I$< below6
Figure +.$. ,sing an alias
9.4 Librar# &unctions and ,rit%metic :;!ressions
The SQL Select clause (target list) may contain also so-called SQL library functions that
will perform various arithmetic summaries such as to find the smallest value or to sum up
the values in a specified column. The attribute name for such library functions must be
derived from the relations specified in the From clause as follows:
Figure +.*. ,sing a library function with %@> %elect
!he common %@> functions available are6
-unction name .as$
CCU"T !o count the number of tuples containing a specified attribute value
Introduction to Databases & Relational DM
-UM !o sum up the values of an attribute
&TM !o find the arithmetic mean *average value+ of an attribute
M&F !o find the ma(imum value of an attribute
MI" !o find the minimum value of an attribute
23a#ples
*4+ Eet the average uantity of LD,s per transaction
%elect 5LE *@nt+ &rom !ransaction
Where :O H
* %elect :O &rom :roduct
Where :name H "LD,# +
Wor'ing first with the inner %elect clause, we get a :O of 0 from the :roduct relation as
the part number for the product named LD,. !hus the uery is now reduced to
%elect 5LE*@nt+ &rom !ransaction
Where :O H 0
5ccessing the !ransaction relation now would yield the following two tuples
where the average uantity value is easily computed as *1KU0K+80 which is 0;.
*0+ Eet the total uantity of LD,s transacted would similarly be e(pressed as
%elect %,M *@nt+ &rom !ransaction
Where :O H
* %elect :O &rom :roduct
Where :name H "LD,# +
where the total value is easily computed as *1K U 0K+ giving ;K.
Introduction to Databases & Relational DM
5n asteris' *V+ in the %elect clause is interpreted as Fall attributes names of the relations
specified in the &rom clauseJ.
%elect V &rom !ransaction
is euivalent to
%elect -O, :O, Date, @nt &rom !ransaction
Thus a query to Get all available information on customers who bought the product
VDU can be written as:
%elect V &rom -ustomer
Where -O In
* %elect -O &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :name H "LD,# + +
The interpretation of this query would be worked out as shown in the following sequence
of accesses, starting from the access of the product relation to the Transaction and finally
to the Customer relation:
Figure +.,. Wor'ing through 1 nested %elects
!he outcome would be the following relation6
(3) Get a total number of such customers who bought the product VDU, would be written
as:
%elect -/,2! *V+ &rom -ustomer
Where -O In
* %elect -O &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :name H "LD,# + +
and this would yield a value of 0 for -ount *V+.
Arithmetic expressions are also permitted in SQL, and the possible operations include:
Introduction to Databases & Relational DM
addition U
subtraction $
multiplication V
division 8
Expressions may be written in the Select clause as:
%elect -O, :O, @ntV:rice &rom !ransaction, :roduct
Where !ransaction.:O H :roduct.:O
which is used to FEet a total price for each transactionJ resulting in6
Arithmetic expressions, likewise, can also be used as parameters of SQL library
functions. For example, Get a total price of all VDUs sold to customers may be written
as the following SQL statement:
%elect %,M *@ntV:rice+ &rom !ransaction, :roduct
Where !ransaction.:O H :roduct.:O
5nd :roduct.:name H "LD,#
Wor' this out. Wou should get an answer of <KKKK.
!he attribute names for both library functions and arithmetic e(pressions must be derived
from the relations specified in the &rom clause.
!hus, it should be noted that the following uery definition is 2/! correct.
%elect %,M *@ntV:rice+ &rom !ransaction
Where !ransaction.:O H :roduct.:O
5nd :roduct.:name H "LD,#
5dditionally, %@> also permits the use of library functions not only in the %elect clause
but also in the Where clause as a part of comparison e(pressions.
!he uery to FEet all available information on such customers who bought the most
e(pensive productJ would be6
%elect V &rom -ustomer
Introduction to Databases & Relational DM
Where -O In
* %elect -O &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :rice H M5X *:rice+ + +
9.5 ,dditional &acilities
+.4.1 /rdering
!he result of a mapping operation may be sorted in ascending or descending order of the
selected attribute value.
!he form of the /rder clause is
Crder B1 ?attribute na#e@ Up L Do7n
23a#ples
*4+ Eet a list of all transactions of the product -:, sorted in descending order of the
attribute @nt
%elect V &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :name H "-:,# +
/rder By @nt Down
!he result would be
If instead, the last clause had been Order By Qnt Up, the result would be listed in
ascending order:
!he /rder By clause is only a logical sorting process, the actual contents of the original
relations are not affected.
Multi$level ordered seuence may also be performed as in6
%elect V &rom !ransaction
Introduction to Databases & Relational DM
/rder By -O ,p,
@nt Down
+.3.2 7andling Dpli!ates
!he result of an %@> mapping operation is however not perceived as a relation, i.e. it
may include duplicate tuples. -onsider for e(ample6
%elect -O &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :rice SH 4KKK +
!he result is actually
Imagine if we have thousands of transactions and yet a handful of customers. The result
would yield hundreds (even thousands) of duplicates. Fortunately, duplicate tuples can be
removed by using the Unique option in the Select clause of the operation as follows:
%elect -O ,niue &rom !ransaction
Where :O In
* %elect :O &rom :roduct
Where :rice SH 4KKK +
and this will yield a much reduced result with only the distinct (unique) customer
numbers:
+.3.3 8roping of Data
,sually, the result of a library function is calculated for the whole relation. &or e(ample,
consider wanting to find the total number of transactions,
%elect -ount *V+
&rom !ransaction
Eiven this relation, the result of -ount *V+ is ?
Introduction to Databases & Relational DM
Powever, sometimes we need to calculate a library function, not for the entire relation,
but only for a subset of it. %uch subsets of tuples are called groups. &or instance, in the
relation !ransaction, a collection of tuples with the same value of attribute -O is a
FgroupJ. In this case, -O is called FEroup ByJ attribute.
Introduction to Databases & Relational DM
Figure +.+. Erouping by customer numbers
!he form of the Eroup By clause is
Mroup B1 ?attribute na#e@
23a#ples
*4+ FEet the list of all customer numbers and the uantity of products bought by each of
themJ. 2ote that the relation will have many transactions for any one customer. !he
transactions for each customer will have to be grouped and the uantities totaled. !his is
then to be done for each different customer. !hus the %@> statement would be6
%elect -O, %um*@nt+ &rom !ransaction Eroup By -O
!hus all transactions with the same -Os are grouped together and the uantities summed
to yield the summarised result6
Why would the following statement be impossible to e(ecuteT
%elect V &rom !ransaction Eroup By :O
*0+ 2ormally, the Where clause would contain conditions for the selection of tuples as in6
%elect -name, %um *@nt+ &rom -ustomer, !ransaction
Where -ustomer.-O H !ransaction.-O
Eroup By -O
!his statement will FEet a list of all customer names and the uantity of products bought
by each of themJ as follows6
Introduction to Databases & Relational DM
Figure +.1. Restriction followed by Erouping
+.3.4 Frther Filtering: 7a0ing
We can further filter out unwanted groups generated by the Eroup By clause by using a
FPavingJ clause which will include in the final result only those groups that satisfy the
stated condition. !hus the additional FPavingJ clause provides a possibility to define
conditions for selection of groups.
&or e(ample, if we wish to .ust FEet such customers who bought more than ?; units of
productsJ, the %@> statement would be6
%elect V &rom -ustomer
Where -O In
* %elect -O &rom !ransaction
Eroup By -O
Paving %,M *@nt+ S ?; +
Figure +.11. Erouping followed by Restriction
In this case, those grouped customers with ?; units or less will not be in the final result.
!he result will thus only be6
It is important to note that in the further filtering of values, the Where clause is used to
e(clude values before the Eroup By clause is applied, whereas the having clause is used
to e(clude values after they have been grouped.
Introduction to Databases & Relational DM
1 0uer1.B1.23a#ple '0B2)
1<.1 Introduction
Data @uery >anguages were developed in the early seventies when the man$machine
interface was, by today#s standards, limited and rudimentary. In particular, interaction
with the computer was through the processing of batched .obs, where .obs *computation
reuests such as Frun this program on that dataJ, Fevaluate this database ueryJ, etc+ were
prepared off$line on some computer readable media *eg. punch cards+, gathered into a
"batch# and then submitted for processing. 2o interaction ta'es place between the user
and computer while the .obs were processed. 3nd results were instead typically printed
for the user to inspect *again off$line+ and to determine the ne(t course of action. !he
batch cycle continued until the user had obtained the desired results.
!his was pretty much the way database ueries were handled *see &igure 4K $4+. 5s data
coding devices were e(clusively te(tual in nature and as processing is non$interactive,
ueries must be defined te(tually and each uery must be self$contained *ie. has all the
components reuired to complete the evaluation+. !he design of early languages were
influenced by, and in fact had to comply with, these constraints to be usable. !hus, for
e(ample, the %@> uery6
!elect *+ from Transaction
where ,+ -" ( !elect ,+ from ,ustomer
where ,city . $ondon )
could be easily encoded as a .ob for batched submission. 2eedless to say, the turnaround
time in such circumstances were high, ta'ing hours or even days before a user sees the
results of submitted ueries. Many hours are typically spent off$line for a .ob that would
ta'e seconds to evaluate, and it is even worse if you made an error in your submissionB
Figure 1.1 3arly batch processing of ueries
/ver the past 0K years, however, man$machine interfaces or human$computer interaction
*P-I+ has progressed in leaps and bounds. !oday, graphical user interfaces *E,I+ are
ta'en for granted and the batched mode of processing is largely a past relic replaced by
highly interactive computing. 2evertheless, many database uery languages today still
retain the old "batch# characteristics and do not e(ploit features of interactive interfaces.
!his is perhaps not surprising as, first, a large body of techniues for processing te(tual
Introduction to Databases & Relational DM
languages had grown over the years *eg. compiling and optimisation+ and, second, they
were well suited for embedding in more general purpose programming languages. !he
latter especially provides great fle(ibility and power in database manipulation. 5lso, as
the paradigm shifted to interactive computing, its application to database ueries was not
immediately obvious. But end$user computing is, in any case, increasing and many tas's
that previously reuired the s'ills of e(pert programmers are now being performed by
end$users through visual, interactive interfaces.
@uery$By$3(ample *@B3+ is the first interactive database uery language to e(ploit such
modes of P-I. In @B3, a uery is a construction on an interactive terminal involving
two$dimensional "drawings# of one or more relations, visualised in tabular form, which
are filled in selected columns with "e(amples# of data items to be retrieved *thus the
phrase uery$by-e%ample+. !he system answers the uery by fetching data items based on
the given e(ample and drawing the result on the same screen *see &igure 4K $0+.
Figure 1.2 5 @B3 uery and its results
!ypically, the "drawing# of relations are aided by interactive commands made available
through pull$down menus *see +. !he menu selection is constrained to relations available
in the schema and thus eliminates errors in specifying relation structures or attribute
names as can occur in te(t$based languages li'e %@>. !he interface provided is in effect
a structured editor for a graphical language.
&or the remainder of this chapter, we
will focus e(clusively on the principal
features of @B3. In contrast to %@>,
@B3 is based on relational calculus
with domain variables *see D.0+. !o
close this introduction, we should
mention that @B3 was developed by
M.M. \loof at the IBM Wor'town
Peights >aboratory.
Introduction to Databases & Relational DM
1<.2 4ariables and Constants
In filling out a selected table with an e(ample, the simplest item that can be entered under
a column is a free variable or a constant. 5 free variable in @B3 must be an underlined
name *identifier+ while a constant can be a number, string or other constructions denoting
a single data value. 5 uery containing such combinations of free variables and constants
is a reuest for a set of values instantiating the specified variables while matching the
constants under the specified columns.
5s an e(ample, loo' at &igure 4K$?. !wo
variables are introduced in the uery6 a and b. By
placing a variable under a column, we are in
effect assigning that variable to range over the
domain of that column. !hus, the variable a
ranges over the domain :O while b ranges over
:name.
!he reader would have also noted that the variables are prefi(ed by F(.J. In @B3, this is
reuired if the instantiation found for the specified variable is to be displayed, ie. the
prefi( F(.J may be thought of as a command to print. We will say more about prefi(
commands li'e this later. %uffice it for now to say that if neither variable in &igure 4K$?
was preceded by F(.J then the result table would display nothingB
!he uery in &igure 4K$? is in fact euivalent to the following construction of relational
calculus with domain variables6
a :O7 b :name7
*a, b+6 * :roduct *a, b+ +
5ssuming the usual :roduct relation e(tension as in previous chapters, the result of the
uery is shown in &igure 4K$;.
>et us consider another simple e(ample and wal' through the basic interactions necessary
to formulate the uery and get the desired results. %uppose we wanted the names and
cities of all customers. !he basic interactions are summarised in &igure 4K$<.
Introduction to Databases & Relational DM
Figure 1.$ Basic seuence of interactions
4. !he user first uses a pull$down menu as in to select the appropriate relation*s+
containing the desired items. &or this uery, the -ustomer relation would seem the
most appropriate and selecting it would result in an empty template being displayed.
0. Inspecting the template, the user can ascertain that the desired data items are indeed in
the selected template *vi). !he -name and -city columns+. 2e(t, the user invents
variable identifiers *a and b+ and types each under the appropriate column. !his is all
that is reuired for this uery.
1. &inally, the e(ample is evaluated by the system and the results displayed on the screen.
!his is the basic interaction even for more comple( ueries $ select relation templates, fill
in e(ample items, then let the system evaluate and display the results. /f course, with
more comple( ueries, more than one relation may be used and constructing the e(ample
will usually involve more than .ust free variables, as we shall see in due course.
&ree variables unconditionally match data values in their respective domains and thus, by
themselves, cannot e(press conditional ueries, such as Fget the names and phone
numbers of customers who live in #ondonJ *the italicised phrase is the condition+. !he
simplest specification of a condition in @B3 is a constant, which is a single data value
entered under a column and interpreted as the condition6
Rattribute nameS H RconstantS
Figure 1.* ,se of a constant to specify a condition in a uery
!hus, the condition "live in #ondon# is uite simply captured by typing ">ondon# under
the "-city# attribute in the -ustomer template, as shown in &igure 4K $C.
More generally, the @B3 synta( for conditions is6
]RcomparatorS^ RconstantS
where comparator is any one of "H#, "#, "R#, "#, "S#, and "#, and is interpreted as the
condition
Rattribute nameS RcomparatorS RconstantS
Introduction to Databases & Relational DM
If RcomparatorS is omitted, it defaults to "H# *as in the above e(ample+. 5s an e(ample of
the use of other comparators, the uery Fget the names of products costing more than
4KKKJ would be as shown in &igure 4K $D.
Figure 1., -omparators in conditions
5 uery can also spread over several rows. !his is the @B3 euivalent form for
e(pressing comple( con.unctions and dis.unctions of conditions. !o correctly interpret
multiple row ueries, bear in mind the following6
the ordering of rows is immaterial
a variable identifier denotes the same instantiation wherever it occurs
!he second point above is particularly important when a variable occurs in more than one
row. But let#s consider first the simpler case where distinct rows do not share any
variable. In this case, the rows are unrelated and can be evaluated independently of one
another and the final result is simply the union of the results of each row. !he collective
condition of such a uery is thus a disjunction of the conditions specified in each row.
&or e(ample, consider the uery6 FEet the names of customers who either live in >ondon
or :aris and whose personal number is greater than 4J. !he @B3 uery for this is shown
in&igure 4K$I. >oo'ing at row 4, note that two conditions are specified. !hese must be
satisfied by values from a single tuple, ie. the condition may be restated as
-O S 4 52D -cityH>ondon
%imilarly, the condition specified in row 0 is
-O S 4 52D -cityH:aris
5s the two rows do not share variables, the collective condition is a dis.unction
*-O S 4 52D -cityH>ondon+ /R *-O S 4 52D -cityH:aris+
which may be simplified to
-O S 4 52D *-cityH>ondon /R -cityH:aris+
Figure 1.+ Multiple dis.unctive rows
In contrast, if a variable occurs in more than one row, then the conditions specified for
each row must be true for the same value of that variable. -onsider, for e(ample, the
uery in &igure 4K$4K where the variable 3 occurs in both rows.
Introduction to Databases & Relational DM
!his means that a value of ( must be found such that both row 4 and row 0 are
simultaneously satisfied. In other words, the condition for this uery is euivalent to
-city H >ondon 52D -O S 4 52D -O R ?
*Eiven the above -ustomer relation, only the value FDeenJ satisfies both rows in this
case.+
!here is another possibly simpler way of describing the meaning and evaluation of
multiple row ueries. %pecifically, we treat each row as a sub-0uery, evaluate each
separately, then merge the results *a set of tuples for each sub$uery+ into a single table.
!he merging of two sets of tuples is simply a union, if their corresponding sub$ueries do
not share variables. /therwise, their intersection over attributes that share variables is
computed instead.
!hus, for the uery in &igure 4K$I, the first sub$uery *row 4+ results in the set MDeenN,
while that of the second sub$uery *row 0+ is MMartinN. 5s the sub$ueries do not share
variables, the final result is simply the union of these results6 MDeen, MartinN.
In contrast, for the uery in &igure 4K$4K, the first sub$uery *row 4+ results in MDeenN,
while the second *row 0+ results in M-odd, DeenN. But as the sub$ueries share the
variable 3 under attribute -name, the merged result is the intersection of the two, ie.
MDeenN.
Before proceeding with the next section, we should just mention here some syntactical
constraints and options of QBE. First, the prefix P. can be used on any example item,
not just free variables. This underlines its earlier interpretation, ie. it is a command to
print or display the value of the item it prefixes (variable or comparison). Thus, if the
query in Figure 10-10 had been:
then the displayed result would be6
2ote that, in general, prefi(ing a comparison prints the value that satisfies it. /f course,
in the case of a constant *implicitly a FHJ comparison+, the constant itself will be printed.
Introduction to Databases & Relational DM
@B3 also allows the user to simplify a uery to only essential components. !his is largely
optional and the user may choose *perhaps for greater clarity+ to include redundant
constructs. Basically, there are two rules that can be applied6
4. If a particular variable is used only once, then it may be omitted. !his saves the user
the trouble of otherwise having to invent names. 5pplication of this rule is illustrated
in &igure 4K $44, where it is applied to the first table *variables 31 and 32+ to result in
the second. 2ote that unless this rule is 'ept in mind when reading simplified ueries,
the appearance of the prefi( F(.J by itself may not only loo' odd but confusing too.
!he prefi(es in the second table must be correctly read as prefi(ing implicit but
distinct variables.
0. Duplicate prefi(es and constants occurring over multiple rows may be FfactorisedJ
into .ust one row. !his is illustrated also in &igure 4K $44 where it is applied to the
second table to result in the third. 5gain, unless this rule is 'ept in mind, ueries such
as that in the third table may seem meaningless.
Figure 1.11 %implifying ueries
While the above rules are optional, the following is a syntactic constraint that must be
observed6 if a free variable occurs in more than one row, then the prefi( F:.J may be used
on at most one of its occurrences.
!he uery below illustrates a valid construction $ note that 3 occurs in two rows but only
one of them has the : prefi(.
1<.3 :;am!le :lements
3ach row of a uery table may be seen as an e(ample of tuples from the associated
relationAspecifically, tuples that match the row. 5 tuple matches a row if each attribute
value in the tuple matches the corresponding uery item in the row. We have seen above
e(actly when a value matches a uery item. In summary6
4. 5ny value matches a blan' uery item or a variable
0. 5 value matches a comparison item if it satisfies the specified comparison
Introduction to Databases & Relational DM
,sing these rules, it is relatively easy to ascertain tuples e(emplified by a uery row. !his
is illustrated in &igure 4K$40. !his is why variables in @B3 are called e%ample elements.
Figure 1.12 5 uery row is an e(ample of matching tuples
In e(tracting facts from several relations that share attribute domains, e(ample elements
are the 'ey to finding related target tuples from the different relations. -onsider the
uery6
FEet the names and phone numbers of customers that have purchased both product
number 4 and product number 0J.
Figure 1.13 3(ample elements over several relations
!he !ransaction relation has part of the information we are after. %pecifically, we loo'
for records of purchase of each item by the same customer, ie. a tuple where the product
number is 4, another where the product number is 0, but both with the same customer
number. !he entries in the !ransaction template in &igure 4K$41 capture this reuirement.
Powever, this tells us only the customer number *the instantiation of F+. Information
about the customer#s name and phone number must be obtained from the -ustomer
relation. We need to ensure, though, that these values are obtained from a customer tuple
that represents the same customer found in the !ransaction relation. In @B3, this is
simply achieved by specifying the same e(ample element F in the customer number
column of the -ustomer relation *as shown in the -ustomer template of &igure 4K$41+.
!he uery in &igure 4K$41 may be evaluated, assuming the following e(tensions of
!ransaction and -ustomer, as follows.
!ransaction -ustomer
-O :O Date @nt -O -name -city -phone
4 4 04.K4 0K 4 -odd >ondon 00<1K1;
4 0 01.K4 1K 0 Martin :aris ;;;;I4K
0 4 0<.K4 0; 1 Deen >ondon 001?1I4
1 0 0I.K4 0K
4. !he subuery in the first row of the !ransaction template is matched by the first and
third tuples of the !ransaction relation, ie. X H M4,0N
Introduction to Databases & Relational DM
0. !he subuery in the second row of the !ransaction template is matched by the second
and fourth tuples of the !ransaction relation, ie. X H M4,1N
1. !he result of evaluating the !ransaction template is therefore M4,0N M4,1N H M4N.
?. !he subuery in the -ustomer template matches all the tuples in the -ustomer
relation, ie. the entire relation is the result.
;. !he final result is the intersection, over -O, of the results in *1+ and *?+, ie. MR-odd,
00<1K1;SN
&igure 4K$4? shows another e(ample of a multi$table uery and illustrates also the
relative ease in FreadingJ or paraphrasing @B3 constructs. &irst, the -ustomer subuery
ma'es it clear, from the use of F:.J prefi(, that the desired result is a set of customer
names and their phone numbers *the elements a and b respectively+. !he element 3 lin's
-ustomer to !ransaction, ie. a customer included in the result must have purchased
something, denoted yet by another element 1. &urthermore, 1 must be such that it is the
product -:,.
Figure 1.1! 5nother e(ample of a multi$table uery with e(ample elements
In other words, the uery can be paraphrased as6
FEet the names and phone numbers of those customers who bought the product -:,J.
!he preceding two e(amples should be enough for the reader to realise that *unadorned+
e(ample elements spread across tables are in fact e(istentially uantified. &or e(ample,
there may be more than one !ransaction tuple that can match the customer number found
in -ustomer, but we don#t care whichB !he e(amples also show that, more generally, a
@B3 uery can spread over a number of rows of a single relation and across other
relations. 5 few further e(amples will serve to highlight @B3#s power and features.
In &igure 4K$4;, we see a comple($loo'ing @B3 uery. 5 closer e(amination will reveal,
however, that within each relation template the rows do not share elements, although the
elements are shared across relations. In fact, there are two dis.oint sets of rows $ one
ta'en from the first row of each relation and the other from the second row of each
relation.
!he first set is actually euivalent to the @B3 uery in &igure 4K$4?.
Figure 1.1% Dis.unctive multi$table uery
Introduction to Databases & Relational DM
!he second differs only in the specified product *replace "-:,# by "LD,# in the above
paraphrased uery+. By analogy with earlier constructions involving unrelated multiple
rows, this type of construction therefore denotes a dis.unctive uery. In other words,
combining the two sets of rows yield the uery6
Get the names and phone numbers of those customers who bought the product CPU or
the product VDU
3arlier, we#ve seen e(amples of elements used in multiple rows of the same relation.
Powever, given now an understanding of multi$table ueries, such constructions can
euivalently be seen as a multi$table uery involving the same tableB !his is shown in
&igure 4K$4< below.
Figure 1.1$ Multi$row *with shared elements+ and euivalent multi$table form
3(ample elements may also be negated. 2egated elements are written with the prefi( "B#,
eg. SF *read Fnot XJ+. !he negated form can only be used if there is at least one
occurrence of the unnegated element elsewhere in the uery. It is then interpreted as
matching any corresponding domain value that the unnegated form did not match.
-onsider, for e(ample, the illustration in &igure 4K$4C. !here are two parts to the
illustration, labelled *a+ and *b+, each with a uery table and an e(tension of the
corresponding relation. &or purposes of this e(ample, the two uery tables constitute a
multi$table uery, ie. the e(ample element X is the same one in both. 2ote, however, that
X is negated in *b+.
Eiven the e(tension of !ransaction as shown, the domain values matching the e(ample
element X in *a+ is M4,0N. !urning now to the subuery in *b+, the specification of "BX# in
it means that the only tuples that can match it are tuples such that the -O value is not in
M4,0N. Eiven the e(tension of -ustomer as shown, this means that only the third tuple
matches the e(ample, ie. the answer returned for elements 5 and B are "Deen# and
"001?1I4# respectively.
Introduction to Databases & Relational DM
Figure 1.1* 2egated 3lement
1<.4 T%e refi; ,LL
!he prefi( 5>> can be applied to e(ample elements. !he occurrence of such an element
in an arbitrary uery row of an arbitrary relation denotes a set of values such that each,
together with a particular instantiation of other items in the row, matches a tuple of the
relation. 5s an e(ample, consider the following relation and uery6
Figure 1.1, 3(ample relation and uery with 5>>
In this case, there is only one other item in the uery row6 another element X. !he set of
values denoted by "5ll.W# therefore needs to be determined for each value that X ta'es.
!hus,
when X H 4, there are two possible values for W, ie. 4 and 0. !hus, "5ll.W# is the set
M4,0N
when X H 0, there is only one value for W, ie. the set M4N
when X H 1, there is also only one value for W, ie. the set M0N
If the uery items had been prefi(ed with ":.#, the result displayed would be6
R4
I4 I0 G
4 M4,0N
0 M4N
1 M0N
Introduction to Databases & Relational DM
In the simplest case, a uery row contains only one element prefi(ed with 5>>. In this
case, the element simply denotes the set of values in the corresponding domain. !his is
illustrated in &igure 4K $4I below.
Figure 1.1+ %imple use of 5>>
!he use of 5>> is more interesting when it involves multitable ueries. &or e(ample,
combining the uery in &igure 4K $4D and &igure 4K $4I into a single uery, we
effectively restrict X to .ust the value 4. !his is because 5>>.W occurs in both tables and
must denote the same set, and the only set satisfying this is M4,0N.
It should be clear now that 5>> is used in @B3 in the same way that a universal
uantifier is used in relational calculus with domain variables. !o highlight this, consider
the uery6
FEet the names of customers who bought all types of the company#s productJ
!hree relations are reuired to resolve this uery6 -ustomer, !ransaction and :roduct.
!he @B3 uery is shown in &igure 4K $0K which is also annotated with e(planations.
Figure 1.2 !he uery FEet the names of customers who bought all types of the
company#s productJ
/ne final word about 5>>6 it does not remove duplicate values, in contrast to an
unprefi(ed element which will return only uniue matching values. !his is illustrated in
&igure 4K $04 below. We shall see in the ne(t section how this property is used *if fact, is
necessary+ in order to answer certain classes of practical ueries.
Introduction to Databases & Relational DM
Figure 1.21 5>> does not remove duplicatesB
1<.5 Librar# &unctions
5s with %@>, @B3 also provides arithmetic operations and a number of built$in functions
which are necessary to manipulate the values in ways not otherwise within the scope of
relational calculus, eg. to count the number of occurrences of returned values or to sum
them up. 5s you may e(pect by now, these operations are provided in the form of
prefi(es. &or e(ample, suppose we wish to 'now how many transactions were related to
the purchase of a particular product, say product number 4. We can e(tract, for e(ample,
all customer numbers in transactions involving product number 46
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -O :O Date @nt
4 4 04.K4 0K :.5ll.X 4 4
4 0 01.K4 1K 0
0 4 0<.K4 0; 4
4 4 0I.K4 0K
But what we are really interested in is counting the number of such values. @B3 allows
us to do this with the prefi( -2! *euivalent to the function -/,2! in %@>+, which
counts the number of values matching the element it prefi(es.
!hus the same uery above, different only in the addition of the -2! prefi(, achieves the
desired result6
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -O :O Date @nt
4 4 04.K4 0K :.-2!.5ll.X 4 1
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
2ote that the use of 5>> is necessary. If the e(ample element was simply F:.-2!.XJ,
the result would be 0B !his is because without the 5>> prefi(, the values matching the
element X are returned with duplicate values removed *as illustrated earlier in &igure 4K
$04+.
5nother freuently used function is %,M, which sums up the values matching the
e(ample element it prefi(es. %uppose, we wish to 'now the total number of product
Introduction to Databases & Relational DM
number 4 that has been sold. Instead of counting the number of customers that purchased
it, we sum instead the uantities recorded in the relevant transactions. !hus6
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -O :O Date @nt
4 4 04.K4 0K 4 :.%,M.5ll.X <;
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
@B3 also allows us to group tuples in a relation based on a specified e(ample element.
!hat is, tuples with the same value of the e(ample element are collected into one group
*there will be as many groups as there are distinct values matching the e(ample element+.
Erouping is specified using the E prefi( *this is similar to the "Eroup By# clause in %@>+.
!hus6
!ransaction !ransaction *@uery+
-O :O Date @nt -O :O Date @nt 4 4 04.K4 0K
4 4 04.K4 0K :.E.X 4 0 01.K4 1K
4 0 01.K4 1K 4 4 0I.K4 0K
0 4 0<.K4 0;
4 4 0I.K4 0K 0 4 0<.K4 0;
5ritmetic functions may be applied to groups. !hus, if we wanted to 'now the total
number of items purchased by each customer, we can modify the above uery as follows6
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -O :O Date @nt
4 4 04.K4 0K :.E.X :.%,M.5ll.B 4 CK
4 0 01.K4 1K 0 0;
0 4 0<.K4 0;
4 4 0I.K4 0K
Eroups may additionally be selected based on conditions that are specified in an
additional column *this corresponds to the "Paving clause# of %@>+. !his additional
conditions column may be created by means of a special menu item in the @B3 interface.
Introduction to Databases & Relational DM
!hus, if we are only interested in finding customers who have purchased more than ?;
items, our uery would be as follows6
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -onditions -O :O Date @nt
4 4 04.K4 0K :.E.X 5ll.B %,M.5ll.BS?; 4
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
In summary, grouping and arithmetic functions can be used in combination to obtain
useful derived values from the database.
Group
selection
condition
Introduction to Databases & Relational DM
11 &rc4itecture of Database -1ste#s
11.1 Introduction
%oftware systems generally have an architecture, ie. possessing of a structure *form+ and
organisation *function+. !he former describes identifiable components and how they
relate to one another structurally7 the latter describes how the functions of the various
structural components interact to provide the overall functionality of the system as a
whole. %ince a database system is basically a software system *albeit comple(+, it too
possesses an architecture. 5 typical architecture must define a particular configuration of
and interaction between data, software modules, meta$data, interfaces and languages *see
&igure 44 $4+.
!he architecture of a database system determines its capability, reliability, effectiveness
and efficiency in meeting user reuirements. But besides the visible functions seen
through some data manipulation language, a good database architecture should provide6
a+ Independence of data and programs
b+ 3ase of system design
c+ 3ase of programming
d+ :owerful uery facilities
e+ :rotection of data
Figure 11.1 Eeneral Database %ystem 5rchitecture
!he features listed above become especially important in large organisations where
corporate data are held centrally. In such situations, no single user department has
responsibility over, nor can they be e(pected to 'now about, all of the organisation#s
data. !his becomes the .ob of a Database "dministrator *DB5+ who has a daunting range
of responsibilities that include creating, e(panding, protecting and maintaining the
integrity of all data while adressing the interests of different present and future user
communities. !o create a database, a DB5 has to analyse and assess the data
reuirements of all users and from these determine its logical structure *database
schema+. !his, on the one hand, will need to be efficiently mapped onto a physical
Introduction to Databases & Relational DM
structure that optimises retrieval performance and the use of storage. /n the other, it
would also have to be mapped to multiple user views suited to the respective user
applications. &or large databases, DB5 functions will in fact reuire the full time services
of a team of many people. 5 good database architecture should have features that can
significantly facilitate these activities.
11.2 Data ,bstraction
!o meet the reuirements above, a more sophisticated architecture is in fact used,
providing a number of levels of data abstraction or data definition. !he database schema,
also 'nown as *onceptual Schema, mentioned above represents an information model at
the logical level of data definition. 5t this level, we abstract out details li'e computer
storage structures, their restrictions, or their operational efficiencies. !he view of a
database as a collection of relations or tables, each with fi(ed attributes and primary 'eys
ranging over given domains, is an e(ample of a logical level of data definition.
!he details of efficiently organising and storing ob.ects of the conceptual schema in
computers with particular hardware configurations are dealt with at the internal 8storage:
level of data definition. !his level is also referred to as the )nternal Schema. It maps the
contents of the conceptual schema onto structures representing tuples, associated 'ey
organisations and inde(es, etc, ta'ing into account application characteristics and
restrictions of a given computer system. !hat is, the DB5 describes at this level how
ob.ects of the conceptual schema are actually organised in a computer. &igure 44 $0
illustrates these two levels of data definition.
Figure 11.2 !he >ogical and Internal >evels of Data 5bstraction
5t a higher level of abstraction, ob.ects from the conceptual schema are mapped onto
vie1s seen by end$users of the database. %uch views are also referred to as E%ternal
Schemas. 5n e(ternal schema presents only those aspects of the conceptual schema that
are relevant to the particular application at hand, abstracting out all other detaiils. !hus,
depending on the reuirements of the application, the view may be organised differently
from that in the conceptual schema, eg. some tables may be merged, attributes may be
suppressed, etc. !here may thus be many views createdAone for each type of
application. In contrast, there is only one conceptual and one internal schema. 5ll views
Introduction to Databases & Relational DM
are derived from the same conceptual schema. !his is illustrated in &igure 44 $1 which
shows two different user views derived from the same conceptual schema.
!hus, modern database systems support three levels of data abstraction6 3(ternal
%chemas *,ser Liews+, -onceptual %chema, and Internal *%torage+ %chema.
!he DD> we discussed in earlier chapters is basically a tool only for conceptual schema
definition. !he DB5 will therefore usually need special languages to handle the e(ternal
and internal schema definitions. !he internal schema definition, however, varies widely
over different implementation platforms, ie. there are few common principles for such
definition. We will therefore say little more about them in this boo'.
Figure 11.3 ,ser Liews *3(ternal %chema+
As to external schema definitions, note that in the relational model, the Data Sub-
Languages can be used to both describe and manipulate data. This is because the
expressions of a Data Sub-Language themselves denote relations. Thus, a collection of
new (derived) relations can be defined as an external schema.
For example, suppose the following relations are defined:
,ustomer( ,+, ,name, ,city, ,phone )
*roduct( *+, *name, *rice )
Transaction( ,+, *+, /ate, 0nt )
We can then define an e(ternal view with a construct li'e the following6
/efine %iew 1y2Transaction2) As
!elect ,name, ,city, /ate, Total2!um.*rice30nt
4rom ,ustomer, Transaction, *roduct
5here ,ustomer.,+ . Transaction.,+
6 Transaction.*+ . *roduct.*+
which defines the relation *view+6
1y2Transaction2)( ,name, ,city, /ate, Total2!um )
Introduction to Databases & Relational DM
!his definition effectively maps the conceptual database structure into a form more
convenient for a particular user or application. !he e(tension of this derived table is itself
derived from the e(tensions of the source relations. !his is illustrated in &igure 44 $?
below.
Figure 11.! 23ternal Tie7 Definition
!his is a very important property of the relational data model6 a unified approach to data
definition and data manipulation.
11.3 Data ,dministration
&unctions of a DB5 include6
-reation of the database
!o create a database, a DB5 has to analyse and assess the reuirements of the users and
from these determine its logical structure. In other words, the DB5 has to design a
conceptual schema and a first variant of an internal schema. When the internal schema is
ready, the DB5 must load the database with actual data.
5cting as intermediary between users and the database
5 DB5 is responsible for all user facilities determined by e(ternal schemas, ie. the DB5
is responsible for defining all e(ternal schemas or user views.
3nsuring data privacy, integrity and security
In analysing user reuirements, a DB5 must determine who should have access to which
data and subseuently arrange for appropriate privacy loc's *passwords+ for identified
individuals and8or groups. !he DB5 must also determine integrity constraints and
arrange for appropriate data validation to ensure that such constraints are never violated.
>ast, but not least, the DB5 must ma'e arrangements for data to be regularly bac'ed up
and stored in a safe place as a measure against unrecoverable data losses for one reason
or another.
Introduction to Databases & Relational DM
5t first glance, it may seem that a database can be developed using the conventional
FwaterfallJ techniue. !hat is, the development process is a seuence of stages, with
wor' progressing from one stage to the ne(t only when the preceding stage has been
completed. &or relational database development, this seuence will include stages li'e
eliciting user reuirements, analysing data relationships, designing the conceptual
schema, designing the internal schema, loading the database, defining user views and
interfaces, etc, through to the deployment of user facilities and database operations.
In practice, however, when users start to wor' with the database, the initial reuirements
inevitably change for a number of reasons including e(perience gained, a growing
amount of data to be processed, and, in this fast changing world, changes in the nature of
the business it supports. !hus, a database need to evolve, learning from e(perience and
allowing for changes in reuirements. In particular, we may e(pect periodic changes to6
improve database performance as data usage patterns changes or becomes clearer
add new applications to meet new processing reuirements
modify the conceptual schema as understanding of the enterprise#s perception of data
improves
-hanging a database, once the conceptual and internal schemas have been defined and
data actually loaded, can be a ma.or underta'ing even for seemingly small conceptual
changes. !his is because the data structures at the storage layer will need to be
reorganised, perhaps involving complete regeneration of the database. 5 good DBM%
should therefore provide facilities to modify a database with a minimum of
inconvenience. !he desired facilities can perhaps be broadly described to cover6
performance monitoring
database reorganisation
database restructuring
By performance monitoring we mean the collection of usage statistics and their analysis.
%tatistics necessary for performance optimisation generally fall under two headings6 static
and dynamic statistics. !he static statistics refer to the general state of the database and
can be collected by special monitoring programs when the database is inactive. 3(amples
of such data include the number of tuples per relation, the population of domains, the
distribution of relations over available storage space, etc. !he dynamic statistics refer to
run$time characteristics and can be collected only when the database is running.
3(amples include freuency of access to and updating of each relation and domain, use
of each type of data manipulation operator and associated response times, freuency of
dis' access for different usage types, etc.
It is the DB5#s responsibility to analyse such data, interpret them and where necessary
ta'e steps to reorganise the storage schema to optimise performance. Reorganising the
storage schema also entails the subseuent physical reorganisation of the data themselves.
!his is what we mean by database reorganisation.
!he restructuring of the conceptual schema implies changing its contents, such as6
adding8removing data items *ie. columns of a relation+
adding8removing entire relations
Introduction to Databases & Relational DM
splitting8recombining relations
changing a relation#s primary 'eys
Getc
&or e(ample, assuming the relations as on page 4?D, suppose we now wish to record also
for each purchase transaction the sales representative responsible for the sale. We will
need therefore to add a column into the !ransaction relation, say with column name RO 6
Transaction( ,+, *+, 7+, /ate, 0nt )
!he intention, of course, is to record a uniue value under this column to denote a
particular sales representative. Details of such sales representatives will then be given in a
new relation6
7epresentative( 7+, 7name, 7city, 7phone)
5 retructured conceptual schema will normally be followed by a database reorganisation
in the sense e(plained above.
11.4 Data Inde!endence
Data independence refers to the independence of one user view *e(ternal schema+ with
respect to others. 5 high degree of independence is desirable as it will allow a DB5 to
change one view, to meet new reuirements and8or to optimise performance, without
affecting other views. Relational databases with appropriate relational sub$languages
have a high degree of data independence.
&or e(ample, suppose that the view
1y2Transaction2)( ,name, ,city, /ate, Total2!um )
as defined on page 4?D no longer meet the user#s needs. >et#s say that -city and Date are
no longer important, and that it is more important to 'now the product name and uantity
purchased. !his change is easily accommodated by changing the select$clause in the
definition thus6
Define Liew MyY!ransactionY4 5s
%elect -name, :name, @nt, !otalY%umH:riceV@nt
&rom -ustomer, !ransaction, :roduct
Where -ustomer.-O H !ransaction.-O
& !ransaction.:O H :roduct.:O
If each view is defined separately over the conceptual schema, then as long as the
conceptual schema does not change, a view may be redefined without affecting other
views. Thus the above change will have no effect on other views, unless they were built
upon My_Transaction_1.
Data independence is also used to refer to the independence of user views relative to the
conceptual schema. &or e(ample, the reader can verify that the change in the conceptual
schema in the last section *adding the attribute RO to !ransaction and adding the new
relation Representative+, does not affect MyY!ransactionY4 $ neither the original nor the
changed viewB. In general, if the relations and attributes referred to in a view definition
new attribute added
This replaces the
original specification
of Ccity and Date
items
Introduction to Databases & Relational DM
are not removed in a restructuring, the view will not be affected. !hus we can
accommodate new *additive+ reuirements without affecting e(isting applications.
>astly, data independence may also refer to the e(tent to which we may change the
storage schema without affecting the conceptual or e(ternal schemas. We will not
elaborate on this as we have pointed out earlier that the storage level is too diverse for
meaningful treatment here.
11.5 Data rotection
!here are generally three types of data protection that any serious DBM% must provide.
!hese were briefly described in -hapter 4 and we summarise them here6
4. 5uthorisational %ecurity
!his refers to protection against unauthorised access and includes measures such as user
identification and password control, privacy 'eys, etc.
0. /perational %ecurity
!his refers to maintaining the integrity of data, ie. protecting the database from the
introduction of data that would violate identified integrity constraints.
1. :hysical %ecurity
!his refers to procedures to protect the physical data against accidental loss or damage of
storage euipment, theft, natural disaster, etc. It will typically involve ma'ing periodic
bac'up copies of the database, transaction .ournalling, error recovery techniues, etc.
In the conte(t of the relational data model, we can use relational calculus as a notation to
define integrity constraints, ie. we define them as formulae of relational calculus. In this
case, however, all variables must be bound variables as we are specifying properties over
their ranges rather than loo'ing for particular instantiations satisfying some predicate. &or
e(ample, suppose that for the :roduct relation, the :rice attribute should only have a
value greater than 4KK and less than IIIII. !his can be e(pressed *in D%> 5lpha style+
as6
7ange *roduct 8 A$$9
(8.*rice : );; 6 8.*rice < ===== )
!his is interpreted as an assertion that must always be true. 5ny data manipulation that
would ma'e it false would be disallowed *typically generating messages informing the
user of the violation+. !hus, not only does the relational data model unify data definition
and manipulation, but its control as well.
In the area of physical security, database bac'ups should of course be done periodically.
&or this purpose, it is perhaps best to view a database as a large set of physical pages,
where each page is a bloc' of fi(ed si)e serving as the basic unit of interaction between
the DBM% and storage devices. 5 database bac'up is thus essentially a copy of the entire
set of pages onto another storage medium that is 'ept in a secure and safe place. 5side
from the obvious need for bac'ups against damage of storage devices, theft, natural
disasters and the li'e, bac'ups are necessary to recover a consistent database in the event
Introduction to Databases & Relational DM
of a database "crash#. %uch crashes can occur in the course of a seuence of database
transactions, particularly transactions that modify the database content.
%uppose, for e(ample, that the last bac'up was done at time t
K
, and subseuent to that, a
number of update transactions were applied one after another. %uppose further that the
first n transactions were successfully completed, but during the *nU4+
th
transaction a
system failure occurred *eg. dis' malfunction, operating system crash, power failure, etc+
leaving some pages in a corrupted state. In general, it is not possible to .ust reapply the
failed transactionAthe failure could have corrupted the updates performed by previous
transactions as well, or worse, it could have damaged the integrity of the storage model as
to ma'e some pages of the database unreadableB We have no recourse at this point but to
go bac' to the last 'nown consistent state of the database at time t
K
, ie. the entire contents
of the last bac'up is reinstated as the current database. /f course, in doing so, all the
transactions applied after t
K
are lost.
5t this point it may seem reasonable that, to guard against losing too much wor',
bac'ups should perhaps be done after each transactionAthen at most only the wor' of
one transaction is lost in case of failure. Powever, many database applications today are
transaction intensive typically involving many online users generating many transactions
freuently *eg. online airline reservation system+. Many databases, on the other hand, are
very large and an entire bac'up could ta'e hours to complete. While bac'up is being
performed the database must be inactive. !hus, it should be clear that this proposition is
impractical.
5s it is clearly desirable that transactions since the last bac'up are also somehow saved in
the event of crashes, an additional mechanism is needed. 3ssentially, such mechanisms
are based on journalling successful transactions applied to a database. !his simply means
that a copy of each transaction *or affected pages+ is recorded in a seuential file as they
are applied to the database.
!he simplest type of .ournalling is the -or1ard System >ournal. In this, whenever a page
is modified, a copy of the modified page is also simultaneously recorded into the forward
.ournal.
!o illustrate this mechanism, let the set of pages in a database be : H Mp
4
, p
0
, G p
n
N. If the
application of an update transaction ! on the database changes :
!
, where :
!
:, then
T':
!
) will be recorded in the forward .ournal. We use the notation T':
!
) to denote the set
of pages :
!
after the transaction ! has changed each page in :
!
. >i'ewise, we write T'p
i
)
to denote a page p
i
after it has been changed by transaction !. &urthermore, if ! was
applied successfully *ie. no crash during its processing+, a separator mar', say "D#, would
be written to the .ournal. !hus, after a number of successful transactions, the .ournal
would loo' as follows
R T':
!4
) D T':
!0
) D G T':
!'
) D S
5s a more concrete e(ample, suppose transaction !4 changed Mp
4
, p
0
, p
1
N, !0 changed
Mp
0
, p
1
, p
?
N, and !1 changed Mp
1
, p
?
, p
;
N, in that order and all successfully carried out.
!hen the .ournal would contain6
R T1' Mp
4
, p
0
, p
1
N ) D T2' M!4*p
0
+, !4*p
1
+, p
?
N ) D T3' M!0*!4*p
1
++, !0*p
?
+, p
;
N ) D S
Introduction to Databases & Relational DM
2ow suppose a crash occurred .ust after !1 has been applied. !he recovery procedure
consists of two steps6
a+ replace the database with the latest bac'up
b+ read the system .ournal in the forward direction *hence the term "forward# .ournal+
and, for each set of .ournal pages that precedes the separator "7#, use it to replace the
corresponding pages in the database. 3ffectively, this duplicates the effect of applying
transactions in the order they were applied prior to the crash.
!he techniue is applicable even if the crash occurred during the last transaction. In this
case, the .ournal for the last transaction would be incomplete and, in particular, the
separator "7# would not be written out. %ay that transaction !1 was interrupted after
modifying pages p
1
and p
?
but before it could complete modifying p
;
. !hen the .ournal
would loo' as follows6
R T1' Mp
4
, p
0
, p
1
N ) D T2' M!4*p
0
+, !4*p
1
+, p
?
N ) D T3' M!0*!4*p
1
++, !0*p
?
+, GN ) S
In this case, recovery is e(actly as described above e(cept that the last incomplete bloc'
of changes will be ignored *no separator "D#+. /f course, the wor' of the last transaction is
lost, but this is unavoidable. It is possible, however, to augment the scheme further by
saving the transaction itself until its effects are completely written to the .ournal. !hen !1
above can be reapplied, as a third step in the recovery procedure.
While the forward .ournal can recover *almost+ fully from a crash, its disadvantage is that
it is still a relatively slow processAhundreds or even thousands of transactions may have
been applied since the last full bac'up, and the corresponding .ournals of each of these
transactions must be copied bac' in seuence to restore the state of the database. In some
applications, very fast recovery is needed.
In these cases, the ?ac$1ard System >ournal will be the more appropriate .ournalling and
recovery techniue. With this techniue, whenever a transaction changes a page, the page
contents before the update is saved. 5s before, if the transaction succesfully completes, a
separator is written. !hus the bac'ward .ournal for the same e(ample as above would be6
RM p
4
, p
0
, p
1
N 7 M T1'p
0
), T1'p
1
), p
?
N 7 M T2'!4*p
1
+), T2'p
?
), GN S
%ince each bloc' of .ournal pages represents the state immediately before a transaction is
applied, recovery consists of only one step6 read the .ournal in the bac'ward direction
until the first separator and replace the pages in the database with the corresponding
pages read from the .ournal. !hus, the bac'ward .ournal is li'e an "undo# fileAthe last
bloc' cancels the last transaction, the second last cancels the second last transaction, etc.
&eatures such as those discussed above can significantly facilitate the management of
corporate data resources. %uch features, together with the overall architecture and the
Data Model e(amined in previous chapters, determine the uality of a DBM% and are
thus often used as part of the principal criteria used in critical evaluation of competing
DBM%s.
Incomplete entry