DBMS

Introduction to Databases & Relational DM
Introduction to Databases & Relational Data Model

Table of Contents
Introduction to Databases & Relational Data Model..............................................1
Table of Contents.......................................................................................................1
1 Introduction to Databases.......................................................................................2
2 Basic Relational Data Model.................................................................................11
3 Data Updating Facilities........................................................................................2
! "or#alisation.........................................................................................................3$
% Relational &lgebra '(art I)..................................................................................%!
$ Relational &lgebra '(art II).................................................................................*!
* Relational Calculus '(art I)..................................................................................+
, Relational Calculus '(art II)...............................................................................1
+ Data -ub./anguage -0/....................................................................................111
1 0uer1.B1.23a#ple '0B2)................................................................................131
11 &rc4itecture of Database -1ste#s...................................................................1!$
1 Introduction to Databases
1.1 Introduction
We live in an information age. By this we mean that, first, we accept the universal fact
that information is reuired in practically all aspects of human enterprise. !he term
"enterprise# is used broadly here to mean any organisation of activities to achieve a stated
purpose, including socio$economic activities. %econd, we recognise further the
importance of efficiently providing timely relevant information to an enterprise and of the
importance of the proper use of technology to achieve that. &inally, we recognise that the
unparallelled development in the technology to handle information has and will continue
to change the way we wor' and live, ie. not only does the technology support e(isting
enterprises but it changes them and ma'es possible new enterprises that would not have
otherwise been viable.
!he impact is perhaps most visible in business enterprises where there are strong
elements of competition. !his is especially so as businesses become more globalised. !he
ability to coordinate activities, for e(ample, across national borders and time )ones
clearly depends on the timeliness and uality of information made available. More
important perhaps, strategic decisions made by top management in response to perceived
opportunities or threats will decide the future viability of an enterprise, one way or the
other. In other words, in order to manage a business *or any+ enterprise, future
development must be properly estimated. Information is a vital ingredient in this regard.
Information must therefore be collected and analysed in order to ma'e decisions. It is
here that the proper use of technology will prove to be crucial to an enterprise. ,p$to$date
management techniues should include computers, since they are very powerful tools for
processing large uantities of information. -ollecting and analysing information using
computers is facilitated by current Database !echnology, a relatively mature technology
which is the sub.ect of this boo'.
1.2 Information Model
Information stored in computer memory is called data. In current computer systems, such
data can *persistently+ reside on a number of memory devices, most common of which
are floppy dis's, -D$R/Ms, and hard dis's.
Data that we store and manipulate using computers are meaningful only to the e(tent that
they are associated with some real world ob.ect in a given conte(t. !a'e, for e(ample, the
number "01#. !his is a piece of data, but by itself a meaningless uantity. If it was
associated with, say, a person and interpreted to denote that person#s age *in years+, then
it begins to be more meaningful. /r, if it was associated with, say, an organisation that
sells electronic goods and interpreted to mean the number of television sets sold in a
given month, then again it becomes more meaningful. 2otice that in both preceding
e(amples, other pieces of data had to be brought into conte(t $ a person, a person#s age, a
shop, television sets, a given month, etc.
If the data is a collection of related facts about some enterprise *eg. a business, an
organisation, an activity, etc+, then it is called a database. !he data stored need not
include every conceivable piece of fact about that enterprise. ,sually, only facts relevant
to some area of an enterprise are captured and organised, typically to provide information
to support decision ma'ing at various levels *operational, management, etc+. %uch a
constrained area of focus is also often referred to as the problem domain or domain of
interest, and is typical of databases. In this sense, a database is an information model of
some *real$world+ problem domain.
1.2.1 Entities
Information models operate on so$called entities and entity relationships. In this section
we will clarify what an entity is. 3ntity relationships are described in 4.0.0.
5n entity is a particular ob.ect in the problem domain. &or e(ample, we can e(tend the
electronics organisation above to identify three distinct entities6 products, customers and
sales representatives *see +. !hey are distinct from one another in the sense that each has
characteristic properties or attributes that distinguish it from the others. !hus a product
has properties such as type, function, dimensions, weight, brand name, cost and price7 a
customer has properties such as name, city of residence, age, credit rating, etc.7 and a
sales representative has properties such as name, address, sales region, basic salary, etc.
3ach entity is thus modelled in the database as a collection of data items corresponding to
its relevant attributes. *2ote that we distinguish between entities even if in the real world
they are from the same class of ob.ects. &or e(ample, a customer and a sales
representative are both people, but a customer is a person who purchases goods while a
sales representative is one who sells goods. !he different "roles# played distinguishes
each from the other+
2ote also the point made earlier that an information model captures only what is relevant
in the given problem domain. -ertainly, there are other entities in the organisation $
regional offices, warehouses, store 'eepers, distributors, etc $ but these may be irrelevant
in the conte(t of, say, analysing sales transactions and are thus omitted from the
information model. 3ven at the level of entity attributes, not all conceivable properties
need be captured as data items. 5 customer#s height, weight, hair colour, hobby, formal
ualification, favourite foods, etc, are probably irrelevant and can thus omitted from the
model.
%trictly spea'ing, the ob.ects we referred to above as entities are perhaps more accurately
called entity classes because they each denote a set of ob.ects *individual entities+, each
of which e(hibits the properties8attributes described for the class. !hus the entity class
"customer# is made up of individual entities, each of which has attributes "name#, "city of
residence#, "age#, etc. 3very individual entity will then have these attributes but one
individual will differ from another in the values *data items+ associated with attributes.
&or e(ample, one customer entity might have the value "%mith# as its "name# attribute,
while another might have the value "9ones#.
Figure 1.1 :roblem domain entities and their attributes
2otice now that in our information model an attribute is really a pair6 an attribute
description or name *such as "age#+ and an attribute value *such as ";<#+, or simply, an
"attribute=value# pair. 5n individual entity is completely modelled only when all its
attribute descriptions have been associated with appropriate attribute values. !he
collection of attribute=value pairs that model a particular individual entity is termed a
data object. &igure 4 $0 illustrates three data ob.ects in the database, each being a
complete model of an individual from its corresponding entity class.
Figure 1.2. Data /b.ects model particular entities in the real world
1.2.2 Entity Relationships
5n entity by itself is often not as interesting or as informative as when it relates in some
way to some other entity or entities. 5 particular product, say a -D$R/M drive, itself
only tells us about its intrinsic properties as recorded in its associated data ob.ect. 5
database, however, models more than individual entities in the problem domain. It also
models relationships between entities.
In the real world, entities do not stand alone. 5 -D$R/M drive is supplied by a supplier,
is stored in a warehouse, is bought by a customer, is sold by a sales representative, is
serviced by a technician, etc. 3ach of these is an e(ample of a relationship, a logical
association between two or more entities.
Figure 1.3. Relationships between entities
&igure 4$1 illustrates such relationships by using lin's labelled with the type of
association between entities. In the figure, a representative sells a product and a customer
buys a product. !a'en together, these lin's and the entities they lin' model a sales
transaction6 a particular sales transaction will have a product data ob.ect related *through
the "sells# relation+ to a representative data ob.ect and *through the "buys# relation+ to a
customer data ob.ect.
>i'e entity attributes, there are many more relationships than are typically captured in an
information model. !he choices are of course based on .udgements of relevance given a
problem domain. /nce choices are made and the database populated with particular data
ob.ects and relationships, it can then be used to analyse the data to support decision
ma'ing. In the simple e(ample developed so far, there are already many types of analysis
that can be carried out, eg. the distribution of sales of a particular product type by sales
region, the performance of sales representatives *measured perhaps by total value of sales
in some time interval+, product preferences by customer age groups, etc.
1.3 Database Management
!he real world is dynamic. 5s an organisation goes about its business, entities are
created, modified or destroyed. %imilarly with entity relationships. !his is easy to see
even for the simple problem domain above, eg. when a sales is made, the product sold is
then logically lin'ed to the customer that bought it and to the representative that sold it.
Many sales transactions could ta'e place each day and thus many new logical lin's
created between many individual entities. 2ew entities can also be introduced, eg. a new
customer arrives on the scene, a new product is offered, or a new salesperson is hired.
>i'ewise, some entities may no longer be of concern to the organisation, eg. a product is
discontinued, a salesperson uits or is fired, etc *these entities may still e(ist in the real
world but have become irrelevant for the problem domain+. -learly, an information
model must also change to reflect the changes in the problem domain that it models.
If the problem domain is small, involving only a few entities and relationship, and the
dynamic changes are relatively few or infreuent, manual methods may be sufficient to
maintain an accurate record of the state of the business. But if hundreds or thousands of
entities are involved and the business is very dynamic, then maintaining accurate records
of its state becomes more of a problem. !his is when computers with their immense
power to handle large amounts of information become crucial to an organisation.
&reuently, it is not .ust a uestion of efficiency, but of survival, especially in intensely
competitive business sectors.
!he need to use computers to efficiently and effectively store databases and to 'eep them
current has developed over the years special software pac'ages called Database
Management Systems *DBM%+. 5 DBM% enables users to create, modify, access and
protect their databases *&igure 4$?+.
Figure 1.! 5 DBM% is a tool to create and use databases
In other words, a DBM% is a tool to be applied by users to build an accurate and useful
information model of their enterprise.
-onceptually, database management is based on the idea of separating a database
structure from its contents. @uite simply, a database structure is a collection of static
descriptions of entity classes and relationships between them. 5t this point, it is perhaps
simplest to thin' of an entity class description as a collection of attribute labels. 3ntity
contents can then be thought of as the values that get associated with attribute labels,
creating data ob.ects. In other words, the distinction between structure and content is little
more than the distinction made earlier between attribute label and attribute value.
Figure 1.% %eparation of structure from content
Relationship descriptions li'ewise are simply labelled lin's between entity descriptions.
!hey specify possible lin's between data ob.ects, ie. two data ob.ects can be lin'ed only
if the database structure describes a lin' between their respective entity classes. &igure 4$
; illustrates this separation of structure from content.
!he database structure is also called a schema *or meta-structureAbecause it describes
the structure of data ob.ects+. It predefines all possible states of a database, in the sense
that no state of the database can contain a data ob.ect that is not the result of instantiating
an entity schema, and li'ewise no state can contain a lin' between two data ob.ects unless
such a lin' was defined in the schema.
Figure 1.$ 5rchitecture of Database %ystems
Moreover, data manipulation procedures can be separated from the data as wellB !hus the
architecture of database systems may be portrayed as in &igure 4$<.
1.4 Database Languages
We see from the foregoing that to build a database, a user must
4. Define the Database %chema
0. 5pply a collection of operators supported by the DBM% to create, store,
retrieve and modify data of interest
5 typical DBM% would provide tools to facilitate the above tas's. 5t the heart of these
tools, a DBM% typically maintains two closely related languages6
4. 5 Data Description >anguage *DD>+, which is used to define database schemas, and
0. 5 Data Manipulation >anguage *DM>+, which allows the user to manipulate data
ob.ects and relationships between them in the conte(t of given database schemas
!hese languages may vary from one DBM% to another, in their underlying data model,
comple(ity, functionality, and ease of use *user interface+.
%o far, we have tal'ed about "users# as if they were all eual in interacting with a DBM%.
In actual fact, though, there may be several types of users distinguished by their role *a
division of labour, often necessary because of highly technical aspects of DBM%s+. &or
e(ample, an organisation that uses a DBM% will normally have a Database 5dministrator
*DB5+ whose .ob is to create and maintain a consistent set of database schemas to satisfy
the needs of different parts of the organisation. !he DB5 is the principal user of the
DD>. !hen there are application developers who develop specific functions around the
database *eg. product inventory, customer information, point$of$sale transaction handling,
etc+. !hey are the principal users of the DM>. 5nd finally, there are the end$users who
use the applications developed to support their wor' in the organisation. !hey normally
don#t see *and don#t care to 'now aboutB+ the DD> or the DM>.
Figure 1.* *notional+ DD> definition
!he DD> is a collection of statements for the description of data types. !he DB5 must
define the target database structure in terms of these data types.
&or instance, the data ob.ect, attribute and lin' mentioned above are data types, and hence
may be perceived as a simple DD>. !hus the data structures in &igure 4 $ are notionally
DD> descriptions of a database schema, as illustrated in &igure 4 $C *"notional# because
the actual language will have specific syntactical constructs that may differ from one
DBM% to another+.
5 DM> is a collection of operators that can be applied to valid instances *ie. data ob.ects+
of the data types in the schema. 5s illustrated in &igure 4$D, the DM> is used to
manipulate instances, including the creation, modification and retrieval of instances.
*>i'e the DD> above, the illustration here is notional7 more concrete forms of these
languages will be covered in later sections of this boo'+.
Figure 1., DM> manipulations of instances
1.5 Data rotection
Databases can be a ma.or investment. !here are costs, obviously, associated with the
hardware and specialised software such as a DBM%. >ess obvious, perhaps, are costs of
creating databases. >arge and comple( databases can reuire many man$years of analysis
and design effort involving specialist s'ills that may not be present in the organisation.
!hus e(pensive consultants and other technical specialists may have to be brought in.
&urthermore, in the long$term, an organisation must also develop internal capability to
maintain the databases and deal with changing reuirements. !his usually means hiring
and retaining a group of technical specialists, such as DB5s, who need to be trained *and
re$trained to 'eep up with the technology+. 3nd$users too will need to be trained to use
the system properly. In other words, there are considerable running costs as well. In all,
databases can be very e(pensive.
Aside from the expenses above, databases often are crucial to a business. Imagine what
would happen, say, if databases of customer accounts maintained by a bank were
(accidently or maliciously) destroyed! Because of these actual and potential costs,
databases must be deliberately protected against any conceivable harm.
Eenerally, there are three types of security that must be put in place6
4. hysical rotection6 these are protective measures to guard against natural
disasters *eg. fire, floods, earthua'es, etc+, theft, accidental damage to
euipment and other threats that can cause the physical loss of data. !his is
generally the area of physical installation management and is outside the
scope of this boo'.
0. !perational rotection6 these are measures to minimise or even eliminate the
impact of human errors on the databases# integrity. 3rrors can occur, for
e(ample, in assigning values to attributesAa value may be unreasonable *eg.
an age of 041B+ or of the wrong type *eg. the value "%mith# assigned to the age
attribute+. !hese measures are typically embodied in a set of integrity
constraints *a set of assertions+ that must be enforced *ie. the truth of the
assertions must be preserved across all database transactions+. 5n e(ample of
an assertion might be "the price of a product must be a positive number#. 5ny
operation then is invalid if it violates a stated constraint, eg. F%tore G :riceH
=I.IIJ. !hese constraints are typically specified by a DB5 in the database
schema.
1. "uthorisational rotection6 these are measures to ensure that access to the
databases are by authorised users only, and then only for specific modes of
access *eg. some users may only be allowed to read while others can modify
database contents+. !hey are necessary to ensure that confidentiality and
correctness of information is preserved. 5ccess control can be applied at
various levels in the system. 5t the installation level, access through computer
terminals may be controlled using special access cards or passwords. 5t
successively lower levels, control may be applied to an entire database, to its
physical devices *or parts thereof+, or to its logical parts *parts of the schema+.
In e(tremely sensitive problem domains, access control may even be applied
to individual instances or data ob.ects in the database.
2 Basic Relational Data Model
2.1 Introduction
Basic concepts of information models, their realisation in databases comprising data
ob.ects and ob.ect relationships, and their management by DBM%#s that separate
structure *schema+ from content, were introduced in the last chapter. !he need for a DD>
to define the structure of entities and their relationships, and for a DM> to specify
manipulation of database contents were also established. !hese concepts, however, were
presented in uite abstract terms, with no commitment to any particular data structure for
entities or lin's nor to any particular function to manipulate data ob.ects.
!here is no single method for organising data in a database, and many methods have in
fact been proposed and used as the basis of commercial DBM%#s. 5 method must fully
specify6
4. the rules according to which data are structured, and
0. the associated operations permitted
!he first is typically e(pressed and encapsulated in a DD>, while the second, in an
associated DM>. 5ny such method is termed a #ogical Data Model *often simply
referred to as a Data Model+. In short,
Data Model 5 DD/ 6 DM/
and may be seen as a techniue for the formal description of data structure, usage
constraints and allowable operations. !he facilities available typically vary from one Data
Model to another.
Figure 2.1 >ogical Data Model
3ach DBM% may therefore be said to maintain a particular Data Model *see &igure 0$4+.
More formally, a Data Model is a combination of at least three components6
*4+ 5 collection of data structure types
*0+ 5 collection of operators or rules of inference, which can be applied
to any valid instance of data types in *4+
*1+ 5 collection of general integrity rules, which implicitly or e(plicitly
define the set of consistent database states or change of state or both
It is important to note at this point that a Data Model is a logical representation of data
which is then realised on specific hardware and software platforms *its implementation,
or physical representation as illustrated in &igure 0 $4+. In fact, there can be many
different implementations of a given model, running on different hardware and operating
systems and differing perhaps in their efficiency, performance, reliability, user interface,
additional utilities and tools, physical limitations *eg. ma(imum si)e of databases+, costs,
etc. *see &igure 0 $0+. 5ll of them, however, will support a mandatory minimal set of
facilities defined for that data model. !his is analogous to programming languages and
their implementations, eg. there are many - compilers and many of them implement an
agreed set of standard features regardless of the hardware and software platforms they
run on. But as with programming languages, we need not concern ourselves with the
variety of implementations when developing database applicationsA'nowledge of the
basic logical data model is sufficient for us to do that.
Figure 2.2 Multiple realisations of a single Data Model
It is also important not to confuse the terms information model and data model. !he
former is an abstraction of a real world problem domain and tal's of entities,
relationships and instances *data ob.ects+ specific to that domain. !he latter provides a
domain independent formal framewor' for e(pressing and manipulating the abstractions
of any information model. In other words, an information model is a description, by
means of a data model, of the real world.
2.2 Relation
:erhaps the simplest approach to data modelling is offered by the Relational Data Model,
proposed by Dr. 3dgar &. -odd of IBM in 4ICK. !he model was subseuently e(panded
and refined by its creator and very uic'ly became the main focus of practically all
research activities in databases. !he basic relational model specifies a data structure, the
so$called Relation, and several forms of high$level languages to manipulate relations.
!he term relation in this model refers to a two$dimensional table of data. In other words,
according to the model, information is arranged in columns and rows. !he term relation,
rather than matri(, is used here because data values in the table are not necessarily
homogenous *ie. not all of the same type as, for e(ample, in matrices of integers or real
numbers+. More specifically, the values in any row are not homogenous. Lalues in any
given column, however, are all of the same type *see &igure 0 $1+.
Figure 2.3 5 Relation
5 relation has a uniue name and represents a particular entity. 3ach row of a relation,
referred to as a tuple, is a collection of facts *values+ about a particular individual of that
entity. In other words, a tuple represents an instance of the entity represented by the
relation.
Figure 2.! Relation and 3ntity
&igure 0$? illustrates a relation called "-ustomer#, intended to represent the set of persons
who are customers of some enterprise. 3ach tuple in the relation therefore represents a
single customer.
!he columns of a relation hold values of attributes that we wish to associate with each
entity instance, and each is labelled with a distinct attribute name at the top of the
column. !his name, of course, provides a uniue reference to the entire column or to a
particular value of a tuple in the relation. But more than that, it denotes a domain of
values that is defined over all relations in the database.
!he term domain is used to refer to a set of values of the same 'ind or type. It should be
clearly understood, however, that while a domain comprises values of a given type, it is
not necessarily the same as that type. &or e(ample, the column "-name# and "-city# in
&igure 0 $? both have values of type string *ie. valid values are any string+. But they
denote different domains, ie. "-name# denotes the domain of customer names while
"-city# denotes the domain of city names. !hey are different domains even if they share
common values. &or e(ample, the string ":aris# can conceivably occur in the -olumn
"-name# *a person named :aris+. Its meaning, however, is uite different to the
occurrence of the string ":aris# in the column "-city# *a city named :aris+B !hus it is
uite meaningless to compare values from different domains even if they are of the same
type.
Moreover, in the relational model, the term domain refers to the current set of values
found under an attribute name. !hus, if the relation in &igure 0 $? is the only relation in
the database, the domain of "-name# is the set M-odd, Martin, DeenN, while that of
"-city# is M>ondon, :arisN. But if there were other relations and an attribute name occurs
in more than one of them, then its domain is the union of values in all columns with that
name. !his is illustrated in &igure 0$; where two relations each have a column labelled
"-O#. It also clarifies the statement above that a domain is defined over all relations, ie. an
attribute name always denotes the same domain in whatever relation in occurs.
Figure 2.% Domain of an attribute
!his property of domains allows us to represent relationships between entities. !hat is,
when two relations share a domain, identical domain values act as a lin$ between tuples
that contain them *because such values mean the same thing+. 5s an e(ample, consider a
database comprising three relations as shown in &igure 0 $<. It highlights a !ransaction
tuple and a -ustomer tuple lin'ed through the -O domain value "0#, and the same
!ransaction tuple and a :roduct tuple lin'ed through the :O domain value "4#. !he
!ransaction tuple is a record of a purchase by customer number "0# of product number
"4#. !hrough such lin's, we are able to retrieve the name of the customer and the product,
ie. we are able to state that the customer "Martin# bought a "-amera#. !hey help to avoid
redundancy in recording data. Without them, the !ransaction relation in &igure 0 $< will
have to include information about the appropriate -ustomer and :roduct in its table. !his
duplication of data can lead to integrity problems later, especially when data needs to be
modified.
Figure 2.$ >in's through domain sharing
2.3 ro!erties of a Relation
5 relation with 2 columns and M rows *tuples+ is said to be of degree 2 and cardinality
M. !his is illustrated in &igure 0 $C which shows the -ustomer relation of degree four
and cardinality three. !he product of a relation#s degree and cardinality is the number of
attribute values it contains.
Figure 2.* Degree and -ardinality of a Relation
!he characteristic properties of a relation are as follows6
4. 5ll entries in a given column are of the same 'ind or type
0. !he ordering of columns is immaterial. !his is illustrated in &igure 0 $D where the
two tables shown are identical in every respect e(cept for the ordering of their
columns. In the relational model, column values *or the value of an attribute of a
given tuple+ are not referenced by their position in the table but by name. !hus the
display of a relation in tabular form is free to arrange columns in any order. /f
course, once an order is chosen, it is good practice to use it everytime the relation *or
a tuple from it+ is displayed to avoid confusion.
Figure 2., -olumn ordering is unimportant
1. 2o two tuples are e(actly the same. 5 relation is a set of tuples. !hus a table that
contains duplicate tuples is not a relation and cannot be stored in a relational
database.
?. !here is only one value for each attribute of a tuple. !hus a table such as in &igure 0
$I is not allowed in the relational model, despite the clear intended representation, ie.
that of customers with two abodes *eg. -odd has one in >ondon and one in Madras+.
In situations li'e this, the multiple values must be split into multiple tuples to be a
valid relation.
Figure 2.+ 5 tuple attribute may only have one value
;. !he ordering of tuples is immaterial. !his follows directly from defining a relation as
a set of tuples, rather than a seuence or list. /ne is free therefore to display a relation
in any convenient way, eg. sorted on some attribute.
!he e%tension of a relation refers to the current set of tuples in it *see &igure 0 $4K+. !his
will of course vary with time as the database changes, ie. as we insert new tuples, or
modify or delete e(isting ones. %uch changes are effected through a DM>, or put another
way, a DM> operates on the e(tensions of relations.
!he more permanent parts of a relation, vi). the relation name and attribute names, are
collectively referred to as its intension or schema. 5 relation#s schema effectively
describes *and constrains+ the structure of tuples it is permitted to contain. DM>
operations on tuples are allowed only if they observe the e(pressed intensions of the
affected relations *this partially addresses database integrity concerns raised in the last
chapter+. 5ny given database will have a database schema which records the intensions of
every relation in it. %chemas are defined using a DD>.
Figure 2.1 !he Intension and 3(tension of a Relation
2.4 "e#s of a Relation
5 $ey is a part of a tuple *one or more attributes+ that uniuely distinguishes it from other
tuples in a given relation. /f course, in the e(treme, the entire tuple is the 'ey since each
tuple in the relation is guaranteed to be uniue. Powever, we are interested in smaller
'eys if they e(ist, for a number of practical reasons. &irst, 'eys will typically be used as
lin's, ie. 'ey values will appear in other relations to represent their associated tuples *as
in &igure 0 $< above+. !hus 'eys should be as small as possible and comprise only
nonredundant attributes to avoid unnecessary duplication of data across relations. %econd,
'eys form the basis for constructing inde(es to speed up retrieval of tuples from a
relation. %mall 'eys will decrease the si)e of inde(es and the time to loo' up an inde(.
-onsider &igure 0 $44 below. !he customer number *-O+ attribute is clearly designed to
uniuely identify a customer. !hus we would not find two or more tuples in the relation
having the same customer number and it can therefore serve as a uniue 'ey to tuples in
the relation. Powever, there may be more than one such 'ey in any relation, and these
'eys may arise from natural attributes of the entity represented *rather than a contrived
one, li'e customer number+. 3(amining again &igure 0 $44, no two or more tuples have
the same value combination of -city and -phone. If we can safely assume that no
customer will share a residence and phone number with any other customer, then this
combination is one such 'ey. 2ote that -phone alone is not $ there are two tuples with the
same -phone value *telephone numbers in different cities that happen to be the same+.
5nd neither is -city alone as we may e(pect many customers to live in a given city.
Figure 2.11 -andidate Qeys
While a relation may have two or more candidate $eys, one must be selected and
designated as the primary $ey in the database schema. &or the e(ample above, -O is the
obvious choice as a primary 'ey for the reasons stated earlier. When the primary 'ey
values of one relation appear in other relations, they are termed foreign $eys. 2ote that
foreign 'eys may have duplicate occurrences in a relation, while primary 'eys may not.
&or e(ample, in &igure 0 $<, the -O in !ransaction is a foreign 'ey and the 'ey value "4#
occurs in two different tuples. !his is allowed because a foreign 'ey is only a reference to
a tuple in another relation, unli'e a primary 'ey value, which must uniuely identify a
tuple in the relation.
2.5 Relational $c%ema
5 Relational Database %chema comprises
4. the definition of all domains
0. the definition of all relations, specifying for each
a+ its intension *all attribute names+, and
b+ a primary 'ey
&igure 0$40 shows an e(ample of such a schema which has all the components mentioned
above. !he primary 'eys are designated by shading the component attribute names. /f
course, this is only an informal view of a schema. Its formal definition must rely on the
use of a specific DD> whose synta( may vary from one DBM% to another.
Figure 2.12 5n 3(ample Relational %chema
!here is, however, a useful notation for relational schemas commonly adopted to
document and communicate database designs free of any specific DD>. It ta'es the
simple form6
Rrelation nameS6 Rlist of attribute namesS
5dditionally, attributes that are part of the primary 'ey are underlined.
!hus, for the e(ample in &igure 0 $40, the schema would be written as follows6
-ustomer6 * -O, -name, -city, -phone +
!ransaction6 * -O, :O, Date, @nt +
:roduct6 * :O, :name, :rice+
!his notation is useful in clarifying the overall organisation of the database but omits
some details, particularly the properties of domains. 5s an e(ample of a more complete
definition using a more concrete DD>, we rewrite some the schema above using -odd#s
original notation. !he principal components of his notation are annotated alongside.
3 Data Updating Facilities
3.1 Introduction
We have seen that a Data Model comprises the Data Description facilities (through the
DDL) and the Data Manipulation facilities (through the DML). As explained in Chapter
2, a DDL is used to specify the schema for the database - its entities can be created, and
its attributes, domains, and keys can be defined through language statements of the DDL.
The structure of the entities is defined, but not the data within them. DDL thus supports
only the declaration of the database structure.
Figure 3.16 Data Definition and Manipulation facilities of a Data Model
In this chapter, we shall see how the second component, the DML, can be used to support
the manipulation or processing of the data within the database structures defined by the
DDL. Manipulation begins with the placement of data values into the structures. When
the data values have been stored, the end user should be able to get at the data. The user
would have a specific purpose for the piece of data he/she wants to get - perhaps to
display the data on the screen in order to know the value, to write the data to an output
report file to be printed, to use the data as part of a computation, or to make changes to it.
3.2 Data Mani!ulation &acilities
!he manipulative part of the relational model comprises the DM> which contains
commands to put new data, delete and modify the e(isting data. !hese facilities of a
Database Management %ystem are 'nown as ,pdating &acilities or Data ,pdate
&unctions, because unli'e the DD> which e(ecutes operations on the structure of the
database#s entities, the DM> performs operations on the data within those structures.
Eiven a relation declared in a database schema, the main update functions of a DM> are6
4. !o insert or add new tuples into a particular relation
0. !o delete or erase e(isting tuples from a relation
1. !o modify or change data in an e(isting relation
Examples:
1. To insert a ne7 tuple
The company receives a new customer. To ensure that its database is up-to-date, a
new tuple containing the data that is normally kept about customers must be created and
inserted.
!his is thus a way to load data
into the database.
Step1. 5 user *through an application program+
chooses a relation, say the -ustomer relation. It has
? attributes, and 1 e(isting tuples.
Step 2. !he user prepares a new tuple of the relation
*database+ on the screen or in the computer#s
memory
Step &. !hrough a DM> command specified by the
user, the DBM% puts a new tuple into the relation of
the database according to the definition of the DD>
to place data in that row. !he -ustomer relation
now has ? tuples.
2. To delete an e3isting tuple
5n e(isting customer no longer does any business with the company, in which case, the
corresponding tuple must be erased from the customer database.
3. To #odif1 an e3isting tuple
An existing customer has moved to a new location (or that the current value in the data
field is incorrect). e has new values for the city and telephone number. These new
values must replace the previous values.
!wo types of modifications are normally done6

4. 5ssigned $ an assigned modification entails the simple assignment of a new value into
the data field *as in the e(ample above+
Step1. !he user chooses the relation, -ustomer.
Step 2. !he user issues a DM> command to
retrieve the tuple to be deleted.
Step &. !he DBM% deletes the tuple from
the relation.
!he updated relation now has one less tuple.
Step1. !he user chooses the relation.
Step 2. !he user issues a DM> retrieval command
to get the tuple to be changed.
Step &. !he user modifies one or more data items.
Step '. !he DBM% inserts the modified tuple
into the relation.
0. -omputed $ in computed modification, the e(isting value is retrieved, then some
computation is done on it before restoring the updated value into the field *e.g. all
-phone numbers beginning with the digit ; are to be changed to begin with the digits
;D+.

Additionally, it is possible to insert new tuples into a relation with one or more unknown
values. !uch unknown values, called "#$$-%A$#&s, are denoted by '
To insert a tuple 7it4 un8no7n 9alues
At this stage, we only mention these update functions via logical definitions such as
above. In the implementation of DMLs, there exist many RDBMS systems with wide
variations in the actual language notations . We shall not discuss these update functions
of concrete data manipulation languages yet.
3.3 (ser Interface
&.&.1 (sers
Now that data values are stored in the database, we shall look at how users can
communicate with the database system in order to access the data for further
manipulation. First let us take a look at the characteristics of users and the processing and
inquiries they tend to make.
!here are essentially two types of users6
4. End (sers6 3nd users are those who directly use the information obtained from the
database. !hey are li'ely to be computer novices or neophytes, using the computer
system as an e(tended tool to assist them with their main .ob function *which may be to
do financial analysis or to register students+. 3nd users may feel uncomfortable with
computers or may simply be not interested in the technical details, but their lac' of
'nowledge should not be a handicap to the main .ob which they have to do.
3nd users should be able to retrieve the data in the database in any manner they wish,
which are li'ely to be in the form of6
casual, unanticipated or ad hoc ueries which often must be satisfied within a short
space of time, if not immediately *e.g. FPow many students over the age of 1K are
taught by professors below the age of 1KTJ+
5 new customer, Deen, is created.
But Deen has yet to notify the company of
his living place and telephone number.
5t a later point in time, when Deen has
confirmed his living place and telephone, the
tuple with his details can be modified by
replacing the Ts with the appropriate values.
standard or predictable ueries that need to be e(ecuted on a routine basis *e.g.
F:roduce the monthly cash flow analysis reportJ+
0. Database specialists6 %pecialist users are 'nowledgeable about the
technicalities of the database management system. !hey are li'ely to hold
positions such as the database administrator, database programmer, database
support person, systems analyst or the li'e. !hey are li'ely to be responsible
for tas's li'e6
defining the database schema
handling comple( ueries, reports or tailored software applications
defining data security and ensuring data integrity
performing database bac'ups and recoveries
monitoring database performance
&.&.2 )nterface
Interactions with the database would reuire some form of interface with the users. !here
are two basic ways in which the ,ser$Database interface may be organi)ed, i.e. the
database may be accessed from a6
4. purpose$built, non$procedural Self-*ontained #anguage, or a
0. +ost #anguage *such as -, -UU, -/B/>, :ascal, etc.+
!he %elf $-ontained >anguage tend to be the tool favored by end$users to access the
database, whereas access through a host language is a tool of the technical e(perts and
s'illed programmers who use it to develop specialised software or database applications,
In either case, the access is still through the database schema and using the DM>.
Figure 3.26 !wo different interfaces to the database
&.&.& Self-*ontained #anguage
Let us first take a look at the tool favored by the end users:
Figure 3.36 3(panding DM> with additional functions
Here we have a collection of DML statements (e.g. GET, SELECT) to access the
database. These statements can be expanded with other statements that are capable of
doing arithmetic operations, computing statistical functions, and so on. The DML
statements, as we have seen, are dependent on the database schema. However, the
additional statements for statistical functions etc. are not, and thus add a form of
independence from the schema. This is illustrated in the Figure 3.3. Hence the name
Self-Contained language for such a database language.
!he self$contained language is most suitable for end users to gain rapid or online access
to the data. It is often used to ma'e ad hoc inuiries into the database, involving mainly
the retrieval operation. It is often described as being user friendly, and can be learnt uite
uic'ly. Instead of waiting for the database specialists or some technical whi) to program
their reuests, end users can now by themselves create ueries and format the resulting
data into reports.
!he language is usually non$procedural and command$oriented where the user specifies
3nglish$li'e te(t. !o get a flavor of such a language, let us loo' at some simple e(amples
which uses a popular command$oriented data$sublanguage, %@>. *More will covered in
-hapter I+.
%uppose we ta'e the 1 relations introduced earlier6
Custo#er 'C:; Cna#e; Ccit1; Cp4one)
(roduct '(:; (na#e; (rice)
Transaction 'C:; (:; Date; 0nt)
with the following sample values in the tuples:
Figure 3.!6 %ample database
We shall illustrate the use of some simple commands6
%3>3-! V &R/M -,%!/M3R
!his will retrieve all data values of the -ustomer relation, with the
following resulting relation6
%3>3-! -25M3
&R/M -,%!/M3R
WP3R3 --I!WHF>ondonJ
!his will retrieve, from the -ustomers relation, the names of all
customers whose city is >ondon6
%3>3-! -25M3, :O
&R/M -,%!/M3R, !R52%5-!I/2
WP3R3 -,%!/M3R.-O H !R52%5-!I/2.-O
!his will access the two relations, -ustomer and !ransaction, and
in effect, retrieve from them the 2ames of customers who have
transactions and the :art numbers supplied to them *note, customers with
no transactions will not be retrieved+. !he resultant relation is6
%3>3-! -/,2! *V+, 5LE *:RI-3+ &R/M :R/D,-!
Pere, the DM>8%3>3-! statement is e(panded with additional
arithmetic8statistical functions. !his will access the :roduct relation and
perform functions to
*4+ count the total number of products and
*0+ get the average of the :rice values6
Once end users know how to define queries in terms of a particular language, it would
seem that they can quite easily do the their own queries like the above. It is a matter of a
few lines of commands which may be quickly formulated to get the desired information.
%3>3-! DI%!I2-! :O &R/M :R/D,-!, !R52%5-!I/2
WP3R3 2/! 3XI%!%
*%3>3-! V &R/M :R/D,-!, !R52%5-!I/2
WP3R3 :R/D,-!.:O H !R52%5-!I/2.:O
52D 2/! 3XI%!%
* %3>3-! V &R/M :R/D,-!, -,%!/M3R
WP3R3 :R/D,-!.:O H !R52%5-!I/2.:O
52D -,%!/M3R.-O H F1J + +
However if the query is too involved or complex, like the following example, then the
end-users will have to be quite expert database users or will have to rely on help from the
technical specialists.
-an you figure what this uery statement doesT
&.&.' Embedded +ost #anguage
5part from simple ueries, end users need specialised reports that reuire technical
specialists to write computer programs to process them.
!he interface for the s'illed programmers is usually in the form a database command
language and a programming language with utilities to support other data operations.
Pere in the second case, the DM> statements are embedded into the te(t of an application
program written in a general purpose host programming language. !hus %@> statements
to access the relational database, for e(ample, are embedded in -, -UU or -obol
programs.
Figure 3.%6 3mbedding DM> in a Post >anguage
3mbedded host language programs provide the application with full access to the
databases to6
manipulate data structures *such as to create, destroy or update the
database tables+,
manipulate the data *such as to retrieve, delete, append or update the
data items in the database tables+,
manage groups of statements as a single transaction *such as to abort a
group of statements+, and
perform a host of other database management functions as well *such
as to create access permits on database tables+.
!he DM>8%@> statements embedded in the program code is usually placed between
delimiters such as
3X3- %@>
%3>3-! GG
32D$3X3-
!he program is then pre$compiled to convert the DM> into the host source code that can
subseuently be compiled into ob.ect code for running.
-ompared to the command lines of ueries written in self$contained languages, an
application program such as the above ta'es more effort and time. Eood programming
abilities are reuired. 5pplications are written in an embedded host language for various
reasons, including for6
large or comple( databases which contain a hundred million characters
or more
a well 'nown set of applications transactions, perhaps running
hundreds of times a day *e.g. to ma'e airline seat reservations+ or
standard8predictable ueries that need to be e(ecuted on a routine basis
*e.g. Fgenerate the wee'ly payrollJ+
uns'illed end$users or if the uery becomes too involved or
complicated for the end$user.
However, for end-users, again special interfaces may be designed to facilitate the access
to these more complex programs.
Figure 3.$6 3asy$to$use 3nd ,ser Interface
Such interfaces are usually in the form of simple, user-friendly screens comprising easy-
to-learn languages, a series of menus, fill-in-the-blank data-entry panels or report screens
- sufficient for the user to check, edit and format data, make queries, or see the results,
and without much technical staff intervention.
In this section we have seen that user interfaces are important to provide contact with the
underlying database. /ne of the advantages of relational databases is the use of languages
that are standardi)ed, such as %@> and the availability of interface products that are easy$
to$use. /ften it ta'es .ust a few minutes to define a uery in terms of a %elf$-ontained
language *but the reali)ation of such a uery may ta'e much more time+. 3nd users can
thus create ueries and generate their own reports without having to rely heavily on the
programming staff to respond to their reuests. More importantly, they can also be made
more responsible for their own data, information retrieval and report generation. !he
technical staff must thus ensure that good communication e(ists between them and the
end$users, that sufficient training is always given, and that good coordination all around
is vital to ensure that these users 'now what they are doing.
These are the things that give relational databases their true flexibility and power
(compared to the other models such as the hierarchical or network databases).
3.4 Integrit#
Paving seen what and how data can be manipulated in a database, we shall ne(t see the
importance of assuring the reliability of data and how this can be achieved through the
imposition of constraints during data manipulation operations.
5part from providing data access to the user who wishes to retrieve or update the data, a
database management system must also provide its users with other utilities to assure
them of the proper maintenance, protection and reliability of the system. &or e(ample, in
single$user systems, where the entire database is used, owned and maintained by a single
user, the problems associated with data sharing do not arise. But when an enterprise$wide
database is used by many users, often at the same time, the problems of who can access
what, when and how $ confidentiality and update - become a very big concern. !hus data
security and data integrity are crucial.
:eople should not see what is not intended for them *e.g. individuals must have privacy
rights on their medical or criminal records, businesses must safeguard their commercially
sensitive information, and the military must secure their operation plans+. 5dditionally,
people who are not authori)ed to update data, must not be allowed to change them *e.g.
an electronic ban' transfer must have the proper authori)ation+.
While the issue of data security concerns itself with the protection of the database against
unauthori)ed users who may disclose, alter or destroy data they are not supposed to have
access to, data integrity concerns itself with protecting the database against authori)ed
users. In this section, we shall focus on the latter *the former will be covered in greater
detail in -hapter 44+.
!hus integrity protection is an important part of the data model. By integrity we mean the
correctness and reliability of the data maintained in the database. /ne must be confident
that the data accessed is accurate, correct, relevant and valid. !he protection can be
viewed as a set of constraints that prevents any undesirable updates on the database. !wo
types of constraints may be applied6
Implicit Integrity -onstraints
3(plicit Integrity -onstraints
&.'.1 )mplicit )ntegrity *onstraints
!he term FImplicit Integrity -onstraintsJ means that a user need not e(plicitly define
these Integrity -onstraints in a database schema. !hey are a property of the relational
data model as such. !he Relational Data Model provides two types of Implicit Integrity
-onstraints6
1. 2ntit1 Integrit1
Recall that an attribute is usually chosen from a relation to be its primary key. The
primary key is used to identify the tuple and is useful when one wishes to sort or access
the tuples efficiently. It is a unique identifier for a given tuple. As such, no two tuples can
have the same key values, and nor can the values be null. Otherwise, uniqueness cannot
be guaranteed.
Figure 3.*6 3ntity Integrity Liolation
5 primary 'ey that is null would be a contradiction in terms, for it would effectively state
that there is an entity that has no 'nown identity. Pence, the term entity integrity.
2ote that whilst the primary 'ey cannot be null, *which in this case, -O cannot have a
null value+, the other attributes, may be so *for e(ample, -name, -city or -phone may
have null values+.
!hus the rule for the F3ntity IntegrityJ constraint asserts that no attribute participating in
the primary 'ey of a relation is permitted to accept null values.
2. Referential Integrit1
We have seen earlier how a primary or secondary key in one relation may be used by
another relation to handle many-to-many relationships. For example, the Transaction
relation has the attribute C# which is also an attribute in the Customer relation. But C# is
a primary key of Customer, thus making C# a foreign key in Transaction.
!he foreign 'ey in the !ransaction relation cross$references data in the -ustomer
relation, e.g. using the value of -O in !ransaction to get details on -name which is not
found directly in it but in -ustomer. When relations ma'e references to another relation
via foreign 'eys, the database management system must ensure that data between the
relations are valid. &or e(ample, !ransaction cannot have a tuple with a -O value that is
not found in the -ustomer relation for the tuple would then be referring to a customer that
does not e(ist.
Thus, for referential integrity a foreign key can have only two possible values - either the
relevant primary key or a null value. No other values are allowed.
Figure 3.,6 Referential Integrity Liolation
&igure 1.D above shows that by adding a tuple with -O ? means that we have a foreign
'ey that does not reference a valid 'ey in the parent -ustomer relation. !hus a violation
by an insert operation. >i'ewise, if we were to delete -O 0 from the -ustomer relation,
we would again have a foreign 'ey that no longer references a matching 'ey in the base
or parent relation.
Also note that the foreign key here, C# is not allowed to have a null value either since it
is a part of Transactions primary key (which is the combined attributes of C#, P#, Date).
But if the foreign key is a simple attribute, and not a combined/compound one, then it
may have null values. In other words, a foreign key cannot be partially null, it must be
wholly null if it does not refer to any particular key in the base relation. Unlike primary
keys which are not permitted to accept null values, there may be instances when foreign
keys have to be null. For example, a database about employees and departments would
have a foreign key, say Dept# in the Employee relation which indicates the department to
which the employee is assigned. But when a new employee joins the company, it is
possible that the employee is not assigned to any department yet. Her Employee tuple
may then have a null Dept#.
Thus the rule for the Referential Integrity constraint asserts that if a relation R2
includes a foreign key that matches the primary key of another relation R1, then every
attribute value of the foreign key in R2 must either (i) be equal to the value of the
primary key in some tuple of R1 or (ii) be wholly null.
&.'.2 E%plicit )ntegrity *onstraints
In addition to the general, implicit constraints of the relational model, any specific
database will often have its own set of local rules that apply to it alone. This is again to
ensure that the data values maintained are reliable. Specific validity checks are done on
them, for otherwise unexpected or erroneous data may be created. Occasionally, one
hears for example, of the case of the telephone subscriber getting an unreasonably large
bill.
Larious 'inds of chec's can be imposed. 5mongst the usual constraints practised in data
processing are tests on6
class or data type, e.g. alphabetic or numeric type
sign e.g. positive or negative number
presence or absence of data, e.g. if spaces or null
value, e.g. if value S 4KK
range or limit, e.g. $4K ( U4K
reasonableness, e.g. if y positive and R 4KKKK
the consistency, e.g. if ( R y and y R 4KK
In a Relational Data Model, explicit integrity constraints may be declared to handle the
above cases as follows:

1. Do#ain Constraints
%uch constraints characteri)e the constraints that can be defined on domains, such as
value set restriction, upper and lower limits, etc. &or e(ample6
Figure 3.+6 Domain -onstraint Liolation
Whilst the :name of the third tuple in &igure 1.I complied with the allowable values, its
:rice should have been less than 0KKK. 5n error message is flagged and the tuple cannot
be inserted into the relation.
2. Tuple Constraints
!he second type of e(plicit constraint characteri)es the rules that can be defined on
tuples, such as inter$attribute restrictions. &or e(ample6
Figure 3.16 !uple -onstraint Liolation
With such a declaration, then the above tuple with :O 1 cannot have a :rice value that is
greater or eual to 4;KK.
3. Relation Constraints
Relation constraints characteri)e the constraints that can be defined on a relation
structure, such as alternate 'eys or inter$relation restrictions. &or e(ample,
Figure 3.116 Relational -onstraint Liolation
:name, being an alternate 'ey to the primary 'ey, :O, should have uniue values.
Powever, :name may be null *unli'e :O which if null, would violate the entity integrity+.
Figure 3.126 5llowable 2ull &oreign Qey
/ther attributes too may have null or duplicate values without violating any integrity
constraint.
3.5 Database Transactions
5ll these data manipulation operations are part of the fundamental purpose of any
database system, which is to carry out database FtransactionsJ *not to be confused with
the relation named F!ransactionJ that has been used in many of the earlier e(amples+. 5
database transaction is a logical unit of wor'. It consists of the e(ecution of an
application$specified seuence of data manipulation operations.
Recall that the terminology Database system is used to refer to the special system that
includes at least the following components:
the database
the Database Management %ystem
the particular database schema, and
a set of special$purpose application programs
Figure 3.136 Database %ystem -omponents
Normally one database transaction invokes a number of DML operators. Each DML
operator, in turn, invokes a number of data updating operations on a physical level, as
illustrated in Figure 3.14 below.
Figure 3.1!6 -orrespondance between !ransaction, DM> and :hysical /perations
If the Fself contained languageJ is used, the data manipulation operators and database
transactions have a Fone$to$oneJ correspondence. &or e(ample, a transaction to put data
values into the database entails a single DM> I2%3R! operator command.
However, in other instances, especially where embedded language programs are used, a
single transaction may in fact comprise several operations.
%uppose we have an additional attribute, !otal@nt, in the :roduct relation, i.e. our
database will contain the following relations6
Custo#er 'C:; Cna#e; Ccit1; Cp4one)
(roduct '(:; (na#e; (rice; Total0nt)
Transaction 'C:; (:; Date; 0nt)
!otal@nt will hold the running total of @nt of the same :O found in the !ransaction
relation. !otal@nt is in fact a value that is computed as follows6
:roduct.!otal@nt !ransaction.@nt
:O
-onsider for e(ample if we wish to add a new !ransaction tuple. With this single tas',
the system will effectively perform the following 0 seuential operations6
insert a new tuple into the !ransaction relation
update the :roduct relation such that the new !ransaction.@nt is added
on to the value of :roduct.!otal@nt for that same :O
A transaction must be executed as an intact unit, for otherwise if a failure happens after
the insert but before the update operation, then the database will left in an inconsistent
state with the new value inserted but the total quantity not updated. But with transaction
management and processing support, if an unplanned error or system crash occurs before
normal termination, then those earlier operations will be undone - a transaction is
executed in its entirety or totally aborted. It is this support to group operations into a
transaction that helps guarantee that a database would be in a consistent state in the
event of any system failure during its data manipulation operations.
5nd finally, it must be noted that as far as the end$user is concerned, he8she Fcan seeJ
database transactions as undivided portions of information sent to the system, or received
from the system. It is also not important how the data is actually physically stored, only
how it is logically available. !his fle(ibility of data access is readily achieved with
relational database management systems.
! "or#alisation
4.1 Introduction
%uppose we are now given the tas' of designing and creating a database. Pow do we
produce a good designT What relations should we have in the databaseT What attributes
should these relations haveT Eood database design needless to say, is important. -areless
design can lead to uncontrolled data redundancies that will lead to problems with data
anomalies.
In this chapter we will e(amine a process 'nown as ,ormalisationAa rigorous design
tool that is based on the mathematical theory of relations which will result in very
practical operational implementations. 5 properly normalised set of relations actually
simplifies the retrieval and maintenance processes and the effort spent in ensuring good
structures is certainly a worthwhile investment. &urthermore, if database relations were
simply seen as file structures of some vague file system, then the power and fle(ibility of
RDBM% cannot be e(ploited to the full.
&or us to appreciate good design, let us begin by e(amining some bad ones.
4.1.1 A Bad Design
E.Codd has identified certain structural features in a relation which create retrieval and
update problems. Suppose we start off with a relation with a structure and details like:
-ustomer details !ransaction details
C: Cna#e Ccit1 .. (1: Date1 0nt1 (2: Date2 (+: Date+
1 Codd /ondon .. 1 21.1 2 2 23.1
2 Martin (aris .. 1 2$.1 2%
3 Deen /ondon .. 2 2+.1 2
Figure !.16 %imple %tructure
This is a simple and straightforward design. It consists of one relation where we have a
single tuple for every customer and under that customer we keep all his transaction
records about parts, up to a possible maximum of 9 transactions. For every new
transaction, we need not repeat the customer details (of name, city and telephone), we
simply add on a transaction detail.
However, we note the following disadvantages:
!he relation is wide and clumsy
We have set a limit of I *or whatever reasonable value+ transactions per customer.
What if a customer has more than I transactionsT
&or customers with less than I transactions, it appears that we have to store null
values in the remaining spaces. What a waste of spaceB
!he transactions appear to be 'ept in ascending order of :Os. What if we have to
delete, for customer -odd, the part numbered 4Ashould we move the part numbered
0 up *or rather, left+T If we did, what if we decide later to re$insert part 0T !he
additions and deletions can cause aw'ward data shuffling.
>et us try to construct a uery to F&ind which customer*s+ bought :O 0J T !he uery
would have to access every customer tuple and for each tuple, e(amine every of its
transaction loo'ing for
*:4O H 0+ /R *:0O H 0+ /R *:1O H 0+ G /R *:IO H 0+
5 comparatively simple uery seems to reuire a clumsy retrieval formulationB
4.1.2 Another Bad Design
5lternatively, why don#t we re$structure our relation such that we do not restrict the
number of transactions per customer. We can do this with the following structure6
!his way, a customer can have .ust any number of :art transactions without worrying
about any upper limit or wasted space through null values *as it was with the previous
structure+. -onstructing a uery to F&ind which customer*s+ bought :O 0J is not as
cumbersome as before as one can now simply state6 :O H 0.
But again, this structure is not without its faults6
It seems a waste of storage to 'eep repeated values of -name, -city and -phone.
If -O 4 were to change his telephone number, we would have to ensure that we
update 5>> occurrences of -O 4#s -phone values. !his means updating tuple 4, tuple
0 and all other tuples where there is an occurrence of -O 4. /therwise, our database
would be left in an inconsistent state.
%uppose we now have a new customer with -O ?. Powever, there is no part
transaction yet with the customer as he has not ordered anything yet. We may find
that we cannot insert this new information because we do not have a :O which serves
as part of the "primary 'ey# of a tuple. *5 primary 'ey cannot have null values+.
%uppose the third transaction has been canceled, i.e. we no longer need information
about 0K of :O 4 being ordered on 0< 9an. We thus delete the third tuple. We are then
left with the following relation6
But then, suppose we need information about the customer Martin, say the city he is
located in. Unfortunately as information about Martin was held in only that tuple and
having the entire tuple deleted because of its P# transaction, meant also that we have lost
all information about Martin from the relation.
5s illustrated in the above instances, we note that badly designed, unnormalised relations
waste storage space. Worse, they give rise to the following storage irregularities6
1. (pdate anomaly
Data inconsistency or loss of data integrity can arise from data redundancy8repetition and
partial update.
2. )nsertion anomaly
Data cannot be added because some other data is absent.
&. Deletion anomaly
Data maybe unintentionally lost through the deletion of other data.
4.1.3 The Need for Normalisation
Intuitively, it would seem that these undesirable features can be removed by breaking a
relation into other relations with desirable structures. We shall attempt by splitting the
above Transaction relation into the following two relations, Customer and Transaction,
which can be viewed as entities with a one to many relationship.
Figure !.26 1<M data relationships
>et us see if this new design will alleviate the above storage anomalies6
1. (pdate anomaly
If C# 1 were to change his telephone number, as there is only one occurrence of the tuple
in the Customer relation, we need to update only that one tuple as there are no
redundant/duplicate tuples.
2. "ddition anomaly
Adding a new customer with C# 4 can be easily done in the Customer relation of which
C# serves as the primary key. With no P# yet, a tuple in Transaction need not be created.
&. Deletion anomaly
Canceling the third transaction about 20 of P# 1 being ordered on 26 Jan would now
mean deleting only the third tuple of the new Transaction relation above. This leaves
information about Martin still intact in the new Customer relation.
!his process of reducing a relation into simpler structures is the process of
,ormalisation. 2ormalisation may be defined as a step by step reversible process of
transforming an unnormalised relation into relations with progressively simpler
structures. %ince the process is reversible, no information is lost in the transformation.
2ormalisation removes *or more accurately, minimises+ the undesirable properties by
wor'ing through a series of stages called ,ormal -orms. /riginally, -odd defined three
types of undesirable properties6
4. Data aggregates
0. :artial 'ey dependency
1. Indirect 'ey dependency
and the three stages of normalisation that remove the associated problems are 'nown,
respectively, as the6
&irst 2ormal &orm *42&+
%econd 2ormal &orm *02&+, and
!hird 2ormal &orm *12&+
We shall now show a more formal process on how we can decompose relations into
multiple relations by using the 2ormal &orm rules for structuring.
4.2 &irst )ormal &orm *1)&+
!he purpose of the &irst 2ormal &orm *42&+ is to simplify the structure of a relation by
ensuring that it does not contain data aggregates or repeating groups. By this we mean
that no attribute value can have a set of values. In the e(ample below, any one customer
has a group of several telephone entries6
Figure !.3. :resence of Repeating Eroups
!his is thus not in 42&. It must be FflattenedJ. !his can be achieved by ensuring that
every tuple defines a single entity by containing only atomic values. /ne can either re$
organise into one relation as in6
Figure !.!6 5tomic values in tuples
or split into multiple relations as in:
Figure !.!6 Reduction to 42&
2ote that earlier we defined 42& as one of the characteristics of a relation *>esson 0+.
!hus we consider that every relation is at least in the first normal form *thus the &igure ?$
1 is not even a relation+. !he !ransaction relation of &igure ?$0 is however a 42&
relation.
We may thus generalise by saying that F5 relation is in the 42& if the values in the
relation are atomic for every single attribute of the relationJ.
Before we can loo' into the ne(t two normal forms, 02& and 12&, we need to first
e(plain the notion of "functional dependency# as these two forms are constrained by
functional dependencies.
4.3 &unctional De!endencies
4.3.1 Determinant
!he value of an attribute can uniuely determine the value in another attribute. &or
e(ample, in every tuple of the !ransaction relation in &igure ?$06
-O uniuely determines -name
-O also uniuely determines -city as well as -phone
Eiven -O 4, we will 'now that its -name is "-odd# and no other. /n the other hand, we
cannot say that given -city ">ondon#, we will 'now that its -name is "-odd# because
-city ">ondon# will also give -name of "Deen#. !hus -city cannot uniuely determine
-name *in the same way that -O can+.
5dditionally, we see that6
*-O, :O, Date+ uniuely determines @nt
We can now introduce the definition of a FdeterminantJ as being an attribute *or a set of
non$redundant+ attributes which can act as a uniue identifier of another attribute *or
another set of attributes+ of a given relation.
We may thus say that:
-O is a uniue 'ey for -name, -city and -phone.
*-O, :O, Date+ is a uniue 'ey for @nt.
!hese 'eys are non$redundant 'eys as no member of the composite attribute can be left
out of the set. Pence, -O is a determinant of -name, -city, and -phone. *-O, :O, Date+ is
a determinant of @nt.
5 determinant is written as6
A B
and can be read as FA determines BJ *or A is a determinant of B+. If any two tuples in the
relation R have the same value for the A attribute, then they must also have the same
value for their B attribute.
5pplying this to the !ransaction relation above, we may then say6
C: Cna#e
C: Ccit1
C: Cp4one
'C:; (:; Date) 0nt
The value of the attribute on the left-hand side of the arrow is the determinant because its
value uniquely determines the value of the attribute on the right.
2ote also that6
'Ccit1; Cp4one) Cna#e
'Ccit1; Cp4one) C:
'.&.1 .he converse notation
A X B
can be read as A Fdoes not determineJ B.
Taking again the Transaction relation, we may say therefore that Ccity cannot uniquely
determine Cname
Ccit1 X Cna#e
because there exists a number of customers living in the same city.
Likewise:
Cna#e X 'Ccit1; Cp4one)
Cna#e X C:
as there may exist customers with the same name.
4.3.2 Fn!tional Dependen!e
!he role of determinants is also e(pressed as Ffunctional dependenciesJ whereby we can
say6
FIf an attribute A is a determinant of an attribute B, then B is said to be functionally
dependent on AJ
and li'ewise
FEiven a relation R, attribute B of R is functionally dependent on attribute A if and only
if each A$value in R has associated with it one B$value in R at any one timeJ.
F-O is a determinant of -name, -city and -phoneJ is thus also F-name, -city and
-phone are functionally dependent on -O. Eiven a particular value of -name value, there
e(ists precisely one corresponding value for each of -name, -city and -phone. !his is
more clearly seen via the following functional dependency diagram6
Figure !.%< &unctional dependencies in the !ransaction relation
%imilarly, F*-O, :O, Date+ is a determinant of @ntJ is thus also F@nt is functionally
dependent on the set of attributes *-O, :O, Date+J. !he set of attributes is also 'nown as a
composite attribute.
Figure !.$< &unctional dependency on a composite attribute
4.3.3 Fll Fn!tional Dependen!e

FIf an attribute *or a set of attributes+ A is a determinant of an attribute *or a set of
attributes+ B, then B is said to be fully functionally dependent on AJ
and li'ewise
FEiven a relation R, attribute B of R is fully functionally dependent on attribute A of R if
it is functionally dependent on A and not functionally dependent on any subset of A *A
must be composite+J.
Figure !.*< &unctional dependencies in the !ransaction relation
&or the !ransaction relation, we may now say that6
-name is fully functionally dependent on -O
-city is fully functionally dependent on -O
-phone is fully functionally dependent on -O
@nt is fully functionally dependent on *-O, :O, Date+
-name is not fully functionally dependent on *-O, :O, Date+, it is only partially
dependent on it *and similarly for -city and -phone+.
Having understood about determinants and functional dependencies, we are now in a
position to explain the rules of the second and third normal forms.
4.4 $econd )ormal &orm *2)&+
-onsider again the !ransaction relation which was in 42&. Recall the operations we tried
to do in %ection ?.4.0 above and the problems encountered6
1. (pdate
What happens if -odd#s telephone number changes and we update only the first tuple
*but not the second+T
2. )nsertion
If we wish to introduce a new customer, we cannot do so unless an appropriate
transaction is effected.
&. Deletion
If the data about a transaction is deleted, the information about the customer is also
deleted. If this happens to the last transaction for that customer the information about the
customer will be lost.
-learly, the !ransaction relation although it is normalised to 42& still have storage
anomalies. !he reason for such violations to the database#s integrity and consistency
rules is because of the partial dependency on the primary $ey.
Recall, the functional dependencies as shown in &igure ?$(. !he determinant *-O, :O,
Date+ is the composite 'ey of the !ransaction relation $ its value will uniuely determine
the value of every other non$'ey attribute in a tuple of the relation. 2ote that whilst @nt is
fully functionally dependent on all of *-O, :O, Date+, -name, -city and -phone are only
partially functionally dependent on the composite 'ey *as they each depend only on the
-O part of the 'ey only but not on :O or Date+.
!he problems are avoided by eliminating partial 'ey dependence in favour of full
functional dependence, and we can do so by separating the dependencies as follows6
The source relation into thus split into two (or more) relations whereby each resultant
relation no longer has any partial key dependencies:
Figure !.,< Relations in 02&
We now have two relations, both of which are in the second normal form. !hese are the
same relations of &igure ?$1 above, and the discussion we had earlier clearly shows that
the storage anomalies caused by the 42& relation have now been eliminated6
1. (pdate anomaly
!here are no redundant8duplicate tuples in the relation, thus updates are done .ust at one
place without nay worry for database inconsistencies.
2. "ddition anomaly
5dding a new customer can be done in the -ustomer relation without concern whether
there is a transaction for a part or not
&. Deletion anomaly
Deleting a tuple in !ransaction does not cause loss of information about -ustomer
details.
More generally, we shall summarise by stating the following6
4. %uppose, there is a relation R
where the composite attribute *Q4, Q0+ is the :rimary Qey. %uppose also that there e(ist
the following functional dependencies6
'=1; =2) I1 i.e. a full functional dependency on the composite 'ey *Q4, Q0+..
=2 I2 i.e. a partial functional dependency on the composite 'ey *Q4, Q0+.
The partial dependencies on the primary key must be eliminated. The reduction of 1NF
into 2NF consists of replacing the 1NF relation by appropriate projections such that
every non-key attribute in the relations are fully functionally dependent on the primary
key of the respective relation. The steps are:
4. -reate a new relation R0 from R. Because of the functional dependency =2 I2;
R0 will contain Q0 and I0 as attributes. !he determinant, Q0, becomes the 'ey of R0.
0. Reduce the original relation R by removing from it the attribute on the right hand side
of =2 I2. !he reduced relation R4 thus contain all the original attributes but
without I0.
1. Repeat steps 4. and 0. if more than one functional dependency prevents the relation
from becoming a 02&.
?. If a relation has the same determinant as another relation, place the dependent
attributes of the relation to be non$'ey attributes in the other relation for which the
determinant is a 'ey.
Figure !.+< Reduction of 42& into 02&
Thus, A relation R is in 2NF if it is in 1NF and every non-key attribute is fully
functionally dependent on the primary key.
4.5 T%ird )ormal &orm *3)&+
A relation in the Second Normal Form can still be unsatisfactory and show further data
anomalies. Suppose we add another attribute, Salesperson, to the Customer relation who
attends to the needs of the customer.
Its associated functional dependencies are6
C: Cna#e; Ccit1; Cp4one; -alesperson
-onsider again operations that we may want to do on the data
1. (pdate
-an we change the salesperson servicing customers in >ondonT Pere, we find that there
are several occurrences of >ondon customers *e.g. -odd and Deen+. !hus we must ensure
that we update all tuples such that "%mith# is now replaced by the new salesperson, say
"9ones#, otherwise we end up with a database inconsistency problem again.
2. )nsertion
Can we enter data that Fatimah is the salesperson for the city of Sarawak although no
customer exists there yet?
As C# is the primary key, a null value in C# cannot be allowed. Thus the tuple cannot be
created.
&. Deletion
What happens if we delete the second tuple, i.e. we no longer need to 'eep information
about the customer Martin, his city, telephone and salesperson T If this tuple is removed,
we will also lose all information that the salesperson Ducruet services the city of :aris as
no other tuple holds this information.
5s another, more comple( e(ample, consider 'eeping information about parts that are
being 'ept in bins, where the following relation called %toc'
which contains information on:
the bin number *BO+
the part number *:O+ of the part 'ept inside a given bin
the uantity of pieces of the part in a given bin *@B+
the lead time *>!+ ta'en to deliver a part after an order has been placed for it
the re$order level *R>+ of the part number which indicates the an order must be
placed to re$order new stoc' of that part whenever the e(isting stoc' uantity gets
too low, i.e. when @B R>
We further assume that6
parts of a given part number may be stored in several bins
the same bin holds only one type of part, i.e. it does not hold parts of more than
one part number
the lead time and re$order level are fi(ed for each part number
!he only candidate 'ey for this relation is BO, hence it must be selected as the primary
'ey. %ince the BO is a single attribute, the uestion of partial dependency does not arise
*the relation is in %econd 2ormal &orm+.
Its associated functional dependencies *which are full functional dependencies+ are6
B: (:; 0B; /T; R/
But in this case, we also have data anomalies6
1. (pdate
%uppose the re$order level for part number 4 is updated, i.e. R> for :O 4 must be changed
from 4 to ?. We must ensure that we update all tuples containing :O4, i.e. tuples 4 and
07 any partial updates will lead to an inconsistent database inconsistency.
2. )nsertion
We cannot store >! and R> information for a new e(pected part number in the
database unless we actually have a bin number allocated to it.
&. Deletion
If the data *tuple+ about a particular bin is deleted, the information about the part is
also deleted. If this happens to the last bin containing that part, the information about
the concrete part *>!, R>+ will also be lost.
&rom the two e(amples above, it is still evident that despite having relations in 02&,
problems can still arise and we should now try to eliminate them. It would seem we need
to further normalise them, i.e. we need a third normal form to eliminate these anomalies.
%crutinising the functional dependencies of these e(amples, we notice the e(istence of
FotherJ dependencies6
Figure !.16 5ll functional dependencies in the -ustomer relation
2otice for e(ample that the dependency of the attribute %alesperson on the 'ey -O, i.e.
C: -alesperson
is only an indirect or transitive dependency, which is also indicated in the diagram as a
dotted arrow .
!his is considered indirect because C: Ccit1 and Ccit1 -alesperson, and thus
C: -alesperson.

!hus for the %toc' relation6
B: (:; 0B and (: /T; R/
then B: (:; 0B; /T; R/
Figure !.116 5ll functional dependencies in the %toc' relation
!he Indirect Dependency obviously causes data duplication *e.g. note the two
occurrences of :O4 and >! 40 in the first two tuples of %toc'+. which leads to the above
anomalies. !his can be eliminated by removing all indirect/transitive dependencies. We
do this by splitting the source relation into two or more other relations, as illustrated in
the following e(ample6
where we can then get
We can say that the storage anomalies caused by the 02& relation can now be eliminated6
1. (pdate anomaly
!o update the re$order level for part number 4, we need only change one *the first+ tuple
in the new :art relation without concern for other duplicates that used to e(ist before.
2. "ddition anomaly
We can now store >! and R> information for a new part number in the database by
creating a tuple in the new :art relation, without concern whether there is a bin number
allocated to it or not.
&. Deletion anomaly
Deleting the tuple about a particular bin will remove a tuple form the new %toc' relation.
%hould the part that was deleted from that bin be the only bin where it could be found,
however does not mean the loss of data about that part. Information on the >! and R> of
the part is still in the new :art relation.
More generally, we shall summarise by stating the following6
%uppose there is a relation R with attributes 5, B and -. 5 is the determinant.
If A B and B "
then
A " is the "Indirect Dependency#
*/f course, if A " and B does not e(ist, then A " is a "Direct Dependency# +
The Indirect Dependencies on the primary key must be eliminated. The reduction of 2NF
into 3NF consists of replacing the 2NF relation by appropriate projections such Indirect
Key Dependencies are eliminated in favour of the Direct Key Dependencies.
!he steps are6
4. Reduce the original relation R by removing from it the attribute on the right hand side
of any indirect dependencies & C. !he reduced relation R4 thus contain all the
original attributes but without -4.
0. &orm a new relation R0 that contains all attributes that are in the dependency B
C.
1. Repeat steps 4. and 0. if more than one indirect dependency prevents the relation from
becoming a 12&.
?. Lerify that every determinant in every relation is a 'ey in that relation
Figure !.12. Reduction of 02& into 12&
!hus, F5 relation R is in 12& if it is in 02& and every non$'ey attribute is fully and
directly dependent on the primary 'eyJ.
!here is another definition of 12& which states that F5 relation is in third normal form if
every data item outside the primary 'ey is identifiable by the primary 'ey, the whole
primary 'ey and by nothing but the primary 'eyJ.
In order to avoid certain update anomalies, each relation declared in the data base
schema, should be at least in the !hird 2ormal &orm. %tructurally, 02& is better than
42&, and 12& is better than 02&. !here are of course other higher normal forms li'e the
Boyce$-odd 2ormal &orm *B-2&+, the &ourth 2ormal &orm *?2&+ and the &ifth
2ormal &orm *;2&+.
Powever, the !hird 2ormal &orm is uite sufficient for most business database design
purposes, although some very specialised applications may reuire the higher$level
normalisation.
It must be noted that although normalisation is a very important database design
component, we should not always design in the highest level of normalisation, thinking
that it is the best. Often at the physical implementation level, the decomposition of
relations into higher normal form mean more pointer movements are required to access
and the thus slower the response time of the database system. This may conflict with the
end-user demand for fast performance. The designer may sometimes have to
denormalise some portions of a database design in order to meet performance
requirements at the expense of data redundancy and its associated storage anomalies.
% Relational &lgebra '(art I)
5.1 Relational ,lgebra and Relational Calculus
Many different DM>s can be designed to e(press database manipulations. Different
DM>s will probably differ in synta(, but more importantly they can differ in the basic
operations provided. &or those familiar with programming, these differences are not
unli'e that found in different programming languages. !here, the basic constructs
provided can greatly differ in synta( and in their underlying approach to specifying
computations. !he latter can be very contrasting indeedAloo' at, for e(ample, the
differences between procedural *eg. -+ and declarative *eg. :rolog+ approaches to
specifying computations.
Relational 5lgebra and Relational -alculus are two approaches to specifying
manipulations on relational databases. !he distinction between them is somewhat
analagous to that between procedural and declarative programming. !hus, before loo'ing
at the details of these approaches, it is instructive to briefly digress and loo' at procedural
and declarative programming a little more closely. We will then try to put into conte(t
how the algebra and calculus help in the design and implementation of DM>s.
Briefly, the procedural approach specifies a computation to be performed as e%plicit
se0uences of operations. !he operations themselves are built from a basic set of primitive
operations and structuring8control primitives that build higher level operations. 5n
operation can determine the flow of control *ie. determines the ne(t operation to be
e(ecuted+ and8or cause data to be transformed. !hus programming in the procedural style
involves wor'ing out the operations and their appropriate order of e(ecution to effect the
desired transformation*s+ on input data.
!he declarative approach, in contrast, would specify the same computation as a
description of the logical relationship between input and output data. !hese relationships
are typically e(pressed as a set of truth$valued sentences about data ob.ects. 2either
operations nor their seuences are made e(plicit by the programmer. /perations are
instead implicit in the predetermined set of rules of inference used to derive new
sentences from those given *or derived earlier+.
In other words, if you used a procedural programming language, you must specify ho1
input is to be transformed to desired outputs using the basic operations available in the
language. Powever, if you used a declarative language, you must instead describe 1hat
relationships must be satisfied between inputs and outputs, but essentially say nothing
about how a given input is to be transformed to the desired output *it is for the system to
determine how, typically by some systematic application of inference rules+.
5midst such differences, users are right to raise some basic concerns. 5re these
differences superficialT /r do they mean that there can be computations specifiable in
one but not in the otherT If the latter were the case, programmers must avoid choosing
languages that simply cannot e(press some desired computation *assuming we 'now
e(actly the limitations of each and every programming language+B &ortunately for
programmers, this is not the case. It is a well$'nown result that every general-purpose
programming language *whether procedural or declarative+ is e>ui9alent to every other.
!hat is, if something is computable at all, it can be specified in any of these languages.
%uch euivalence is established by reference to a fundamental model of computation
4
that
underlies the notion of computability.
2ow what can we say about the different *e(isting and future+ database languagesT -an
two different languages be said to be euivalent in some senseT !o answer this uestion
we must first as' whether, in relational databases, there is a fundamental model of
database manipulationT !he answer is yesARelational -alculus defines that modelB
>et us first state a fundamental definition6
5 language that can define any relation definable in relational calculus is
relationally complete
!his definition provides a benchmar' by which any e(isting *and future+ language may
be .udged as to its power of e(pression. Different languages may thus be euivalent in the
sense of being relationally complete
0
.
Relational -alculus, as we shall see later, provides constructs *well$formed e(pressions+
to specify relations. !hese constructs are very much in a declarative style. Relational
5lgebra, on the other hand, also provides constructs for relational database manipulations
but in a more procedural style. Moreover, it has also been established that Relational
5lgebra is euivalent to Relational -alculus, in that every e(pression in one has an
euivalent e(pression in the other. !hus relational completeness of a database language
can also be established by showing that it can define any relation e(pressible in
Relational 5lgebra.
Why then should we bother with different languages and styles of e(pression if they are
all in some sense euivalentT !he answer is that besides euivalence *relational or
computational+, there are other valid issues that different language designs try to address
including the level of abstraction, precision, comprehensibility, economy of e(pression,
ease of writing specifications, efficiency of e(ecution, etc. Declarative constructs, by
virtue of their descriptive nature *as opposed to the prescriptive nature of procedural
constructs+, are closer to natural language and thus easier to write and understand.
Designers of declarative languages try to provide constructs that even end$users *with a
little training+ can use to formulate, for e(ample, ad hoc ueries. Declarative constructs,
however, e(ecute less efficiently than euivalent procedural ones. 5ttempts to ma'e them
more efficient typically involve first, automatic translation to euivalent procedural form,
and second, optimising the resulting e(pressions according to some predetermined rules
of optimisation. In fact, this was the original motivation for the 5lgebra, ie. providing a
set of operations to which e(pressions of the -alculus could be translated and
4
!he ,niversal !uring Machine *model of computation+ is accepted as defining the class of all
computable functions. 3very programming language shown to be euivalent to it is therefore
euivalent with every other.
0
2ote however, that relational completeness is not the same as computational completeness, ie.
Relational -alculus is not euivalent to general$purpose programming languages. It is a
specialised calculus intended for the Relational Database Model. !hus while two languages may
be relationally complete, each may have features over and above that reuired for relational
completeness *but these need not concern us here+.
subseuently optimised for e(ecution. But the efficency of even this approach cannot
match that of carefully hand$coded procedural specifications. !hus for periodical and
freuently e(ecuted manipulations, it is more efficient to use algebraic forms of database
languages.
Because of the fundamental nature and role of the Relational 5lgebra and -alculus in
relational databases, we will loo' at them in depth in the ensuing chapters. !his will
provide readers with the basic 'nowledge of database manipulations possible in the
model. We begin in this chapter with Relational 5lgebra.
5.2 -.er.ie/ of Relational ,lgebra
Relational 5lgebra comprises a set of basic operations. 5n operation is the application of
an operator to one or more source *or input+ relations to produce a new relation as a
result. !his is illustrated in &igure ; $4 below. More abstractly, we can thin' of such an
operation as a function that maps arguments from specified domains to a result in a
specified range. In this case, the domain and range happen to be the same, ie. relations.
Figure %.1 5 Relational 5lgebraic /peration
!his is no different in principle to, say, operations in arithmetic. &or e(ample, the "add#
operation in arithmetic ta'es two numbers and produces a number as a result. In
arithmetic, we are used to writing e(pressions to denote operations and their results.
!hus, the "add# operation is usually written using the "U# symbol *the operator+ placed
between its two aguments, eg. 4?; U 4<D. Moreover, because an e(pression denotes the
result of the operation *which is of the same type as its input arguments+, it itself can be
written as an argument in another operation, allowing us to construct comple(
e(pressions to denote one result, eg. 4?; U 4<D = 0K 1.
Complex expressions that combine different operations are evaluated by a sequence of
reductions. The sequence, in the case of arithmetic expressions, is determined by the
familiar precedence of operators. Thus, the expression 145 + 168 20 3 would be
reduced as follows:
4?; U 4<D = 0K 1 4?; U 4<D = <K 141 = <K 0;1
!his default precedence can be overridden with the e(plicit use of parenthesis. !hus,
*4?; U *4<D = 0K++ 1 *4?; U 4?D+ 1 1I1 1 44CI
5ll these would of course be elementary to the readerB !he point though is that the basic
operations of relational algebra are defined to allow manipulations of relations in much
the same way that arithmetic operations manipulate numbers above. With appropriately
defined symbols to denote operators and synta( to denote the application of operators to
arguments *relations+, relational e(pressions combining multiple operations can be
constructed to denote a resultant relation. 5nd as with arithmetic e(pressions, *algebraic+
relational e(pressions are evaluated by reduction in some specified *default or e(plicit+
order. !hus the earlier statement that relational algebra is basically procedural in nature,
ie. operations and their seuencing are e(plicitly specified to achieve a particular
transformation of input*s+ to output.
!he analogy with arithmetic above has been useful to highlight the basic nature of
relational algebra. Powever, the algebra#s basic operations are much more comple( than
those of arithmetic. !hey are in fact much more related to operations on sets *vi).
intersection, union, difference, etc+. !his is not surprising as relations are after all special
'inds of sets.
5s mentioned above, a relational operation maps relations to a relation. 5s a relation is
completely defined by its intension and e(tension, the complete definition of a relational
operation must specify both the schema of the resultant relation and the rules for
generating its tuples from the input relation*s+. In the following, we will do .ust that.
Moreover, in the interest of clarity as well as precision of presentation, we will define
each basic operation both informally and formally.
While notations used to denote the basic operators and operations may differ in the
literature, there is no disagreement in their basic logical definitions. It will be necessary
for us to use some concrete notation in what follows and, rather than introducing yet
more notations, we have chosen in fact to use -odd#s original notation.
5.3 $election
%election is used to e(tract tuples from a relation. 5 tuple from the source relation is
selected *or not+ based on whether it satisfies a specified predicate *condition+. 5
predicate is a truth$valued e(pression involving tuple component values and their
relationships. 5ll tuples satisfying the predicate are then collected into the resultant
relation. !he general effect is illustrated in &igure ; $0.
Figure %.2 !he %elect /peration
For example, consider the Customer source relation below:
-ustomer
-O -name -city -phone
4 -odd >ondon 00<1K1;
0 Martin :aris ;;;;I4K
1 Deen >ondon 001?1I4
The result of a selection operation applied to it with the condition that the attribute
Ccity must have the value London will be:
Result
because these are the only tuples that satisfy the stated condition. :rocedurally, you may
thin' of the operation as e(amining each tuple of the source relation in turn *say from top
to bottom+, chec'ing to see if it met the specified condition before turning attention to the
ne(t tuple. If a tuple satisfies the condition, it is included in the resultant relation,
otherwise it is ignored.
We shall use the following synta( to e(press a selection operation6
select ?source.relation.na#e@
74ere ?predicate@
gi9ing ?result.relation.na#e@
!he Rsource$relation$nameS must already have been defined in the database schema, or
is the name of the result of one of previous operations.
In its simplest form, the RpredicateS is a simple scalar comparison operation, ie.
e(pressions of the form
?9alue
1
@ ?co#parator@ ?9alue
2
@
where RcomparatorS is one of any comparison operator *ie. H, R, S, , , etc+. Rvalue
i
S
denotes a scalar value and is either a valid attribute name in the source relation, or a
constant value. If an attribute name is specified, it denotes the corresponding value in the
tuple under consideration. !hus, the e(ample operation above could have been denoted
by the following construct6
select Custo#er 74ere Ccit1 5 A/ondonB
/f course, arguments to a comparator must reduce to values from the same domain.
!he RpredicateS may also ta'e the more comple( form of a boolean combination of
simple comparison operations above, using the boolean operators "52D#, "/R#, and
"2/!#.
!he Rresult$relation$nameS is a uniue name that is associated with the result relation. It
can be viewed as a convenient abbreviation that can be used as Rsource$relation$nameSs
in subseuent operations. !hus, the full selection specification corresponding to the
e(ample above is6
select Custo#er 74ere Ccit1 5 A/ondonB gi9ing Result
2ote that the intension of the resultant relation is identical to that of the source relation.
In other words, the result of selection has the same number of atrributes *columns+ and
attribute names *column labels+ as the source relation. Its overall effect, therefore, is to
derive a "hori)ontal# subset of the source relation.
As another example, consider the following relation. Each tuple represents a sales
transaction recording the customer number of the customer purchasing the goods (C#),
the product number of the goods sold (P#), the date of the transaction (Date) and the
quantity sold (Qnt).
!ransaction
-O :O Date @nt
4 4 04.K4 0K
4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
Suppose now that we are interested in looking at only those transactions which took place
before February 26 with quantities of more than 25 or involving customer number 2. This
need would be translated into the following selection:
select Transaction
74ere Date ? 2$.1 &"D 0nt @ 2% CR C: 5 2
gi9ing Result
and would yield the relation:
valid relation name
valid attribute name
a literal value
Result
-O :O Date @nt
4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
!his e(ample illustrates the use of boolean combinations in the RpredicateS. Powever,
formulating comple( predicates is not as simple and straightforward as it may seem. !he
basic problem is having to deal with ambiguities at two levels.
&irst, the informal statement *typically in natural language+ of the desired condition may
itself be ambiguous. !he alert reader would have noted that the phrase *in the above
e(ample+
FGonly those transactions which too' place before &ebruary 0<
with uantities of more than 0; or involving customer number
0GJ
has two possible interpretations6
a+ a transaction is selected if it is before &ebruary 0< and its
uantity is more than 0;, or it is selected if it involves
customer number 0
b+ all selected transactions must be those that are before &ebruary
0< but additionally, each must either involve a uantity of
more than 0; or a customer number of 0 *or both+
%uch ambiguities must be resolved first before construction of the euivalent boolean
e(pression is attempted. In the above e(ample, the first interpretation was assumed.
%econd, the formal boolean e(pressions involving 52D, /R and 2/! may also be
ambiguous. !he source of ambiguity is not unli'e that for natural language *ambiguity of
strength or order of binding+.
!hus
Cna#e 5 ACoddB &"D Ccit1 5 A/ondonB CR Ccit1 5 A(arisB
may be interpreted as
a+ a customer -odd who either lives in >ondon or :aris
*ie. the /R binds stronger and before 52D+
b+ a customer -odd who lives in >ondon, or any customer who
lives in :aris *ie. the 52D binds stronger and before /R+
Because of its more formal nature, however, such ambiguities are easier to overcome. It
can in fact be eliminated through a convention of operator precedences and e(plicit
*syntactical+ devices to override default precedences. !he conventions used are in fact as
follows6
4. e(pressions enclosed within parentheses have greater
precedence *ie. binds stronger+. !hus,
Cna#e 5 ACoddB &"D 'Ccit1 5 A/ondonB CR Ccit1 5
A(arisB)
can only ta'e interpretation *a+ above
0. !he order of precedence of the boolean connectives, unless
overridden by parentheses, are *from highest to lowest+ 2/!,
52D, /R. !hus,
Cna#e 5 ACoddB&"D Ccit1 5 A/ondonB CR Ccit1 5
A(arisB
can only ta'e interpretation *b+ above
!here is no limit to the level of "nesting# of parentheses, ie. a parenthesised e(pression
may have within it a parenthesised sube(pression, which may in turn have within it a
parenthesised sub$sube(pression, etc. Eiven any boolean e(pression and the above
conventions, we can in fact construct a precedence tree that visually depicts its uniue
meaning. &igure ; $1 illustrates this. 5 leaf node represents a basic comparison
operation, whereas a non$leaf node represents a boolean combination of its children. 5
node deeper in the tree binds stronger than those above it. 5lternatively, the tree can be
viewed as a prescription for reducing *evaluating+ a boolean e(pression to a truth$value
*true or false+. !o reduce a node6
a+ if it is a leaf node, replace it with the result of its associated comparison
operation
b+ if it is a non$leaf node, reduce each of its children7 then replace it with
the result of applying its associated boolean operation on the truth$
values of its children
Figure %.3 5 boolean e(pression and its precedence tree
Using these simple conventions, we can check that expressions we construct indeed carry
the intended meanings. (The reader can go back the the last example and ascertain that
the intended interpretation was indeed correctly captured in the predicate of the selection
statement)
5t this point, we should say a few words about the notation, particularly in the conte(t of
the analogy to arithmetic e(pressions in the last section. %trictly spea'ing, the full
selection synta( above is not an e(pression that can be used as an argument to another
operation. !his does not contradict the analogy, however. !he selection synta(, in fact,
has an e(pression component comprising the select$ and where$clauses only, ie. without
the giving$clause6
select ?source.relation.na#e@ 74ere ?predicate@ gi9ing ?result.relation.na#e@
!hus,
select Custo#er 74ere Ccit1 5 A/ondonB
is an e(pression that completely specifies a selection operation while denoting also its
result, in much the same way that "0U1# completely specifies an addition operation while
also denoting the resultant number *ie. ;+. !he e(pression, therefore, can syntactically
occur where a relation is e(pected and it would then be valid to write6
select 'select Custo#er 74ere Ccit1 5 A/ondonB ) 74ere C: ? 3
%trictly spea'ing, this is all we need to define the selection operation. %o, of what use is
the giving$clauseT !he answer, in fact, was alluded to earlier when we described the
clause as allowing us to introduce a convenient abbreviation. It is convenient, and useful,
especially in simplifying and ma'ing more readable what may otherwise be unwieldy and
confusing e(pressions. 3ven the simple double selection e(pression above may already
loo' unwieldy to the reader *imagine what the e(pression would loo' li'e if it involved
4K algebraic operations, sayB+. It would be clearer to write6
select Custo#er 74ere Ccit1 5 A/ondonB gi9ing ResultD
select Result 74ere C: ? 3 gi9ing Result2
*of course, this operation could have been more simply and clearly written as a single
selection operation involving a boolean combination of comparison operations7 the reader
should attempt to write it as an e(ercise+
Mathematical notation too have various devices to introduce abbreviations to simplify
and ma'e e(pressions more readable. What we are doing here with the giving$clause is
analogous to, for e(ample, writing6
let ( H 0U1
let y H C=0
let ) H *(=y+ *(Uy+
instead of the unabbreviated F**0U1+=*C=0++ **0U1+U*C=0++J. !he giving$clause is thus
mainly a stylistic device. It is important to note that that is all it is $ introducing a
temporary abbreviation to be used in another operation. In particular, it is not to be
interpreted as permanently modifying the database schema with the addition of a new
relation name.
In this boo', we will favour this notational style because we thin' it leads to a simpler
and more comprehensible notation. !he reader should note, however, that while other
3 ( p r e s s i o n
descriptions in the literature may favour and use only the e(pression forms, the
differences are superficial.
For#al Definition
If denotes a relation, then let
S() denote the finite set of attribute names of (ie. its intension)
T() denote the finite set of tuples of (ie. its extension)
dom(), where S(), denote the set of allowable values for
, where T() and S(), denote the value of attribute in tuple
!he selection operation ta'es the form
select where giving
where is a predicate e(pression. !he synta( of a predicate e(pression is given by the
following B2& grammar *this should be viewed as an abstract synta( not necessarily
intended for an end$user language+6
predYe(p 66H compYe(p Z boolYe(p Z * predYe(p +
boolYe(p 66H negatedYe(p Z binaryYe(p
negatedYe(p 66H 2/! predYe(p
binaryYe(p 66H predYe(p boolYop predYe(pr
boolYop 66H 52D Z /R
compYe(p 66H argument comparator argument
comparator 66H S Z R Z Z Z H
argument
1
66H attributeYname Z literal
is well$formed iff it is syntactically correct and
for every attributeYname in , %*+
for every compYe(pr
4

0
*where "# denotes a comparator+ such that
4
,
0
%*+,
either dom*
4
+ dom*
0
+ or dom*
0
+ dom*
4
+
for every compYe(pr , or *where "# denotes a comparator+ such
that
%*+ and is a literal, dom*+
&urther, let *+ denote the application of a well$formed predicate e(pression to a tuple
!*+. *+ reduces in the conte(t of , ie. the occurrence of any %*+ in is
first replaced by . !he resulting e(pression is then reduced to a truth$value according
to the accepted semantics of comparators and boolean operators.
!hen , the resultant relation of the selection operation, is characterised by the following6
1
!he synta( of "attributeYname# and "literal# are unimportant in what follows and we leave it
unspecified
%*+ %*+
!*+ M Z !*+ *+ N
5.4 ro0ection
Whereas a selection operation e(tracts rows of a relation meeting specified conditions, a
pro.ection operation e(tracts specified columns of a relation. !he desired columns are
simply specified by name. !he general effect is illustrated in &igure ; $?.
Figure %.!< !he pro.ection operation
We could thin' of selection as eliminating rows *tuples+ not meeting the specified
conditions. In li'e manner, we can thin' of a pro.ection as eliminating columns not
named in the operation. Powever, an additional step is reuired for pro.ection because
removing columns may result in duplicate rows, which are not allowed in relations. @uite
simply, any duplicate occurrence of a row must be removed so that the result is a relation
*a desired property of relational algebra operators+.
&or e(ample, using again the customer relation6
-ustomer
its projection over the attribute Ccity would yield (after eliminating all columns other
than Ccity):
Result
-city
>ondon
:aris
>ondon
duplicates
Note the duplication of row 1 in row 3. Projection can result in duplication because the
resultant tuples have a smaller degree whereas the uniqueness of tuples in the source
relation is only guaranteed for the original degree of the relation. For the final result to be
a relation, duplicated occurrences must be removed, ie.
Result
-city
>ondon
:aris
!he form of a pro.ection operation is6
proEect ?source.relation.na#e@
o9er ?list.of.attribute.na#es@
!hus the above operation would be written as6
proEect Custo#er
o9er Ccit1
gi9ing Result
5s with selection, Rsource$relation$nameS must be a valid relationAa relation name
defined in the database schema or the name of the result of a previous operation. Rlist$of$
attribute$namesS is a comma$separated list of at least one identifier. 3ach identifier
appearing in the list must be a valid attribute name of Rsource$relation$nameS. 5nd
finally, Rresult$relation$nameS must be a uniue identifier used to name the resultant
relation.
Why would we want to pro.ect a relation over some attributes and not othersT @uite
simply, we sometimes are interested in only a subset of an entity#s attributes given a
particular situation. !hus, if we needed to telephone all customers to inform them of
some new product line, data about a customer#s number and the city of residence are
superfluous. !he relevant data, and only the relevant data, can be presented using6
proEect Custo#er
o9er Cna#e; Cp4one
gi9ing Result
Result
-name -phone
-odd 00<1K1;
Martin ;;;;I4K
Deen 001?1I4
3(tending this e(ample, suppose further that we have multiple offices sited in ma.or
cities and the tas' of calling customers is distributed amongst such offices, ie. the office
in >ondon will call up customers resident in >ondon, etc. 2ow the simple pro.ection
above will not do, because it presents customer names and phone numbers without regard
to their place of residence. If it was used by each office, customers will receive multiple
calls and you will probably have many annoyed customers on your hands, not to mention
the huge phone bills you unnecessarily incurredB
!he desired relation in this case must be restricted to only customers from a given city.
Pow can we specify thisT !he simple answer is that we cannot $ not with .ust the
pro.ection operation. Powever, the alert reader would have realised that the reuirement
to restrict resultant rows to only those from a given city is e(actly the sort of reuirement
that the selection operation is designed forB In other words, here we have an e(ample of a
situation that needs a composition of operations to compute the desired relation. !hus, for
the office in >ondon, the list of customers and phone numbers relevant to it is computed
by first selecting customers from >ondon, then pro.ecting the result over customer names
and phone numbers. !his is illustrated in &igure ; $;. &or offices in other cities, only the
predicate of the selection needs to be appropriately modified.
2ote that the order of the operations is significant, ie. a selection followed by a
pro.ection. It would not wor' the other way around *you can verify this by trying it out
yourself+.
Figure %.% -ombining operators to compute a desired relation
For#al Definition
%*+ denote the finite set of attribute names of *ie. its intension+
!*+ denote the finite set of tuples of *ie. its e(tension+
, where !*+ and %*+, denote the value of attribute in tuple
!he pro.ection operation ta'es the form
proEect o9er gi9ing
where is a comma$separated list of attribute names. &ormally, *as a discrete
structure+ may be considered a tuple, but having a concrete enumeration synta( *comma$
separated list+.
>et %
tuple
*(+ denote the set of elements in the tuple (. !hen, must observe the following
constraint6
%
tuple
*+ %*+
ie. every name occurring in must be a valid attribute name in the relation .
&urthermore, if !*+ and denotes a tuple, we define6
R*, , ) %
tuple
*+ %
tuple
*+
ie. a tuple element is in the tuple if and only if the attribute name occurs in .
!hen , the resultant relation of the pro.ection, is characterised by the following6
%*+ %
tuple
*+
!*+ M Z !*+ R*, , ) N
5.5 )atural 1oin
!he ne(t operation we will loo' at is the 2atural 9oin *hereafter referred to simply as
9oin+. !his operation ta'es two source relations as inputs and produces a relation whose
tuples are formed by concatenating tuples from each input source. It is basically a
cartesian product of the e(tensions of each input source. Powever, not all possible
combinations of tuples necessarily end up in the result. !his is because it implicitly
selects from among all possible tuple combinations only those that have identical values
in attributes shared by both relations.
Figure %.$ !he 9oin combines two relations over one or more common domains
!hus, in a typical application of a 9oin, the intensions of the input sources share at least
one attribute name or domain *we assume here that attribute names are global to a
schema, ie. the same name occurring in different relations denote the same attribute and
value domain+. !he 9oin is said to occur over such domain*s+. &igure ; $< illustrates the
general effect. !he shaded left$most two columns of the inputs are notionally the shared
attributes. !he result comprise these and the concatenation of the other columns from
each input. More precisely, if the degree of the input sources were m and n respectively,
and the number of shared attributes was s, then the degree of the resultant relation is
*mUn=s+.
5s an e(ample, consider the two relations below6
-ustomer
!hese relations share the attribute "-O#, as indicated. !o compute the .oin of these
relations, consider in turn every possible pair of tuples formed by ta'ing one tuple from
each relation, and e(amine the values of their shared attribute. %o if the pair under
consideration was
<4, -odd, >ondon, 00<1K1;>and <4, 4, 04.K4, 0K>
we would find that the values match e(actly. In such a case, we concatenate them and add
the concatenation to the resultant relation. It doesn#t matter if the second tuple is
concatenated to the end of the first, or the first to the second, as long as we are consistent
about it. By convention, we use the former. 5dditionally, we omit the second occurrence
of the shared attribute in the result *repeated occurrence is superfluous+. !his gives us the
tuple
<4, -odd, >ondon, 00<1K1;, 4, 04.K4, 0K>
If, on the other hand, the pair under consideration was
<1, Deen, >ondon, 001?1I4> and <4, 4, 04.K4, 0K>
we would ignore it because the values of their shared attributes do not match e(actly.
!hus, the resultant relation after considering all pairs would be6
Result
-O -name -city -phone :O Date @nt
4 -odd >ondon 00<1K1; 4 04.K4 0K
4 -odd >ondon 00<1K1; 0 01.K4 1K
0 Martin :aris ;;;;I4K 4 0<.K4 0;
0 Martin :aris ;;;;I4K 0 0I.K4 0K
!he foregoing description is in fact general enough to admit operations on relations that
do not share any attributes at all *s H K+. !he .oin, in such a case, is simply the cartesian
product of the input sources# e(tensions *the condition that tuple combinations have
identical values over shared attributes is vacuously true since there are no shared
attributesB+. Powever, such uses of the operation are atypical.
!ransaction
-O :O Date @nt
4 4 04.K4 0K
4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
%hared
attribute
*s H 4+
%yntactically, we will write the 9oin operation as follows6
Eoin ?source.relation.na#e@
1
&"D ?source.relation.na#e@
2
o9er ?attribute.na#e.list@
where again
Rsource$relation$nameS
I
is a valid relation name *in the schema or the result
of a previous operation+
Rattribute$name$listS is a comma$separated non$empty list of attribute names,
each of which must occur in both input sources, and
Rresult$relation$nameS is a uniue identifier denoting the resultant relation
With this synta(, particularly with the over$clause, we have in fact ta'en the liberty
*4+ to insist that the .oin must be over at least one shared attribute, ie. we disallow
e(pressions of pure cartesian products of two relations that do not share any
attribute. !his restriction is of no practical conseuence, however, as in practice a
9oin is used to bring together information from different relations related through
some common value.
*0+ to allow a .oin over a subset of shared attributes, ie. we rela( *generalise+ the
restriction that a 9oin is over all shared attributes.
If a 9oin is over a proper subset of shared attributes, then shared attributes not specified in
the over$clause will each have its own column in the result relation. But in such cases, the
respective column labels will be ualified names. We will adopt the convention of
writing a ualified name as ".#, where is the column label and the relation name in
which appears. 5s an illustration, consider the relations below6
R4 R0
54 50 X 54 50 W
4 0 abc 0 1 pr
4 1 def 0 0 (y)
0 ? i.'
!he operation
Eoin R1 &"D R2 o9er &1 gi9ing Result
will yield
Result
54 R4.50 X R0.50 W
0 ? i.' 1 pr
0 ? i.' 0 (y)
To see why Join is a necessary operation in the algebra, consider the following situation
(assume as context the Customer and Transaction relations above): the company decided
that customers who purchased product number 1 (P# = 1) should be informed that a fault
has been discovered in the product and that, as a sign of good faith and of how it values
its customers, it will replace the product with a brand new fault-free one. To do this, we
need to list, therefore, the names and phone numbers of all such customers.
First, we need to identify all customers who purchased product number 1. This
information is in the Transaction relation and, using the following selection operation, it
is easy to limit its extension to only such customers:
Next, we note that the resultant relation only identifies such customers by their customer
numbers. What we need, though, are their names and phone numbers. In other words, we
would like to extend each tuple in A with the customer name and phone number
corresponding to the customer number. As such items are found in the relation Customer
which shares the attribute C# with A, the join is a natural operation to perform:
With B, we have practically derived the information we needin fact, more than we
need, since we are interested only in the customer name (the Cname column) and phone
number (the Cphone column). But as weve learned, the irrelevant columns may be
easily removed using projection, as shown below.
5s a final e(ample, let us also assume we have the :roduct relation, in addition to the
-ustomer and !ransaction relations6
:roduct
:O :name :price
4 -:, 4KKK
0 LD, 40KK
The task is to get the names of products sold to customers in London. Once again, this
task will require a combination of operations which must involve a Join at some point
because not all the information required are contained in one relation. The sequence of
operations required is shown below.
-ustomer
select Custo#er 74ere Ccit1 5 A/ondonB gi9ing
&
5 !ransaction
-
O
-name -city -phone -O :O Date @nt
4 -odd >ondon 00<1K1; 4 4 04.K4 0K
1 Deen >ondon 001?1I4 4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
Eoin & &"D Transaction o9er C: gi9ing B
B :roduct
-O -name -city ... :O Date @nt :O :name :price
4 -odd >ondon ... 4 04.K4 0K 4 -:, 4KKK
4 -odd >ondon ... 0 01.K4 1K 0 LD, 40KK
Eoin B &"D (roduct o9er (: gi9ing C
-
-
O
-name -city -phone :O Date @nt :name :price
4 -odd >ondon 00<1K1; 4 04.K4 0K -:, 4KKK
4 -odd >ondon 00<1K1; 0 01.K4 1K LD, 40KK
proEect C o9er (na#e gi9ing Result
Result
:name
-:,
LD,
For#al Definition
5s before, if denotes a relation, then let
&urther, if
4
and
0
are tuples, let
4
[
0
denote the tuple resulting from appending
0
to the
end of
4
.
We will also have need to use the terminology introduced in defining pro.ection above, in
particular, %
tuple
and the definition6
R*, , ) %
tuple
*+ %
tuple
*+
!he *natural+ .oin operation ta'es the form
Eoin &"D o9er gi9ing
5s with other operations, the input sources and must denote valid relations that are
either defined in the schema or are results of previous operations, and must be a uniue
identifier to denote the result of the .oin. is a tuple of attribute names such that6
%
tuple
*+ *%*+ %*++
>et H *%*+ %*++ = %
tuple
*+, ie. the set of shared attribute names not specified in the
over$clause. We ne(t define, for any relation r6
Rename*r, + M Z %*r+ = * H "r.p# p %*r+ + N
In the case that H MN or %*r+ H MN, Rename*r, + H %*r+.
!he 9oin operation can then be characterised by the following6
%*+ Rename*,+ Rename*,+
!*+ M
4
[
0
Z
4
!*+ !*+ R*, ,
0
+
%
tuple
*+
4
H N
where
%
tuple
*+ H %*+ = %
tuple
*+
$ Relational &lgebra '(art II)
2.1 Introduction
In the previous chapter, we introduced relational algebra as a fundamental model of
relational database manipulation. In particular, we defined and discussed three important
operations it provides6 %elect, :ro.ect and 2atural 9oin. !hese constitute what is called
the basic set of operators and all relational DBM%, without e(ception, support them.
We have presented e(amples of the power of these operations to construct solutions
*derived relations+ to various ueries. Powever, there are classes of practical ueries for
which the basic set is insufficient. !his is best illustrated with an e(ample. ,sing again
the same e(ample domain of customers and products they purchase, let us consider the
following reuirement6
FEet the names of customers who had purchased both product number 4 and product
number 0J
-ustomer !ransaction
-
O
-name -city -phone -O :O Date @nt
4 -odd >ondon 00<1K1; 4 4 04.K4 0K
0 Martin :aris ;;;;I4K 4 0 01.K4 1K
1 Deen >ondon 001?1I4 0 4 0<.K4 0;
0 0 0I.K4 0K
5ll the reuired pieces of data are in the relations shown above. It is uite easy to see
what the answer isAfrom the !ransaction relation, customers number 4 and number 0 are
the ones we are interested in, and cross$referencing the -ustomer relation *to retrieve
their names+ the customers are -odd and Martin respectively. 2ow, how can we
construct this solution using the basic operation setT
Wor'ing bac'wards, the final relation we wish to construct is a single$column relation
with the attribute "-name#. !hus, the last operation needed will be a pro.ection of some
relation over that attribute. %uch a relation must first be the result of .oining -ustomer
and !ransaction *over "-O#+, since -ustomer alone does not have data on products
purchased. %econd, it must contain only tuples of customers who had purchased products
4 and 0, ie. some form of selection must be applied. !his analysis suggests that the
reuired seuence of operations is a 9oin, followed by a %elect, and finally a :ro.ect.
!he following then may be a possible solution6
Eoin Custo#er &"D Transaction o9er C: gi9ing &
select & 74ere (: 5 1 &"D (: 5 2 gi9ing B
proEect B o9er Cna#e gi9ing Result
!he .oin results in6
5
-O -name -city -phone :O Date @nt
4 -odd >ondon 00<1K1; 4 04.K4 0K
4 -odd >ondon 00<1K1; 0 01.K4 1K
0 Martin :aris ;;;;I4K 4 0<.K4 0;
0 Martin :aris ;;;;I4K 0 0I.K4 0K
5t this point, however, we discover a problem6 the selection on 5 results in an empty
relationB
!he problem is the selection condition6 no tuple can possibly satisfy a condition that
reuires a single attribute to have t1o different values *F:O H 4 52D :O H 0J+. !his is
obvious once it is pointed out, although it might not have been so at first glance. !hus
while the selection statement is syntactically correct, its logic is erroneous. What is
needed, effectively, is to select tuples of a particular customer only if there e(ists one
with :O H 4 and another with :O H 0, ie. the form of selection needed is dependent across
tuples. But the basic %elect operator cannot e(press this because it operates on each tuple
in turn and independently of one another.
?
!hus the proposed solution above is not a solution at all. In fact, no combination of the
basic operations can handle the uery or other ueries of this sort, for e(ample6
Get the names of customers who bought the product CPU but not the
product VDU, or
Get the names of customers who bought every product type that the
company sells, etc
!hese e(amples suggest that additional operations are needed. In the following, we shall
present them and show how they are used.
We will round up this chapter and our discussion of relational algebra with a discussion
of two other important topics6 how operations handle FnullJ values, and how seuences
of operations can be optimised for performance. 5 null value is inserted into a tuple field
to denote an *as yet+ un'nown value. -learly, this affects the evaluation of conditions
involving attribute values. 3(actly how will be e(plained in %ection <.?. &inally, we will
see that there may be several different seuences of operations that derive the same
result. In such cases, we may well as' which seuence is more efficient, ie. least costly or
better in performance, in some sense. 5 more precise notion of "efficiency# of operators
and how a given operator seuence can be made more efficient will be discussed in
section <.;.
?
%ome readers may have noted that if /R was used instead of 52D in the selection operation,
the desired result would be constructed. Powever, this is coincidental. !he use of /R is logically
erroneousAit means one or the other, but not necessarily both. !o see this, change the e(ample
slightly by deleting the last tuple in !ransaction and recompute the result *using /R+. Wour
answer would still be -odd and Martin, but the correct answer should be -odd aloneB
2.2 Di.ision
5s the name of this operation implies, it involves dividing one relation by another.
Division is in principle a partitioning operation. !hus, < 0 can be paraphrased as
partitioning a single group of < into a number of groups of 0Ain this case, 1 groups of 0.
!he basic terminology used in arithmetic will be used here as well. !hus in an e(pression
li'e ( y, ( is the dividend and y the divisor. Division does not always yield whole
groups of the divisor, eg. C 0 gives 1 groups of 0 and a remainder group of 4. Relational
division too can leave remainders but, much li'e integer division, we ignore remainders
and focus only on constructing whole groups of the divisor.
!he manner in which a relational dividend is partitioned is a little more comple(. &irst
though, we should as' what aspect of a relation is being partitionedT !he answer simply
is the set of tuples in the relation. 2e(t, we as' how we decide to group some tuples
together and not othersT 2ot surprisingly, the basis for such decisions has to do with the
attribute values in the tuples. >et#s ta'e a loo' at an e(ample first before we describe the
process more precisely.
R R# Result
54 50 54 50 54
4 a 4 a 4
4 b 4 b 0
0 c 8Ma,bN 0 a
0 b 0 b
0 a
1 c
!he illustration above shows how we may divide a relation R, which is a simple binary
relation in this case with two attributes 54 and 50. &or clarity, the values of attribute 54
have been sorted so that a given value appears in contiguous rows *where there#s more
than one+. !he uestion we#re interested in is which of these values have in common an
arbitrary subset of values of attribute 50.
&or e(ample,
Fwhich values of 54 share the subset Ma,bN of 50TJ
By inspecting R, the reader can verify that the answer are the values 4 and 0, because
only tuples with these 54values have corresponding 50 entries of both "a# and "b#. :ut
another way, the tuples of R are grouped by the common denominator or divisor Ma,bN.
!his is shown in the relation R# where we emphasise the groups formed using double$line
borders. /ther tuples *the remainder of the division+ are ignored. 2ote that R# is not the
final result of divisionAit is only an intermediate wor'ing result. !he desired result are
the values of attribute 54 in it, or put another way, the pro.ection of R# over 54.
&rom this e(ample, we can see that a division of a relation R is performed over some
attribute of R. !he divisor is a subset of values from that attribute domain and the result is
a relation comprising the remaining attributes of R. In relational algebra e(pessions, the
divisor is in fact specified by another relation D. &or this to be meaningful at all, D must
have at least one attribute in common with the R. !he division is over the common
attribute*s+ and the set of values used as the actual divisor are the values found in D. !he
general operation is depicted in the figure below.
Figure $.1. !he Division /peration
&igure < $0 shows a simple e(ample of dividing a binary relation R4 by a unary relation
R0. !he division is over the shared attribute I0. !he divisor is the set M4,0,1N, these being
the values found in the shared attribute in R0. Inspecting the tuples of R4, the value "a#
occur in tuples such that their I0 values match the divisor. %o "a# is included in the result.
"b# is not, however, as there is no tuple Rb,0S.
proEect Transaction
o9er C:; (: gi9ing &
proEect (roduct
o9er (: gi9ing B
Figure $.2 Division of a binary relation by a unary relation
We can now specify the form of the operation6
di9ide ?di9idend.relation.na#e@ b1 ?di9isor.relation.na#e@
Rdividend$relation$nameS and Rdivisor$relation$nameS must be names of defined
relations or results of previous operations. Rresult$relation$nameS must be a uniue name
used to denote the result relation. 5s mentioned above, the divisor must share attributes
with the dividend. In fact, we shall insist *on a stronger condition+ that the intension of
the divisor must be a subset of the dividend#s. !his is not really a restriction as any
relation that shares attributes with the dividend can be turned into the reuired form
simply by pro.ecting over them.
We can now show how division can be used for the type of ueries mentioned in the
introduction. !a'e the uery6
FEet the names of customers who bought every product type that the company sellsJ
!he !ransaction relation records customers who have ever bought anything. &or this
uery, however, we are not interested in the dates or purchase uantities but only in the
product types a customer purchased. %o we pro.ect !ransaction over -O and :O to give us
a wor'ing relation 5. !his is shown on the left side of the following illustration. 2e(t, we
need all the product types the company sells, and these may be obtained by pro.ecting the
relation :roduct over :O to give us a wor'ing relation B. !his is shown on the right side
of the illustration.
!ransaction :roduct
-O :O Date @nt :O :name :price
4 4 04.K4 0K 4 -:, 4KKK
4 0 01.K4 1K 0 LD, 40KK
0 4 0<.K4 0;
1 0 0I.K4 0K
5 B
di9ide & b1 B
gi9ing C
Eoin Custo#er; C
o9er C: gi9ing
Result
-O :O :O
4 4 4
4 0 0
0 4
1 0
2ow as we are interested in only those customers that
purchased all products *ie. all the values in B+, B is thus
used to divide 5 to result in the wor'ing relation -. In
this case, there is only one such customer. &inally, the
details of the customer are obtained by .oining - with
the -ustomer relation over -O.
-ustomer -
-
O
-name -city -phone -O
4 -odd >ondon 00<1K1; 4
Result
For#al Definition
!o formally define the Divide operation, we will use the notation introduced and used in
-hapter ;. Powever, for convenience, we repeat here principal definitions to be used.
%
tuple
*(+ denote the set of elements in tuple (
&urthermore, if !*+, denotes a tuple, and %
tuple
*+ %*+, we define6
R*, , ) %
tuple
*+ %
tuple
*)
!he Divide operation ta'es the form
di9ide b1 gi9ing
5s with other operations, the input sources and must denote valid relations that are
either defined in the schema or are results of previous operations, and must be a uniue
identifier to denote the result of the division. !he intensions of and must be such that
%*+ %*+
!he Divide operation can then be characterised by the following6
%*+ %*+ = %*+
!*+ M Z
4
!*+ R*
4
,,+ !*+ IM*+ N
where
%
tuple
*+ H %*+,
%
tuple
*+ H %*+, and
IM*+ H M t Z t !*+ R*t, , t) R*t, , + N
2.3 $et -!erations
Relations are basically sets. We should, therefore, be able to apply standard set operations
on them. !o do this, however, we must observe a basic rule6 a set operation on two or
more sets is meaningful if the sets comprise values of the same type. !his is so that
comparison of values from different sets is meaningful. It is uite pointless, for e(ample,
to attempt an intersection of a set of integers and a set of names. We can still perform the
operation, of course, but we can already tell at the outset that the result will be a null set
because any value from one will never be eual to any value from the other.
!o ensure this rule is observed for relations, we need to state what it means for two
relations to comprise values of the same type. 5s a relation is a set of tuples, the values
we are interested in are the tuples themselves. %o when is it meaningful to compare two
tuples for eualityT -learly, the structure of the tuples must be identical, ie. the tuples
must be of eual length and their corresponding elements must be of the same type. /nly
then can two tuples be eual, ie. when their corresponding element values are eual. !he
structure of a tuple, put another way, is in fact the intension or schema of the relation it
occurs in. !hus, meaningful set operations on relations reuire that the source relations
have identical intensions8schemas. %uch relations are said to be union-compatible.
!he set operations included in relational algebra are ,nion, Intersection, and Difference.
Qeeping in mind that they are applied to whole tuples, these operations behave in e(actly
the standard way. It goes without saying that their results are also relations with
intensions identical to the source relations.
!he ,nion operation ta'es the form
Rsource$relation$4S union Rsource$relation$0S giving Rresult$relationS
where Rsource$relation$iS are valid relations or results of previous operations and are
union$compatible, and Rresult$relationS is a uniue identifier denoting the resulting
relation.
&igure < $1 illustrates this operation.
Figure $.3 Relational ,nion /peration
!he Intersection operation ta'es the form
Rsource$relation$4S intersect Rsource$relation$0S giving Rresult$relationS
relation.
&igure < $? illustrate this operation.
Figure $.! Relational Intersection /peration
!he Difference operation ta'es the form
Rsource$relation$4S minus Rsource$relation$0S giving Rresult$relationS
relation.
&igure < $; illustrate this operation.
Figure $.% Relational Difference /peration
5s an e(ample of the need for set operations, consider the uery6 Fwhich customers
purchased the product -:, but not the product LD,TJ
!he seuence of operations to answer this uestion is uite lengthy, but not difficult.
:robably the best way to construct a solution is to wor' bac'wards and observe that if we
had a set of customers who purchased -:, *say W4+ and another set of customers who
purchased LD, *say W0+, then the solution is obvious6 we only want customers that
appear in W4 but not in W0, or in other words, the operation FW4 minus W0J.
!he problem now has been reduced to constructing the sets W4 and W0. !heir
constructions are similar, the difference being that one focuses on the product -:, while
the other the product LD,. We show the construction for W4 below.
!ransaction :roduct
-O :O Date @nt :O :nam
e
:price
4 4 04.K4 0K 4 -:, 4KKK
4 0 01.K4 1K 0 LD, 40KK
0 4 0<.K4 0;
1 0 0I.K4 0K
X
-O :O Date @nt :name :price
4 4 04.K4 0K -:, 4KKK
4 0 01.K4 1K LD, 40KK
0 4 0<.K4 0; -:, 4KKK
1 0 0I.K4 0K LD, 40KK
Eoin Transaction &"D (roduct o9er (: gi9ing F
!he above 9oin operation is needed
to bring in the product name into
the resulting relation. !his is then
used as the basis of a selection, as
shown on the right.
W4
-O :O Date @nt :name :price
4
4 04.K4 0K -:, 4KKK
0 4 0<.K4 0; -:, 4KKK
-ustomer \4
-
O
-name -city -phone -O
4 -odd >ondon 00<1K1; 4
0 Martin :aris ;;;;I4K 0
W4
The construction for W2 is practically identical to that above except that the selection
operation specifies the condition Pname = VDU. The reader may like to perform these
steps as an exercise and verify that the following relation is obtained:
W0
2ow we need only perform the difference operation FW4 minus W0 giving ResultJ to
construct a solution to the uery6
Result
select F 74ere (na#e 5 C(U gi9ing G1
W4 now has only customer numbers that
purchased the product -:,. 5s we are interested
only in the customers and not other details, we
perform the pro.ection on the right.
proEect G1 o9er C: gi9ing H1
&inally, details of such
customers are obtained by
.oining \4 and -ustomer,
giving the desired relation
W4.
Eoin Custo#er &"D H1 o9er C: gi9ing I1
For#al Definition
!he form of set operations is
?set operator@ gi9ing
where Rset operatorS is one of "union#, "intersect# or "minus#7 , are source relations
and the result relation. !he source relations must be union$compatible, ie. %*+ H %*+.
!he set operations are characterised by the following6
%*+ H %*+ H %*+ for all Rset operatorSs
for "union#
!*+ M t Z t !*+ t !*+ N
for "intersect#
!*+ M t Z t !*+ t !*+ N
for "minus#
!*+ M t Z t !*+ t !*+ N
2.4 )ull .alues
In populating a database with data ob.ects, it is not uncommon that some of these ob.ects
may not be completely 'nown. &or e(ample, in capturing new customer information
through forms that customers are reuested to fill, some fields may have been left blan'
*some customers may ta'e e(ception to revealing their age or phone numbersB+. In these
cases, rather than not have any information at all, we can still record those that we 'now
about. But what value do we insert into the un'nown fields of data ob.ectsT >eaving a
field blan' is not good enough as it can be interpreted as an empty string which may be a
valid value for some domains. We need a value that denotes "un'nown# and that cannot
be confused with valid domain values.
It is here that the ,ull value is used. We can thin' of it as a special value different from
any other value from any attribute domain. 5t the same time, we may thin' of it as
belonging to every attribute domain in the database, ie. it may appear as a value for any
attribute and not violate any type constraints. %yntactically, different DBM%s may use
different symbols to denote null values. &or our purposes, we will use the symbol "T#.
Pow do null values affect relational operationsT 5ll relational operations involve
comparing values in tuples, including :ro.ection *which involves comparison of result
tuples for duplicates+. !he 'ey to answering this uestion is in how we evaluate boolean
operations involving null values. !hus, for e(ample, what does FT S ;J evaluate toT !he
un'nown value could be greater than ;. But then again, it may not be. !hat is, the value
of the boolean e(pression cannot be determined on the basis of available information. %o
perhaps we should consider the result of the comparison as un'nown as wellT
,nfortunately, if we did this, the relational operations we#ve discussed cease to be well$
definedB !hey all rely on comparisons evaluating categorically to one of two values6
!R,3 or &5>%3. &or e(ample, if the above comparison *FT S ;J+ was generated in the
process of selection, we would not 'now whether to include or e(clude the associated
tuple in the result if we were to admit a third value *,2Q2/W2+. If we wanted to do
that, we must go bac' and redefine all these operations based on some form of three$
valued logic.
!o avoid this problem, most systems that allow null values simply interpret any
comparison involving them as &5>%3. !he rationale is that even though they could be
true, they are not demonstrably true on the basis of what is 'nown. !hat is, the result of
any relational operation conservatively includes only tuples that demonstrably satisfy
conditions of the operation. 5dopting this convention, all the operations defined
previously still hold without any amendment. %ome implications on the outcome of each
operation are considered below.
&or the %elect operation, an un'nown value cannot identify a tuple. !his is illustrated in
&igure < $< which shows two %elect operations applied to the relation R. 2ote that
between the two operations, the selection criteria ranges over the entire domain of the
attribute I0. /ne would e(pect therefore, that any tuple in R4 would either be in the result
of the first or the second. !his is not the case, however, as the second tuple in R4 *Rb,TS+
is not selected in either operationAthe un'nown value in it falsifies the selection criteria
of both operationsB
Figure $.$ %electing over null values
&or :ro.ection, tuples containing null values that are otherwise identical are not
considered to be duplicates. !his is because the comparison FT H TJ, by the above
convention, evaluates to &5>%3. !his leads to the situation as illustrated in &igure < $C
below. !he reader should note from this e(ample that the symbol "T#, while it denotes
some value much li'e a mathematical variable, is uite unli'e the latter in that it#s
occurrences do not always denote the same value. !hus FT H TJ is not demonstrably true
and therefore considered &5>%3.
Figure $.* :ro.ecting over null values
In a 9oin operation, tuples having null values under the common attributes are not
concatenated. !his is illustrated in &igure < $D *FTH4J, F4HTJ and FTHTJ are all &5>%3+.
Figure $., 9oining over null values
In Division, the occurrence of even one null value in the divisor means that the result will
be an empty relation, as any value in the dividend#s common attribute*s+ will fail when
matched with it. !his is illustrated in &igure < $I below. 2ote, however, that this is not
necessarily the case if only the dividend contains null values under the common
attribute*s+Adivision may still be successful on tuples not containing null values.
Figure $.+ Division with null divisors
In set operations, because tuples are treated as a single unit in comparisons, a single rule
applies6 tuples otherwise identical but containing null values are considered to be
different *as was the case for :ro.ection above+. &igure < $4K illustrates this for each set
operation. 2ote that because of the occurrence of null values, the tuples in R0 are not
considered duplicates of R4#s tuples. !hus their union simply collects tuples from both
relations7 subtracting R0 from R4 simply results in R47 and their intersection is empty.
Figure $.1 %et operations involving null values
2.5 -!timisation
3ach relational operation entails a certain amount of wor'6 retrieving a tuple, e(amining a
tuple#s attribute values, comparing attribute values, creating new tuples, repeating a
process on each tuple in a relation, etc. &or a given operation, the amount of wor' clearly
varies with the cardinality of source relation*s+. &or e(ample, a selection performed on a
relation twice the cardinality of another *of the same degree+ would involve twice as
much wor'.
We can also compare the relative amount of wor' needed between different operations
based on the number of tuples processed. 5n operation with two source inputs, for
e(ample, need to repeat its logic on every possible tuple$pair formed by ta'ing a tuple
from each input relation. !hus if we had two relations of cardinalities M and 2
respectively, a total of M2 tuple$pairs must be processed, ie. M *or 2+ times more than,
say, a selection operation on each individual relation. /f course, this is not an e(act
relative measure of wor', as there are also differences in the amount of wor' e(pended
by different operations at the tuple level. By and large, however, we are interested in the
order of magnitude of wor' *rather than the e(act amount of wor'+ and this is fairly well
appro(imated by the number of tuples processed.
We will call such a measure the efficiency of an operation. !hus, the efficiency of
selection and pro.ection is the cardinality of its single input relation, while the efficiency
of .oin, divide and set operations is the product of the respective cardinalities of their two
input relations.
Why should the efficiency of operations interest usT -onsider the following seuence of
operations6
.oin -ustomer 52D !ransaction over -O giving X7
select X where --ity H F>ondonJ giving Result
%uppose the cardinality of -ustomer was 4KK and that of !ransaction was 4KKK. !hen the
efficiency of the .oin operation is 4KK4KKK H 4KKKKK. !he cardinality of X is 4KKK *as it
is certainly intended that the -O in every !ransaction tuple matches a -O in one of the
-ustomer tuples+. !herefore, the efficiency of the selection is 4KKK. 5s these two
operations are performed one after another, the efficiency of the entire seuence of
operations is naturally the sum of their individual efficiencies, ie. 4KKKKKU4KKK H
4K4KKK.
2ow consider the following seuence6
select -ustomer where --ity H F>ondonJ giving X7
.oin X 52D !ransaction over -O giving Result
!he reader can verify that this seuence is relationally euivalent to the first, ie. they
produce identical results. But how does its efficiency compare with that of the firstT >et
us calculate using the same assumptions about the cardinalities. !he efficiency of the
selection is 4KK. !o estimate the efficiency of the .oin, we need to ma'e an assumption on
the cardinality of X. >et#s say that 4K customers live in >ondon. !hen the efficiency of
the .oin is 4K4KKK H 4KKKK, and the efficiency of the seuence as a whole is 4KKU4KKKK
H 4K4KKAten times more efficient than the firstB
/f course, the reader may thin' that the assumption about X#s cardinality was contrived
to give this dramatic performance improvement. !he point, however, is that the second
seuence can do no worse than the first, ie. if all customers in the -ustomer relation live
in >ondon, then it performs as poorly as the first. More li'ely, however, we e(pect a
performance improvement.
!he above e(ample illustrates a very important point about relational algebra6 there can
be more than one *seuence of+ e(pression that describe a desired result. !he main aim of
optimisation, therefore, is to translate a given *seuence of+ e(pression into its most
efficient euivalent form. %uch optimisation may be done manually by a human user or
automatically by the database management system. 5utomatic optimisation may in fact
do better because the automatic optimiser has access to information that is not readily
available to a human optimiser, eg. current cardinalities of source relations, current data
values, etc. But the overwhelming ma.ority of relational DBM%#s available today merely
e(ecute operations reuested by users as is. !hus, it is important that users 'now how to
perform optimisations manually.
&or manual optimisation, it is perhaps less important to derive the most efficient form of
a uery than to follow certain guidelines, heuristics or rules$of$thumb that lead to more
efficient e(pressions. &reuently the latter will lead to acceptable performance and
e(pending more effort to find the optimal e(pression may not significantly improve that
performance if good heuristics are used. !here is, in fact, a simple and effective rule to
remember when writing ueries6 delay as long as possible the use of e(pensive
operationsB In particular, we should wherever possible put selection ahead of other
operations because it reduces the cardinality of relations. &igure < $44 illustrate the
application of this principle. !he reader should be able to verify that the two seuences of
operations are logically euivalent and that intuitively the selection operations before the
.oins can significantly improve the efficiency of the uery.
Figure $.11 Delay e(pensive operations
* Relational Calculus '(art I)
3.1 Introduction
We established earlier the fundamental role of relational algebra and calculus in relational
databases *see ;.4+. More specifically, relational calculus is the basis for the notion of
relational completeness of a database language, ie. any language that can define any
relation e(pressible in relational calculus is relationally complete.
Relational 5lgebra *see chapters ; and <+ is one such language. Its approach is
procedural, ie. it provides a number of basic operations on relations and successive
applications of these operations must be properly seuenced to derive answers to
database ueries. !he basic operators are in themselves uite simple and easy to
understand. Powever, e(cept for fairly simple ueries, the construction of operation
seuences can be uite comple(. &urthermore, such constructions must also consider
efficiency issues and strive to find optimal ones *see <.;+. 5 considerable amount of
programming s'ill is therefore reuired to effectively use relational algebra.
Relational -alculus ta'es a different approach to the human=database interface. Rather
than reuiring users to specify how relations are to be manipulated, it only reuires them
to define 1hat the desired result is. Pow the result is actually computed, ie. the operations
used, their seuencing and optimisation, is left to the database management system to
wor' out. 5s it doesn#t deal with procedures *ie. seuencing of operations+, this approach
is freuently termed non-procedural or declarative.
Relational -alculus is mainly based on the well$'nown :ropositional -alculus, which is a
method of calculating with sentences or declarations. %uch sentences or declarations, also
termed propositions, are ones for which a truth value *ie. FtrueJ or FfalseJ+ may be
assigned. !hese can be simple sentences, such as Fthe ball is redJ, or they may be more
comple( involving one or more simple sentences, such as Fthe ball is red 52D the
playing field is greenJ. !he truth value of comple( sentences will of course depend on the
truth values of their components. !his is in fact what the calculus "calculates#, using rules
for combining truth values of component sentences.
In Relational -alculus, the sentences we deal with are simpler and refer specifically to
the relations and values in the database of interest. %imple sentences typically ta'e the
form of comparisons of values denoted by variables or constants, eg. X 1, X R W, etc.
More comple( sentences are built using logical connectives 5nd *"&#+ and /r *"Z#+, eg. X
S C & X R W Z X ;. %imple and comple( sentences li'e these are e(amples of Well$
&ormed &ormulae, which we will define fully later.
Regardless of their e(act synta(, a formula is in principle a logical function with one or
more free variables. &or purposes of illustration, we will write such functions as in the
following annotated e(ample6
In the above e(ample, there is one free variable, X. !he value of the function can be
computed for specific instances of X. !hus,
&*4;+ *4; S 40 & 4; R 4D+ *true & true+ true
&*4K+ *4K S 40 & 4K R 4D+ *false & true+ false
5dditionally, free variables are deemed to range over a set of permitted values, ie. only
such values can instantiate them. We shall see the significance of this later, as applied to
relations. But .ust to illustrate the concept for now, consider the following function over
two free variables6
&*X,W+ H6 X S W & W R 40
%uppose X ranges over MD, 4;N and W ranges over MC,4?N. !hen &*D, 4?+ and &*4;, C+ are
allowable instantiations of the function, with truth values false and true respectively,
whereas &*4KKK,0KK+ is not a valid instantiation. %uch restrictions of values over which
free variables range become significant when we interpret a formula as the simple uery6
Fget the set of values of free variables for which the formula evaluates to trueJ. !hus, for
the above formula, we need only construct the following table involving only the
permitted values6
X W &*X,W+
D C true
D 4? false
4; C true
4; 4? false
!he desired set of values can then be read from the rows where &*X,W+ evaluated to true,
ie. the set M*D,C+, *4;,C+N.
Relational -alculus is an application of the above ideas to relations. We will develop
these ideas in greater detail in the following sections.
3.2 Tu!le 4ariables
&ree variables in logical functions can in principle range over any type of value. 5 feature
that distinguishes Relational -alculus from other similar calculi is that the free variables
range over relations. More specifically, any free variable ranges over the e(tension of a
designated relation, ie. the current set of tuples in the relation. !hus, a free variable may
be instantiated with a tuple selected from the designated relation.
%uppose, for e(ample, we introduced a variable - to range over the relation -ustomer, as
in &igure C $4. !hen - may be instantiated with any one of the three tuples at any one
time. !he e(ample shows - instantiated with the second tuple. 3uivalently, we may
sometimes say that - "holds# a value instead of being instantitated with that value
;
. In
any case, because variables li'e - range over tuples *or is only permitted to hold a tuple+,
they are termed tuple variables.
Figure *.1 5 !uple Lariable - ranging over the -ustomer Relation
5 tuple has component parts, and unless we have a means of referring to such parts, the
logical functions we formulate over relations will have limited e(pressive power. Eiven,
for e(ample, two variables X and W that range over two different relations with a
common domain, we may want to specify a condition where their current instantiations
are such that the values under the common domain are identical. !hus while X *and W+
denote a tuple as a whole, we really wish to compare tuple component values. !he
syntactic mechanism provided for this purpose ta'es the form6
Rtuple$variable$nameS.Rattribute$nameS
and is interpreted to mean the value associated with Rattribute$nameS in the current
instantiation of Rtuple$variable$nameS. !hus, assuming the instantiation of - as in &igure
C $46
-.-O H 0
-.-name H "Martin#
Getc
!his denotation of a particular data item within a tuple variable is often referred to as a
pro.ection of the tuple variable over a domain *eg. F-.-nameJ is a pro.ection of tuple
variable - over the domain -name+.
Relational -alculus is a collection of rules of inference of the form6
Rtarget listS 6 Rlogical e(pressionS
where Rtarget listS is a list of free variables and8or their pro.ections that are referenced in
Rlogical e(pressionS. !his list is thought of as the Ftarget listJ because the set of
instantiations of the list items that ma'es Rlogical e(pressionS true is the desired result.
In other words, an inference rule may be thought of as a uery, and may be informally
understood as a reuest to find all variable instantiations that satisfy Rlogical e(pressionS
and, for each such instantiation, to e(tract the data items mentioned in Rtarget listS.
&or e(ample, consider the inference rule in &igure C $0. It references one free variable, -,
which ranges over -ustomer. !he Rtarget listS specifies items we are interested in $ only
the phone number in this case $ but only of those tuples that satisfy the Rlogical
e(pressionS. In other words, the rule may be paraphrased as the uery to Fget the set of
;
!his terminology may perhaps be favoured by programmers who are used to programming
language variables and to thin'ing about them as memory locations that can "hold# one value at a
time.
phone numbers of customers who live in >ondonJ. 2ote that the use of the variable -
both in Rtarget listS and in Rlogical e(pressionS denotes the same instantiation, thereby
ensuring that F-.-phoneJ is e(tracted from the same tuple that satisfies the comparison
F-.-city H >ondonJ. !he computed set in this case would be M00<1K1;, 001?1I4N,
corresponding to the phone numbers in the first and last tuples $ these being the only
tuples satisfying F-.-city H >ondonJ.
Figure *.2 5n inference rule over the -ustomer relation
The reader should note the simplicity and declarative character of the inference rule,
which merely states the desired result (the <target list>) and the conditions that must be
satisfied (the <logical expression>) for a value to be included in the result. Contrast this
with relational algebra which would require the following construction:
select Custo#er 74ere Ccit1 5 J/ondonK gi9ing FD
proEect F o9er Cp4one gi9ing Result
!he above e(ample only used a single variable. Powever, a single variable can only
range over a single relation, while often the data items of interest are spread over more
than one relation. In such cases, we will need more than one tuple variable.
&igure C $1 illustrates such a case involving two variables, : and !, ranging over
relations :roduct and !ransaction respectively. 2ote that the inference rule at the top of
the figure
lists items from both variables in the target list *ie. :.:name, !.-O+
compares in the logical e(pression pro.ections of the two different variables over the
same domain *!.:O H :.:O+
It further illustrates specific instantiations of each variable and evaluation of the logical
e(pression in the conte(t of these instantiations. In this case, the logical e(pression is true
and therefore the items in the target list are e(tracted from the variables *shown in the
FresultJ table+. It is important to note that a given inference, as in this illustration, is
entirely in the conte(t of a specific instantiation of each tuple variable. It is meaningless,
for e(ample, to evaluate F!.:O H :.:OJ using one instance of : and F:.:rice S 4KKKJ
using another. !he total number of inferences that can be attempted for any given rule is
therefore the product of the cardinality of each variable#s range.
!he inference rule in this e(ample may be paraphrased as the uery Ffind the customer
numbers and product names, priced at more than 4KKK, that they purchasedJ. 5s an
e(ercise, the reader should attempt to construct this uery in relational algebra *hint6 it
will involve the basic operators %elect, :ro.ect and 9oin+.
Figure *.3 Multiple variable inference
3.3 5uantifiers
>ogical e(pressions may also include variable 0uantifiers, specifically6
4. the e%istential uantifier, denoted by the symbol "#, and
0. the universal uantifier, denoted by the symbol "#
!hese uantifiers uantify variables. 5n e(istentially uantified variable, say (, is written
F(J and is read as Fthere e(ists an ( such thatGJ. 5 universally uantified variable is
written as F(J and is read as Ffor all ( GJ.
@uantification is applied to a formula and is written preceding it. &or e(ample,
( *( R y & y R 40+
would be read as Fthere e(ists an ( such that ( is less than y and y is less than 40J. !he
formula to which the uantification is applied is called the scope of uantification.
/ccurrences of uantified variables in the scope of uantification are said to be bound
*e(istentially or universally+. !he scope is normally obvious from the written
e(pressions, but if ambiguities might otherwise arise, we will use parenthesis to delimit
scope.
Informally, the formula F( * Re(prS +J asserts that there e(ists at least one value of (
*from among its range of values+ such that Re(prS is true. !his assertion is false only
when no value of ( can be found to satisfy Re(prS. /n the other hand, if the assertion is
true, there may be more than one such value of (, but we don#t care which. In other
words, the truth of an e(istentially uantified e(pression is not a function of the
uantified variable*s+
<
.
5s an e(ample, consider the unuantified e(pression
( R y & y R 40
and suppose x ranges over {4,15} and y over {7,14}. The truth table for the expression is:
<
!he truth of a uantified e(pression does depend, of course, on the range of permitted values of
the uantified variables.
( y (Ry & yR40
? C true
? 4? false
4; C false
4; 4? false
2ow consider the same e(pression but with ( e(istentially uantified6
( *( R y & y R 40+
Since we dont care which value of x makes the expression true as long as there is at least
one, its truth depends only on the unbound variable y:
y ( *(Ry & yR40+
C true
4? false
An existentially quantified expression therefore has a distinctly different meaning from
the same expression unquantified. In particular, when <logical expression> of an
inference rule is existentially quantified, it becomes a query on the free variables only,
since it is a function of only those variables.
Figure *.! :roduct and !ransaction relations with associated tuple variables
-onsider, for e(ample, the :roduct and !ransaction relations in &igure C $?, with tuple
variables : ranging over the former and ! over the latter. !he rule
:.:name6 ! *!.:O H :.:O 5nd !.-O H 4+
is interpreted as the uery to find values of the free variable : such that there e(ists at
least one value of the bound variable ! satisfying the formula F!.:O H :.:O 5nd !.-O H
4J. 5s before, evaluation of the e(pression is in the conte(t of some instantiations of the
variables. 5ll possible values of : must be considered, but for each we need only find one
value of ! to satisfy e(pression. /nce we have done that, other possible values of !, if
any, may be ignored. &or e(ample, with : instantiated to the first tuple of :roduct, we
consider in turn tuples of !ransaction as values of !. We will find in fact that the first
already satisfies the e(pression and we may therefore ignore the others. With : set to the
second tuple, however, we will find no value for ! to satisfy the e(pression. !he result
for this e(ample therefore is only one value for :, with :.:nameH-:,.
5s another e(ample, consider the relations in &igure C $; with associated tuple variables
as shown. %uppose, we are interested in finding the names of customers who bought the
product -:,. !hat is, our target is the value X.-name, but only if X is a customer who
has bought the product -:,. In other words, there must e(ist a W such that FX.-O H
W.-OJ. !his would establish that X bought a product, denoted by W.:O *the product
number+. &urthermore, this product must be a -:,, ie. there must e(ist a \ such that
FW.:O H \.:OJ and F\.:name H -:,J. !hus, the rule corresponding to our uery is
X.-name6 W \ * X.-O H W.-O & W.:O H \.-O & \.:name H -:, +
!he reader can verify that the answer satisfying this uery is M-odd, MartinN.
Figure *.% :roduct, -ustomer and !ransaction relations with associated tuple variables
,sing again the relations in &igure C $;, let#s loo' at a more comple( uery6 get the
names of customers who bought the product -:, and LD,. 5t first glance, this seems a
simple e(tension of the above uery6
X.-name6 W \ * X.-O H W.-O & W.:O H \.-O &
\.:name H -:, & \.:name H LD, +
But the reader who remembers a similar e(ample in section <.4 would have noted a
problem. %pecifically, the sube(pression F\.:name H -:, & \.:name H LD,J can
never be true for a given value of \Aa field of a given tuple can only hold one value, so
only one or the other can be true but not bothB /f course what we mean to specify is that
the customer purchased at least one product which is a -:,, and another which is a
LD,. %ince a tuple variable can hold only one value at a time, this clearly cannot be
done using only one tuple variable. !he solution, therefore, is to introduce additional
distinct variables to range over the same relation when more than one tuple is to be
considered at a time *note that relational calculus places no restriction on the number of
distinct variables that can range over a relation+. &or this particular e(ample, we need
only introduce one additional variable each for the relations !ransaction and :roduct
respectively, as shown in &igure C $<. !his will allow us to consider two separate
purchases at one time. !he correct formulation, therefore, is6
X.-name6 !4 !0 :4 :0 * X.-O H !4.-O & X.-O H !0.-O &
!4.:O H :4.-O & :4.:name H -:, &
!0.:O H :0.-O & :0.:name H LD, +
&igure C $< additionally shows particular values of these variables that satisfy our uery.
Figure *.$ Multiple variables ranging over a relation
>et#s turn now to the universal uantifier. Informally, the formula F( * Re(prS +J asserts
that for every value of ( *from among its range of values+ Re(prS is true. >i'e the
e(istential uantifier, the truth of an e(istentially uantified e(pression is not a function
of the uantified variable*s+. -onsider, for e(ample, the unuantified e(pression
( R y Z y R 40
and suppose x ranges over {4,15} and y over {7,14}. The truth table for the expression is:
( y (Ry Z yR40
? C true
? 4? true
4; C true
4; 4? false
2ow consider the same e(pression but with ( universally uantified6
( *( R y Z y R 40+
In a sense, like the existentially quantified variable, we dont care what the values of x
are, as long as every one of them makes the expression true for any given y. Thus its truth
table is:
y ( *(Ry Z yR40+
C true
4? false
!he universal uantifier will be needed for ueries li'e the following6
Fget the names of customers who bought every type of productJ
5ssume the relations as in &igure C $;. !he phrase Fevery type of productJ clearly means
every tuple of the :roduct relation. Powever, the :roduct relation does not record
purchases, which are found only in the !ransaction relation, ie. a product is purchased
*by someone+ if there is a transaction recording its purchase. In other words, a customer
*ie. X+ satisfies this uery if for every product *ie. \+ there is a transaction *ie. W+
recording its purchase by the customer. !his can now be uite simply rewritten in the
calculus6
X.-name6 \ W *X.-O H W.-O & W.:O H \.:O+
2ote that the different types of uantifiers can be mi(ed. But note also that their order is
significant, ie. ( y * Re(prS + is not the same as y ( * Re(prS +. &or e(ample,
( y * y is the mother of (+
asserts that everyone has a mother. Whereas,
y ( * y is the mother of (+
asserts that there is a single individual *y+ who is the mother of everyoneB
3.4 6ell7&ormed &ormulae
>et us now be more precise about the valid forms of logical e(pressions involving tuple
variables. %uch valid forms are called well$formed formulae *wff+ and are defined as
follows6
4. A # is a wff
if A is a pro.ection of a tuple variable,
# is a constant or a pro.ection of a tuple variable, and
is one of the comparison operators6 H, , R, S, ,
0. F1 $ F2 and F1 % F2 are wffs if F1 and F2 are wffs
1. &F' is a wff if F is a wff
?. x &F&x'' and x &F&x'' are wffs if F&x' is a wff with a free occurrence of the
variable (.
!he operator precedence for the "&K and "LK; operators follow the standard precedence
rules, ie. "&K binds stronger than "LK. !hus,
"&4 & &0 Z &1# "* &4 & &0 + Z &1#
3(plicit use of parenthesis, as in rule *1+ above, is reuired to override this default
precedence. !hus if the intention is for the "L# operator to bind stronger in the above
e(pression, it has to be written as
&4 & *&0 Z &1+
We can now be more specific about the form of a query in relational calculus:
*Rtarget listS+6*RwffS+
5s final e(amples for this chapter, consider the following ueries6
5ssume the tuple variables -, ! and : ranging over relations -ustomer, !ransaction and
:roduct respectively. !he appropriate uery is as follows6
*-
.-
na
m
e, -.-ity, -.:hone+ 6
! : *-.-O H !.-O & !.Date R 0;.K4 & !.:O H :.:O & :.:name H -:,+
5ssume tuple variables as in @uery 4 above. !he appropriate uery is as follows6
*:.:name+ 6
! - * !.:O H :.:O & !.-O H -.-O & * -.-ity H >ondon Z -.-name H %mith ++
2ote that the use of parenthesis around the "or# e(pression above is necessary.
@uery 46 FEet the names, cities and phone numbers of customers who bought the
product -:, before the 0;
th
of 9anuaryJ
@uery 06 FEet the names of products bought by customers living in >ondon or by
the customer named %mithJ
, Relational Calculus '(art II)
Relational -alculus, as defined in the previous chapter, provides the theoretical
foundations for the design of practical data sub$languages *D%>+. In this chapter, we will
loo' at an e(ample of oneAin fact, the first practical D%> based on relational calculusA
the 5lpha.
&urther to this, we will also loo' at an alternative calculusAstill a relational calculus *ie.
relations are still the ob.ects of the calculus+ but based on Domain 2ariables rather than
.uple 2ariables. Because of this, the relational calculus covered earlier is more
accurately termed Relational *alculus 1ith .uple 2ariables. !he reader will recall that
!uple Lariables range over tuples of relations and were central in the formulation of
inference rules and in the definition of well$formed formulae. Domain 2ariables, on the
other hand, range over domain values rather than tuples and conseuently reuire a
different construction of well$formed formulae. We will discuss this alternative in the
second part of this chapter.
8.1 T%e Data $ub7Language ,l!%a
DSL Alpha is directly based on relational calculus with tuple variables. It provides,
however, additional constructions that increase the query formulation power of the
language. Such constructions are in fact found in most practical DSL in use today.
3.1.1 "lpha *ommand
D%> 5lpha is a set of "lpha commands, each ta'ing the form6
Met ?7or8space@ '?target list@ ) < ?IFF@
<workspace> is an identifier or label that names a temporary working relation to hold the
result of the command (similar to the named working relation in the giving clause of
relational algebrasee section 5.3). The attributes of this relation are specified by <target
list> which is a list of tuple variable projections as in the previous chapter. <WFF> is of
course a well-formed formulae of relational calculus that must be satisfied before the
values in <target list> are extracted as a result tuple.
5s an e(ample, suppose the variable : ranges over
the :roduct relation as shown in &igure D $4. !hen
the following construction is a valid 5lpha
command6
Eet W*:.:name+ 6 :.:rice 4KKK
:roduct
:O :name :rice
4 -:, 4KKK
0 LD, 40KK
W
:.:name
-:,
Figure ,.1 3(ample relations
The reader can see that except for the keyword Get and the naming of the result relation
(W in this example), the basic form is identical to the one used in the previous chapter,
which would simply be written
*:.:name+ 6 :.:rice 4KKK
The semantics of the Alpha command is also exactly the same, except that the result is a
named relation, as shown in the illustration.
3.1.2 Range Statement
In our e(position of relational calculus, tuple variables used in ueries were introduced
informally. We did this in the above e(ample too *vi). Fsuppose the variable : GJ+. !his
will not do, of course, if we wish the language to be interpreted by a computer. !hus,
tuple variables must be introduced and associated with the relations over which they
range using formal constructions. In D%> 5lpha, this is achieved by the range
declaration statement, which ta'es the basic form6
Range ?relation na#e@ ?9ariable na#e@
where <relation name> must name an existing relation and <variable name> introduces a
unique variable identifer. The variable <variable name> is taken to range over <relation
name> upon encountering such a declaration. The above example can now be written
more completely and formally as:
Range :roduct :7
Eet W*:.:name+ 6 :.:rice 4KKK
D%> 5lpha statements and commands, as the above construction shows, are separated by
semi$colons *"7#+.
D%> 5lpha also differs from relational calculus in the way it uantifies variables. &irst,
for a practical language, mathematical symbols li'e "# and "# need to be replaced by
symbols easier to 'ey in. D%> 5lpha uses the symbols "&//# and "-CM2# to stand for
"# and "# respectively. %econd, rather than using the uantifiers in the RW&&S
e(pression, they are introduced in the range declarations. !hus, the full synta( of range
declarations is6
Range ?relation na#e@ ?9ariable na#e@ N -CM2 L &// O
2ote that the use of uantifiers in the declaration is optional. If omitted, the variable is
ta'en to be a free variable whenever it occurs in an 5pha command.
>et us loo' at a number of e(amples.
5ssume the -ustomer relation as in . !his uery will only need a single free variable to
range over customer. !he 5lpha construction reuired is6
Range *ustomer 45
6et 7"8 4.*name9 4.*phone :; 4.*city < #ondon
also highlights the tuples in -ustomer satisfying the W&& of the command and the
associated result relation W5.
Query 1: Get the names and phone numbers of customers who live in London
Figure ,.2 @uery 4
&or this uery, we will need to access the !ransaction relation, with records of which
customer bought which product, and the :roduct relation, which holds the names of
products. 5ssume these relations are as given in &igure D $1.
Figure ,.3 @uery 0
!he ob.ect of our uery is the :name attribute of :roduct, thus the tuple variable for
:roduct must necessarily be a free variable6
Range :roduct 57
The condition of the query requires us to look in the Transaction relation for a record of
purchase by Customer #2 - as long as we can find one such record, the associated product
is one that we are interested in. This is a clear case of existential quantification, and the
variable introduced to range over Transaction is therefore given by:
Range !ransaction B %/M37
The Alpha command for the query can now be written:
Eet W * 5.:name +6 5.:O H B.:O 5nd B.-O H 0
The associated tuples satisfying the WFF above are highlighted in the figure 8-3 (the
result relation is not shown).
@uery06 FEet the names of products bought by -ustomer O0J
@uery 16 FEet the names and phone numbers of customers in >ondon who bought the
product LD,J
!his is a more comple( e(ample that will involve three relations, as shown in &igure D
$?. !he target data items are in the -ustomer relation *names and phone numbers+. %o the
tuple variable assigned to it must be free6
Range -ustomer X7
:art of the condition specified is that the customer must live in >ondon *ie. X.-city H
>ondon+, but the rest of the condition *F G who bought the product LD,J+ can only be
ascertained from the !ransaction relation *record of purchase by some customer+ and
:roduct relation *name of product+. In both these cases, we are .ust interested in finding
one tuple from each, ie. that there e(ists a tuple from each relation that satisfies the uery
condition. !hus, the variables introduced for them are given by6
Range !ransaction W %/M37
Range :roduct \ %/M37
!he 5lpha command can now be written as6
Eet W* X.-name, X.-phone +6
X.-city H >ondon 5nd X.-O H W.-O 5nd W.:O H \.:O 5nd \.:name H LD,
&igure D $? highlights one instantiation of each variable that satisfies the above W&&.
Figure ,.! @uery 1
5s with the previous e(ample, this one also reuires access to three relations as shown in
&igure D $;. 5 customer will satisfy this uery if for every product there is a transaction
recording that he8she purchased it. !his time, therefore, we have a case for universal
uantification $ FGall types of the company#s productsJ $ which will reuire that the
variable ranging over :roduct be universally uantified. !he variable for !ransaction,
onthe other hand, is e(istentially uantified *FGthere is a transactionGJ+. !he full 5lpha
construction therefore is6
Range -ustomer -7
Range :roduct : 5>>7
Range !ransaction ! %/M37
Eet W *-.-name+6 :.:O H !.:O 5nd !.-O H -.-O
&igure D $; highlights tuples from the various relations that satisfy this construction.
@uery ?6 FEet the names of customers who bought all types of the company#s
productsJ
2ote that the order of uantified variable declarations is important. !he order above is
euivalent to F: !J. If variable ! was declared before :, it would be euivalent to F!
:J which would mean something uite differentB *see section C.1+
Figure ,.% @uery ?
This query involves only one relation: the Product relation (assume the Product relation
as in the above examples). Now, the most expensive product is that for which every
product has a price less than or equal to it. Or, in relational calculus, X is such a product
provided that Y X.Price Y.Price. Thus two variables are required, both ranging
over Product but one of them is universally quantified:
Range :roduct X7
Range :roduct W 5>>7
Eet W*X.:name+6 X.:rice W.:rice
It is perhaps interesting to note in passing that the choice by D%> 5lpha designers to
uantify variables at the point of declaration rather than at the point of use ma'es 5lpha
commands a little harder to readAit is not clear which variables are uantified .ust by
loo'ing at the 5lpha command. /ne must search for the variable declaration to see how,
if at all, it is uantified.
3.1.& "dditional -acilities
D%> 5lpha provides additional facilities that operate on the results of its commands.
While these are outside the realm of relational calculus, they are useful and practical
functions that enhances the utility of the language. !hese facilities fall loosely under two
headings6 0ualifiers, and library functions.
!he ualifiers affect the order of presentation of tuples in the result relation, based on the
ordering of values of a specified attribute in either an ascending or descending order, ie.
they may be thought of as sort functions over a designated attribute. 2ote that in
relational theory the order of tuples in a relation is irrelevant since a relation is a set of
values. %o the ualifiers affects only the presentation of a relation.
%yntactically, the ualifier is appended to the W&& and ta'es the following form6
P U( L DCI" Q ?attribute na#e@
@uery ;6 FEet the name of the most e(pensive productJ
5s an e(ample, consider the reuirement for the names of products bought by -ustomer
O4 in descending order of their prices. !he 5lpha construction for this would be6
Range :roduct X7
Range !ransaction W %/M37
Eet ,W5* X.:name, X.:rice +6 *X.:O H W.:O 5nd W.-O H 0+ D/W2 X.:rice
&igure D $< shows the relations highlighting tuples satisfying the W&&. It also shows the
result relation ,W5 which can be seen to be ordered in descending order of price.
Figure ,.$ Result of ualified command
!he library functions, on the other hand, derives *computes+ new values from the data
items e(tracted from the database. 5nother way to put this is that the result relation of the
basic 5lpha command is further transformed by library functions to yield the final result.
Why would we want to do thisT -onsider for e(ample that we have a simple set of
integers, say M4,0,1N. !here are a variety of values we may wish to derive from it, such as
the number of items, or cardinality, of the set *library function -/,2!, ie.
-/,2!M4,0,1NH1+
the sum of the values in the set *library function !/!5>, ie. !/!5> M4,0,1NH<+
the minimum, or ma(imum, value in the set *library function MI2 and M5X, ie.
MI2 M4,0,1N H 4, or M5X M4,0,1N H 1+
the average of values in the set *library function 5L3R5E3, ie.
5L3R5E3 M4,0,1N H 0+
3(tending this idea to relations, and in particular the 5lpha command, library functions
are applied to attributes in the target list, ta'ing the form6
?librar1 function@'?attribute na#e@)
As an example, consider the need to find the number of customers who bought the
product VDU. This is quite a practical requirement to help management track how well
some products are doing on the market. Pure relational calculus, however, has no facility
to do this. But using the library function COUNT in DSL Alpha, we can write the
following:
Range !ransaction !7
Range :roduct : %/M37
Eet 555* -/,2!*!.-O+ +6 !.:O H :.:O 5nd :.:name H LD,
&igure D $C highlights the tuples satisfying the W&& and shows the result relation.
Figure ,.* ,sing library function *-/,2!+
5s another e(ample, suppose we wanted to 'now how many products were bought by the
customer -odd. !he data items to answer this uestion are in the uantity field *@nt+ of
the !ransaction relation, but pure relational calculus can only retrieve the set of uantity
values associated with purchases by -odd. What we need is the sum of these values. !he
library function !/!5> of D%> 5lpha allows us to do this6
Range !ransaction !7 Range -ustomer - %/M37
Eet BBB* !/!5>* !.@nt + +6 !.-O H -.-O 5nd -.-name H -odd
&igure D $D summarises the e(ecution of this 5lpha command.
Figure ,., ,sing library function *!/!5>+
As a final remark, we note that we have only sampled a few library functions. It is not
our aim to cover DSL Alpha comprehensively, but only to illustrate real DSLs based on
the relational calculus, and to look at added features or facilities needed to turn them into
practical languages.
8.2 Relational Calculus /it% Domain 4ariables
3.2.1 Domain 2ariables
As noted in the introduction, there is an alternative to using tuple variables as the basis
for a relational calculus, and that is to use domain variables. Recall that a domain (see
section 2.2) in the relational model refers to the current set of values of a given kind
under an attribute name and is defined over all relations in the database, ie. an attribute
name denotes the same domain in whatever relation it occurs. A domain variable ranges
over a designated domain, ie. it can be instantiated to, or hold, any value from that
domain.
&or e(ample, consider the domain -name found in the -ustomer relation. !his domain
has three distinct values as shown in &igure D $I. If we now introduced a variable, "-n#,
and designate it to range over -name, then -n can be instantiated to any of these values
*the illustration shows it holding the value "Martin#+.
5s with tuple variables6
a domain variable can hold only one value at any time
domain variables can be introduced for any domain in the database
more than one domain variable may be used to range over the same domain
2ote also that the value of a domain variable is an atomic value, ie. it does not comprise
component values as was the case with tuple variables. !hus there is no need for any
syntactic mechanism li'e the "dot notation# to denote component atomic values of tuple
variables. It also means that in constructing simple comparison e(pressions, domain
variables appear directly without any embellishments, eg. 5 S 4KKK, B H >ondon, -
0KKK, D :aris, etc. *assuming of course that the variables 5, B, - and D have been
designated to range over appropriate domains+.
In a relational calculus with domain variables we can write predicates of the form6
?relation na#e@' 3
1
; R ; 3
n
)
where
Rrelation nameS is the name of a relation currently defined in the database schema,
and
each (
i
is a domain variable ranging over a domain from the intension of Rrelation
nameS
!hus, suppose we have the situation as in &igure D $4K. It is then syntactically valid to
write6
-ustomer* 5, B +
as "-ustomer# is a valid relation name, and the variables "5# and "B# range over domains
that are in the intension of the -ustomer relation.
Figure ,.1 Lariables ranging over domains of a relation
!he meaning of such a predication can be stated as follows6
Figure ,.+ 5 Domain Lariable
a predicate FRrelation nameS* (
4
, G , (
n
+J is true for some given
instantiation of each variable (
i
if and only if there e(ists a tuple in Rrelation
nameS that contains corresponding values of the variables (
4
, G , (
n
!hus, for e(ample, -ustomer*5,B+ is true when 5H-odd and BH>ondon, since the first
tuple of -ustomer has the corresponding values. In contrast, -ustomer*5,B+ is false when
5H-odd and BH:aris, as no tuple in -ustomer have these values. In fact, the values that
ma'e -ustomer*5,B+ true are6
-name -city
-odd >ondon
Martin :aris
Deen >ondon
that is, in relational algebra terms, a pro.ection of Rrelation nameS over the domains that
variables (
4
, G , (
n
range over.
5 uery in relational calculus with domain variables ta'e the form6
'?target list@) < '?logical e3pression@)
where
Rtarget listS is a comma$separated list of domain variable names, and
Rlogical e(pressionS is a truth$valued e(pression involving predicates and
comparisons over domain variables and constants *the rules for constructing well$
formed Rlogical e(pressionsS will be detailed later+
!he result of such a uery is a set of instantiations of variables in Rtarget listS that ma'e
Rlogical e(pressionS true.
&or e(ample, consider the database state in &igure D $44 and the uery
*(,y+ 6 *:roduct*(,y+ & y S4KKK+
which can be paraphrased as Fget product names and their prices for those products
costing more than 4KKKJ.
Figure ,.11 Database state for the uery F*(,y+6 *:roduct*(,y+ & y S 4KKK+J
!he only pair of *(,y+ instantiation satisfying logical e(pression in this case is
*LD,,40KK+, ie. the result of the uery is
( y
LD, 40KK
Domain variables, li'e tuple variables, may also be uantified with either the universal or
e(istential uantifier. 3(pressions involving uantified domain variables are interpreted
in the same way as for uantified tuple variables *see C.1+.
-onsider the uery6 Fget the names of products bought by customer O4J. !he reuired
data items are in two relations6 :roduct and !ransaction, as follows.
:roduct !ransaction
:O :name :rice -O :O Date @nt
4 -:, 4KKK 4 4 04.K4 0K
0 LD, 40KK 4 0 01.K4 1K
0 4 0<.K4 0;
0 0 0I.K4 0K
We can paraphrase the uery to introduce variables and ma'e it easier to formulate the
correct formal uery6
x is such a product name if there is a product number y for x and there is a customer
number ' that purchases y and ' is e(ual to )
!he phrase F( is such a product nameJ ma'es it clear that it is a variable for the ":name#
domain, and as this is our target data value, ( must be a free variable. !he phrase Fthere is
a product number y for (J clarifies two points6 *4+ that y is a variable for the :O domain,
and *0+ that it#s role is e(istential. %imilarly, the phrase Fthere is a customer number ) that
purchases yJ states that *4+ ) is a variable for the domain -O, and *0+ it#s role is
e(istential. !his can now be uite easily rewritten as the formal uery *assuming the
variables (,y and ) range over :name, :O and -O respectively+6
*(+ 6 y ) *:roduct*(,y+ & !ransaction*y,)+ & ) H 4+
where the sube(pressions
:roduct*(,y+ captures the condition Fthere is a product number y for (J
!ransaction*y,)+ captures the condition Fthere is a customer number ) that purchases
yJ, and
) H 4 clearly reuires that the customer number is 4
!he reader should be able to wor' out the solution to the uery as an e(ercise.
5s a final e(ample, consider the uery6 Fget the names of customers who bought all types
of the company#s productsJ. !he reader can perform an analysis of this uery as was
done above to confirm that the relevant database state is as shown in &igure D $40 and
that the correct formal uery is6
*(+ 6 y ) *-ustomer*(,)+ & !ransaction*y,)++
Figure ,.12 Database state for F*(+ 6 y ) *-ustomer*(,)+ & !ransaction*y,)++J
!his e(ample illustrates a universally uantified domain variable y ranging over :O. &or
this uery, this means that the F!ransaction*y,)+Jpart of the logical e(pression must
evaluate to true for every possible instantiation of y given a particular instantiation of ).
!hus, when ( H -odd and ) H 4, both !ransaction*4,4+ and !ransaction*0,4+ must
evaluate to true. !hey do in this case and -odd will therefore be part of the result set.
But, when ( H Martin and ) H 0, !ransaction*4,0+ is true but !ransaction*0,0+ is notB %o
Martin is not part of the result set. -ontinuing in this fashion for every possible
instantiation of ( will eventually yield the full result.
3.2.2 7ell--ormed -ormula
We have not formally defined above what constitutes valid Rlogical e(pressionSs. We do
so here, but for the sa'e of a uniform terminology, we will use the phrase well$formed
formula *W&&+ instead of Rlogical e(pressionS .ust as we did for relational calculus with
tuple variables. !hus a formal uery in relational calculus with domain variables ta'e the
form6
'?target list@) < 'IFF)
where Rtarget listS is a comma$separated list of free variable names, and a W&& is
defined by the following rules6
4. (&A)*' is a W&& if ( is a relation name and A)* is a list of free variables
0. A # is a W&& if A is a variable, # is a constant or a variable, and
MH, , R, S, , N
1. F1 & F2 and F1 L F2 are W&&s if F1 and F2 are W&&s
?. &F' is a W&& if F is a W&&
;. x &F&x' and x &F&x'' if F&x' is a W&& with the variable x occurring free in it
5s usual, the operator precedence for the "&K and "LK; operators follow the standard
precedence rules, ie. "&K binds stronger than "LK. !hus,
"&4 & &0 Z &1# "* &4 & &0 + Z &1#
3(plicit use of parenthesis, as in rule *?+ above, is reuired to override this default
precedence. !hus if the intention is for the "L# operator to bind stronger in the above
e(pression, it has to be written as
&4 & *&0 Z &1+
+ Data -ub./anguage -0/
9.1 Introduction
In this chapter, we shall learn more about the essentials of the relational model#s standard
language that will allow us to manipulate the data stored in the databases. !his language
is powerful yet fle(ible, thus ma'ing it popular. It is in fact one of the factors that has led
to the dominance of the relational model in the database mar'et today.
&ollowing -odd#s papers on the relational model and relational algebra and calculus
languages, research communities were prompted to wor' on the realisation of these
concepts. %everal implemented versions of the relational languages were developed,
amongst the most noted were %@> *%tructured @uery >anguage+, @B3 *@uery$By$
3(ample+ and @,3> *@uery >anguage+. Pere, we shall loo' into %@> with greater detail
as it the most widely used relational language today. /ne often hears of remar's that say,
FIt#s not relational if it doesn#t use %@>J. It is currently being standardised now as a
standard language for the Relational Data Model.
%@> had its origins bac' in 4IC? from IBM#s %ystem R research pro.ect as -tructured
2nglish 0uery /anguage *or %3@ue>+ for use on the IBM L%80 mainframes. It was
developed by -hamberlain et al. !he name was subseuently changed to -tructured
0uery /anguage or %@>. It is pronounced FseuelJ by some and %$@$> by others. IBM#s
products such as %@>8D% and the popular DB0 emerged from this. %@> is based on the
Relational -alculus with tuple variables. In 4ID<, the 5merican 2ational %tandards
Institute *52%I+ adopted %@> standards, contributing to its widespread adoption. Whilst
many commercial %@> products e(ist with various FdialectsJ, the basic command set and
structure remain fairly standard.
5lthough %@> is called a uery language, it is capable of more than .ust getting data off
relations in the databases. It can also handle data updates and even data definitionsadd
new data, change e(isting data, delete or create new structures. !hus %@> is capable of6
1. Data =uery
!he contents of the database are accessed via a set of commands whereby useful
information is returned to the end user
2. Data Maintenance
!he data within the relations can be created, corrected, deleted and modified
&. Data Definition
!he structure of the database and its relations can be defined and created
The end user is given an interface, as we have seen in Chapter 3, to interact with the
database via menus, query operations, report generators, etc. Behind this lies the SQL
engine that performs the more difficult tasks of creating relation structures, maintaining
the systems catalogues and data dictionary, etc.
%@> belongs to the category of the so$called &ourth$Eeneration >anguage *?E>+ because
of its power, conciseness and low$level of procedurality. 5s a non$procedural language it
allows the user to specify 1hat must be done without detailing ho1 it must be done. !he
user#s %@> reuest specification is then translated by the RDBM% into the technical
details needed to get the reuired data. 5s a result, the relational database is said to
reuire less programming than any other database or file system environment. !his
ma'es %@> relatively easy to learn.
9.2 -!erations
+.2.1 #apping: The ,-. ,ele!t ,tatement
!he basic operation in %@> is called mapping, which transforms values from a database
to user reuirements. !his operation is syntactically represented by the following bloc'6
Figure +.1. %@> %elect
!his uncomplicated structure can used to construct ueries ranging from very simple
inuiries to more comple( ones by essentially defining the conditions of the predicate. It
thus provides immense fle(ibility.
!he %@> %elect command combines the Relational 5lgebra operators %elect, :ro.ect,
9oin and the -artesian :roduct. Because a single declarative$style command can be used
to retrieve virtually any stored data, it is also regarded by many to be an implementation
of the Relational -alculus. If we need to e(tract information from only one relation of the
database, we may encounter similarities and a few differences between the Relational
-alculus$based D%> 5lpha and %@>. In this case we may substitute 'ey words of D%>
5lpha for matching 'ey words of %@> as follows6
Figure +.2. %imilarities of D%> 5lpha and %@> %elect
Let us refer back to the earlier example with the Customer relation.
%uppose we wish to FEet the names and phone numbers of customers living in >ondonJ.
With D%> 5lpha, we would specify this uery as6
Range -ustomer X7
Eet *X.-name, X.-phone+6 X.-cityH>ondon7
whereas in %@> its euivalent would be6
%elect -name, :hone
&rom -ustomer
Where -city H ">ondon#
In either case, the result would be the retrieval of the following two tuples6
!his simple uery highlights the three most used %@> clauses6
4. !he %3>3-! clause
!his effectively gets the columns that we are interested in getting from the relation. We
may be interested in a single column, thus we may for e(ample write F%elect
-phoneJ if we only wish to list .ust the telephone numbers. We may also however be
interested in listing the customer#s name, city and telephone number7 in which case,
we write F%elect -name, -city, -phoneJ.
0. !he &R/M clause
We need to identify the relations that our uery refers to and this is done via the &rom
clause. !he columns that we have chosen from the %elect clause must be found in the
relation names of the &rom clause as in F&rom -ustomerJ.
1. !he WP3R3 clause
!his holds the conditions that allows us to restrict the tuples of the relation*s+. In the
e(ample FWhere -cityH>ondonJ asserts that we wish to select only the tuples which
contain the city name that is eual to the value ">ondon#.
!he system first processes the &rom clause *and all tuples of the chosen relation*s+ are
placed in the processing wor' area+, followed by the Where clause *which chooses, one
by one, the tuples that satisfy the clause conditions and eliminating those which do not+,
and finally the %elect clause *which ta'es the resultant tuples and displays only the values
under the %elect clause column names+.
+.2.2 /tpt Restri!tion
Most queries do not need every tuple in the relation but rather only a subset of the tuples.
As described previously in section 5.3, the following mathematical operators can be used
in the predicate to restrict the output:
Symbol Meaning
H 3ual to
R >ess than
S Ereater than
RH >ess than or eual to
SH Ereater than or eual to
RS 2ot eual to
5dditionally, the logical operators 52D, /R and 2/! may be used to place further
restrictions. !hese logical operators, along with parentheses, may be combined to
produce uite comple( conditional e(pressions.
%uppose we need to retrieve the tuples from the !ransaction relation such that the
following conditions apply6
4. !he transaction date is before 0< 9an and the uantity is at least 0;
0. /r, the customer number is 0
!he %@> statement that could get the desired result would be6
%elect -O, Date, @nt &rom !ransaction
Where *Date R #04.K4# 5nd @nt SH 0;+ /r -O H 0
+.2.3 Re!rsi0e #apping: ,123eries
!he main idea of %@> is the recursive usage of the mapping operation instead of using
the e(istential and universal uantifiers. %o far in our e(amples, we always 'now the
values that we want to put in our predicate. &or e(ample,
Where -city H ">ondon#
Where Date R #0<.K4# 5nd @nt S 0;
%uppose we now wish to FEet the personal numbers of customers who bought the
product -:,J. We could start off by writing the %@> statement6
%elect -O
&rom !ransaction
Where :OH T
We cannot of course write FWhere :OH-:,J because -:, is a part name not its number.
Powever as we may recall, part number :O is stored in the !ransaction relation, but the
part name is in fact in another relation, the :roduct relation. !hus one needs to first of all
get the part name from :roduct via another %@> statement6
%elect :O
&rom :roduct
Where :name H "-:,#
Paving obtained the euivalent :O, the value is then used to complete the earlier uery.
!he way this is to be e(pressed is by writing the whole mapping operator in the right
hand side of comparison e(pressions of another mapping operator. !his effectively means
the use of an inner bloc' *sub$uery+ within the outer bloc' *main uery+ as depicted in
the figure below.
Figure +.3. @uery nesting
!he uery in the outer bloc' thus e(ecutes by using the value set generated earlier by the
sub$uery of the inner bloc'.
It is important to note that because the sub$uery replaces the value in the predicate of the
main uery, the value retrieved from the sub$uery must be of the same domain as the
value in the main predicate.
+.2.4 #ltiple Nesting
It is also possible that may be two or more inner bloc's within an outer %@> bloc'. &or
instance, we ne(t wish to6 FEet a date when customer -odd bought the product -:,J.
!he %@> statement we would start out with would probably loo' li'e this6
%elect Date
&rom !ransaction
Where :OHT
5nd -OHT
5s in the earlier uery, the part number :O can be obtained via the part name :name in
the relation :roduct. !he customer name, -odd, however has to have its euivalent
customer number which has to be obtained from -O of the relation -ustomer. !hus to
complete the above uery, one would have to wor' two sub$ueries first as follows6
Figure +.!. Interpretation of sub$ueries
2ote that the original %@> notation utilises brac'ets or parentheses to determine inner
%@> bloc's as6
%elect Date
&rom !ransaction
Where :O H
* %elect :O
&rom :roduct
Where :name H -:,+
5nd -O H
* %elect -O
&rom -ustomer
Where -name H -odd+
%imilarly, an inner bloc' many contain further inner %@> bloc's. &or instance, if we
wish to FEet the names of customers who bought more than 0K pieces of the product
-:,J we need to specify6
%elect -name
&rom -ustomer
Where -O H
* %elect -O
&rom !ransaction
Where :O H
* %elect :O
&rom :roduct
Where :name H -:, +
5nd @nt S 0K +

!hus we may visualise the nesting of sub$ueries as6
%elect G
&rom G.
Where Rattribute4S RoperatorS
* %elect Rattribute4S
&rom G
&rom G
&rom G
Where G + + +
The number of inner blocks or levels of nesting may, however, be limited by the storage
available in the workspace of the DBMS in use.
+.2.4 #ltiple Data 5tems
Standard comparison operators ( =, >, <, >=, <=, <> ) operate on two data items, as in x =
y or p >= 4. They cannot be applied to multiple data items. However, a particular SQL
block normally returns a set of values (i.e. not a single value which can be used in a
comparison).
&or instance6 FEet the product numbers of items which were bought by customers from
>ondonJ.
%elect :O
&rom !ransaction
Where -O H
* %elect -O
&rom -ustomer
Where -city H ">ondon# +
Eiven the sample database of the earlier e(amples, the result of the inner %@> bloc'
would yield two values for -O, which are 4 and 1, *or more precisely, the set M4, 1 N +.
!he outer %@> bloc', in testing -O H M4, 1 N would effectively test if M4,0 N H M4, 1 N or
not. !hus the above %@> statement is not correctB
!o overcome the error caused by the testing of multiple values returned by the sub$uery,
%@> allows the use of comparison e(pressions in the form6

?attribute na#e@ ?set of 9alues@
!his logical e(pression is true if the current value of an attribute is included *or not
included, respectively in the set of values.
&or instance,
%mith In M -odd, %mith, Deen N is !rue,
and
%mith 2ot In M-odd, %mith, Deen N is &alse.
!hus in re$writing the earlier erroneous statement, we now replace the eual operator *H+
with the set membership operator "In# as follows6
%elect :O
&rom !ransaction
Where -O In
* %elect -O
&rom -ustomer
Where -city H ">ondon# +
This time it would yield the outer SQL block would effectively test C# in {1, 3}. The
outer SQL block would now only retrieve the P#s that are only in the set {1, 3 } i.e.
testing {1, 2 } In {1, 3 } This would result in returning P# 1 only, which is the expected
right answer.
Illustrating with another e(ample, consider the uery to F&ind the names of customers
who bought the product -:,J. Its corresponding %@> statement would thus be6
%elect -name &rom -ustomer
Where -O In

* %elect -O &rom !ransaction
Where :O In

* %elect :O &rom :roduct

In
"ot In
Where :name H "-:,# + +
3(ecuting this step$by$step6
*4+ &rom the inner$most bloc',
%elect :O &rom :roduct
Where :name H -:,
would first yield :O 4 from :roduct, i.e. M4 N
*0+ !he ne(t bloc', would thus be
%elect -O &rom !ransaction
Where :O In M 4 N
and this would yield -O s 4 and 0 *as they bought :O 4+, i.e. M4, 0 N
*1+ 5nd finally, the outer$most bloc' would e(ecute
Where -O In M4, 0 N
would result in the names of customers 4 and 0, which are -odd and Martin
respectively.
We next go on to a slightly more complex example. Suppose we now wish to Get a
name of such customers who bought the product CPU but did not buy the product VDU.
In %@>, the statement would be6
Where -O In
* %elect -O &rom !ransaction Where :O In
* %elect :O &rom :roduct Where :name H "-:,# +
5nd -O 2ot In
* %elect -O &rom !ransaction Where :O In
* %elect :O &rom :roduct Where :name H "LD,# + +
Why dont you try to figure out, step-by-step, the sequence of results from the inner-most
blocks up to the final result of execution of the outer-most block?
2ote that the comparison operators

Rattribute nameS Rset of valuesS

In
"ot In
are used instead of e(istential ualifiers *+. It is an implementation of multiple logical
/R conditions which is more efficiently handled.
%imilarly, comparison e(pressions
Rattribute nameS H &// Rset of valuesS
are used instead of universal ualifiers *+.
!his logical e(pression is valid *i.e. produces the logical value F!rueJ+ if the collection of
attribute name values in the database includes the given set of values.
&or instance, FEet personal numbers of those customers who bought all 'inds of
company#s productsJ, would have the following %@> statement for it6
Where :O H
5>> * %elect :O
&rom :roduct +
!he inner bloc' would yield the set M4, 0 Nof :O values. 3(ecuting the outer bloc' would
effectively test if the 1 customers in the !ransaction relation, i.e. -O 4, 0 and 1 would
have :O in M4, 0 N
!his test is as follows6
C: Transaction 'C:; 1) Transaction 'C:; 1) &ll (:
4 !rue !rue True S
0 !rue &alse &alse
1 &alse !rue &alse
The only customer that has P# equal to all P# as found in Product would be C# 1.
9.3 &urt%er Retrie.al &acilities
+.3.1 6oining Relations
In the e(amples that have been used so far, our retrievals have been of values ta'en from
one relation, as in F%elect -O &rom !ransactionJ. Powever, often we have to retrieve
information from two or more relations simultaneously. In other words, a number of
relations names may be used in the &rom clause of %@>. &or e(ample, if we wish to
access the relations -ustomer and !ransaction, we may write the %@> statement as
follows6
%elect G
&rom -ustomer, !ransaction
WhereG
!he target list in the %elect clause may contain the attributes form various relations, as in
%elect -name, Date, @nt
WhereG
where, if you recall, -name is an attribute of -ustomer and Date and @nt are attributes of
!ransaction.
%imilarly, comparison e(pressions in the Where clause may include attribute names from
various relations,
%elect -name, Date, @nt
Where *-ustomer.-O H !ransaction.-O+ 5nd :O H 4
2ote that a so$called ualification techniue which is used to refer to attributes of the
same name belonging to different relations. -ustomer.-O refers to the -O of the
-ustomer relation whereas !ransaction.-O refers to the -O of the !ransaction relation.
Figure +.%. @ualifying attributes
!hus the uery FEet customer names, dates and number of pieces for transactions of the
product number 4J will result in6
It must be noted that the two *or more+ relations that must be combined on at least one
common lin'ing attribute *as in the Relational 5lgebra#s 9/I2 operator+. 5s in the above
e(ample, the lin' is established on -O as in the clause
Where -ustomer.-O H !ransaction.-O
+.3.2 Alias
In order to avoid a possible ambiguity in a uery definition %@> also allows to use an
alias for the relation name in the &rom clause. !he alias is an alternate name that is used
to identify the source relation and the attribute names may include an alias as a prefi(6
RaliasS.Rattribute nameS
%uppose we use ! and - as the aliases for the !ransaction and -ustomer relations
respectively. We may use these to label the attributes as in6
%elect ... &rom -ustomer -, !ransaction !
Where -.-O H !.-O 5nd G
5n alias is especially useful when we wish to .oin a relation to itself because of grouping
as in the uery to F&ind the names and phone numbers of customers living in the same
city as the customer -oddJ6
%elect -0.-name, -0.-phone
&rom -ustomer -4, -ustomer -0
Where -0.-city H -4.-city
5nd -4.-name H "-odd#
!he resulting interpretation of the %@> statement is depicted in &igure I$< below6
Figure +.$. ,sing an alias
9.4 Librar# &unctions and ,rit%metic :;!ressions
The SQL Select clause (target list) may contain also so-called SQL library functions that
will perform various arithmetic summaries such as to find the smallest value or to sum up
the values in a specified column. The attribute name for such library functions must be
derived from the relations specified in the From clause as follows:
Figure +.*. ,sing a library function with %@> %elect
!he common %@> functions available are6
-unction name .as$
CCU"T !o count the number of tuples containing a specified attribute value
-UM !o sum up the values of an attribute
&TM !o find the arithmetic mean *average value+ of an attribute
M&F !o find the ma(imum value of an attribute
MI" !o find the minimum value of an attribute
23a#ples
*4+ Eet the average uantity of LD,s per transaction
%elect 5LE *@nt+ &rom !ransaction
Where :O H
Where :name H "LD,# +
Wor'ing first with the inner %elect clause, we get a :O of 0 from the :roduct relation as
the part number for the product named LD,. !hus the uery is now reduced to
%elect 5LE*@nt+ &rom !ransaction
Where :O H 0
5ccessing the !ransaction relation now would yield the following two tuples
where the average uantity value is easily computed as *1KU0K+80 which is 0;.
*0+ Eet the total uantity of LD,s transacted would similarly be e(pressed as
%elect %,M *@nt+ &rom !ransaction
Where :O H
Where :name H "LD,# +
where the total value is easily computed as *1K U 0K+ giving ;K.
5n asteris' *V+ in the %elect clause is interpreted as Fall attributes names of the relations
specified in the &rom clauseJ.
%elect V &rom !ransaction
is euivalent to
%elect -O, :O, Date, @nt &rom !ransaction
Thus a query to Get all available information on customers who bought the product
VDU can be written as:
%elect V &rom -ustomer
Where -O In
Where :O In
Where :name H "LD,# + +
The interpretation of this query would be worked out as shown in the following sequence
of accesses, starting from the access of the product relation to the Transaction and finally
to the Customer relation:
Figure +.,. Wor'ing through 1 nested %elects
!he outcome would be the following relation6
(3) Get a total number of such customers who bought the product VDU, would be written
as:
%elect -/,2! *V+ &rom -ustomer
Where -O In
Where :O In
Where :name H "LD,# + +
and this would yield a value of 0 for -ount *V+.
Arithmetic expressions are also permitted in SQL, and the possible operations include:
addition U
subtraction $
multiplication V
division 8
Expressions may be written in the Select clause as:
%elect -O, :O, @ntV:rice &rom !ransaction, :roduct
Where !ransaction.:O H :roduct.:O
which is used to FEet a total price for each transactionJ resulting in6
Arithmetic expressions, likewise, can also be used as parameters of SQL library
functions. For example, Get a total price of all VDUs sold to customers may be written
as the following SQL statement:
%elect %,M *@ntV:rice+ &rom !ransaction, :roduct
5nd :roduct.:name H "LD,#
Wor' this out. Wou should get an answer of <KKKK.
!he attribute names for both library functions and arithmetic e(pressions must be derived
from the relations specified in the &rom clause.
!hus, it should be noted that the following uery definition is 2/! correct.
%elect %,M *@ntV:rice+ &rom !ransaction
5nd :roduct.:name H "LD,#
5dditionally, %@> also permits the use of library functions not only in the %elect clause
but also in the Where clause as a part of comparison e(pressions.
!he uery to FEet all available information on such customers who bought the most
e(pensive productJ would be6
Where -O In
Where :O In
Where :rice H M5X *:rice+ + +
9.5 ,dditional &acilities
+.4.1 /rdering
!he result of a mapping operation may be sorted in ascending or descending order of the
selected attribute value.
!he form of the /rder clause is
Crder B1 ?attribute na#e@ Up L Do7n
23a#ples
*4+ Eet a list of all transactions of the product -:, sorted in descending order of the
attribute @nt
Where :O In
Where :name H "-:,# +
/rder By @nt Down
!he result would be
If instead, the last clause had been Order By Qnt Up, the result would be listed in
ascending order:
!he /rder By clause is only a logical sorting process, the actual contents of the original
relations are not affected.
Multi$level ordered seuence may also be performed as in6
/rder By -O ,p,
@nt Down
+.3.2 7andling Dpli!ates
!he result of an %@> mapping operation is however not perceived as a relation, i.e. it
may include duplicate tuples. -onsider for e(ample6
Where :O In
Where :rice SH 4KKK +
!he result is actually
Imagine if we have thousands of transactions and yet a handful of customers. The result
would yield hundreds (even thousands) of duplicates. Fortunately, duplicate tuples can be
removed by using the Unique option in the Select clause of the operation as follows:
%elect -O ,niue &rom !ransaction
Where :O In
Where :rice SH 4KKK +
and this will yield a much reduced result with only the distinct (unique) customer
numbers:
+.3.3 8roping of Data
,sually, the result of a library function is calculated for the whole relation. &or e(ample,
consider wanting to find the total number of transactions,
%elect -ount *V+
&rom !ransaction
Eiven this relation, the result of -ount *V+ is ?
Powever, sometimes we need to calculate a library function, not for the entire relation,
but only for a subset of it. %uch subsets of tuples are called groups. &or instance, in the
relation !ransaction, a collection of tuples with the same value of attribute -O is a
FgroupJ. In this case, -O is called FEroup ByJ attribute.
Figure +.+. Erouping by customer numbers
!he form of the Eroup By clause is
Mroup B1 ?attribute na#e@
23a#ples
*4+ FEet the list of all customer numbers and the uantity of products bought by each of
themJ. 2ote that the relation will have many transactions for any one customer. !he
transactions for each customer will have to be grouped and the uantities totaled. !his is
then to be done for each different customer. !hus the %@> statement would be6
%elect -O, %um*@nt+ &rom !ransaction Eroup By -O
!hus all transactions with the same -Os are grouped together and the uantities summed
to yield the summarised result6
Why would the following statement be impossible to e(ecuteT
%elect V &rom !ransaction Eroup By :O
*0+ 2ormally, the Where clause would contain conditions for the selection of tuples as in6
%elect -name, %um *@nt+ &rom -ustomer, !ransaction
Eroup By -O
!his statement will FEet a list of all customer names and the uantity of products bought
by each of themJ as follows6
Figure +.1. Restriction followed by Erouping
+.3.4 Frther Filtering: 7a0ing
We can further filter out unwanted groups generated by the Eroup By clause by using a
FPavingJ clause which will include in the final result only those groups that satisfy the
stated condition. !hus the additional FPavingJ clause provides a possibility to define
conditions for selection of groups.
&or e(ample, if we wish to .ust FEet such customers who bought more than ?; units of
productsJ, the %@> statement would be6
Where -O In
Eroup By -O
Paving %,M *@nt+ S ?; +
Figure +.11. Erouping followed by Restriction
In this case, those grouped customers with ?; units or less will not be in the final result.
!he result will thus only be6
It is important to note that in the further filtering of values, the Where clause is used to
e(clude values before the Eroup By clause is applied, whereas the having clause is used
to e(clude values after they have been grouped.
1 0uer1.B1.23a#ple '0B2)
1<.1 Introduction
Data @uery >anguages were developed in the early seventies when the man$machine
interface was, by today#s standards, limited and rudimentary. In particular, interaction
with the computer was through the processing of batched .obs, where .obs *computation
reuests such as Frun this program on that dataJ, Fevaluate this database ueryJ, etc+ were
prepared off$line on some computer readable media *eg. punch cards+, gathered into a
"batch# and then submitted for processing. 2o interaction ta'es place between the user
and computer while the .obs were processed. 3nd results were instead typically printed
for the user to inspect *again off$line+ and to determine the ne(t course of action. !he
batch cycle continued until the user had obtained the desired results.
!his was pretty much the way database ueries were handled *see &igure 4K $4+. 5s data
coding devices were e(clusively te(tual in nature and as processing is non$interactive,
ueries must be defined te(tually and each uery must be self$contained *ie. has all the
components reuired to complete the evaluation+. !he design of early languages were
influenced by, and in fact had to comply with, these constraints to be usable. !hus, for
e(ample, the %@> uery6
!elect *+ from Transaction
where ,+ -" ( !elect ,+ from ,ustomer
where ,city . $ondon )
could be easily encoded as a .ob for batched submission. 2eedless to say, the turnaround
time in such circumstances were high, ta'ing hours or even days before a user sees the
results of submitted ueries. Many hours are typically spent off$line for a .ob that would
ta'e seconds to evaluate, and it is even worse if you made an error in your submissionB
Figure 1.1 3arly batch processing of ueries
/ver the past 0K years, however, man$machine interfaces or human$computer interaction
*P-I+ has progressed in leaps and bounds. !oday, graphical user interfaces *E,I+ are
ta'en for granted and the batched mode of processing is largely a past relic replaced by
highly interactive computing. 2evertheless, many database uery languages today still
retain the old "batch# characteristics and do not e(ploit features of interactive interfaces.
!his is perhaps not surprising as, first, a large body of techniues for processing te(tual
languages had grown over the years *eg. compiling and optimisation+ and, second, they
were well suited for embedding in more general purpose programming languages. !he
latter especially provides great fle(ibility and power in database manipulation. 5lso, as
the paradigm shifted to interactive computing, its application to database ueries was not
immediately obvious. But end$user computing is, in any case, increasing and many tas's
that previously reuired the s'ills of e(pert programmers are now being performed by
end$users through visual, interactive interfaces.
@uery$By$3(ample *@B3+ is the first interactive database uery language to e(ploit such
modes of P-I. In @B3, a uery is a construction on an interactive terminal involving
two$dimensional "drawings# of one or more relations, visualised in tabular form, which
are filled in selected columns with "e(amples# of data items to be retrieved *thus the
phrase uery$by-e%ample+. !he system answers the uery by fetching data items based on
the given e(ample and drawing the result on the same screen *see &igure 4K $0+.
Figure 1.2 5 @B3 uery and its results
!ypically, the "drawing# of relations are aided by interactive commands made available
through pull$down menus *see +. !he menu selection is constrained to relations available
in the schema and thus eliminates errors in specifying relation structures or attribute
names as can occur in te(t$based languages li'e %@>. !he interface provided is in effect
a structured editor for a graphical language.
&or the remainder of this chapter, we
will focus e(clusively on the principal
features of @B3. In contrast to %@>,
@B3 is based on relational calculus
with domain variables *see D.0+. !o
close this introduction, we should
mention that @B3 was developed by
M.M. \loof at the IBM Wor'town
Peights >aboratory.
1<.2 4ariables and Constants
In filling out a selected table with an e(ample, the simplest item that can be entered under
a column is a free variable or a constant. 5 free variable in @B3 must be an underlined
name *identifier+ while a constant can be a number, string or other constructions denoting
a single data value. 5 uery containing such combinations of free variables and constants
is a reuest for a set of values instantiating the specified variables while matching the
constants under the specified columns.
5s an e(ample, loo' at &igure 4K$?. !wo
variables are introduced in the uery6 a and b. By
placing a variable under a column, we are in
effect assigning that variable to range over the
domain of that column. !hus, the variable a
ranges over the domain :O while b ranges over
:name.
!he reader would have also noted that the variables are prefi(ed by F(.J. In @B3, this is
reuired if the instantiation found for the specified variable is to be displayed, ie. the
prefi( F(.J may be thought of as a command to print. We will say more about prefi(
commands li'e this later. %uffice it for now to say that if neither variable in &igure 4K$?
was preceded by F(.J then the result table would display nothingB
!he uery in &igure 4K$? is in fact euivalent to the following construction of relational
calculus with domain variables6
a :O7 b :name7
*a, b+6 * :roduct *a, b+ +
5ssuming the usual :roduct relation e(tension as in previous chapters, the result of the
uery is shown in &igure 4K$;.
>et us consider another simple e(ample and wal' through the basic interactions necessary
to formulate the uery and get the desired results. %uppose we wanted the names and
cities of all customers. !he basic interactions are summarised in &igure 4K$<.
Figure 1.$ Basic seuence of interactions
4. !he user first uses a pull$down menu as in to select the appropriate relation*s+
containing the desired items. &or this uery, the -ustomer relation would seem the
most appropriate and selecting it would result in an empty template being displayed.
0. Inspecting the template, the user can ascertain that the desired data items are indeed in
the selected template *vi). !he -name and -city columns+. 2e(t, the user invents
variable identifiers *a and b+ and types each under the appropriate column. !his is all
that is reuired for this uery.
1. &inally, the e(ample is evaluated by the system and the results displayed on the screen.
!his is the basic interaction even for more comple( ueries $ select relation templates, fill
in e(ample items, then let the system evaluate and display the results. /f course, with
more comple( ueries, more than one relation may be used and constructing the e(ample
will usually involve more than .ust free variables, as we shall see in due course.
&ree variables unconditionally match data values in their respective domains and thus, by
themselves, cannot e(press conditional ueries, such as Fget the names and phone
numbers of customers who live in #ondonJ *the italicised phrase is the condition+. !he
simplest specification of a condition in @B3 is a constant, which is a single data value
entered under a column and interpreted as the condition6
Rattribute nameS H RconstantS
Figure 1.* ,se of a constant to specify a condition in a uery
!hus, the condition "live in #ondon# is uite simply captured by typing ">ondon# under
the "-city# attribute in the -ustomer template, as shown in &igure 4K $C.
More generally, the @B3 synta( for conditions is6
]RcomparatorS^ RconstantS
where comparator is any one of "H#, "#, "R#, "#, "S#, and "#, and is interpreted as the
condition
Rattribute nameS RcomparatorS RconstantS
If RcomparatorS is omitted, it defaults to "H# *as in the above e(ample+. 5s an e(ample of
the use of other comparators, the uery Fget the names of products costing more than
4KKKJ would be as shown in &igure 4K $D.
Figure 1., -omparators in conditions
5 uery can also spread over several rows. !his is the @B3 euivalent form for
e(pressing comple( con.unctions and dis.unctions of conditions. !o correctly interpret
multiple row ueries, bear in mind the following6
the ordering of rows is immaterial
a variable identifier denotes the same instantiation wherever it occurs
!he second point above is particularly important when a variable occurs in more than one
row. But let#s consider first the simpler case where distinct rows do not share any
variable. In this case, the rows are unrelated and can be evaluated independently of one
another and the final result is simply the union of the results of each row. !he collective
condition of such a uery is thus a disjunction of the conditions specified in each row.
&or e(ample, consider the uery6 FEet the names of customers who either live in >ondon
or :aris and whose personal number is greater than 4J. !he @B3 uery for this is shown
in&igure 4K$I. >oo'ing at row 4, note that two conditions are specified. !hese must be
satisfied by values from a single tuple, ie. the condition may be restated as
-O S 4 52D -cityH>ondon
%imilarly, the condition specified in row 0 is
-O S 4 52D -cityH:aris
5s the two rows do not share variables, the collective condition is a dis.unction
*-O S 4 52D -cityH>ondon+ /R *-O S 4 52D -cityH:aris+
which may be simplified to
-O S 4 52D *-cityH>ondon /R -cityH:aris+
Figure 1.+ Multiple dis.unctive rows
In contrast, if a variable occurs in more than one row, then the conditions specified for
each row must be true for the same value of that variable. -onsider, for e(ample, the
uery in &igure 4K$4K where the variable 3 occurs in both rows.
!his means that a value of ( must be found such that both row 4 and row 0 are
simultaneously satisfied. In other words, the condition for this uery is euivalent to
-city H >ondon 52D -O S 4 52D -O R ?
*Eiven the above -ustomer relation, only the value FDeenJ satisfies both rows in this
case.+
!here is another possibly simpler way of describing the meaning and evaluation of
multiple row ueries. %pecifically, we treat each row as a sub-0uery, evaluate each
separately, then merge the results *a set of tuples for each sub$uery+ into a single table.
!he merging of two sets of tuples is simply a union, if their corresponding sub$ueries do
not share variables. /therwise, their intersection over attributes that share variables is
computed instead.
!hus, for the uery in &igure 4K$I, the first sub$uery *row 4+ results in the set MDeenN,
while that of the second sub$uery *row 0+ is MMartinN. 5s the sub$ueries do not share
variables, the final result is simply the union of these results6 MDeen, MartinN.
In contrast, for the uery in &igure 4K$4K, the first sub$uery *row 4+ results in MDeenN,
while the second *row 0+ results in M-odd, DeenN. But as the sub$ueries share the
variable 3 under attribute -name, the merged result is the intersection of the two, ie.
MDeenN.
Before proceeding with the next section, we should just mention here some syntactical
constraints and options of QBE. First, the prefix P. can be used on any example item,
not just free variables. This underlines its earlier interpretation, ie. it is a command to
print or display the value of the item it prefixes (variable or comparison). Thus, if the
query in Figure 10-10 had been:
then the displayed result would be6
2ote that, in general, prefi(ing a comparison prints the value that satisfies it. /f course,
in the case of a constant *implicitly a FHJ comparison+, the constant itself will be printed.
@B3 also allows the user to simplify a uery to only essential components. !his is largely
optional and the user may choose *perhaps for greater clarity+ to include redundant
constructs. Basically, there are two rules that can be applied6
4. If a particular variable is used only once, then it may be omitted. !his saves the user
the trouble of otherwise having to invent names. 5pplication of this rule is illustrated
in &igure 4K $44, where it is applied to the first table *variables 31 and 32+ to result in
the second. 2ote that unless this rule is 'ept in mind when reading simplified ueries,
the appearance of the prefi( F(.J by itself may not only loo' odd but confusing too.
!he prefi(es in the second table must be correctly read as prefi(ing implicit but
distinct variables.
0. Duplicate prefi(es and constants occurring over multiple rows may be FfactorisedJ
into .ust one row. !his is illustrated also in &igure 4K $44 where it is applied to the
second table to result in the third. 5gain, unless this rule is 'ept in mind, ueries such
as that in the third table may seem meaningless.
Figure 1.11 %implifying ueries
While the above rules are optional, the following is a syntactic constraint that must be
observed6 if a free variable occurs in more than one row, then the prefi( F:.J may be used
on at most one of its occurrences.
!he uery below illustrates a valid construction $ note that 3 occurs in two rows but only
one of them has the : prefi(.
1<.3 :;am!le :lements
3ach row of a uery table may be seen as an e(ample of tuples from the associated
relationAspecifically, tuples that match the row. 5 tuple matches a row if each attribute
value in the tuple matches the corresponding uery item in the row. We have seen above
e(actly when a value matches a uery item. In summary6
4. 5ny value matches a blan' uery item or a variable
0. 5 value matches a comparison item if it satisfies the specified comparison
,sing these rules, it is relatively easy to ascertain tuples e(emplified by a uery row. !his
is illustrated in &igure 4K$40. !his is why variables in @B3 are called e%ample elements.
Figure 1.12 5 uery row is an e(ample of matching tuples
In e(tracting facts from several relations that share attribute domains, e(ample elements
are the 'ey to finding related target tuples from the different relations. -onsider the
uery6
FEet the names and phone numbers of customers that have purchased both product
number 4 and product number 0J.
Figure 1.13 3(ample elements over several relations
!he !ransaction relation has part of the information we are after. %pecifically, we loo'
for records of purchase of each item by the same customer, ie. a tuple where the product
number is 4, another where the product number is 0, but both with the same customer
number. !he entries in the !ransaction template in &igure 4K$41 capture this reuirement.
Powever, this tells us only the customer number *the instantiation of F+. Information
about the customer#s name and phone number must be obtained from the -ustomer
relation. We need to ensure, though, that these values are obtained from a customer tuple
that represents the same customer found in the !ransaction relation. In @B3, this is
simply achieved by specifying the same e(ample element F in the customer number
column of the -ustomer relation *as shown in the -ustomer template of &igure 4K$41+.
!he uery in &igure 4K$41 may be evaluated, assuming the following e(tensions of
!ransaction and -ustomer, as follows.
!ransaction -ustomer
-O :O Date @nt -O -name -city -phone
4 4 04.K4 0K 4 -odd >ondon 00<1K1;
4 0 01.K4 1K 0 Martin :aris ;;;;I4K
0 4 0<.K4 0; 1 Deen >ondon 001?1I4
1 0 0I.K4 0K
4. !he subuery in the first row of the !ransaction template is matched by the first and
third tuples of the !ransaction relation, ie. X H M4,0N
0. !he subuery in the second row of the !ransaction template is matched by the second
and fourth tuples of the !ransaction relation, ie. X H M4,1N
1. !he result of evaluating the !ransaction template is therefore M4,0N M4,1N H M4N.
?. !he subuery in the -ustomer template matches all the tuples in the -ustomer
relation, ie. the entire relation is the result.
;. !he final result is the intersection, over -O, of the results in *1+ and *?+, ie. MR-odd,
00<1K1;SN
&igure 4K$4? shows another e(ample of a multi$table uery and illustrates also the
relative ease in FreadingJ or paraphrasing @B3 constructs. &irst, the -ustomer subuery
ma'es it clear, from the use of F:.J prefi(, that the desired result is a set of customer
names and their phone numbers *the elements a and b respectively+. !he element 3 lin's
-ustomer to !ransaction, ie. a customer included in the result must have purchased
something, denoted yet by another element 1. &urthermore, 1 must be such that it is the
product -:,.
Figure 1.1! 5nother e(ample of a multi$table uery with e(ample elements
In other words, the uery can be paraphrased as6
FEet the names and phone numbers of those customers who bought the product -:,J.
!he preceding two e(amples should be enough for the reader to realise that *unadorned+
e(ample elements spread across tables are in fact e(istentially uantified. &or e(ample,
there may be more than one !ransaction tuple that can match the customer number found
in -ustomer, but we don#t care whichB !he e(amples also show that, more generally, a
@B3 uery can spread over a number of rows of a single relation and across other
relations. 5 few further e(amples will serve to highlight @B3#s power and features.
In &igure 4K$4;, we see a comple($loo'ing @B3 uery. 5 closer e(amination will reveal,
however, that within each relation template the rows do not share elements, although the
elements are shared across relations. In fact, there are two dis.oint sets of rows $ one
ta'en from the first row of each relation and the other from the second row of each
relation.
!he first set is actually euivalent to the @B3 uery in &igure 4K$4?.

Figure 1.1% Dis.unctive multi$table uery
!he second differs only in the specified product *replace "-:,# by "LD,# in the above
paraphrased uery+. By analogy with earlier constructions involving unrelated multiple
rows, this type of construction therefore denotes a dis.unctive uery. In other words,
combining the two sets of rows yield the uery6
Get the names and phone numbers of those customers who bought the product CPU or
the product VDU
3arlier, we#ve seen e(amples of elements used in multiple rows of the same relation.
Powever, given now an understanding of multi$table ueries, such constructions can
euivalently be seen as a multi$table uery involving the same tableB !his is shown in
&igure 4K$4< below.
Figure 1.1$ Multi$row *with shared elements+ and euivalent multi$table form
3(ample elements may also be negated. 2egated elements are written with the prefi( "B#,
eg. SF *read Fnot XJ+. !he negated form can only be used if there is at least one
occurrence of the unnegated element elsewhere in the uery. It is then interpreted as
matching any corresponding domain value that the unnegated form did not match.
-onsider, for e(ample, the illustration in &igure 4K$4C. !here are two parts to the
illustration, labelled *a+ and *b+, each with a uery table and an e(tension of the
corresponding relation. &or purposes of this e(ample, the two uery tables constitute a
multi$table uery, ie. the e(ample element X is the same one in both. 2ote, however, that
X is negated in *b+.
Eiven the e(tension of !ransaction as shown, the domain values matching the e(ample
element X in *a+ is M4,0N. !urning now to the subuery in *b+, the specification of "BX# in
it means that the only tuples that can match it are tuples such that the -O value is not in
M4,0N. Eiven the e(tension of -ustomer as shown, this means that only the third tuple
matches the e(ample, ie. the answer returned for elements 5 and B are "Deen# and
"001?1I4# respectively.
Figure 1.1* 2egated 3lement
1<.4 T%e refi; ,LL
!he prefi( 5>> can be applied to e(ample elements. !he occurrence of such an element
in an arbitrary uery row of an arbitrary relation denotes a set of values such that each,
together with a particular instantiation of other items in the row, matches a tuple of the
relation. 5s an e(ample, consider the following relation and uery6
Figure 1.1, 3(ample relation and uery with 5>>
In this case, there is only one other item in the uery row6 another element X. !he set of
values denoted by "5ll.W# therefore needs to be determined for each value that X ta'es.
!hus,
when X H 4, there are two possible values for W, ie. 4 and 0. !hus, "5ll.W# is the set
M4,0N
when X H 0, there is only one value for W, ie. the set M4N
when X H 1, there is also only one value for W, ie. the set M0N
If the uery items had been prefi(ed with ":.#, the result displayed would be6
R4
I4 I0 G
4 M4,0N
0 M4N
1 M0N
In the simplest case, a uery row contains only one element prefi(ed with 5>>. In this
case, the element simply denotes the set of values in the corresponding domain. !his is
illustrated in &igure 4K $4I below.
Figure 1.1+ %imple use of 5>>
!he use of 5>> is more interesting when it involves multitable ueries. &or e(ample,
combining the uery in &igure 4K $4D and &igure 4K $4I into a single uery, we
effectively restrict X to .ust the value 4. !his is because 5>>.W occurs in both tables and
must denote the same set, and the only set satisfying this is M4,0N.
It should be clear now that 5>> is used in @B3 in the same way that a universal
uantifier is used in relational calculus with domain variables. !o highlight this, consider
the uery6
FEet the names of customers who bought all types of the company#s productJ
!hree relations are reuired to resolve this uery6 -ustomer, !ransaction and :roduct.
!he @B3 uery is shown in &igure 4K $0K which is also annotated with e(planations.
Figure 1.2 !he uery FEet the names of customers who bought all types of the
company#s productJ
/ne final word about 5>>6 it does not remove duplicate values, in contrast to an
unprefi(ed element which will return only uniue matching values. !his is illustrated in
&igure 4K $04 below. We shall see in the ne(t section how this property is used *if fact, is
necessary+ in order to answer certain classes of practical ueries.
Figure 1.21 5>> does not remove duplicatesB
1<.5 Librar# &unctions
5s with %@>, @B3 also provides arithmetic operations and a number of built$in functions
which are necessary to manipulate the values in ways not otherwise within the scope of
relational calculus, eg. to count the number of occurrences of returned values or to sum
them up. 5s you may e(pect by now, these operations are provided in the form of
prefi(es. &or e(ample, suppose we wish to 'now how many transactions were related to
the purchase of a particular product, say product number 4. We can e(tract, for e(ample,
all customer numbers in transactions involving product number 46
!ransaction !ransaction *@uery+ !ransaction
-O :O Date @nt -O :O Date @nt -O :O Date @nt
4 4 04.K4 0K :.5ll.X 4 4
4 0 01.K4 1K 0
0 4 0<.K4 0; 4
4 4 0I.K4 0K
But what we are really interested in is counting the number of such values. @B3 allows
us to do this with the prefi( -2! *euivalent to the function -/,2! in %@>+, which
counts the number of values matching the element it prefi(es.
!hus the same uery above, different only in the addition of the -2! prefi(, achieves the
desired result6
4 4 04.K4 0K :.-2!.5ll.X 4 1
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
2ote that the use of 5>> is necessary. If the e(ample element was simply F:.-2!.XJ,
the result would be 0B !his is because without the 5>> prefi(, the values matching the
element X are returned with duplicate values removed *as illustrated earlier in &igure 4K
$04+.
5nother freuently used function is %,M, which sums up the values matching the
e(ample element it prefi(es. %uppose, we wish to 'now the total number of product
number 4 that has been sold. Instead of counting the number of customers that purchased
it, we sum instead the uantities recorded in the relevant transactions. !hus6
4 4 04.K4 0K 4 :.%,M.5ll.X <;
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
@B3 also allows us to group tuples in a relation based on a specified e(ample element.
!hat is, tuples with the same value of the e(ample element are collected into one group
*there will be as many groups as there are distinct values matching the e(ample element+.
Erouping is specified using the E prefi( *this is similar to the "Eroup By# clause in %@>+.
!hus6
!ransaction !ransaction *@uery+
-O :O Date @nt -O :O Date @nt 4 4 04.K4 0K
4 4 04.K4 0K :.E.X 4 0 01.K4 1K
4 0 01.K4 1K 4 4 0I.K4 0K
0 4 0<.K4 0;
4 4 0I.K4 0K 0 4 0<.K4 0;
5ritmetic functions may be applied to groups. !hus, if we wanted to 'now the total
number of items purchased by each customer, we can modify the above uery as follows6
4 4 04.K4 0K :.E.X :.%,M.5ll.B 4 CK
4 0 01.K4 1K 0 0;
0 4 0<.K4 0;
4 4 0I.K4 0K
Eroups may additionally be selected based on conditions that are specified in an
additional column *this corresponds to the "Paving clause# of %@>+. !his additional
conditions column may be created by means of a special menu item in the @B3 interface.
!hus, if we are only interested in finding customers who have purchased more than ?;
items, our uery would be as follows6
-O :O Date @nt -O :O Date @nt -onditions -O :O Date @nt
4 4 04.K4 0K :.E.X 5ll.B %,M.5ll.BS?; 4
4 0 01.K4 1K
0 4 0<.K4 0;
4 4 0I.K4 0K
In summary, grouping and arithmetic functions can be used in combination to obtain
useful derived values from the database.
Group
selection
condition
11 &rc4itecture of Database -1ste#s
11.1 Introduction
%oftware systems generally have an architecture, ie. possessing of a structure *form+ and
organisation *function+. !he former describes identifiable components and how they
relate to one another structurally7 the latter describes how the functions of the various
structural components interact to provide the overall functionality of the system as a
whole. %ince a database system is basically a software system *albeit comple(+, it too
possesses an architecture. 5 typical architecture must define a particular configuration of
and interaction between data, software modules, meta$data, interfaces and languages *see
&igure 44 $4+.
!he architecture of a database system determines its capability, reliability, effectiveness
and efficiency in meeting user reuirements. But besides the visible functions seen
through some data manipulation language, a good database architecture should provide6
a+ Independence of data and programs
b+ 3ase of system design
c+ 3ase of programming
d+ :owerful uery facilities
e+ :rotection of data
Figure 11.1 Eeneral Database %ystem 5rchitecture
!he features listed above become especially important in large organisations where
corporate data are held centrally. In such situations, no single user department has
responsibility over, nor can they be e(pected to 'now about, all of the organisation#s
data. !his becomes the .ob of a Database "dministrator *DB5+ who has a daunting range
of responsibilities that include creating, e(panding, protecting and maintaining the
integrity of all data while adressing the interests of different present and future user
communities. !o create a database, a DB5 has to analyse and assess the data
reuirements of all users and from these determine its logical structure *database
schema+. !his, on the one hand, will need to be efficiently mapped onto a physical
structure that optimises retrieval performance and the use of storage. /n the other, it
would also have to be mapped to multiple user views suited to the respective user
applications. &or large databases, DB5 functions will in fact reuire the full time services
of a team of many people. 5 good database architecture should have features that can
significantly facilitate these activities.
11.2 Data ,bstraction
!o meet the reuirements above, a more sophisticated architecture is in fact used,
providing a number of levels of data abstraction or data definition. !he database schema,
also 'nown as *onceptual Schema, mentioned above represents an information model at
the logical level of data definition. 5t this level, we abstract out details li'e computer
storage structures, their restrictions, or their operational efficiencies. !he view of a
database as a collection of relations or tables, each with fi(ed attributes and primary 'eys
ranging over given domains, is an e(ample of a logical level of data definition.
!he details of efficiently organising and storing ob.ects of the conceptual schema in
computers with particular hardware configurations are dealt with at the internal 8storage:
level of data definition. !his level is also referred to as the )nternal Schema. It maps the
contents of the conceptual schema onto structures representing tuples, associated 'ey
organisations and inde(es, etc, ta'ing into account application characteristics and
restrictions of a given computer system. !hat is, the DB5 describes at this level how
ob.ects of the conceptual schema are actually organised in a computer. &igure 44 $0
illustrates these two levels of data definition.
Figure 11.2 !he >ogical and Internal >evels of Data 5bstraction
5t a higher level of abstraction, ob.ects from the conceptual schema are mapped onto
vie1s seen by end$users of the database. %uch views are also referred to as E%ternal
Schemas. 5n e(ternal schema presents only those aspects of the conceptual schema that
are relevant to the particular application at hand, abstracting out all other detaiils. !hus,
depending on the reuirements of the application, the view may be organised differently
from that in the conceptual schema, eg. some tables may be merged, attributes may be
suppressed, etc. !here may thus be many views createdAone for each type of
application. In contrast, there is only one conceptual and one internal schema. 5ll views
are derived from the same conceptual schema. !his is illustrated in &igure 44 $1 which
shows two different user views derived from the same conceptual schema.
!hus, modern database systems support three levels of data abstraction6 3(ternal
%chemas *,ser Liews+, -onceptual %chema, and Internal *%torage+ %chema.
!he DD> we discussed in earlier chapters is basically a tool only for conceptual schema
definition. !he DB5 will therefore usually need special languages to handle the e(ternal
and internal schema definitions. !he internal schema definition, however, varies widely
over different implementation platforms, ie. there are few common principles for such
definition. We will therefore say little more about them in this boo'.
Figure 11.3 ,ser Liews *3(ternal %chema+
As to external schema definitions, note that in the relational model, the Data Sub-
Languages can be used to both describe and manipulate data. This is because the
expressions of a Data Sub-Language themselves denote relations. Thus, a collection of
new (derived) relations can be defined as an external schema.
For example, suppose the following relations are defined:
,ustomer( ,+, ,name, ,city, ,phone )
*roduct( *+, *name, *rice )
Transaction( ,+, *+, /ate, 0nt )
We can then define an e(ternal view with a construct li'e the following6
/efine %iew 1y2Transaction2) As
!elect ,name, ,city, /ate, Total2!um.*rice30nt
4rom ,ustomer, Transaction, *roduct
5here ,ustomer.,+ . Transaction.,+
6 Transaction.*+ . *roduct.*+
which defines the relation *view+6
1y2Transaction2)( ,name, ,city, /ate, Total2!um )
!his definition effectively maps the conceptual database structure into a form more
convenient for a particular user or application. !he e(tension of this derived table is itself
derived from the e(tensions of the source relations. !his is illustrated in &igure 44 $?
below.
Figure 11.! 23ternal Tie7 Definition
!his is a very important property of the relational data model6 a unified approach to data
definition and data manipulation.
11.3 Data ,dministration
&unctions of a DB5 include6
-reation of the database
!o create a database, a DB5 has to analyse and assess the reuirements of the users and
from these determine its logical structure. In other words, the DB5 has to design a
conceptual schema and a first variant of an internal schema. When the internal schema is
ready, the DB5 must load the database with actual data.
5cting as intermediary between users and the database
5 DB5 is responsible for all user facilities determined by e(ternal schemas, ie. the DB5
is responsible for defining all e(ternal schemas or user views.
3nsuring data privacy, integrity and security
In analysing user reuirements, a DB5 must determine who should have access to which
data and subseuently arrange for appropriate privacy loc's *passwords+ for identified
individuals and8or groups. !he DB5 must also determine integrity constraints and
arrange for appropriate data validation to ensure that such constraints are never violated.
>ast, but not least, the DB5 must ma'e arrangements for data to be regularly bac'ed up
and stored in a safe place as a measure against unrecoverable data losses for one reason
or another.
5t first glance, it may seem that a database can be developed using the conventional
FwaterfallJ techniue. !hat is, the development process is a seuence of stages, with
wor' progressing from one stage to the ne(t only when the preceding stage has been
completed. &or relational database development, this seuence will include stages li'e
eliciting user reuirements, analysing data relationships, designing the conceptual
schema, designing the internal schema, loading the database, defining user views and
interfaces, etc, through to the deployment of user facilities and database operations.
In practice, however, when users start to wor' with the database, the initial reuirements
inevitably change for a number of reasons including e(perience gained, a growing
amount of data to be processed, and, in this fast changing world, changes in the nature of
the business it supports. !hus, a database need to evolve, learning from e(perience and
allowing for changes in reuirements. In particular, we may e(pect periodic changes to6
improve database performance as data usage patterns changes or becomes clearer
add new applications to meet new processing reuirements
modify the conceptual schema as understanding of the enterprise#s perception of data
improves
-hanging a database, once the conceptual and internal schemas have been defined and
data actually loaded, can be a ma.or underta'ing even for seemingly small conceptual
changes. !his is because the data structures at the storage layer will need to be
reorganised, perhaps involving complete regeneration of the database. 5 good DBM%
should therefore provide facilities to modify a database with a minimum of
inconvenience. !he desired facilities can perhaps be broadly described to cover6
performance monitoring
database reorganisation
database restructuring
By performance monitoring we mean the collection of usage statistics and their analysis.
%tatistics necessary for performance optimisation generally fall under two headings6 static
and dynamic statistics. !he static statistics refer to the general state of the database and
can be collected by special monitoring programs when the database is inactive. 3(amples
of such data include the number of tuples per relation, the population of domains, the
distribution of relations over available storage space, etc. !he dynamic statistics refer to
run$time characteristics and can be collected only when the database is running.
3(amples include freuency of access to and updating of each relation and domain, use
of each type of data manipulation operator and associated response times, freuency of
dis' access for different usage types, etc.
It is the DB5#s responsibility to analyse such data, interpret them and where necessary
ta'e steps to reorganise the storage schema to optimise performance. Reorganising the
storage schema also entails the subseuent physical reorganisation of the data themselves.
!his is what we mean by database reorganisation.
!he restructuring of the conceptual schema implies changing its contents, such as6
adding8removing data items *ie. columns of a relation+
adding8removing entire relations
splitting8recombining relations
changing a relation#s primary 'eys
Getc
&or e(ample, assuming the relations as on page 4?D, suppose we now wish to record also
for each purchase transaction the sales representative responsible for the sale. We will
need therefore to add a column into the !ransaction relation, say with column name RO 6
Transaction( ,+, *+, 7+, /ate, 0nt )
!he intention, of course, is to record a uniue value under this column to denote a
particular sales representative. Details of such sales representatives will then be given in a
new relation6
7epresentative( 7+, 7name, 7city, 7phone)
5 retructured conceptual schema will normally be followed by a database reorganisation
in the sense e(plained above.
11.4 Data Inde!endence
Data independence refers to the independence of one user view *e(ternal schema+ with
respect to others. 5 high degree of independence is desirable as it will allow a DB5 to
change one view, to meet new reuirements and8or to optimise performance, without
affecting other views. Relational databases with appropriate relational sub$languages
have a high degree of data independence.
&or e(ample, suppose that the view
1y2Transaction2)( ,name, ,city, /ate, Total2!um )
as defined on page 4?D no longer meet the user#s needs. >et#s say that -city and Date are
no longer important, and that it is more important to 'now the product name and uantity
purchased. !his change is easily accommodated by changing the select$clause in the
definition thus6
Define Liew MyY!ransactionY4 5s
%elect -name, :name, @nt, !otalY%umH:riceV@nt
&rom -ustomer, !ransaction, :roduct
& !ransaction.:O H :roduct.:O
If each view is defined separately over the conceptual schema, then as long as the
conceptual schema does not change, a view may be redefined without affecting other
views. Thus the above change will have no effect on other views, unless they were built
upon My_Transaction_1.
Data independence is also used to refer to the independence of user views relative to the
conceptual schema. &or e(ample, the reader can verify that the change in the conceptual
schema in the last section *adding the attribute RO to !ransaction and adding the new
relation Representative+, does not affect MyY!ransactionY4 $ neither the original nor the
changed viewB. In general, if the relations and attributes referred to in a view definition
new attribute added
This replaces the
original specification
of Ccity and Date
items
are not removed in a restructuring, the view will not be affected. !hus we can
accommodate new *additive+ reuirements without affecting e(isting applications.
>astly, data independence may also refer to the e(tent to which we may change the
storage schema without affecting the conceptual or e(ternal schemas. We will not
elaborate on this as we have pointed out earlier that the storage level is too diverse for
meaningful treatment here.
11.5 Data rotection
!here are generally three types of data protection that any serious DBM% must provide.
!hese were briefly described in -hapter 4 and we summarise them here6
4. 5uthorisational %ecurity
!his refers to protection against unauthorised access and includes measures such as user
identification and password control, privacy 'eys, etc.
0. /perational %ecurity
!his refers to maintaining the integrity of data, ie. protecting the database from the
introduction of data that would violate identified integrity constraints.
1. :hysical %ecurity
!his refers to procedures to protect the physical data against accidental loss or damage of
storage euipment, theft, natural disaster, etc. It will typically involve ma'ing periodic
bac'up copies of the database, transaction .ournalling, error recovery techniues, etc.
In the conte(t of the relational data model, we can use relational calculus as a notation to
define integrity constraints, ie. we define them as formulae of relational calculus. In this
case, however, all variables must be bound variables as we are specifying properties over
their ranges rather than loo'ing for particular instantiations satisfying some predicate. &or
e(ample, suppose that for the :roduct relation, the :rice attribute should only have a
value greater than 4KK and less than IIIII. !his can be e(pressed *in D%> 5lpha style+
as6
7ange *roduct 8 A$$9
(8.*rice : );; 6 8.*rice < ===== )
!his is interpreted as an assertion that must always be true. 5ny data manipulation that
would ma'e it false would be disallowed *typically generating messages informing the
user of the violation+. !hus, not only does the relational data model unify data definition
and manipulation, but its control as well.
In the area of physical security, database bac'ups should of course be done periodically.
&or this purpose, it is perhaps best to view a database as a large set of physical pages,
where each page is a bloc' of fi(ed si)e serving as the basic unit of interaction between
the DBM% and storage devices. 5 database bac'up is thus essentially a copy of the entire
set of pages onto another storage medium that is 'ept in a secure and safe place. 5side
from the obvious need for bac'ups against damage of storage devices, theft, natural
disasters and the li'e, bac'ups are necessary to recover a consistent database in the event
of a database "crash#. %uch crashes can occur in the course of a seuence of database
transactions, particularly transactions that modify the database content.
%uppose, for e(ample, that the last bac'up was done at time t
K
, and subseuent to that, a
number of update transactions were applied one after another. %uppose further that the
first n transactions were successfully completed, but during the *nU4+
th
transaction a
system failure occurred *eg. dis' malfunction, operating system crash, power failure, etc+
leaving some pages in a corrupted state. In general, it is not possible to .ust reapply the
failed transactionAthe failure could have corrupted the updates performed by previous
transactions as well, or worse, it could have damaged the integrity of the storage model as
to ma'e some pages of the database unreadableB We have no recourse at this point but to
go bac' to the last 'nown consistent state of the database at time t
K
, ie. the entire contents
of the last bac'up is reinstated as the current database. /f course, in doing so, all the
transactions applied after t
K
are lost.
5t this point it may seem reasonable that, to guard against losing too much wor',
bac'ups should perhaps be done after each transactionAthen at most only the wor' of
one transaction is lost in case of failure. Powever, many database applications today are
transaction intensive typically involving many online users generating many transactions
freuently *eg. online airline reservation system+. Many databases, on the other hand, are
very large and an entire bac'up could ta'e hours to complete. While bac'up is being
performed the database must be inactive. !hus, it should be clear that this proposition is
impractical.
5s it is clearly desirable that transactions since the last bac'up are also somehow saved in
the event of crashes, an additional mechanism is needed. 3ssentially, such mechanisms
are based on journalling successful transactions applied to a database. !his simply means
that a copy of each transaction *or affected pages+ is recorded in a seuential file as they
are applied to the database.
!he simplest type of .ournalling is the -or1ard System >ournal. In this, whenever a page
is modified, a copy of the modified page is also simultaneously recorded into the forward
.ournal.
!o illustrate this mechanism, let the set of pages in a database be : H Mp
4
, p
0
, G p
n
N. If the
application of an update transaction ! on the database changes :
!
, where :
!
:, then
T':
!
) will be recorded in the forward .ournal. We use the notation T':
!
) to denote the set
of pages :
!
after the transaction ! has changed each page in :
!
. >i'ewise, we write T'p
i
)
to denote a page p
i
after it has been changed by transaction !. &urthermore, if ! was
applied successfully *ie. no crash during its processing+, a separator mar', say "D#, would
be written to the .ournal. !hus, after a number of successful transactions, the .ournal
would loo' as follows
R T':
!4
) D T':
!0
) D G T':
!'
) D S
5s a more concrete e(ample, suppose transaction !4 changed Mp
4
, p
0
, p
1
N, !0 changed
Mp
0
, p
1
, p
?
N, and !1 changed Mp
1
, p
?
, p
;
N, in that order and all successfully carried out.
!hen the .ournal would contain6
R T1' Mp
4
, p
0
, p
1
N ) D T2' M!4*p
0
+, !4*p
1
+, p
?
N ) D T3' M!0*!4*p
1
++, !0*p
?
+, p
;
N ) D S
2ow suppose a crash occurred .ust after !1 has been applied. !he recovery procedure
consists of two steps6
a+ replace the database with the latest bac'up
b+ read the system .ournal in the forward direction *hence the term "forward# .ournal+
and, for each set of .ournal pages that precedes the separator "7#, use it to replace the
corresponding pages in the database. 3ffectively, this duplicates the effect of applying
transactions in the order they were applied prior to the crash.
!he techniue is applicable even if the crash occurred during the last transaction. In this
case, the .ournal for the last transaction would be incomplete and, in particular, the
separator "7# would not be written out. %ay that transaction !1 was interrupted after
modifying pages p
1
and p
?
but before it could complete modifying p
;
. !hen the .ournal
would loo' as follows6
R T1' Mp
4
, p
0
, p
1
N ) D T2' M!4*p
0
+, !4*p
1
+, p
?
N ) D T3' M!0*!4*p
1
++, !0*p
?
+, GN ) S
In this case, recovery is e(actly as described above e(cept that the last incomplete bloc'
of changes will be ignored *no separator "D#+. /f course, the wor' of the last transaction is
lost, but this is unavoidable. It is possible, however, to augment the scheme further by
saving the transaction itself until its effects are completely written to the .ournal. !hen !1
above can be reapplied, as a third step in the recovery procedure.
While the forward .ournal can recover *almost+ fully from a crash, its disadvantage is that
it is still a relatively slow processAhundreds or even thousands of transactions may have
been applied since the last full bac'up, and the corresponding .ournals of each of these
transactions must be copied bac' in seuence to restore the state of the database. In some
applications, very fast recovery is needed.
In these cases, the ?ac$1ard System >ournal will be the more appropriate .ournalling and
recovery techniue. With this techniue, whenever a transaction changes a page, the page
contents before the update is saved. 5s before, if the transaction succesfully completes, a
separator is written. !hus the bac'ward .ournal for the same e(ample as above would be6
RM p
4
, p
0
, p
1
N 7 M T1'p
0
), T1'p
1
), p
?
N 7 M T2'!4*p
1
+), T2'p
?
), GN S
%ince each bloc' of .ournal pages represents the state immediately before a transaction is
applied, recovery consists of only one step6 read the .ournal in the bac'ward direction
until the first separator and replace the pages in the database with the corresponding
pages read from the .ournal. !hus, the bac'ward .ournal is li'e an "undo# fileAthe last
bloc' cancels the last transaction, the second last cancels the second last transaction, etc.
&eatures such as those discussed above can significantly facilitate the management of
corporate data resources. %uch features, together with the overall architecture and the
Data Model e(amined in previous chapters, determine the uality of a DBM% and are
thus often used as part of the principal criteria used in critical evaluation of competing
DBM%s.
Incomplete entry

DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS

Uploaded by

Copyright:

Available Formats

Introduction to Databases & Relational DM

Introduction to Databases & Relational Data Model

You might also like