You are on page 1of 5

[MUSIC].

So, last time we talked about data models


and used them to motivate databases and
define the term database in, in a, in a
broad sense.
And now I want to build on that to talk
specifically about relational databases,
okay?
So, last time we talked about these
questions you could use to reason about
different ways of organizing data.
And evaluate them with respect to your
requirements.
And I want to talk about these questions
and apply them to examples of different
kinds of databases you saw in the past.
that end up motivating the relational
model.
And in particular, the reason I want to
go through this, this sort of historical
view of things.
Is that you see some of these same
designs being proposed in terms of no
sequel systems.
And some of the same issues come up, both
the benefits, both pros and cons are
still there.
So, it's good to have a historical
perspective on them when you're
evaluating these modern systems that are
becoming popular.
Okay, so the questions we talked about
were, how is data physically organized on
disks?
You can ask this about a system.
what kind of queries are efficiently
supported?
How do you update things?
And so on.
Alright, so one example is, well, I'll
call it network database although
arguably this is just sort of
pre-databases where you just have files.
And you, if you go back to our questions,
you might ask, well how is it physically
organized on disk?
Well, if you, you're using sort of a
parts and order model here, you, you
would have a, order record.
And it would have an address associated
with this order record that would
physically point to the first part
associated with that order.
And that part would point to the next one
and so on.
Another field in the record would point
to the customer that made that order.
Okay?
So going back, you know what kind of
crews are efficiently supported.

Well, if I want to find all of the parts


associated with an order I can do that
pretty efficiently.
I have an access to over, I just give an
order, and I'll just walk down this chain
to gather all the parts.
What kind of queries are not supported
officially is you know i want to find all
the orders that involve a particular
part.
All the orders that involve this washer
well now i have to scan every order to
look for them.
Okay?
There are some ways around that by
putting back pointers and so on.
Another problem with this file oriented
you know, sort of proto-database model is
that.
Whenever I want to make a change to the
data, all right, if I want to have an
extra field added to support the billing
customer as opposed to the shipping
customer.
Well, I've just added a new field, I've
extended the length of this record, that
means that everything else below that
record needs to be moved.
More importantly, all the programs that,
that, navigate the structure now need to
be aware of this other field.
They all need to be rewritten to
accommodate this extra piece of data,
okay.
Moreover, if you, if you want to support
different access methods as we talked
about, if you want to look, lets say by
part and find all the orders.
I end up having to make a complete second
copy, of the database.
And now when I update, when I make a
change to one copy, I need to make a
change to all copies.
And you can imagine how the space of
possible copies might grow might get
pretty big.
Okay?
So, a partial solution to this problem
was this notion of hierarchical databases
characterized by perhaps IBM's IMS
system.
Which actually still exists and still has
customers.
And so here, they used to order organize
data in terms of segments.
but still the logical model was, has the
higher flavor that we saw in the network
model as well.
So, here I switched, I made the top-level
access, the customer's set of order.

And so logically what you have is and,


on, and order is only located underneath
the customer, and the part is only
located underneath an order.
However, given that there are in separate
segments I can make a change to one
segment.
Without having to break all the code that
accesses other segments.
The downside that still exist here though
is that, the programmer, the application
developer, still needs to understand this
hierarchy in order to find anything.
Okay?
They have to actually know exactly how
things are, are organized for example,
that orders appear under customers.
so you still have to anticipate what kind
of access methods your customers are
going to want and design for those.
All right ?
Updates here are a little bit easier
given that I can add an order to one
segment stored elsewhere without
reflecting all the other structures.
And I didn't even had to field, and I can
only, only make changes to the orders as
I suppose changing everything,.
more over the softer layer on top of this
variable just sort of insulate from those
kind of changes with us, with, with some
reliability.
Okay, so this new field would only be,
passed back to the client when they
actually needed that new field.
Okay?
So, there's some measure of, what I'll
call data independence, and we'll talk
about that a little bit more in a few
minutes.
Okay, so moving towards relational
databases the one view of what a
relational database really is.
Is, here I'm quoting Curt Monash, who's
an analyst for the database industry.
And he says, you know, Relational
Database Management Systems were invented
to let you use one set of data in
multiple ways.
Including ways that were unforeseen at
the time the database was built and at
the time that the first applications were
written.
And so I want to emphasize here that this
is the key idea of relational databases,
not, you know, SQL.
And not some of the other things you may
associate it with, with particularly
implementations.
It's really just about organizing the

data in such a way, just to support


unforeseen access methods, querying in
ways that you didn't anticipate when
organized it.
Insulating applications from changes.
Okay, so what is a relational database?
Well, at the simplest level everything is
a relation which is synonymous with a
table, right?
Everything's rows and columns, and this,
this probably doesn't need to be made
explicitly, but let me do so.
Every row in the table has exactly the
same columns.
Has the same number of columns, but they
also have the same types.
Okay?
So, if a column has an integer and one
row, then it needs to be an integer in
all the rows.
All right.
And then, a consequence of this model of
everything being a table is that you
don't have pointers anymore.
Right, you don't have physical addresses.
All you have is tables.
And so, relationships between different
data items are implicit.
So, instead of having the, so here we
switch to the, the domain one of course
and students.
So, this table is a student takes, lets
say takes, a student takes course and
this is a student record.
Well, instead of having a physical
pointer form the, course record back to
the student, we just have a shared ID.
The only, the only relationship between
these two data items is they both have
the same value in a particular column.
Okay, and so this is, intuitively this
sounds really bad for performance right
off the bat, right?
If I want to go look up all the students
associated, all the student's names
associated with a particular course.
Once I have my course, I need to go look
up in this table, all the values that
match.
As opposed to just navigating directly to
them which you can do with the
hierarchical method.
But, if I want to go the other direction.
It's the exact same process.
I look up the names I want to find, you
know, all the courses that a student has
taken, right.
I can do so the same why.
I do have to do the look ups, which maybe
is a cost in performance.

But the mechanism by which I look things


up is the same in both cases.
Moreover, everything is stored only once
which is, which is a feature that the
hierarchical databases were able to
achieve in most cases.
Okay.
But the network databases were not.
We don't have that multiple copies of
things lying around.
All right.
So, the philosophy here is, you know,
being cute about this, the quote from the
19th century is that, you know, God made
the integers, all else is the work of
man.
Well, you know, Codd made the relations,
which is a reference to Edgar Codd, he
wrote the first relational database
paper.
And when on to win the Turing Award for
his work, which is sort of the Nobel
Prize in computer science, Codd made
relations and all else is the work of
man.
So, everything is a table, is the number
one thing to remember about the
relational data model.
Everything is a relation.
All right.
So, let's actually break here and I'll
pick up with this slide next time.

You might also like