and used them to motivate databases and define the term database in, in a, in a broad sense. And now I want to build on that to talk specifically about relational databases, okay? So, last time we talked about these questions you could use to reason about different ways of organizing data. And evaluate them with respect to your requirements. And I want to talk about these questions and apply them to examples of different kinds of databases you saw in the past. that end up motivating the relational model. And in particular, the reason I want to go through this, this sort of historical view of things. Is that you see some of these same designs being proposed in terms of no sequel systems. And some of the same issues come up, both the benefits, both pros and cons are still there. So, it's good to have a historical perspective on them when you're evaluating these modern systems that are becoming popular. Okay, so the questions we talked about were, how is data physically organized on disks? You can ask this about a system. what kind of queries are efficiently supported? How do you update things? And so on. Alright, so one example is, well, I'll call it network database although arguably this is just sort of pre-databases where you just have files. And you, if you go back to our questions, you might ask, well how is it physically organized on disk? Well, if you, you're using sort of a parts and order model here, you, you would have a, order record. And it would have an address associated with this order record that would physically point to the first part associated with that order. And that part would point to the next one and so on. Another field in the record would point to the customer that made that order. Okay? So going back, you know what kind of crews are efficiently supported.
Well, if I want to find all of the parts
associated with an order I can do that pretty efficiently. I have an access to over, I just give an order, and I'll just walk down this chain to gather all the parts. What kind of queries are not supported officially is you know i want to find all the orders that involve a particular part. All the orders that involve this washer well now i have to scan every order to look for them. Okay? There are some ways around that by putting back pointers and so on. Another problem with this file oriented you know, sort of proto-database model is that. Whenever I want to make a change to the data, all right, if I want to have an extra field added to support the billing customer as opposed to the shipping customer. Well, I've just added a new field, I've extended the length of this record, that means that everything else below that record needs to be moved. More importantly, all the programs that, that, navigate the structure now need to be aware of this other field. They all need to be rewritten to accommodate this extra piece of data, okay. Moreover, if you, if you want to support different access methods as we talked about, if you want to look, lets say by part and find all the orders. I end up having to make a complete second copy, of the database. And now when I update, when I make a change to one copy, I need to make a change to all copies. And you can imagine how the space of possible copies might grow might get pretty big. Okay? So, a partial solution to this problem was this notion of hierarchical databases characterized by perhaps IBM's IMS system. Which actually still exists and still has customers. And so here, they used to order organize data in terms of segments. but still the logical model was, has the higher flavor that we saw in the network model as well. So, here I switched, I made the top-level access, the customer's set of order.
And so logically what you have is and,
on, and order is only located underneath the customer, and the part is only located underneath an order. However, given that there are in separate segments I can make a change to one segment. Without having to break all the code that accesses other segments. The downside that still exist here though is that, the programmer, the application developer, still needs to understand this hierarchy in order to find anything. Okay? They have to actually know exactly how things are, are organized for example, that orders appear under customers. so you still have to anticipate what kind of access methods your customers are going to want and design for those. All right ? Updates here are a little bit easier given that I can add an order to one segment stored elsewhere without reflecting all the other structures. And I didn't even had to field, and I can only, only make changes to the orders as I suppose changing everything,. more over the softer layer on top of this variable just sort of insulate from those kind of changes with us, with, with some reliability. Okay, so this new field would only be, passed back to the client when they actually needed that new field. Okay? So, there's some measure of, what I'll call data independence, and we'll talk about that a little bit more in a few minutes. Okay, so moving towards relational databases the one view of what a relational database really is. Is, here I'm quoting Curt Monash, who's an analyst for the database industry. And he says, you know, Relational Database Management Systems were invented to let you use one set of data in multiple ways. Including ways that were unforeseen at the time the database was built and at the time that the first applications were written. And so I want to emphasize here that this is the key idea of relational databases, not, you know, SQL. And not some of the other things you may associate it with, with particularly implementations. It's really just about organizing the
data in such a way, just to support
unforeseen access methods, querying in ways that you didn't anticipate when organized it. Insulating applications from changes. Okay, so what is a relational database? Well, at the simplest level everything is a relation which is synonymous with a table, right? Everything's rows and columns, and this, this probably doesn't need to be made explicitly, but let me do so. Every row in the table has exactly the same columns. Has the same number of columns, but they also have the same types. Okay? So, if a column has an integer and one row, then it needs to be an integer in all the rows. All right. And then, a consequence of this model of everything being a table is that you don't have pointers anymore. Right, you don't have physical addresses. All you have is tables. And so, relationships between different data items are implicit. So, instead of having the, so here we switch to the, the domain one of course and students. So, this table is a student takes, lets say takes, a student takes course and this is a student record. Well, instead of having a physical pointer form the, course record back to the student, we just have a shared ID. The only, the only relationship between these two data items is they both have the same value in a particular column. Okay, and so this is, intuitively this sounds really bad for performance right off the bat, right? If I want to go look up all the students associated, all the student's names associated with a particular course. Once I have my course, I need to go look up in this table, all the values that match. As opposed to just navigating directly to them which you can do with the hierarchical method. But, if I want to go the other direction. It's the exact same process. I look up the names I want to find, you know, all the courses that a student has taken, right. I can do so the same why. I do have to do the look ups, which maybe is a cost in performance.
But the mechanism by which I look things
up is the same in both cases. Moreover, everything is stored only once which is, which is a feature that the hierarchical databases were able to achieve in most cases. Okay. But the network databases were not. We don't have that multiple copies of things lying around. All right. So, the philosophy here is, you know, being cute about this, the quote from the 19th century is that, you know, God made the integers, all else is the work of man. Well, you know, Codd made the relations, which is a reference to Edgar Codd, he wrote the first relational database paper. And when on to win the Turing Award for his work, which is sort of the Nobel Prize in computer science, Codd made relations and all else is the work of man. So, everything is a table, is the number one thing to remember about the relational data model. Everything is a relation. All right. So, let's actually break here and I'll pick up with this slide next time.