Database Modeling and Design: Logical Design
()
About this ebook
Database Modeling and Design, Fifth Edition, focuses on techniques for database design in relational database systems.
This extensively revised fifth edition features clear explanations, lots of terrific examples and an illustrative case, and practical advice, with design rules that are applicable to any SQL-based system. The common examples are based on real-life experiences and have been thoroughly class-tested.
This book is immediately useful to anyone tasked with the creation of data models for the integration of large-scale enterprise data. It is ideal for a stand-alone data management course focused on logical database design, or a supplement to an introductory text for introductory database management.
- In-depth detail and plenty of real-world, practical examples throughout
- Loaded with design rules and illustrative case studies that are applicable to any SQL, UML, or XML-based system
- Immediately useful to anyone tasked with the creation of data models for the integration of large-scale enterprise data
Toby J. Teorey
Toby J. Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. He received his B.S. and M.S. degrees in electrical engineering from the University of Arizona, Tucson, and a Ph.D. in computer sciences from the University of Wisconsin, Madison. He was general chair of the 1981 ACM SIGMOD Conference and program chair for the 1991 Entity-Relationship Conference. Professor Teorey’s current research focuses on database design and data warehousing, OLAP, advanced database systems, and performance of computer networks. He is a member of the ACM and the IEEE Computer Society.
Read more from Toby J. Teorey
Database Design: Know It All Rating: 5 out of 5 stars5/5Physical Database Design: The Database Professional's Guide to Exploiting Indexes, Views, Storage, and More Rating: 5 out of 5 stars5/5Data Mining: Know It All Rating: 0 out of 5 stars0 ratings
Related to Database Modeling and Design
Related ebooks
Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist Rating: 5 out of 5 stars5/5Relational Database Design and Implementation: Clearly Explained Rating: 0 out of 5 stars0 ratingsInformation Modeling and Relational Databases Rating: 0 out of 5 stars0 ratingsRelational Database Design and Implementation Rating: 5 out of 5 stars5/5Principles of Data Integration Rating: 5 out of 5 stars5/5Developing High Quality Data Models Rating: 0 out of 5 stars0 ratingsData Mapping for Data Warehouse Design Rating: 5 out of 5 stars5/5Data Warehousing in the Age of Big Data Rating: 0 out of 5 stars0 ratingsBusiness Metadata: Capturing Enterprise Knowledge Rating: 4 out of 5 stars4/5DW 2.0: The Architecture for the Next Generation of Data Warehousing Rating: 4 out of 5 stars4/5Building a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Data Lake Development with Big Data Rating: 0 out of 5 stars0 ratingsDatabase Design and SQL for DB2 Rating: 5 out of 5 stars5/5Business Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling Rating: 0 out of 5 stars0 ratingsData Mining: Concepts and Techniques Rating: 4 out of 5 stars4/5Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration Rating: 5 out of 5 stars5/5Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses Rating: 4 out of 5 stars4/5Data Science: Concepts and Practice Rating: 3 out of 5 stars3/5Software Architecture for Big Data and the Cloud Rating: 0 out of 5 stars0 ratingsObject-Oriented Analysis and Design for Information Systems: Agile Modeling with UML, OCL, and IFML Rating: 1 out of 5 stars1/5Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice Rating: 5 out of 5 stars5/5Principles of Data Management: Facilitating information sharing Rating: 0 out of 5 stars0 ratingsBig Data: Principles and Paradigms Rating: 0 out of 5 stars0 ratingsManaging Data in Motion: Data Integration Best Practice Techniques and Technologies Rating: 0 out of 5 stars0 ratingsThe Data Model Resource Book, Volume 1: A Library of Universal Data Models for All Enterprises Rating: 0 out of 5 stars0 ratingsGraph Databases in Action: Examples in Gremlin Rating: 0 out of 5 stars0 ratingsAn Introduction to Data Base Design Rating: 0 out of 5 stars0 ratingsMicrosoft SQL Server 2014 Business Intelligence Development Beginner’s Guide Rating: 0 out of 5 stars0 ratingsIntroduction to DBMS: Designing and Implementing Databases from Scratch for Absolute Beginners Rating: 0 out of 5 stars0 ratings
Databases For You
Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Business Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learning PostgreSQL Rating: 1 out of 5 stars1/5Learn Git in a Month of Lunches Rating: 0 out of 5 stars0 ratingsExcel 2021 Rating: 4 out of 5 stars4/5Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsQuery Store for SQL Server 2019: Identify and Fix Poorly Performing Queries Rating: 0 out of 5 stars0 ratingsBehind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5SQL Clearly Explained Rating: 5 out of 5 stars5/5Python Projects for Everyone Rating: 0 out of 5 stars0 ratingsBeginning Microsoft SQL Server 2012 Programming Rating: 1 out of 5 stars1/5A Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsBuilding a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5SQL: Practical Guide for Developers Rating: 2 out of 5 stars2/5Learn SQL Server Administration in a Month of Lunches Rating: 0 out of 5 stars0 ratingsBeginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsAccess 2019 For Dummies Rating: 0 out of 5 stars0 ratingsArtificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry Rating: 0 out of 5 stars0 ratingsThe AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation Rating: 4 out of 5 stars4/5Access 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality Rating: 5 out of 5 stars5/5
Reviews for Database Modeling and Design
0 ratings0 reviews
Book preview
Database Modeling and Design - Toby J. Teorey
The Morgan Kaufmann Series in Data Management Systems (Selected Titles)
Joe Celko’s Data, Measurements and Standards in SQL
Joe Celko
Information Modeling and Relational Databases, 2nd Edition
Terry Halpin, Tony Morgan
Joe Celko’s Thinking in Sets
Joe Celko
Business Metadata
Bill Inmon, Bonnie O’Neil, Lowell Fryman
Unleashing Web 2.0
Gottfried Vossen, Stephan Hagemann
Enterprise Knowledge Management
David Loshin
Business Process Change, 2nd Edition
Paul Harmon
IT Manager’s Handbook, 2nd Edition
Bill Holtsnider & Brian Jaffe
Joe Celko’s Puzzles and Answers, 2nd Edition
Joe Celko
Architecture and Patterns for IT Service Management, Resource Planning, and Governance
Charles Betz
Joe Celko’s Analytics and OLAP in SQL
Joe Celko
Data Preparation for Data Mining Using SAS
Mamdouh Refaat
Querying XML: XQuery, XPath, and SQL/XML in Context
Jim Melton and Stephen Buxton
Data Mining: Concepts and Techniques, 2nd Edition
Jiawei Han and Micheline Kamber
Database Modeling and Design: Logical Design, 5th Edition
Toby J, Teorey, Sam S. Lightstone, Thomas P. Nadeau, and H. V. Jagadish
Foundations of Multidimensional and Metric Data Structures
Hanan Samet
Joe Celko’s SQL for Smarties: Advanced SQL Programming, 4th Edition
Joe Celko
Moving Objects Databases
Ralf Hartmut Güting and Markus Schneider
Joe Celko’s SQL Programming Style
Joe Celko
Data Mining, Second Edition: Concepts and Techniques
Jiawei Han, Micheline Kamber, Jian Pei
Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration
Earl Cox
Data Modeling Essentials, 3rd Edition
Graeme C. Simsion and Graham C. Witt
Developing High Quality Data Models
Matthew West
Location-Based Services
Jochen Schiller and Agnès Voisard
Managing Time in Relational Databases: How to Design, Update and Query Temporal Data
Tom Johnston and Randall Weis
Database Modeling with Microsoft® Visio for Enterprise Architects
Terry Halpin, Ken Evans, Patrick Hallock, Bill Maclean
Designing Data-Intensive Web Applications
Stephano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, Maristella Matera
Mining the Web: Discovering Knowledge from Hypertext Data
Soumen Chakrabarti
Advanced SQL: 1999—Understanding Object-Relational and Other Advanced Features
Jim Melton
Database Tuning: Principles, Experiments, and Troubleshooting Techniques
Dennis Shasha, Philippe Bonnet
SQL: 1999—Understanding Relational Language Components
Jim Melton, Alan R. Simon
Information Visualization in Data Mining and Knowledge Discovery
Edited by Usama Fayyad, Georges G. Grinstein, Andreas Wierse
Transactional Information Systems
Gerhard Weikum and Gottfried Vossen
Spatial Databases
Philippe Rigaux, Michel Scholl, and Agnes Voisard
Managing Reference Data in Enterprise Databases
Malcolm Chisholm
Understanding SQL and Java Together
Jim Melton and Andrew Eisenberg
Database: Principles, Programming, and Performance, 2nd Edition
Patrick and Elizabeth O’Neil
The Object Data Standard
Edited by R. G. G. Cattell, Douglas Barry
Data on the Web: From Relations to Semistructured Data and XML
Serge Abiteboul, Peter Buneman, Dan Suciu
Data Mining, Third Edition Practical Machine Learning Tools and Techniques with Java Implementations
Ian Witten, Eibe Frank, and Mark A. Hall
Joe Celko’s Data and Databases: Concepts in Practice
Joe Celko
Developing Time-Oriented Database Applications in SQL
Richard T. Snodgrass
Web Farming for the Data Warehouse
Richard D. Hackathorn
Management of Heterogeneous and Autonomous Database Systems
Edited by Ahmed Elmagarmid, Marek Rusinkiewicz, Amit Sheth
Object-Relational DBMSs: 2nd Edition
Michael Stonebraker and Paul Brown, with Dorothy Moore
Universal Database Management: A Guide to Object/Relational Technology
Cynthia Maro Saracco
Readings in Database Systems, 3rd Edition
Edited by Michael Stonebraker, Joseph M. Hellerstein
Understanding SQL’s Stored Procedures: A Complete Guide to SQL/PSM
Jim Melton
Principles of Multimedia Database Systems
V. S. Subrahmanian
Principles of Database Query Processing for Advanced Applications
Clement T. Yu, Weiyi Meng
Advanced Database Systems
Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, Roberto Zicari
Principles of Transaction Processing, 2nd Edition
Philip A. Bernstein, Eric Newcomer
Using the New DB2: IBMs Object-Relational Database System
Don Chamberlin
Distributed Algorithms
Nancy A. Lynch
Active Database Systems: Triggers and Rules For Advanced Database Processing
Edited by Jennifer Widom, Stefano Ceri
Migrating Legacy Systems: Gateways, Interfaces, & the Incremental Approach
Michael L. Brodie, Michael Stonebraker
Atomic Transactions
Nancy Lynch, Michael Merritt, William Weihl, Alan Fekete
Query Processing for Advanced Database Systems
Edited by Johann Christoph Freytag, David Maier, Gottfried Vossen
Transaction Processing
Jim Gray, Andreas Reuter
Database Transaction Models for Advanced Applications
Edited by Ahmed K. Elmagarmid
A Guide to Developing Client/Server SQL Applications
Setrag Khoshafian, Arvola Chan, Anna Wong, Harry K. T. Wong
Acquiring Editor: Rick Adams
Development Editor: David Bevans
Project Manager: Sarah Binns
Designer: Joanne Blank
Morgan Kaufmann Publishers is an imprint of Elsevier.
30 Corporate Drive, Suite 400, Burlington, MA 01803, USA
This book is printed on acid-free paper.
© 2011 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Database modeling and design : logical design / Toby Teorey … [et al.]. -- 5th ed.
p. cm.
Rev. ed. of: Database modeling & design / Tobey Teorey, Sam Lightstone, Tom Nadeau. 4th ed. 2005.
ISBN 978-0-12-382020-4
1. Relational databases. 2. Database design. I. Teorey, Toby J. Database modeling & design.
QA76.9.D26T45 2011
005.75′6--dc22
2010049921
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.elsevierdirect.com
Printed in the United States of America
11 12 13 14 15 5 4 3 2 1
To Julie, for her wonderful support
—Toby Teorey
To my wife and children, Elisheva, Hodaya, and Avishai
—Sam Lightstone
To Carol, Paula, Mike, and Lagi
—Tom Nadeau
To Aradhna, Siddhant, and Kamya
—H V Jagadish
Preface
Database design technology has undergone significant evolution in recent years, although business applications continue to be dominated by the relational data model and relational database systems. The relational model has allowed the database designer to separately focus on logical design (defining the data relationships and tables) and physical design (efficiently storing data onto and retrieving data from physical storage). Other new technologies such as data warehousing, OLAP, and data mining, as well as object-oriented, spatial, and Web-based data access, have also had an important impact on database design.
In this fifth edition, we continue to concentrate on techniques for database design in relational database systems. However, because of the vast and explosive changes in new physical database design techniques in recent years, we have reorganized the topics into two separate books: Database Modeling and Design: Logical Design (5th Edition) and Physical Database Design: The Database Professional’s Guide (1st Edition)
Logical database design is largely the domain of application designers, who design the logical structure of the database to suit application requirements for data manipulation and structured queries. The definition of database tables for a particular vendor is considered to be within the domain of logical design in this book, although many database practitioners refer to this step as physical design.
Physical database design, in the context of these two books, is performed by the implementers of the database servers, usually database administrators (DBAs) who must decide how to structure the database for a particular machine (server), and optimize that structure for system performance and system administration. In smaller companies these communities may in fact be the same people, but for large enterprises they are very distinct.
We start the discussion of logical database design with the entity-relationship (ER) approach for data requirements specification and conceptual modeling. We then take a detailed look at another dominating data modeling approach, the Unified Modeling Language (UML). Both approaches are used throughout the text for all the data modeling examples, so the user can select either one (or both) to help follow the logical design methodology. The discussion of basic principles is supplemented with common examples that are based on real-life experiences.
Organization
The database life cycle is described in Chapter 1. In Chapter 2, we present the most fundamental concepts of data modeling and provide a simple set of notational constructs (the Chen notation for the ER model) to represent them. The ER model has traditionally been a very popular method of conceptualizing users’ data requirements. Chapter 3 introduces the UML notation for data modeling. UML (actually UML-2) has become a standard method of modeling large-scale systems for object-oriented languages such as C++ and Java, and the data modeling component of UML is rapidly becoming as popular as the ER model. We feel it is important for the reader to understand both notations and how much they have in common.
Chapters 4 and 5 show how to use data modeling concepts in the database design process. Chapter 4 is devoted to direct application of conceptual data modeling in logical database design. Chapter 5 explains the transformation of the conceptual model to the relational model, and to Structured Query Language (SQL) syntax specifically.
Chapter 6 is devoted to the fundamentals of database normalization through third normal form and its variation, Boyce-Codd normal form, showing the functional equivalence between the conceptual model (both ER and UML) and the relational model for third normal form.
The case study in Chapter 7 summarizes the techniques presented in Chapters 1 through 6 with a new problem environment.
Chapter 8 illustrates the basic features of object-oriented database systems and how they differ from relational database systems. An impedance mismatch
problem often arises due to data being moved between tables in a relational database and objects in an application program. Extensions made to relational systems to handle this problem are described.
Chapter 9 looks at Web technologies and how they impact databases and database design. XML is perhaps the best known Web technology. An overview of XML is given, and we explore database design issues that are specific to XML.
Chapter 10 describes the major logical database design issues in business intelligence - data warehousing, online analytical processing (OLAP) for decision support systems, and data mining.
Chapter 11 discusses three of the currently most popular software tools for logical design: IBM’s Rational Data Architect, Computer Associates’ AllFusion ERwin Data Modeler, and Sybase’s PowerDesigner. Examples are given to demonstrate how each of these tools can be used to handle complex data modeling problems.
The Appendix contains a review of the basic data definition and data manipulation components of the relational database query language SQL (SQL-99) for those readers who lack familiarity with database query languages. A simple example database is used to illustrate the SQL query capability.
The database practitioner can use this book as a guide to database modeling and its application to database design for business and office environments and for well-structured scientific and engineering databases. Whether you are a novice database user or an experienced professional, this book offers new insights into database modeling and the ease of transition from the ER model or UML model to the relational model, including the building of standard SQL data definitions. Thus, no matter whether you are using IBM’s DB2, Oracle, Microsoft’s SQL Server, Access, or MySQL for example, the design rules set forth here will be applicable. The case studies used for the examples throughout the book are from real-life databases that were designed using the principles formulated here. This book can also be used by the advanced undergraduate or beginning graduate student to supplement a course textbook in introductory database management, or for a stand-alone course in data modeling or database design.
Typographical Conventions
For easy reference, entity and class names (Employee, Department, and so on) are capitalized from Chapter 2 forward. Throughout the book, relational table names (product, product_count) are set in boldface for readability.
Acknowledgments
We wish to acknowledge colleagues that contributed to the technical continuity of this book: James Bean, Mike Blaha, Deb Bolton, Joe Celko, Jarir Chaar, Nauman Chaudhry, David Chesney, David Childs, Pat Corey, John DeSue, Yang Dongqing, Ron Fagin, Carol Fan, Jim Fry, Jim Gray, Bill Grosky, Wei Guangping, Wendy Hall, Paul Helman, Nayantara Kalro, John Koenig, Ji-Bih Lee, Marilyn Mantei Tremaine, Bongki Moon, Robert Muller, Wee-Teck Ng, Dan O’Leary, Kunle Olukotun, Dorian Pyle, Dave Roberts, Behrooz Seyed-Abbassi, Dan Skrbina, Rick Snodgrass, Il-Yeol Song, Dick Spencer, Amjad Umar, and Susanne Yul. We also wish to thank the Department of Electrical Engineering and Computer Science (EECS), especially Jeanne Patterson, at the University of Michigan for providing resources for writing and revising. Finally, thanks for the generosity of our wives and children that has permitted us the time to work on this text.
Solutions Manual
A solutions manual to all exercises is available. Contact the publisher for further information.
About the Authors
Toby Teorey is Professor Emeritus in the Computer Science and Engineering Division (EECS Department) at the University of Michigan, Ann Arbor. He received his B.S. and M.S. degrees in electrical engineering from the University of Arizona, Tucson, and a Ph.D. in computer science from the University of Wisconsin, Madison. He was chair of the 1981 ACM SIGMOD Conference and program chair of the 1991 Entity–Relationship Conference. Professor Teorey’s current research focuses on database design and performance of computing systems. He is a member of the ACM.
Sam Lightstone is a Senior Technical Staff Member and Development Manager with IBM’s DB2 Universal Database development team. He is the cofounder and leader of DB2’s autonomic computing R&D effort. He is also a member of IBM’s Autonomic Computing Architecture Board, and in 2003 he was elected to the Canadian Technical Excellence Council, the Canadian affiliate of the IBM Academy of Technology. His current research includes numerous topics in autonomic computing and relational DBMSs, including automatic physical database design, adaptive self-tuning resources, automatic administration, benchmarking methodologies, and system control. He is an IBM Master Inventor with over 25 patents and patents pending, and he has published widely on autonomic computing for relational database systems. He has been with IBM since 1991.
Tom Nadeau is a Senior Database Software Engineer at the American Chemical Society. He received his B.S. degree in computer science and M.S. and Ph.D. degrees in electrical engineering and computer science from the University of Michigan, Ann Arbor. His technical interests include data warehousing, OLAP, data mining, text mining, and machine learning. He won the best paper award at the 2001 IBM CASCON Conference.
H. V. Jagadish is the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan. He received a Ph.D. from Stanford in 1985 and worked many years for AT&T, where he eventually headed the database department. He also taught at the University of Illinois. He currently leads research in databases in the context of the Internet and in biomedicine. His research team built a native XML store, called TIMBER, a hierarchical database for storing and querying XML data. He is Editor-in-Chief of the Proceedings of the Very Large Data Base Endowment (PVLDB), a member of the Board of the Computing Research Association (CRA), and a Fellow of the ACM.
Table of Contents
Cover Image
Title
Series
Copyright
Dedication
Preface
About the Authors
1. Introduction
Data and Database Management
Database Life Cycle
Conceptual Data Modeling
Summary
Tips and Insights for Database Professionals
Literature Summary
2. The Entity–Relationship Model
Fundamental ER Constructs
Advanced ER Constructs
Summary
Tips and Insights for Database Professionals
Literature Summary
3. The Unified Modeling Language
Class Diagrams
Activity Diagrams
Summary
Tips and Insights for Database Professionals
Literature Summary
4. Requirements Analysis and Conceptual Data Modeling
Introduction
Requirements Analysis
Conceptual Data Modeling
View Integration
Entity Clustering for ER Models
Summary
Tips and Insights for Database Professionals
Literature Summary
5. Transforming the Conceptual Data Model to SQL
Transformation Rules and SQL Constructs
Transformation Steps
Summary
Tips and Insights for Database Professionals
Literature Summary
6. Normalization
Fundamentals of Normalization
The Design of Normalized Tables: A Simple Example
Normalization of Candidate Tables Derived from ER Diagrams
Determining the Minimum Set of 3NF Tables
Summary
Tips and Insights for Database Professionals
Literature Summary
7. An Example of Logical Database Design
Requirements Specification
Logical Design
Summary
Tips and Insights for Database Professionals
8. Object-Relational Design
Object Orientation
Object-Oriented Databases
Object-Relational Databases
Summary
Tips and Insights for Database Professionals
Literature Summary
9. XML and Web Databases
XML
XML Design
Web-Based Applications
Summary
Tips and Insights for Database Professionals
Literature Summary
10. Business Intelligence
Data Warehousing
Online Analytical Processing
Data Mining
Summary
Tips and Insights for Database Professionals
Literature Summary
11. CASE Tools for Logical Database Design
Introduction to the CASE Tools
Key Capabilities to Watch for
The Basics
Generating a Database from a Design
Database Support
Collaborative Support
Distributed Development
Application Life Cycle Tooling Integration
Design Compliance Checking
Reporting
Modeling a Data Warehouse
Semistructured Data—XML
Summary
Tips and Insights for Database Professionals
Literature Summary
APPENDIX. The Basics of SQL
SQL Names and Operators
Data Definition Language
Data Manipulation Language
References
Exercises
ER and UML Conceptual Data Modeling
Conceptual Data Modeling and Integration
Transformation of the Conceptual Model to SQL
Normalization and Minimum Set of Tables
Logical Database Design (Generic Problem)
Olap
Solutions to Selected Exercises
Glossary
Index
Bonus Chapter Opener
3. Query Optimization and Plan Selection
3.1 Query Processing and Optimization
3.2 Useful Optimization Features in Database Systems
3.3 Query Cost Evaluation—An Example
3.4 Query Execution Plan Development
3.5 Selectivity Factors, Table Size, and Query Cost Estimation
3.6 Summary
Tips and Insights for Database Professionals
A. A Simple Performance Model for Databases
A.1 I/O Time Cost—Individual Block Access
A.2 I/O Time Cost—Table Scans and Sorts
A.3 Network Time Delays
A.4 CPU Time Delays
1
Introduction
Chapter outline
Database technology has evolved rapidly in the past three decades since the rise and eventual dominance of relational database systems. While many specialized database systems (object-oriented, spatial, multimedia, etc.) have found substantial user communities in the sciences and engineering, relational systems remain the dominant database technology for business enterprises.
Relational database design has evolved from an art to a science that has been partially implementable as a set of software design aids. Many of these design aids have appeared as the database component of computer-aided software engineering (CASE) tools, and many of them offer interactive modeling capability using a simplified data modeling approach. Logical design—that is, the structure of basic data relationships and their definition in a particular database system—is largely the domain of application designers. The work of these designers can be effectively done with tools such as the ERwin Data Modeler or Rational Rose with Unified Modeling Language (UML), as well as with a purely manual approach. Physical design—the creation of efficient data storage and retrieval mechanisms on the computing platform you are using—is typically the domain of the database administrator (DBA). Today’s DBAs have a variety of vendor-supplied tools available to help design the most efficient databases. This book is devoted to the logical design methodologies and tools most popular for relational databases today. Physical design methodologies and tools are covered in a separate book.
In this chapter, we review the basic concepts of database management and introduce the role of data modeling and database design in the database life cycle.
Data and Database Management
The basic component of a file in a file system is a data item, which is the smallest named unit of data that has meaning in the real world—for example, last name, first name, street address, ID number, and political party. A group of related data items treated as a unit by an application is called a record. Examples of types of records are order, salesperson, customer, product, and department. A file is a collection of records of a single type. Database systems have built upon and expanded these definitions: In a relational database, a data item is called a column or attribute, a record is called a row or tuple, and a file is called a table.
A database is a more complex object; it is a collection of interrelated stored data that serves the needs of multiple users within one or more organizations—that is, an interrelated collection of many different types of tables. The motivation for using databases rather than files has been greater availability to a diverse set of users, integration of data for easier access and update for complex transactions, and less redundancy of data.
A database management system (DBMS) is a generalized software system for manipulating databases. A DBMS supports a logical view (schema, subschema); physical view (access methods, data clustering); data definition language; data manipulation language; and important utilities such as transaction management and concurrency control, data integrity, crash recovery, and security. Relational database systems, the dominant type of systems for well-formatted business databases, also provide a greater degree of data independence than the earlier hierarchical and network (CODASYL) database management systems. Data independence is the ability to make changes in either the logical or physical structure of the database without requiring reprogramming of application programs. It also makes database conversion and reorganization much easier. Relational DBMSs provide a much higher degree of data independence than previous systems; they are the focus of our discussion on data modeling.
Database Life Cycle
The database life cycle incorporates the basic steps involved in designing a global schema of the logical database, allocating data across a computer network, and defining local DBMS-specific schemas. Once the design is completed, the life cycle continues with database implementation and maintenance. This chapter contains an overview of the database life cycle, as shown in Figure 1.1. In succeeding chapters we will focus on the database design process from the modeling of requirements through logical design (Steps I and II below). We illustrate the result of each step of the life cycle with a series of diagrams in Figure 1.2. Each diagram shows a possible form of the output of each step so the reader can see the progression of the design process from an idea to an actual database implementation. These forms are discussed in much more detail in Chapters 2–6.
I. Requirements analysis. The database requirements are determined by interviewing both the producers and users of data and using the information to produce a formal requirements specification. That specification includes the data required for processing, the natural data relationships, and the software platform for the database implementation. As an example, Figure 1.2 (Step I) shows the concepts of products, customers, salespersons, and orders being formulated in the mind of the end user during the interview process.
II. Logical design. The global schema, a conceptual data model diagram that shows all the data and their relationships, is developed using techniques such as entity-relationship (ER) or UML. The data model constructs must be ultimately transformed into tables.
a. Conceptual data modeling. The data requirements are analyzed and modeled by using an ER or UML diagram that includes many features we will study in Chapters 2 and 3, for example, semantics for optional relationships, ternary relationships, supertypes, and subtypes (categories). Processing requirements are typically specified using natural language expressions or SQL commands along with the frequency of occurrence. Figure 1.2 (Step II.a) shows a possible ER model representation of the product/customer database in the mind of the end user.
b. View integration. Usually, when the design is large and more than one person is involved in requirements analysis, multiple views of data and relationships occur, resulting in inconsistencies due to variance in taxonomy, context, or perception. To eliminate redundancy and inconsistency from the model, these views must be rationalized
and consolidated