Professional Documents
Culture Documents
Team
Team number Specialty Customer Customer Customer Customer Customer Customer Customer number name address activity telephone fax
member
is a member of
Employee Employee number First name Last name Employee function Employee salary
subcontract
staffed by
is assigned to
Project Project number Project name Project label Start date End date
Task
Task name Task cost
contains
ER modeling technique is the major data modeling method in Information Engineering and is widely supported by most of CASE tools. Data modeling is the foundation of most database-centered transaction processing systems and data warehouse systems
Minder Chen, 1993~2006 Data Modeling - 3 -
Application Logic
Process Flow
Programs, Procedures
Data Model
Tables, Indexes
Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993. Data Modeling - 4 Minder Chen, 1993~2006
Processes Behavior
Architecture
Multiple Perspectives
We use this data
ONE BUSINESS
We do these things
DATA
ACTIVITY
HIRE EMPLOYEE
PAY EMPLOYEE
FIRE EMPLOYEE
Data Modeling - 6 -
Member
Agreement
sells; is sold on
generates; generated by
Product
Promotion
sponsors; is sponsored by
Club
Data Modeling - 7 -
Entity Types
Definition:
An entity is an object or event, real or abstract, about which we would like to store data. Entity is the abbreviation of entity type. It represent a set of entity instances which can be described by the same set of attribute types. The value of the same attribute for each entity instance may be different.
Examples
Customer, Customer Order, Product, Hourly Employee, Project, Department, Unfilled Customer Order
Be clear and concise Avoid abbreviation Be consist with users terminology Identify synonyms
Customer Product Supplier Teacher Client Merchandise Vendor Faculty
Use one name as the official name and document others as aliases
Minder Chen, 1993~2006 Data Modeling - 9 -
Data Modeling - 10 -
Properties of Entity Types Name Description Identifier Properties: Estimated number (Max., Min., Average) of entity instances Expected growth rate of entity instances Subject Area in which the Entity Type resides Attributes that describe the Entity Types Examples of entity type instances
Data Modeling - 11 -
Definition of an Entity Type A poor definition of Customer: Anyone that buys something from the company.
Can employees be a customer? Can a leaser be a customer? If the company sold a subsidiary to another company, does the new owner consider a customer?
Good Definition
Compatible
Customer: An ORGANIZATION that purchase PRODUCTs for personal use. Distributor: An ORGANIZATION that purchase PRODUCTs for resale.
Precision:
With appropriate qualifiers Example: An ORGANIZATION is considered to have purchase a PRODUCT when we receive a valid PURCHASE ORDER from it.
Complete
ORGANIZATION, PRODUCT, PURCHASE ORDER need to be defined.
Entity Type
Customer Product Raw-material Supplier Buyer
Description
Information about all persons or organizations who purchases
All goods manufactured and sold Components used to manufacture Products. Vendors of Raw Materials.
Company personnel responsible for purchasing Raw-Materials from Suppliers
Data Modeling - 14 -
Entity Type and Entity Instance (Occurrence) Entity Types Vendor Employee Course Department Entity Instance ABC Co. John Smith Intro. to IE
Marketing Department
Data Modeling - 15 -
Maryland Organization Unit Customer President Bill Clinton Department of Commerce Address
Data Modeling - 16 -
Finding Entity Types Interviews with users JAD workshops Business forms Reports Computer files using reverse engineering Operation manuals
Data Modeling - 17 -
Resources
Any resources that an organization needs to manage should be represented as an Entity Type. Information assists the efficient and effective use of other resources through improved decision. Examples: Inventory, Machine, Bank Account, and Customer.
Roles Played
Roles can be played by persons or organizational units. Examples: Customers, Managers, and Account representatives.
Events
Events are incidents that occur at points in time. An event often involved an interaction between two Entity Types or an action that changes the status of an Entity Type. Examples: Sale, Delivery, and Registration of a motor vehicle.
Minder Chen, 1993~2006 Data Modeling - 18 -
BIAIT:
Analysis of Orders Ordered entities can be a thing, a space, or a skill. View the order from supplier side. If an organization receives no orders, it has no reason for existing. An organization unit can receive multiple types of orders. 4 questions about the Supplier:
Billing (Cash)? Deliver Late (Immediate)? Profile customer? Negotiate price (Fixed)?
Relationships Entity Relationship Diagramming Notation Attributes Identifiers Partitioning and Entity Subtypes
Data Modeling - 21 -
Relationship (Type)
Definition
A Relationship Type is an association among Entity Types. It indicates that there is a business relationship between these Entity Types. Relationship Membership is the participation of an Entity Type in a Relationship. In IE, a Relationship Type can involve only two Entity Types (binary relationship). Some other modeling techniques allow n-ary relationships.
Examples
CUSTOMER places ORDER ORDER is placed by CUSTOMER EMPLOYEE works on PROJECT PROJECT has project member EMPLOYEE
Data Modeling - 22 -
Entity Instance
Student#1 Student#2
Course#A Course#B Course#C Course#D
Course
Relationship
Student takes Course
Minder Chen, 1993~2006
Relationship Paring
Student#1 takes Course#A Student#1 takes Course#B Student#1 takes Course#D Student#2 takes Course#A Student#2 takes Course#C Student#2 takes Course#D
Data Modeling - 23 -
Definition: A collection of pairings of a Relationship Membership in which an Entity Instance is involved. Examples:
Student#1 takes Course#A, #B, and #D Student#2 takes Course#A, #C, and #D Course#A is taken by Student#1 and Student#2
Data Modeling - 24 -
Relationship Cardinality
One-to-One
1:1
E1
E2
One-to-Many
1:M
E1
E2
Many-to-Many
M:N
E1
E2
Data Modeling - 25 -
Relationship Cardinality
The number of Entity Instances involved in the Relationship Instances Grouping in a Relationship Type. Three Forms of Cardinality
1. One-to-one (1:1) DEPARTMENT has MANAGER Each DEPARTMENT has one and only one MANAGER Each MANAGER manages one and only one DEPARTMENT 2. One-to-many (1:m) CUSTOMER places ORDER Each CUSTOMER sometimes (95%) place one or more ORDERs Each ORDER always is placed by exactly one CUSTOMER 3. Many-to-many (m:n) INSTRUCTOR teaches COURSE Each INSTRUCTION teaches zero, one, or more COURSEs Each COURSE is taught by one or more INSTRUCTORs
Data Modeling - 26 -
Cardinality indicator
relationship-description
Entity-X
reversed-relation-description
Entity-Y
Manager
Data Modeling - 27 -
Optionality of Relationship Memberships Whether all entity instances of both entity types need to participate in relationship pairing. Optionality:
Mandatory Optional
Example:
CUSTOMER membership is optional ORDER membership is mandatory
places
CUSTOMER
Minder Chen, 1993~2006
is placed by
ORDER
Data Modeling - 28 -
Relationship Statements
Cardinality indicator
Graphical Notations one one or more
places
CUSTOMER
is placed by
ORDER
zero (sometimes) one (always)
Optionality indicator
Each Entity X optionality relationship cardinality Entity Y Each CUSTOMER sometimes places one or more ORDER. Each ORDER always is placed by one CUSTOMER.
Data Modeling - 29 -
Product
(b)
Parallel Relationship
Part
contained-in
Data Modeling - 31 -
Identifying Relationships
Association between entity types Entity types that are used on the same forms or documents. A description in a business document that has a verb that relates two entity types
has consists of uses
Minder Chen, 1993~2006 Data Modeling - 32 -
Attributes
Definition
Characteristics that could be used to describe Entity Types and Relationship Types. However, in IE, relationship types are not allowed to have attributes.
Naming Conventions:
Names that have business meaning Don't use abbreviation or possessive case, e.g., PN and Customer's name Don't include entity type name because IEF will prefix the attribute name with entity type name automatically Use standard format: Entity Type Name (Qualifiers) Domain Name Customer Name Employee Starting Date
Examples
Customer has customer name, address, and telephone number Product has quantity-on-hand, weight, volume, color, and name. Employee has SSN, salary, and birthday. Employee-works-for-project has percentage-of-time, starting-date.
Data Modeling - 33 -
Attributes: Notations
Student
Student ID Student Name
Employee Employee number First name Last name Employee function Employee salary
Student studentID
name phone
Attribute Value
Definition
Attribute Values are instances of Attributes used to describe specific Entity Instances
Examples
Customer Number: 011334 Customer Name: Minder Chen State: VA Order Total: $23,000 Sale tax: $250
An attribute of an entity type should have only one value at any given time. (No repeating group) Avoid using complex coding scheme for an attribute. For example: PART Number: X-XXX-XXX
Part Type
Minder Chen, 1993~2006
Material
Sequence Number
Data Modeling - 35 -
Data Modeling - 36 -
Derived
Definition: The Attribute Value can be calculated or deduced from relationship Groupings or from the values of other Attributes. The value of a Derived Attribute changes constantly. Examples: Student Age, Account Balance, Number of courses taken.
Designed
Definition: The Attribute is created to overcome the system constraints. The value of a Designed Attribute does not change. Examples: Student ID, Course number.
Minder Chen, 1993~2006 Data Modeling - 37 -
Properties of Attributes
Name Description Attribute Source Category: Basic, Derived, Designed Domain or data type: Text, Number, Date, Time, Timestamp Optionality: Mandatory or optional Length and/or precision Permitted Values (Legal Values)
Ranges A set of values (Code Table)
Tools such as PowerBuilder has additional properties for tables columns called extended attributes
Validation Rule Editing Format Reporting Format
Minder Chen, 1993~2006
"Jack Smith catches a cold and what he suspects is a flu virus. He makes an appointment with his family doctor who confirm his diagnosis. The doctor prescribes an antibiotic and nasal decongestant tablets. Jack leaves the doctor's office and drives to his local drug store. The pharmacist packages the medication and types the labels for pill bottles. The label includes information about customer, the doctor who prescribe the drug, the drug (e.g., Penicillin), when to take it, and how often, the content of the pill (250 mg), the number of refills, expiration date, and the date of purchase."
Please develop a data model for the entities and relationships within the context of pharmacy. Also develop a definition for "prescription". List all your underlying assumptions used in your data models.
Minder Chen, 1993~2006 Data Modeling - 39 -
Find/Create identifiers for each entity type Add attributes to the entity type in the data model Analyze and revise the data model
Data Modeling - 40 -
Normalization
A data base is a model or an image of the reality. Logical Data Base Design is a process of modeling and capturing the end-user views of an application domain and synthesis them into a data base structure. Normalization is a logical data base design method. The basis for normalization is the functional dependencies among attributes in a table.
Minder Chen, 1993~2006 Data Modeling - 42 -
SQL Terminology
Column
Product Table
p_no 101 Row 201 202 product_name Color TV B&W TV PC quantity 24 10 5 price 500 250 2000
CREATE TABLES (p_no CHAR(5) NOT NULL, product_name CHAR(20), quantity SMALLINT, price DECIMAL(10, 2));
Minder Chen, 1993~2006 Data Modeling - 43 -
SQL Terminology
Set Theory Relation Attribute Relational DB Table Column File File Data item Example Product_table Product_name
Tuple
Domain
Row
Pool of legal values
Record
Data type
Product_101's info.
DATE
Data Modeling - 44 -
SQL Principles
The result of a SQL query is always a table (View or Dynamic Table) Rows in a table are considered to be unordered Dominate the markets since late 1980s Can be used in interactive programming environments Provide both data definition language (DDL) and data manipulation language (DML) A non-procedural language Can be embedded in 3GL:
Embedded SQL Dynamic SQL
Minder Chen, 1993~2006 Data Modeling - 45 -
ALTER
TABLE
Data Modeling - 46 -
SQL: Introduction
A relational data base is perceived by its users as a collection of tables E. F. Codd 1969
Most CASE products support the development of relational data base centered applications
Minder Chen, 1993~2006 Data Modeling - 47 -
Data Modeling - 48 -
Database Table
The following code retrieves only the Last Name and the Employee ID where the Employee ID is greater than 5. The records are retrieved in descending order.
SELECT LastName, EmployeeID FROM Employees WHERE EmployeeID > 5 ORDER BY EmployeeID DESC
Data Modeling - 49 -
WHERE Clause
WHERE: Use the Where clause to limit the selection. The # symbol indicates literal date values. SELECT * FROM Employees WHERE LastName = "Smith" SELECT Employees.LastName FROM Employees WHERE Employees.State in ('NY','WA') SELECT OrderID FROM Orders WHERE OrderDate BETWEEN #01/01/93# AND #01/31/93#
Minder Chen, 1993~2006 Data Modeling - 50 -
Keys
A key, also called identifier, is an Attribute or a Composite Attribute that can be used to uniquely identify an instance of an entity type. Examples: Entity Type Key
Warehouse Product Student Ship Stock of Product Warehouse Number Product Number Student ID or SSN Name and Port of Registration Product Number and Warehouse No.
Data Modeling - 51 -
Types of Key
Primary Key: A unique key is an attribute or a set of attributes that has been used by the DBMS as the identifier of a table. Candidate (Alternative) Key: An attribute or a set of attributes that could have been used as the primary key of a table. Secondary (Index) Key: An attribute or a set of attributes that has been used to construct the data retrieval index. Concatenated (Combined or Composite) Key: A set of attributes that has been used as the key. Foreign Key: An attribute or a set of attributes that is used as the primary key in another table.
Minder Chen, 1993~2006 Data Modeling - 52 -
Purposes of Normalization
Avoid maintenance problems such as Update . Insert: There may be no place to insert new information. Delete: Some important information will be lost by deletion. Update: Inconsistency may occur because of the existence of data redundancy. Provide maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.
Minder Chen, 1993~2006 Data Modeling - 53 -
Don't rush to put all the information in one table. Create a table to correspond to a class of a simple object type that should exist by itself, i.e., "one fact in one place." Include common fields (links) as ways of joining information from several related tables. Avoid redundancy by using links to retrieve data from related tables.
Data Modeling - 54 -
Normalization Theory
Normalization is a process of systematically breaking a complex table into simpler ones. It is built around the concept of normal forms. A relation is in a particular normal form if it satisfies a specific set of constraints such as dependencies among attributes in the relation. For x is an integer and x > 1, if a relation is in x-NF than it is in (x-1)-NF. Higher order normal forms are usually more desirable than lower order normal forms. Normalization process usually starts from complex relations which are usually drawn from some existing documents such as business forms.
Minder Chen, 1993~2006 Data Modeling - 55 -
A Business Form
Data Modeling - 56 -
Solution
Unnormalized table
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total, 1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n)
1st NF
Remove partial FD
2nd NF
Remove transitive FD
3rd NF
Minder Chen, 1993~2006
Unnormalized Form
A relation that has multi-valued attributes (repeating groups). Normalization Process: Remove Multi-value Attributes If an unnormalized relation R has a primary key K and a multi-value attribute M, the normalization process is:
The multi-value attribute M should be removed from R. A new relation will be created with (K,M) as the primary key of the relation. There may be some other attributes associated with this new relation. R will then be at least in 1NF.
Example: An Employee relation has an attribute language-spoken. For some employees there may be more than one language that they can speak.
EMP (employeeID, empName, empAddress, (language1, language2, ...)) EMP (employeeID, empName, empAddress) EMP-LANGUAGE (employeeID, language, skillLevel)
Minder Chen, 1993~2006 Data Modeling - 59 -
Data Modeling - 60 -
Functional Dependency
Notation: R.X => R.Y Definition: Attribute Y of Relation R is functionally dependent on the Attribute X of Relation R when there is each value of R.Y associated with no more than one value of R.X. R.X and R.Y may be composite attributes. Description:
R .Y is functionally dependent on R.X R.X functionally determines R.Y
Minder Chen, 1993~2006 Data Modeling - 61 -
R.A => R.B If B is not functionally dependent on any subset of A (other than A itself), B is fully dependent on A in R. If B is functionally dependent on a subset of A (other than A itself), B is partially dependent on A in R.
Data Modeling - 62 -
Example
Supplier-Part relation has attributes supplier#, part#, qty, city, distance, where (supplier#, part#) is the key. City is partially dependent on supplier#. SUPPLIER-PART (supplier#, part#, qty, city, distance) SUPPLIER-PART (supplier#, Part#, qty) SUPPLIER (supplier#, city, distance)
Minder Chen, 1993~2006 Data Modeling - 63 -
Non-loss Decomposition
Normalization is a reduction (decomposition) process that replaces a relation by suitable projections. Each of the projection is a new relation that is in a further normalized form than the original relation. The collection of projections is equivalent to the original relation. The original relation can always be recovered by taking the natural join of these projections. Any information that can be derived from the original relation can also be derived from the further normalized relations. The converse is not true. The process is reversible because no information is loss in the reduction process.
Minder Chen, 1993~2006 Data Modeling - 64 -
Transitive Dependency
In a relation R, if R.A =>R.B and R.B => R.C then attribute C is said to be transitively dependent on attribute A.
Data Modeling - 65 -
All the non-key attributes have atomic value and dependent on the key (1NF - No multi-value attribute), the whole key, (2NF - No Partially Functional Dependency) and nothing but the key (3NF - No Transitive Functional Dependency)
Minder Chen, 1993~2006 Data Modeling - 67 -
Normalization Process
Unnormalized Form
A
1NF
H
2NF
G F H
3NF
D
B C D
Data Modeling - 68 -
Cons
Many complex queries will be slower because joins have to be performed to retrieve relevant data from several normalized tables Programmers/users have to understand the underlying data model of an database application in order to perform proper joins among several tables The formulation of multiple-level queries is a nontrivial task.
Data Modeling - 69 -
Data Modeling - 70 -
Tables in Relational DB
Identify Primary Keys and Foreign Keys in the following Tables!!!
ID
ID
ID
Data Modeling - 71 -
Join Tables
SELECT Orders.OrderID, Orders.CustID, LastName, Firstname, Orders.ItemID, Description FROM Customer, Orders, Inventory WHERE Customer.CustID = Orders.CustID AND Orders.ItemID = Inventory.ItemID ORDER BY CustID, Orders.ItemID
ID
ID
Data Modeling - 72 -
Data Modeling - 73 -
Data Modeling - 74 -
Denormalization
The process of intentionally backing away from normalization to improve performance. Denormalization should not be the first choice for improving performance and should only be used for fine tuning a database for a particular application. Requirements
Prior normalization Knowledge of data usage
Benefits
Minimize the need for joins Reduce number of tables Reduce number of foreign keys Reduce number of indices How often are two data items needed together How many rows are involved How volatile is denormalized data How important is visibility of data to users What is the minimum response time and frequency of an query
Data Modeling - 75 -
De-normalization: An Example
JOIN
R1
R2
Denormalization
R1 * R 2
Where:
R2
R1 (ProductNo, SupplierNo, Price) R2 (SupplierNo, Name, Address, Phone) R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)
R2 should be kept to prevent data loss. Data redundancy in R1*R2 and R2 could cause potential data inconsistency problems if the redundant data in these two tables are not maintained properly.
Minder Chen, 1993~2006 Data Modeling - 76 -
Data Modeling - 77 -
Isolated Entity Type Solitary Entity Type One-to-One Relationship Redundant Relationship Multi-Valued Attributes Attribute with Attributes Many-to-Many Relationship
Data Modeling - 78 -
Data Modeling - 79 -
Data Modeling - 80 -
Correct
Purchase Order
Data Modeling - 81 -
Redundant Relationship
Is this relationship redundant? has ordered
customer
places
product
is ordered by
ORDERS
is placed by contains has
order
is part of
order item
Redundant Relationship
Redundant Product is held as Stock contains Non-redundant is held in stocks
Warehouse
holds
Product
is contained in contains
Order Line
is contained in contains
Order
contains
Order History
Data Modeling - 83 -
Multi-Valued Attribute
Definition
An Attribute that may have more than one value at a time is called a multi-valued attribute.
Solution:
Example:
Employee(ID, Name, Phone) Employee(111, John Smith, 210-999-8888) Employee_language(ID, Language) Employee_language(111, English) Employee_language(111, Chinese)
Data Modeling - 84 -
Solution:
Create an Entity Type to avoid an Attribute with Attributes. Add new attributes to the existing Entity Type.
Minder Chen, 1993~2006 Data Modeling - 85 -
Data Modeling - 86 -
Order
belongs-to
Product
Why?
There is no place to attach Attributes that are required to describe a many-to-many Relationship. It is difficult to translate many-to-many Relationships into relational tables automatically.
How?
A many-to-many relationship can be decomposed into two one-to-many Relationships by creating an Associative Entity Type between the existing two Entity Types.
contains has
Order
belongs to
Order Line
is contained in
Product
Data Modeling - 87 -
(b)
Student
takes
is-taken-by
Course
(c)
consists-of
Part
is-contained-in
Data Modeling - 88 -
Bills of Material
Part consists-of A
is-a-component-in
2
E D
1
F
Product Structure
B C D E D F
2 1 1 3 2 2
Data Modeling - 89 -
uses Project
supplies Supplier
Minder Chen, 1993~2006 Data Modeling - 90 -
Order
belongs to
Order Line
is contained in
Product
CREATE TABLE ORDER (OrderNo CHAR(10) OrderDate DATE, CustomerID CHAR(10), SalePersonID CHAR(10));
Minder Chen, 1993~2006
NOT NULL,
Data Modeling - 91 -
The entire, or part of, a data (entity-relationship) model can be translated into a normalized database design. Objects Created
At most one relational database One or more relations (tables) Data structures (DDL) representing the elements (attributes) and the primary key of each relation Data type of each data elements
Data Modeling - 92 -
Heuristics of Transformation
A table is created for each Entity Type in the ER diagram. A table is created for each multi-valued attribute. Relationship Types are implemented as tables or as foreign keys in other tables. Many-to-many relationship types are translated into tables. Foreign keys are used for implementing one-to-one and one-to-many Relationship Types. For one-to-many Relationship Types, the foreign key is placed in the table that represents the Entity Type on the "many" end of the Relationship Type. For identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is also part of the PK of the "many" table. For non-identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is a non-key attribute of the "many" table.
Minder Chen, 1993~2006 Data Modeling - 93 -
http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html
dept_id = parent_id dept dept_id int parent_id int name varchar(255) description text date_changed datetime
pfid = pfid product_variant sku int pfid varchar(30) attribute0 tinyint attribute1 tinyint attribute2 tinyint attribute3 tinyint attribute4 tinyint
product_family pfid varchar(30) dept_id int manufacturer_id int name varchar(255) short_description varchar(255) long_description text image_filename varchar(255) intro_date datetime date_changed datetime list_price int monogramable tinyint
pfid = pfid
shopper sku = sku shopper_id char(32) order_id = order_id created datetime name varchar(235) password varchar(20) street varchar(50) city varchar(50)shopper_id = shopper_id state varchar(30) receipt zip varchar(15) order_id char(26) country varchar(20) shopper_id char(32) phone varchar(16) total int email varchar(50) status tinyint date_entered datetime date_changed datetime marshalled_receipt image shopper_id = shopper_id
receipt_item pfid varchar(30) sku int order_id char(26) row_id int quantity int adjusted_price int promo_price promo_name varchar(255) promo_type int promo_description text promo_rank int active int date_start datetime date_end datetime shopper_all int shopper_column varchar(64) shopper_op varchar(2) shopper_value varchar(64) cond_all int cond_column varchar(64) cond_op varchar(2) cond_value varchar(64) cond_basis char(1) cond_min int award_all int award_column varchar(64) award_op varchar(2) award_value varchar(64) award_max int disjoint_cond_award int disc_type char(1) disc_value realData Modeling
- 95 -
Attribute 0 of pfid 14 is size and the attribute value 1 is Grande and 2 is Tall and 3 is Short
Data Modeling - 96 -
Data Modeling - 97 -
Data Modeling - 98 -