Professional Documents
Culture Documents
Data - Data is meaningful known raw facts that can be processed and stored as information. Database - Database is a collection of interrelated and organized data. DBMS - Database Management System (DBMS) is a collection of interrelated data [usually called database] and a set of programs to access, update and manage those data [which form part of management system]. OR It is a software package to facilitate creation and maintenance of computerized database. It is general purpose software that facilitates the following: 1. Defining: Specifying data types and structures, and constraints for data to be stored. 2. Constructing: Storing data in a storage medium. 3. Manipulating: Involves querying, updating and generating reports. 4. Sharing: Allowing multiple users and programs to access data simultaneously. Eg. Of DBMS - Access, dBase, FileMaker Pro, and FoxBASE, ORACLE, Ingress, Informix, Sybase, etc. Primary goals of DBMS are: 1. To provide a way to store and retrieve database information that is both convenient and efficient. 2. To manage large and small bodies of information. It involves defining structures for storage of information and providing mechanism for manipulation of information. 3. It should ensure safety of information stored, despite system crashes or attempts at unauthorized access. 4. If data are to be shared among several users, then system should avoid possible anomalous results. Eg. Of DBMS applications 1. Banking For customer information, accounts, and loans, and banking transactions. [all transactions] 2. Airlines For reservation and schedule information. [reservations, schedules] 3. Universities For student information, course registrations, and grades. [registration, grades] 4. Credit Card Transactions For purchases on credit card and generation of monthly statements. 5. Telecommunication For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about communication networks. 6. Finance For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds. 7. Sales For customer, product, and purchase information. [customers, products, purchases] 8. Manufacturing For management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores, and orders for items. [production, inventory, orders, supply chain] 9. Human Resources For information about employees, salaries, payroll taxes and benefits, and generation of paychecks. [employee records, salaries, tax deductions] File systems/File processing systems A file system is basically storing information in data structures called files in the operating system and manipulating this information via application programs that manipulate the files.
DBMS
BIT
DBMS
BIT
DBMS
BIT
Disadvantages of a DBMS The following are disadvantages of DBMS 1. Setup of the database system requires more knowledge, money, skills, and time. 2. The complexity of the database may result in poor performance. Functions of a DBMS The functions performed by a typical DBMS are the following: Data Definition - The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field. Data Manipulation - Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are ad-hoc queries which are performed on a need basis.
DBMS
BIT
Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed. Data abstraction It can be summed up as follows. 1. When the DBMS hides certain details of how data is stored and maintained, it provides what is called as the abstract view of data. 2. This is to simplify user-interaction with the system. 3. Complexity (of data and data structure) is hidden from users through several levels of abstraction. Data abstraction is used for following purposes: 1. To provide abstract view of data. 2. To hide complexity from user. 3. To simplify user interaction with DBMS. Levels of data abstraction There are three levels of data abstraction. 1. Physical level: It describes how a record (e.g., customer) is stored.
Features: a) Lowest level of abstraction. b) It describes how data are actually stored. c) It describes low-level complex data structures in detail. d) At this level, efficient algorithms to access data are defined. 2. Logical level: It describes what data stored in database, and the relationships among the data.
Features: a) It is next-higher level of abstraction. Here whole Database is divided into small simple structures. b) Users at this level need not be aware of the physical-level complexity used to implement the simple structures. c) Here the aim is ease of use. d) Generally, database administrators (DBAs) work at logical level of abstraction.
DBMS
BIT
Levels of Data Abstraction Instances and Schemas Instances and Schemas are similar to types and variables in programming languages. 1. Schema: The overall design of a database is called database schema. E.g., the database consists of information about a set of customers and accounts and the relationship between them. It is analogous to variable along with its type information in a program. Types of Schemas (partitioned according to levels of abstraction): a. Physical schema: It is database design at the physical level. It is hidden below logical schema, and can be changed easily without affecting application programs. b. Logical schema: It is database design at the logical level. Programmers construct applications using logical schema. It is by far the most important schema, in terms of its effect on application programs. c. Subschema: It is schema at view level. 2. Instance: It is the actual content of the database at a particular point in time. It is analogous to the value of a variable. The ANSI/SPARC Architecture A DBMS can be considered as a buffer between application programs, end users and a database designed to fulfill features of data independence. In 1975 the American National Standards Institute Standards Planning and Requirements Committee (ANSI-SPARC) proposed three-level architecture identified three levels of abstraction. These levels are sometimes referred to as schemas or views. 1. The External or User Level: This level describes the users or application programs view of the database. Several programs or users may share the same view.
DBMS
BIT
Application Programs
Application Programs
Application Programs
Application Programs
Query Processor
Database
Foreg., Consider a Bank System, It uses Customer_Details Table. Customer_Transactions Table. At the internal level, a Customer_Details or Customer_Transaction record can be described as a block of consecutive storage locations (for example, words of bytes). The language compiler hides this level of detail from programmers. Similarly, the database system hides the lowest-level storage details (how data is stored and accessed) from database programmers. At the conceptual level, the table definition (the attributes data type and width definition) and the interrelationship among the data is described. Finally at the external level, several views of the database are defined, and database end users are also to see those views. In addition to hiding details of the conceptual level of the database, the views also provide a security mechanism to prevent users from accessing parts of the database. For e.g., tellers in the bank will be able to see only that part of the database that has information on customer accounts; they cannot access information concerning salaries of bank employees.
DBMS
BIT
External Level
Data Independence Data independence is the ability to modify a schema definition in one level without affecting a schema definition in a higher level is called data independence. There are two types of data independence: 1. Physical data independence a. It is the ability to modify the physical scheme without causing application programs to be rewritten. b. Modifications at this level are usually to improve performance. 2. Logical data independence a. It is the ability to modify the conceptual scheme without causing application programs to be rewritten b. It is usually done when logical structure of database is altered.
Logical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages. Database Users Users are differentiated by the way they expect to interact with the system. They fall into the following categories: 1. Application programmers: They are computer professionals interacting with the system through DML calls embedded in a program written in a host language (e.g. C, PL/1, Pascal). a. These programs are called application programs. b. The DML precompiler converts DML calls (prefaced by a special character like $, #, etc.) to normal procedure calls in a host language. c. The host language compiler then generates the object code. d. Some special types of programming languages combine Pascal-like control structures with control structures for the manipulation of a database. e. These are sometimes called fourth-generation languages. f. They often include features to help generate forms and display data. 2. Sophisticated users: They interact with the system without writing programs. a. They form requests by writing queries in a database query language. b. These are submitted to a query processor that breaks a DML statement down into instructions for the database manager module. Specialized users: They are sophisticated users writing special database application programs. These may be CAD systems, knowledge-based and expert systems, complex data systems (audio/video), etc.
3.
DBMS
BIT
Database Administrator The database administrator is a person having central control over data and programs accessing that data. He coordinates all the activities of the database system; the database administrator has a good understanding of the enterprises information resources and needs. Functions of a DBA Database administrator's duties include: 1. Schema definition: the creation of the original database schema. This involves writing a set of definitions in a DDL (data storage and definition language), compiled by the DDL compiler into a set of tables stored in the data dictionary. 2. Storage structure and access method definition: writing a set of definitions translated by the data storage and definition language compiler. 3. Schema and physical organization modification: writing a set of definitions used by the DDL compiler to generate modifications to appropriate internal system tables (e.g. data dictionary). This is done rarely, but sometimes the database schema or physical organization must be modified. 4. Granting user authority to access the database: granting different types of authorization for data access to various users 5. Specifying integrity constraints: generating integrity constraints. These are consulted by the database manager module whenever updates occur. 6. Routine Maintenance: It includes the following: a. Acting as liaison with users. b. Monitoring performance and responding to changes in requirements. c. Periodically backing up the database. Database languages We have Data Definition Languages (DDL) to specify database schemas and Data Manipulation Language (DML) to express database updates and queries. In practice, these are not to separate languages but are part of a single database language, like SQL. 1. Data Definition Languages (DDL) It is the language that is used to specify database schemas by a set of definitions contained in it. 2. Data Manipulation Language (DML) It is a language for accessing and manipulating the data organized by the appropriate data model. DML is also known as query language. There are two types of DML a) Procedural DMLs b) Declarative DMLs (non-procedural DMLs) 1. 2. Procedural DMLs - This language requires user to specify what data is required and how to get those data. Declarative DMLs (non-procedural DMLs) - This language requires user to specify what data is required without specifying how to get those data.
DBMS
BIT
Features of DML 1. 2. 3. A DML is a language which enables users to access and manipulate data. The goal is to provide efficient human interaction with the system. There are two types of DMLs a. Procedural: Here user specifies what data is needed and how to get it. b. Non-procedural: Here user only specifies what data is needed. - Easier for user - May not generate code as efficient as that produced by procedural languages
DBMS System Structure and its Components We can explain the overall structure of DBMS/System structure and its components by the diagram (on next page). 1. Database systems are partitioned into modules for different functions. Some functions (e.g. file systems) may be provided by the operating system. 2. Broadly the functional components of a database system are: a. Query Processor: It is one of the functional components of DBMS. It translates statements in a query language into low-level instructions the database manager understands. It may also attempt to find an equivalent but more efficient form. It contains following components: a. DML compiler - It converts DDL statements to a set of tables containing metadata stored in a data dictionary. It also performs query optimization. b. DDL interpreter It interprets DDL statements and records definitions into data dictionary. c. Query evaluation engine It executes low-level instructions generated by DML compiler. They mainly deal with solving all problems related to queries and query processing. It helps database system simplify and facilitate access to data. b. Storage Manger (Database Manager) 1. Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. 2. The storage manager is responsible to the following tasks: 1. interaction with the file manager 2. efficient storing, retrieving and updating of data 3. The important components include: a. File manager: It manages allocation of disk space and data structures used to represent information on disk. b. Database manager: It is the interface between low-level data and application programs and queries. c. Transaction manager: Transaction manager ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. d. DML precompiler: It converts DML statements embedded in an application program to normal procedure calls in a host language. The precompiler interacts with the query processor.
DBMS
BIT
In addition, several data structures are required for physical system implementation: a. Data files: They store the database itself. b. Data dictionary: It stores information about the structure of the database. It is used heavily. Great emphasis should be placed on developing a good design and efficient implementation of the dictionary. In short, it stores metadata. c. Indices: They provide fast access to data items holding particular values.
DBMS
BIT
DBMS
BIT
Data Model A data model is collection of tools for describing 1. data 2. data relationships 3. data semantics 4. data constraints Types of Data Models There are basically two types of data models 1. Record based Data Models. 2. Object based Data Models. 1. Record based Data Models In Record-based models, the database is organized in fixed-format records of several types. A fixed number of fields, or attributes, are defined in each record type, and each field is usually of a fixed length.
The three most popular record-based data models are 1. Relational Data Model 2. Network Data Model 3. Hierarchical Data Model 1. Relational Data Model 1. The relational model uses tables to represent the data and the relationships among those data. 2. Each table has multiple columns, and each column is identified by a unique name. 3. It is a low level model.
In this database, each row in the table represents a different customer. Relationships link rows from two tables on the basis of the key field, in this case number.
Advantages of Relational Data Model a. b. c. Structural Independence Relational database model has structural independence, i.e. changes made in the database structure do not affect the DBMSs capability to access data. Simplicity The relational model is the simplest model at the conceptual level. It allows the designer to concentrate on the logical view of the database, leaving the physical data storage details. Ease of designing, implementation, maintenance, and usage Due to the inherent features of data independence and structural independence, and the relational model makes it easy to design, implement, maintain and use the databases. Adhoc query capability One of the main reasons for the huge popularity of the relational database model is the presence of powerful, flexible and easy-to-use query capability. The query language of the relational database model Structure Query Language or SQL is a fourth generation language
d.
DBMS
BIT
2. Network Data Model 1. In the network model, data are represented by collections of records. 2. Relationships among data are represented by links. 3. In this model Graph data structure is used. 4. A network model permits a record to have more than one parent
Advantages of Network Data Model a. b. c. Simplicity The network data model is also conceptually simple and easy to design. Ability to handle more relationship types The network model can handle the one-to-many and many-to-many relationships. Ease of data access In the network database terminology, a relationship is a set. Each set comprises of two types of records an owner record and a member record. In a network model an application can access an owner record and all the member records within a set. Data Integrity In a network model, no member can exist without an owner. A user must therefore first define the owner record and then the member record. This ensures the data integrity. Data Independence The network model draws a clear line of demarcation between the programs and the complex physical storage details. The application programs work independently of the data. Any changes made in the data characteristics do not affect the application program. Database standards The standards devised by the DBTG (Database Task Group of CODASYL Committee) form the basis of the network model. These standards were further enhanced by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) in the 1970s. All the network database management systems adhere to these standards. These standards comprise of a DDL and a DML that augments the database administration and portability.
d. e.
f.
DBMS
BIT
b.
4. Hierarchical Model 1. In the hierarchical model, data are represented by collections of records. 2. Relationships among data are represented by links. 3. In this model Tree data structure is used. 4. There are two concepts associated with the hierarchical model segment types and parent-child relationships. Segment type is similar to the record types in the network models. The information retrieved only by navigating from the root segment type to the nodes segment types. Thus you can access a segment
type only via its parent segment type in the parent-child relationship. The operators provided for manipulating such structures include operators for traversing hierarchic paths up and down the trees. Advantages of hierarchical Model 1. 2. 3. Simplicity Since the database is based on the hierarchical structure, the relationship between the various layers is logically simple. Thus, the design of a hierarchical database is simple. Data Security Hierarchical model was the first database that offered the data security that is provided and enforced by the DBMS. Data Integrity Since the hierarchical model is based on the parent/child relationship, there is always a link between the parent segment and the child segment under it. The child segments are always automatically referenced to its parent, this model promotes data integrity. Efficiency The hierarchical database model is a very efficient one when the database contains a large number of one-to-many relationships and when the users require large number of transactions, using data whose relationships are fixed.
4.
Disadvantages of hierarchical Model 1. Implementation Complexity Although the hierarchical database model is conceptually simple and easy to design, it is quite complex to implement. The database designers should have very good knowledge of the physical data storage characteristics. Database management problems If you make any changes in the database structure of a hierarchical database, then it is required to make the necessary changes in all the application programs
2.
DBMS
BIT
3.
4.
5.
Hierarchical Data Model Relationship between records is of parent child type. Many-to-many relationship cannot be expressed in this model. It is a simple, straightforward and natural method of implementing record relationships. This type of model is useful only when there is some hierarchical character in the database. In order to represent links among records, pointers are used. Thus relationships among records are physical. Searching for a record is very difficult since one can retrieve a child only after going though its parent record. During updation or deletion process, chance of data inconsistency is involved.
Network Data Model Relationship between records is expressed in the form of pointers or links. Many-to-many relationship can also be implemented. Record relationship implementation is quite complex due to the use of pointers. Network model is useful for representing such records which have many-to-many relationships. In Network model also the relationship among records are physical.
Relational Data Model Relationship between record is represented by a relation that contains a key for each record involved in the relations. Many-to-many relationship can be easily implemented. Relationship implementation is very easy though the use of a key or composite key field(s). Relational model is useful for representing most of the real world objects and relationships among them. Relational model does not maintain physical connection among records. Data is organized logically in the form of rows and columns and stored in table. A unique, indexed key field is used to search for a data element.
Searching a record is easy since there are multiple access paths to a data element.
No problem of inconsistency exists in network model because a data element is physically located at just one place.
Data integrity maintaining methods like Normalization process, etc. are adopted for consistency.
DBMS
BIT
The two most popular object-based data models are a. Object oriented model b. E R Model 1. Object Oriented Model 1. The object-oriented model is based on a collection of objects, like the E-R model. 2. An object contains values stored in instance variables within the object. 3. Unlike the record-oriented models, these values are themselves objects. 4. Thus objects contain objects to an arbitrarily deep level of nesting. 5. An object also contains bodies of code that operate on the object. These bodies of code are called methods. 6. Objects that contain the same types of values and the same methods are grouped into classes. 7. A class may be viewed as a type definition for objects. 8. Analogy: the programming language concept of an abstract data type. 9. The only way in which one object can access the data of another object is by invoking the method of that other object. This is called sending a message to the object. 10. Internal parts of the object, the instance variables and method code, are not visible externally. 11. Result is two levels of data abstraction. For example, consider an object representing a bank account. a. The object contains instance variables number and balance. b. The object contains a method pay-interest which adds interest to the balance. c. Under most data models, changing the interest rate entails changing code in application programs. d. In the object-oriented model, this only entails a change within the pay-interest method. 12. Unlike entities in the E-R model, each object has its own unique identity, independent of the values it contains: a. Two objects containing the same values are distinct. b. Distinction is maintained in physical level by assigning distinct object identifiers. Advantages of Object Oriented Data Model a. Capability to handle large number of different data types Traditional database models like hierarchical, network and relational database are limited in their capability to store the different types of data. For e.g., one cannot store pictures, voices and video in these databases. But the object-oriented database can store any type of data including text, numbers, pictures, voice and video. Combination of object-oriented programming and database technology Perhaps the most significant characteristic of object-oriented database technology is that it combines object-oriented programming with database technology to provide an integrated application development system. Object-oriented features improve productivity Inheritance allows one to develop solutions to complex problems incrementally by defining new objects in terms of previously defined objects. Polymorphism and dynamic binding allow one to define operations for one object and then to share the specification of the operation with other objects. These objects can further extend this operation to provide behaviors that are unique to those objects. Dynamic binding determines at runtime, which of these operations is actually executed, depending on the class of the object requested to perform the operation. Polymorphism and dynamic binding are powerful object-oriented features that allow one to compose objects to provide solutions without having to write code that is specific to each object. All of these capabilities come together to provide significant productivity advantages to database application developers. Data access Object-oriented database represent relationships explicitly, supporting both navigational and associative access to information. As the complexity of interrelationships between information within the database increases, the greater the advantages of representing relationships explicitly.
b.
c.
d.
DBMS
BIT
b.
2. E R Model (Entity Relational Model) 1. The entity-relationship model is based on a perception of the world as consisting of a collection of basic objects (entities) and relationships among these objects. 2. It is an object-based logical model. 3. It is a high-level data model. 4. An entity is a distinguishable object that exists. 5. Each entity has associated with it a set of attributes describing it. E.g. number and balance for an account entity. 6. A relationship is an association among several entities. E.g. A cust_acct relationship associates a customer with each account he or she has. 7. The set of all entities or relationships of the same type is called the entity set or relationship set. 8. Another essential element is the E-R diagram in which the mapping cardinalities express the number of entities to which another entity can be associated via a relationship set. 9. The overall logical structure of a database can be expressed graphically by an E-R diagram: a. Rectangles: represent entity sets. b. Ellipses: represent attributes. c. Diamonds: represent relationships among entity sets. d. Lines: link attributes to entity sets and entity sets to relationships.
An example of ER model
DBMS
BIT