You are on page 1of 70

Traditional Filing System

Definition
Traditional file based system is one in which we manually or through computer handle the database such as updating, insertion, deletion adding new files to database etc. File based systems were developed as better alternatives to paper based filing systems. By having files stored on computers, the data could be accessed more efficiently. It was common practice for larger companies to have each of its departments looking after its own data.

File Organization Terms and Concepts


A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases. A bit represents the smallest unit of data a computer can handle. A group of bits, called a byte, represents a single character, which can be a letter, a number, or another symbol. A grouping of characters into a word, a group of words, or a complete number (such as a persons name or age) is called a field.

Cont
A group of related fields, such as the students name, the course taken, the date, and the grade, comprises a record. A group of records of the same type is called a file. A group of related files makes up a database

Record
A record describes an entity. An entity is a person, place, thing, or event on which we maintain information. An order is a typical entity in a sales order file, which maintains information on a firms sales orders. Each characteristic or quality describing a particular entity is called an attribute. For example, order number, order date, order amount, item number, and item quantity would each be an attribute of the entity order. Every record in a file should contain at least one field that uniquely identifies instances of that record so that the record can be retrieved, updated, or sorted. This identifier field is called a key field

Entities and Attributes

Data Hierarchy
Data Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on. Database- Student database (course file, financial file) File(Course File-Name of all students, courses, date, grade) Record- (Name, course, date , grade) FieldName or age of a person Byte10101010 Bit0

The Data Hierarchy

Problems with the Traditional File Environment


The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to:
Data redundancy and confusion Program-data dependence Lack of flexibility Poor security Lack of data-sharing and availability

Data Redundancy and Confusion


Data redundancy is the presence of duplicate data in multiple data files. Data redundancy occurs when different divisions, functional areas, and groups in an organization independently collect the same piece of information. Because it is collected and maintained in so many different places, the same data item, such as employee, fiscal year, or product identification code may have different meanings in different parts of the organization. Different systems might use different names for the same item.

The resulting confusion would make it difficult for companies to create customer relationship management, supply chain management, or enterprise systems that integrate data from different sources.

Program-data Dependence & Poor Security


Program-data dependence
Program-data dependence is the tight relationship between data stored in files and the specific programs required to update and maintain those files. In a traditional file environment, any change in data requires a change in all programs that access the data.

Such programming changes may cost millions of dollars to implement in programs that require the revised data.

Poor security
Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of knowing who is accessing or even making changes to the organizations data.

Lack of Flexibility & Lack of Data-sharing and Availability


Lack of flexibility
A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver specialpurpose reports or respond to unanticipated information requirements in a timely fashion. Several programmers would have to work for weeks to put together the required data items in a new file.

Lack of data-sharing and availability


Because pieces of information in different files and different parts of the organization cannot be related to one another, it is virtually impossible for information to be shared or accessed in a timely manner. Information cannot flow freely across different functional areas or different parts of the organization.

Database approach to data mgt.


Database technology overcome many of the problems a traditional file organization creates. Database is a collection of data organised to serve many applications effectively and efficiently by centralizing the data. Rather than storing data in separate files for each file, data is stored in one location. A single database serves multiple applications.

Examples of Databases
The following are examples of database applications: Computerized library systems Automated teller machines Flight reservation systems Computerized parts inventory systems

Database Management System


DBMS is a software designed to manage and maintain the database of an organization. A database management system (DBMS) is a software package with computer programs that control the creation, maintenance, and the use of a database.

DBMS Features
It allows organizations to conveniently develop databases for various applications by database administrators (DBAs) and other specialists. A DBMS allows different user application programs to concurrently access the same database. It typically supports query languages, which are in fact high-level programming languages, dedicated database languages that considerably simplify writing database application programs. Database languages also simplify the database organization as well as retrieving and presenting information from it.

Cont
A DBMS provides facilities for controlling data access, enforcing data integrity, managing concurrency control, recovering the database after failures and restoring it from backup files, as well as maintaining database security. It act as an interface between the application programs and the data. It is a collection of interrelated files and a set of programs through which the users can access and modify these files.

For instance: When the application program calls for a data item, such as gross pay, the DBMS finds this item in the database and presents it to the application program.

Cont
DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from an application program and instructs the operating system to transfer the appropriate dat. When a DBMS is used, information systems can be changed more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.

Cont
Database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID (redundant array of independent disks; it is a storage technology that combines multiple disk drive components into a logical unit) disk arrays used for stable storage. DBMSs are found at the heart of most database applications, modern DBMSs typically rely on a standard operating system to provide these functions.

DBMS Components
A database management system has following components: DBMS engine: It accepts logical requests from various other DBMS subsystems, converts them into physical equivalents, and actually accesses the database and data dictionary as they exist on a storage device. Data definition language: It is the formal language programmers use to specify the content and structure of the database. It defines each data element as it appears in the database before that data element is translated into the forms required by application programs.

Data Manipulation Language


DBMS have a specialized language that is used in conjunction with some conventional application programming languages to manipulate the data in the database. It contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications.

Contd....
The most prominent data manipulation language SQLStructured Query Language. SQL is an interactive query language to access data from databases. Sophisticated languages for managing database systems are called fourth-generation languages, or 4GLs for short. The information from a database can be presented in a variety of formats. Most DBMSs include a report writer program that enables you to output data in the form of a report. Many DBMSs also include a graphics component that enables you to output information in the form of graphs and charts.

Data Dictionary
Stores definition of data elements and data characteristics, such as usage, physical representation, ownership (who in the organization is responsible for maintaining the data), authorization, and security. Through these components DBMS manipulates the data and provides an environment which is appropriate to use in retrieving and storing the database information.

Example
single HR database serves multiple applications and helps the organizations to draw together all information for various applications. Details of employees (Name, Address, etc) Payroll (Net pay, Hours worked, gross pay) Benefits (LIC, Pension Plan) A

Cont
Data administration subsystem: It helps users manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.

Views of data
Three different views of data: External, User of data: The users view of a database program represents data in a format that is meaningful to a user and to the software programs that process those data, there can be an endless number of different external views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Physical view or Internal View: The physical view refers to the way the data are physically stored and processed. logical view or conceptual view: This refers to how or Database administrator views the database.

Database Models
A database model or database schema is the structure or format of a database, described in a formal language supported by the database. In other words, a it is the application of a data model when used in conjunction with a DBMS. It is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system.

Database Model
There are three types of database models common to the industry: Hierarchical database model Network database model Relational database model

Hierarchical database model


Hierarchical database model is one of the earliest model and is one of the logical database model that organizes the data in a tree like structure. A record is subdivided into segments that are connected to each other in one-to-may parent-child relationships. The most common HDBMS is IBMs Information management systems

Contd.....
Children- segments (pieces of records) Top level- root segment Below-child segment Starts from top and move downwards

Cont
Employee description (Root)

Benefits (first root child)

Compensation

(first root child)

Pension (second root child)

LIC (second root child)

Salary History and performance ratings

(second root child)

Cont
The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent (also known as a 1-to-many relationship). All attributes of a specific record are listed under an entity type. In a database an entity type is the equivalent of a table. Each individual record is represented as a row, and each attribute as a column.

Cont.
A user accesses data within this model by starting at the root table and working down through the tree to the target data. This access method requires the user to be very familiar with the structure of the database. User can retrieve data very quickly because there are explicit links between the table structures. Problem occurs when a user needs to store a record in a child table that is currently unrelated to any record in a parent table- cannot support complex relationships- often a problem with redundant data.

Network database model


It interconnects the entities organisation into a network. This method is popularly known Bachmans diagram, was suggested Bachman It shows the owner-member relation Many- to-many relationship Starts from any where: search direction of an

as the by C W

in any

Contd.....
One student may be enrolled in many courses Vice versa a course have many students

Cont.

Cont.
The structure of a network database is represented in terms of nodes and set structures. A node represents a collection of records, and a set structure establishes and represents a relationship in a network database. It is a transparent construction that relates a pair of nodes together by using one node as an owner and the other node as a member. One or more sets (connections) can be defined between a specific pair of nodes, and a single node can also be involved in other sets with other nodes in the database.

Cont
User can access data from within the network database, starting from any node and working backward or forward through related sets It supports fast and complex data access, than those provided by the hierarchical database. A user has to be very familiar with the structure of the database in order to work through the set structures. It is not easy to change the database structure without affecting the application programs that interact with it.

Relational database Model


It is the most recent database model and it overcomes the limitations of the other two models In RDBM, the concept of two dimensional table is used to show the relation RDBM model uses theories of relational algebra in representing the data in various tables . Tables may be referred to as files.

Contd...
Combines relational tables to provide the user with more information than is available in individual tables It is flexible and can answer adhoc query In a relational database, three basic operations, are used to develop useful sets of data: select, project, and join.

Contd...
The select operation creates a subset consisting of all records in the file that meet stated criteria. The join operation combines relational tables to provide the user with more information than is available in individual tables. The project operation creates a subset consisting of columns in a table, permitting the user to create new tables (also called views) that contain only the information required.

The select, project, and join operations enable data from two different tables to be combined and only selected attributes to be displayed.

Contd...
Leading mainframe relational database management systems include IBMs DB2; Microsofts SQL and Oracle from the Oracle Corporation. DB2, Oracle, and Microsoft SQL Server are used as DBMS for midrange computers.

Each table is a relation, each row is a tuple representing a record, and each column is an attribute representing a field. These relations can easily be combined and extracted to access data and produce reports provided that any two share a common element

Database
Hierarchical Network Relational

Processing Efficiency
High Medium-High Lower but improving

Flexibility
Low Low-Medium High

End-user Progra friendliness comple


Low Lowmoderate High High High Low

Differences between three models


Particulars
Degree of data independence Simplicity from users point of view Requests for information

NDBM
Low

HDBM
Low

RDBM
High

Not so simple

Have to know Very simple tree structure of database No possible and Have to dependency procedural in between line with the tree relations. So can structure be nonprocedural.

Complex procedural

Advantages of DBMS
Organizations information system complexity is reduced by central management of data, access, utilization and security. Data redundancy and inconsistency can be reduced by eliminating all of the isolated files in which the same data elements are repeated Data confusion can be eliminated by providing central control of data creation and definition.

Contd...
Program-data dependence can be reduced by separating the logical view of the data from its physical arrangement Program development and maintenance costs can be reduced substantially Access and availability of information can be increased

Data Warehouse
In computing, a data warehouse (DW) is a storage facility used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. A data warehouse maintains its functions in three layers: 1. Staging is used to store raw data for use by developers. 2. The integration layer is used to integrate data and to have a level of abstraction from users. 3. The access layer is for getting data out for users.

Figure illustrates the components of a data warehouse

Data Warehouses
Decision makers need concise, reliable information about current operations, trends, and changes. Data often is fragmented in separate operational systems, such as sales or payroll, so that different managers make decisions from incomplete knowledge bases. Users and information systems specialists may have to spend inordinate amounts of time locating and gathering data. Data warehousing addresses this problem by integrating key operational data from around the company from various sources and presents in a form in a form that is consistent, reliable, and easily available.

Contd...
A data warehouse is a facility that stores current and historical data of potential interest to managers throughout the company. The data originate in many core operational systems and external sources, including Web site transactions, each with different data models. They may include legacy systems, relational or object-oriented DBMS applications, and systems based on HTML or XML documents.

Cont
The data from these diverse applications are copied into the data warehouse database as often as needed hourly, daily, weekly, monthly. The data are standardized into a common data model and consolidated so that they can be used across the enterprise for management analysis and decision making. The data are available for anyone to access as needed but cannot be altered.

Cont..
The data warehouse must be carefully designed by both business and technical specialists to make sure it can provide the right information for critical business decisions. The firm may need to change its business processes to benefit from the information in the warehouse

Data Warehousing architecture


Monitoring & Administratio n Metadata Repository OLAP Servers

Reconciled data
External Sources

Analysis

Extract Transform Load Refresh

Serve
Query/Reportin g

Operational Dbs

Data Mining

DATA SOURCES
DATA MARTS

TOOLS

Data Warehousing Tools


Data Warehouse SQL Server 2000 DTS Oracle 8i Warehouse Builder Online Analysis Processing (OLAP) tools SQL Server Analysis Services Oracle Express Server Reporting tools MS Excel Pivot Chart VB Applications

Benefits of Data Warehouses


Data warehouses provide improved information and make it easier for decision makers to obtain this information. Database technology plays an important role in making organizations information resources available on the World Wide Web. It include the ability to model and remodel the data.

Data Marts
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team. A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized.

Datamarts
Companies can build enterprise-wide data warehouses where a central data warehouse serves the entire organization, or they can create smaller, decentralized warehouses called data marts. A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organizations data is placed in a separate database for a specific population of users.

Example
A company might develop marketing and sales data marts to deal with customer information. A data mart typically focuses on a single subject area or line of business, so it usually can be constructed more rapidly and at lower cost than an enterprisewide data warehouse. However, complexity, costs, and management problems will rise if an organization creates too many data marts

Reasons for building Data Marts


Easy access to frequently needed data. Creates collective view by a group of users. Improves end-user response time. Ease of creation. Lower cost than implementing a full data warehouse. Potential users are more clearly defined than in a full data warehouse.

DATA MINING
A data warehouse system provides a range of adhoc
and standardized query tools, analytical tools, and graphical reporting facilities, including tools for data mining. Inferring new information from already collected data. Traditionally job of Data Analysts Computers have changed this. Far more efficient to combine through data using a machine than eyeballing statistical data.

Contd...
Wikipedia definition: Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.
Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts.

Contd.....
Knowledge Prediction Uses known data to forecast future trends, events, etc. (ie: Stock market predictions) Data mining uses a variety of techniques to find hidden patterns and relationships in large pools of data and infer rules from them that can be used to predict future behavior and guide decision making.

Contd.....
Data mining provide information for targeted marketing in which personalized or individualized messages can be created based on individual preferences. There are many data-mining applications in both business and scientific work. These systems can perform high level analyses of patterns or trends. Datamining applications can perform high-level analyses of patterns or trends, but they can also drill into more detail where needed.

Cont
Datamining is both a powerful and profitable tool, but it poses challenges to the protection of individual privacy. Datamining technology can combine information from many diverse sources to create a detailed data image about each of us our income, our driving habits, our hobbies, our families, and our political interests.

Datamining Functions
Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Data mining tasks


It involves six common classes of tasks: Anomaly detection (Outlier/change/deviation detection) The identification of unusual data records, that might be interesting or data errors and require further investigation. Association rule learning (Dependency modeling) Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Cont
Clustering is the task of discovering groups and structures in the data that are in some way or another "similar. Classification is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Regression Attempts to find a function which models the data with the least error. Summarization providing a more compact representation of the data set, including visualization and report generation.

You might also like