Parallel Dbms

A parallel database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes
and evaluating queries. Although data may be stored in a distributed fashion, the distribution is governed solely by performance considerations. Parallel databases improve processing and input/output speeds by using multiple CPUs and disks in parallel. Centralized and clientserver database systems are not powerful enough to handle such applications. In parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially. Parallel databases can be roughly divided into two groups, the first group of architecture is the multiprocessor architecture, the alternatives of which are the followings :

Shared memory architecture, where multiple processors share the main memory space, as well as mass storage (e.g. hard disk drives). Shared disk architecture, where each node has its own main memory, but all nodes share mass storage, usually a storage area network. In practice, each node usually also has multiple processors. Shared nothing architecture, where each node has its own mass storage as well as main memory.
The other architecture group is called hybrid architecture, which includes:

Non-Uniform Memory Architecture (NUMA), which involves the Non-Uniform Memory Access. Cluster (shared nothing + shared disk: SAN/NAS), which is formed by a group of connected computers.
Data can be partitioned across multiple disks for parallel I/O. Individual relational operations (e.g., sort, join, aggregation) can be executed in parallel data can be partitioned and each processor can work independently on its own partition. o Queries are expressed in high level language (SQL, translated to relational algebra) makes parallelization easier. o Different queries can be run in parallel with each other. Concurrency control takes care of conflicts. o Thus, databases naturally lend themselves to parallelism. I/O Parallelism o Reduce the time required to retrieve relations from disk by partitioning o the relations on multiple disks. o Horizontal partitioning tuples of a relation are divided among many disks such that each tuple resides on one disk.
o o
A parallel database system (PDBS) is a DBMS implemented on a parallel computer
which is made of a number of nodes (processors and memories) connected by a fast network within a cabinet. It strives to exploit modern multiprocessor architectures using software-oriented solutions for data management OBJECTIVE Problems of conventional DBMS - high disk access time. - very large databases cant be supportedwithin a single system. PDBS is the only viable solution for increasing the I/O bandwidth through parallelism
& for storing huge databases in a single system. ADVANTAGES OF PDBS High Performance Increased throughput (inter-query parallelism) & decreased response time (intra-query parallelism). High Availability Using data replication. Extensibility Linear scaleup and Linear speedup. PARALLEL DBMS ARCHITECTURE Shared Memory Advantages Simplicity, Load Balancing. Problems Cost, Limited Extensibility, Low Availability. Shared Disk Advantages Cost, Extensibility, Load Balancing, Availability. Problems Higher Complexity, Potential Coherence Problems Shared Nothing Advantages Cost, Extensibility, Availability. Problems Complex, Addition of new nodes requires reorganizing the database. PARALLEL DBMS TECHNIQUES DATA ALLOCATION Methods that spread the database across the systems disks to ensure efficient parallel I/O. Partitioning (Fragmentation) 3 strategies # Round Robin i th tuple to partition (i mod n) for n partitions. # Hashing Apply hash function to some attribute to give partition no. # Range Partitioning Distribute tuples based on value(ranges) of some attribute. USES OF DATA FRAGMENTATION Maximize system performance. Minimize response time (through intra-query parallelism). Maximize throughput (through inter-query parallelism).
Reference: http://www.seminarprojects.com/Thread-parallel-databasesystems#ixzz1cbxd7gzy A variety of hardware architectures allow multiple computers to share access to data, software, or peripheral devices. A parallel database is designed to take advantage of such architectures by running multiple instances which "share" a single physical database. In appropriate applications, a parallel server can allow access to a single database by users on multiple machines, with increased performance. A parallel server processes transactions in parallel by servicing a stream of transactions using multiple CPUs on different nodes, where each CPU processes an entire transaction. Using parallel data manipulation language you can have one transaction being performed by multiple nodes. This is an efficient approach because many applications consist of online insert and update transactions which tend to have short data access requirements. In addition to balancing the workload among CPUs, the parallel database provides for concurrent access to data and protects data integrity.
o
Speedup is the extent to which more hardware can perform the same task in less time than the original system. With added hardware, speedup holds the task constant and measures time savings
Figure 1-5 Speedup
With good speedup, additional processors reduce system response time. You can measure speedup using this formula: where is the elapsed time spent by a larger, parallel system on the given task
Time_Parallel
For example, if the original system took 60 seconds to perform a task, and two parallel systems took 30 seconds, then the value of speedup would equal 2. A value of n, where n times more hardware is used indicates the ideal of linear speedup: when twice as much hardware can perform the same task in half the time (or when three times as much hardware performs the same task in a third of the time, and so on). Scaleup Scaleup is the factor m that expresses how much more work can be done in the same time period by a system n times larger. With added hardware, a formula for scaleup holds the time constant, and measures the increased size of the job which can be done. Figure 1-6 Scaleup
With good scaleup, if transaction volumes grow, you can keep response time constant by adding hardware resources such as CPUs. You can measure scaleup using this formula: where
Volume_Parallel is the transaction volume processed in a given amount of time on a parallel system
For example, if the original system can process 100 transactions in a given amount of time, and the parallel system can process 200 transactions in this amount of time, then the value of scaleup would be equal to 2. That is, 200/100 = 2. A value of 2 indicates the ideal of linear scaleup: when twice as much hardware can process twice the data volume in the same amount of time.
Messaging
Parallel processing requires fast and efficient communication between nodes: a system with high bandwidth and low latency which efficiently communicates with the IDLM. Bandwidth is the total size of messages which can be sent per second. Latency is the time (in seconds) it takes to place a message on the interconnect. Latency thus indicates the number of messages which can be put on the interconnect per second. An interconnect with high bandwidth is like a wide highway with many lanes to accommodate heavy traffic: the number of lanes affects the speed at which traffic can move. An interconnect with low latency is like a highway with an entrance ramp which permits vehicles to enter without delay: the cost of getting on the highway is low. Advantages:
Enhanced Throughput: Scaleup
If tasks can run independently of one another, they can be distributed to different CPUs or nodes and there will be a scaleup: more processes will be able to run through the database in the same amount of time. If processes can run ten times faster, then the system can accomplish ten times more in the original amount of time. The parallel query feature, for example, permits scaleup: a system might maintain the same response time if the data queried increases tenfold, or if more users can be served. Oracle Parallel Server without the parallel query feature also permits scaleup, but by running the same query sequentially on different nodes.
Greater Flexibility
An Oracle Parallel Server environment is extremely flexible. Instances can be allocated or deallocated as necessary. When there is high demand for the database, more instances can be temporarily allocated. The instances can be deallocated and used for other purposes once they are no longer necessary.
More Users
Parallel database technology can make it possible to overcome memory limits, enabling a single system to serve thousands of users.

Parallel Dbms

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Dbms

Uploaded by

Copyright:

Available Formats

A parallel database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes

The other architecture group is called hybrid architecture, which includes:

A parallel database system (PDBS) is a DBMS implemented on a parallel computer

Figure 1-5 Speedup

You might also like