You are on page 1of 13

CLUSTER COMPUTING A.

Vijaya Vysali Abstract


Cluster computing is not a new area of computing. It is, however, evident that there is a growing interest in its usage in all areas where applications have traditionally used parallel or distributed computing platforms. The mounting interest has been fuelled in part by the availability of powerful microprocessors and high-speed networks as off-the-shelf commodity components as well as in part by the rapidly maturing software components available to support high performance and high availability applications. Cluster computing can be defined as a set or a group of processors which are interconnected to one another, which are used to perform a specific task. In this paper we will be discussing about a short introduction on cluster computing. The scope of this paper also contains the architecture, the various applications and its advantages and disadvantages. However, with cluster computing, the subparts of the larger problem usually run on a single processor for a long period of time without reference to the other sub parts, which means that the slow communication among nodes is not a major problem. Experts in the field often refer to these types of problems as CPUbound. Cluster computing has become a major part of many research programs because the price to performance ratio of commodity clusters is very good. Also, because the nodes in a cluster are clones, there is no single point of failure, which enhances the reliability to the cluster. Of course, these benefits can only be realized if the problems you are attempting to solve can be easily parallelized. Increasingly, computer clusters are being combined with large shared memory systems, such as the ones found in supercomputing architectures. By doing so, scientists who work on problems that have both capability and capacity elements can take advantage of the inherent strengths of both designs.

Introduction
Not all difficult problems require access to a single shared memory resource. Some problems can easily be broken into many smaller independent parts. Computer scientists often refer to this class of problems as "embarrassingly parallel" or as capacity problems. Many of the computers that we typically employ on a day-today basis for word processing or for game playing are very well equipped to solve the smaller components of capacity problems. In practice, clusters are usually composed of many commodity computers, linked together by a high-speed dedicated network

Need for cluster computing


The needs and expectations of modern-day applications are changing in the sense that they not only need computing resources (be they processing power, memory or disk space), but also the ability to remain available to service user requests almost constantly 24 hours a day and 365 days a year. These needs and expectations of todays applications result in challenging research and development efforts in both the areas of computer hardware and software. It seems that as applications evolve they inevitably consume more and more computing resources. To some extent we can overcome these limitations. For example, we can create faster processors and install larger memories. But future improvements are constrained by a number of factors, including physical ones, such as the speed of light and the constraints imposed by various thermodynamic laws, as well as financial ones, such as the huge

What distinguishes this configuration from the heavy hitting, top dollar supercomputers is that each node within a cluster is an independent system, with its own operating system, private memory, and, in some cases, its own file system. Because the processors on one node cannot directly access the memory on the other nodes, programs or software run on clusters usually employ a procedure called "message passing" to get data and execution code from one node to another. Compared to the shared memory systems of supercomputers, passing messages is very slow.

investment needed to fabricate new processors and integrated circuits. The obvious solution to overcoming these problems is to connect multiple processors and systems together and coordinate their efforts. The resulting systems are popularly known as parallel computers and they allow the sharing of a computational task among multiple processors. Parallel supercomputers have been in the mainstream of high-performance computing for the last ten years. However, their popularity is waning. The reasons for this decline are many, but include being expensive to purchase and run, potentially difficult to program, slow to evolve in the face of emerging hardware technologies, and difficult to upgrade without, generally, replacing the whole system. The decline of the dedicated parallel supercomputer has been compounded by the emergence of commodity-off-the-shelf clusters of PCs and workstations. The idea of the cluster is not new, but certain recent technical capabilities, particularly in the area of networking, have brought this class of machine to the vanguard as a platform to run all types of parallel and distributed applications. The emergence of cluster platforms was driven by a number of academic projects, such as Beowulf , Berkeley and HPVM . These projects helped to prove the advantage of clusters over other traditional platforms. Some of these advantages included, low-entry costs to access supercomputing-level performance, the ability to track technologies, an incrementally upgradeable system, an open source development platform, and not being locked into particular vendor products. Today, the overwhelming price/performance advantage of this type of platform over other proprietary ones, as well as the other key benefits mentioned earlier, means that clusters have infiltrated not only the traditional science and engineering marketplaces for research and development, but also the huge commercial marketplaces of commerce and industry. It should be noted that this class of machine is not only being used as for highperformance computation, but increasingly as a platform to provide highly available services, for applications such Web and database servers. A cluster is a type of parallel or distributed computer system, which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing

resource. The key components of a cluster include, multiple standalone computers (PCs, Workstations, or SMPs), an operating systems, a high performance interconnect, communication software, middleware, and applications.

Cluster benefits
The main benefits of clusters are scalability, availability, and performance. For scalability, a cluster uses the combined processing power of compute nodes to run cluster-enabled applications such as a parallel database server at a higher performance than a single machine can provide. Scaling the cluster's processing power is achieved by simply adding additional nodes to the cluster. Availability within the cluster is assured as nodes within the cluster provide backup to each other in the event of a failure. In highavailability clusters, if a node is taken out of service or fails, the load is transferred to another node (or nodes) within the cluster. To the user, this operation is transparent as the applications and data running are also available on the failover nodes. An additional benefit comes with the existence of a single system image and the ease of manageability of the cluster. From the users perspective the users sees an application resource as the provider of services and applications. The user does not know or care if this resource is a single server, a cluster, or even which node within the cluster is providing services. These benefits map to needs of today's enterprise business, education, military and scientific community infrastructures. In summary, clusters provide: Scalable capacity for compute, data, and transaction intensive applications, including support of mixed workloads Horizontal and vertical scalability without downtime Ability to handle unexpected peaks in workload Central system management of a single systems image 24 x 7 availability

Types of clusters

There are several types of clusters, each with specific design goals and functionality. These clusters range from distributed or parallel clusters for computation intensive or data intensive applications that are used for protein, seismic, or nuclear modeling to simple loadbalanced clusters. High Availability or Failover Clusters These clusters are designed to provide uninterrupted availability of data or services (typically web services) to the end-user community. The purpose of these clusters is to ensure that a single instance of an application is only ever running on one cluster member at a time but if and when that cluster member is no longer available, the application will failover to another cluster member. With a high-availability cluster, nodes can be taken out-of-service for maintenance or repairs. Additionally, if a node fails, the service can be restored without affecting the availability of the services provided by the cluster . While the application will still be available, there will be a performance drop due to the missing node. High-availability clusters implementations are best for mission-critical applications or databases, mail, file and print, web, or application servers.

cluster aware applications together into a single virtual machine necessary to allow the network to effortlessly grow to meet increased business demands. High-availability clusters (also known as failover clusters, or HA clusters) improve the availability of the cluster approach. They operate by having redundant nodes, which are then used to provide service when system components fail. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure. There are commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux operating system.

Clusters-Aware and Cluster-Unaware Applications


Cluster-aware applications are designed specifically for use in clustered environment. They know about the existence of other nodes and are able to communicate with them. Clustered database is one example of such application. Instances of clustered database run in different nodes and have to notify other instances if they need to lock or modify some data. Cluster-unaware applications do not know if they are running in a cluster or on a single node. The existence of a cluster is completely transparent for such applications, and some additional software is usually needed to set up a cluster. A web server is a typical clusterunaware application. All servers in the cluster have the same content, and the client does not care from which server the server provides the requested content.

Load Balancing Cluster


This type of cluster distributes incoming requests for resources or content among multiple nodes running the same programs or having the same content (see Figure 2). Every node in the cluster is able to handle requests for the same content or application. If a node fails, requests are

fig: High availability clusters


Unlike distributed or parallel processing clusters, high-availability clusters seamlessly and transparently integrate existing standalone, non-

redistributed between the remaining available nodes. This type of distribution is typically seen in a web-hosting environment. These clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized.However, approaches to loadbalancing may significantly differ among applications, e.g. a high-performance cluster used for scientific computations would balance load with different algorithms from a web-server cluster which may just use a simple round-robin method by assigning each new request to a different node.

A parallel cluster is a system that uses a number of nodes to simultaneously solve a specific computational or data-mining task. Unlike the load balancing or high-availability cluster that distributes requests/tasks to nodes where a node processes the entire request, a parallel environment will divide the request into multiple sub-tasks that are distributed to multiple nodes within the cluster for processing. Parallel clusters are typically used for CPU-intensive analytical applications, such as mathematical computation, scientific analysis (weather forecasting, seismic analysis, etc.), and financial data analysis. One of the more common cluster operating systems is the Beowulf class of clusters. A Beowulf cluster can be defined as a number of systems whose collective processing capabilities are simultaneously applied to a specific technical, scientific, or business application. Each individual computer is referred to as a "node" and each node communicates with other nodes within a cluster across standard Ethernet technologies (10/100 Mbps, GibeE, or 10GbE). Other high-speed interconnects such as Myrinet, Infiniband, or Quadrics may also be used.

Cluster components
Both the high availability and loadbalancing cluster technologies can be combined to increase the reliability, availability, and scalability of application and data resources that are widely deployed for web, mail, news, or FTP services. The basic building blocks of clusters are broken down into multiple categories: the cluster nodes, cluster operating system, network switching hardware and the node/switch interconnect. Significant advances have been accomplished over the past five years to improve the performance of both the compute nodes as well as the underlying switching infrastructure.

Parallel/Distributed Processing Clusters


Traditionally, parallel processing was performed by multiple processors in a specially designed parallel computer. These are systems in which multiple processors share a single memory and bus interface within a single computer. With the advent of high speed, low-latency switching technology, computers can be interconnected to form a parallel-processing cluster. These types of cluster increase availability, performance, and scalability for applications, particularly computationally or data intensive tasks.

Cluster Nodes
Node technology has migrated from the conventional tower cases to single rack-unit multiprocessor systems and blade servers that provide a much higher processor density within a decreased area. Processor speeds and server architectures have increased in performance, as well as solutions that provide options for either 32-bit or 64-bit processors systems. Additionally, memory performance as well as hard-disk access speeds and storage capacities have also

increased. It is interesting to note that even though performance is growing exponentially in some cases, the cost of these technologies has dropped considerably. As shown in Figure 3 below, node participation in the cluster falls into one of two responsibilities: master (or head) node and compute (or slave) nodes. The master node is the unique server in cluster systems. It is responsible for running the file system and also serves as the key system for clustering middleware to route processes, duties, and monitor the health and status of each slave node. A compute (or slave) node within a cluster provides the cluster a computing and data storage capability. These nodes are derived from fully operational, standalone computers that are typically marketed as desktop or server systems that, as such, are off-the-shelf commodity systems

With today's low costs per-port for Gigabit Ethernet switches, adoption of 10Gigabit Ethernet and the standardization of 10/100/1000 network interfaces on the node hardware, Ethernet continues to be a leading interconnect technology for many clusters. In addition to Ethernet, alternative network or interconnect technologies include Myrinet, Quadrics, and Infiniband that support bandwidths above 1Gbps and end-to-end message latencies below 10 microseconds (uSec).

Network Characterization
There are two primary characteristics establishing the operational properties of a network: bandwidth and delay. Bandwidth is measured in millions of bits per second (Mbps) and/or billions of bits per-second (Gbps). Peak bandwidth is the maximum amount of data that can be transferred in a single unit of time through a single connection. Bi-section bandwidth is the total peak bandwidth that can be passed across a single switch. Latency is measured in microseconds (uSec) or milliseconds (mSec) and is the time it takes to move a single packet of information in one port and out of another. For parallel clusters, latency is measured as the time it takes for a message to be passed from one processor to another that includes the latency of the interconnecting switch or switches. The actual latencies observed will vary widely even on a single switch depending on characteristics such as packet size, switch architecture (centralized versus distributed), queuing, buffer depths and allocations, and protocol processing at the nodes.

fig: Cluster nodes Cluster Network


Commodity cluster solutions are viable today due to a number of factors such as the high performance commodity servers and the availability of high speed, low-latency network switch technologies that provide the inter-nodal communications. Commodity clusters typically incorporate one or more dedicated switches to support communication between the cluster nodes. The speed and type of node interconnects vary based on the requirements of the application and organization.

Ethernet, Fast Ethernet, Gigabit Ethernet and 10-Gigabit Ethernet


Ethernet is the most widely used interconnect technology for local area networking (LAN). Ethernet as a technology supports speeds varying from 10Mbps to 10 Gbps and it is successfully deployed and operational within many high-performance cluster computing environments.

Cluster Network Hardware

Commodity clusters are made possible only because of the availability of adequate inter-node communication network technology. Interconnect networks enable data packets to be transferred between logical elements distributed among a set of separate processor nodes within a cluster through a combination of hardware and software support. Commodity clusters incorporate one or more dedicated networks to support message packet communication within the distributed system. This distinguishes it from ensembles of standalone systems loosely connected by shared local area networks (LAN) that are employed primarily as desktop and server systems. Such computing environments have been successfully employed to perform combined computations using available unused resources. These practices are referred to as cycle harvesting or workstation farms and share the intercommunication network with external systems and services, not directly related to the coordinated multi-node computation. In comparison, the commodity clusters system area network (SAN) is committed to the support of such distributed computation on the cluster, employing separate external networks for interaction with environment services. Parallel applications exhibit a wide range of communication behaviors and impose diverse requirements on the underlying communication framework. Some problems require the high bandwidth and low latency found only in the tightly coupled massively parallel processing systems (MPP) and may not be well suited to commodity clusters. Other application classes referred to as embarrassingly parallel not only perform effectively on commodity clusters, they may not even fully consume the networking resources provided. Many algorithms fall in between these two extremes. As network performance attributes improve, the range of problems that can be effectively handled also expands. The two primary characteristics establishing the operational properties of a network are the bandwidth measured in millions of bits per second (Mbps) and the latency measured in microseconds. Peak bandwidth is the maximum amount of information that can be transferred in unit time through a single channel. Bi-section bandwidth is the total peak bandwidth that can be passed across a

system. Latency is the amount of time to pass a packet from one physical or logical element to another. But the actual measurement values observed for these parameters will vary widely even on a single system depending on such secondary effects as packet size, traffic contention, and software overhead. Early PC clusters such as Beowulf-class systems employed existing LAN Ethernet technology the cost of which had improved to the point that low cost commodity clusters were feasible. However, their 10 Mbps peak bandwidth was barely adequate for other than the most loosely coupled applications and some systems ganged multiple Ethernet networks together in parallel through a software means called channel bonding that made the multiplicity of available channels at a node transparent to the application code while delivering significantly higher bandwidth to it than provided by a single channel. With the emergence of Fast Ethernet exhibiting 100 Mbps peak bandwidth and the availability of low cost hubs and moderate cost switches, commodity clusters became practical and were found to be useful to an increasing range of applications for which latency tolerant algorithms could be devised. Hubs provide a shared communications backplane that provides connectivity between any two nodes at a time. While relatively simple and inexpensive, hubs limit the amount of communication within a cluster to one transaction at a time. The more expensive nswitches permit simultaneous transactions between disjoint pairs of nodes, thus greatly increasing the potential system throughput and reducing network contention. While local area network technology provided an incremental path to the realization of low cost commodity clusters, the opportunity for the development of networks optimized for this domain was recognized. Among the most widely used is Myrinet with its custom network control processor that provides peak bandwidth in excess of 1 Gbps at latencies on the order of 20 microseconds. While more expensive per port than Fast Ethernet, its costs are comparable to that of the recent Gigabit Ethernet (1 Gbps peak bandwidth) even as it provides superior latency characteristics. Another early system area network is SCI (scalable coherent interface) that was originally designed to support distributed shared memory. Delivering several Gbps bandwidth, SCI has found service primarily in the European

community. Most recently, an industrial consortium has developed a new class of network capable of moving data between application processes without requiring the usual intervening copying of the data to the node operating systems. This zero copy scheme is employed by the VIA (Virtual Interface Architecture) network family yielding dramatic reductions in latency. One commercial example is cLAN which provides bandwidth on the order of a Gbps with latencies well below 10 microseconds. Finally, a new industry standard is emerging, Infiniband, that in two years promises to reduce the latency even further approaching one microsecond while delivering peak bandwidth on the order of 10 Gbps. Infiniband goes further than VIA by removing the I/O interface as a contributing factor in the communication latency by directly connecting to the processors memory channel interface. With these advances, the network infrastructure of the commodity cluster is becoming less dominant as the operational bottleneck. Increasingly, it is the software tools that are limiting the applicability of commodity clusters as a result of hardware advances that have seen improvements of three orders of magnitude in interconnect bandwidth within a decade.

distribution of clusters is fostering unprecedented work in the field even as it is providing a convergent architecture family in which the applications community can retain confidence. Therefore, software components, so necessary for the success of any type of parallel system, is improving at an accelerated rate and quickly approaching the stable and sophisticated levels of functionality that will establish clusters as the long-term solution to high-end computation. The software components that comprise the environment of a commodity cluster may be described in two major categories: programming tools and resource management system software. Programming tools provide languages, libraries, and performance and correctness debuggers to construct parallel application programs. Resource management software relates to initial installation, administration, and scheduling and allocation of both hardware and software components as applied to user workloads. A brief summary of these critical components follows.

Application Programming Environments


Harnessing of parallelism within application programs to achieve orders of magnitude gain in delivered performance has been a challenge confronting high end computing since the 1970s if not before. Vector supercomputers, SIMD array processors, and MPP multiprocessors have exploited varying forms of algorithmic parallelism through a combination hardware and software mechanisms. The results have been mixed. The highest degrees of performance yet achieved have been through parallel computation. But in many instances, the efficiencies observed have been low and the difficulties in their accomplishment have been high. Commodity clusters, because of their superior priceperformance advantage for a wide range of problems, are now undergoing severe pressure to address the same problems confronted by their predecessors. While multiple programming models have been pursued, one paradigm has emerged as the predominant form, at least for the short term. The communicating sequential processes model more frequently referred to

Software Components
If low cost consumer grade hardware has catalyzed the proliferation of commodity clusters for both technical and commercial high performance computing, it is the software that has both enabled its utility and restrained its usability. While the rapid advances in hardware capability have propelled commodity clusters to the forefront of next generation systems, equally important has been the evolving capability and maturity of the support software systems and tools. The result is a total system environment that is converging on previous generation supercomputers and MPPs. And like these predecessors, commodity clusters present opportunity for future research and advanced development in programming tools and resource management software to enhance their applicability, availability, scalability, and usability. However, unlike these earlier system classes, the wide and rapidly growing

as the message passing model has evolved through many different implementations resulting in the MPI or Message Passing Interface community standard. MPI is now found on virtually every vendor multiprocessor including SMPs, DSMs, MPPs, and clusters. MPI is not a full language but an augmenting library that allows users of C and Fortran to access libraries for passing messages between concurrent processes on separate but interconnected processor nodes. A number of implementations of MPI are available from system vendors, ISVs, and research groups providing open source versions through free distributions. Parallel programming to be effective requires more than a set of constructs. There needs to be the tools and environment to understand the operational behavior of a program to correct errors in the computation and to enhance performance. The status of such tools for clusters is in its infancy although significant effort by many teams has been made. One popular debugger, Totalview, has met part of the need and is the product of an ISV, benefiting from years of evolutionary development. A number of performance profilers have been developed taking many forms. One good example is the set of tools incorporated with the PVM distribution. Work continues but more research is required and no true community wide standard has emerged. While the message-passing model in general and MPI in particular dominate parallel programming of commodity clusters, other models and strategies are possible and may prove superior in the long run. Data-parallel programming has been supported through HPF and BSP. Message driven computation has been supported by Split-C and Fast messages. Distributed shared memory programming was pursued by the Shrimp project and through the development of UPC. And for the business community relying on coarse-grained transaction processing, a number of alternative tools have been provided to support a conventional master-slave methodology. It is clear that today for cluster programming there is an effective and highly portable standard for building and executing parallel programming and that work on possible alternative techniques is underway by researcher groups that may

yield improved and more effective models in the future.

Resource Management Software


While a couple of strong standards have been accepted by the community for programming commodity clusters, such can not be said for the software environments and tools required to manage the resources, both hardware and software, that execute the programs. However, in the last two to three years substantial progress has been made with number of software products available from research groups and ISVs that are beginning to satisfy some of the more important requirements. The areas of support required are diverse and reflect the many challenges to the management of cluster resources. Installation and Configuration the challenge of build it yourself supercomputers lies first with the means of assembling, installing, and configuring the necessary hardware and software components comprising the complete system. The challenge of implementing and maintaining common software image across nodes, especially for systems comprising hundreds of nodes requires sophisticated, effective, and easy to use tools. Many partial solutions have been developed by various groups and vendors (e.g. Scyld) and are beginning to make possible quick and easy creation of very large systems with a minimum of effort. Scheduling and Allocation the placement of applications on to the distributed resources of commodity clusters requires tools to allocate the software components to the nodes and to schedule the timing of their operation. In its most simple form, the programmer performs this task manually. However, for large systems, especially those shared among multiple users and possibly performing multiple programs at one time, more sophisticated means are required for a robust and disciplined management of computing resources. Allocation can be performed at different levels of task granularity including: jobs, transactions, processes, or threads. They may be

scheduled statically such that once an assignment is made it is retained until the culmination of the task or dynamically permitting the system to automatically migrate, suspend, and reinitiate tasks for best system throughput by load balancing. Examples of available schedulers that incorporate some, but not all, of these capabilities include Condor, the Maui Scheduler, and the Cluster Controller. System Administration the supervision of industrial grade systems requires many mundane but essential chores to be performed including the management of user accounts, job queues, security, backups, mass storage, log journaling, operator interface, user shells, and other housekeeping activities. Traditionally, high-end computing systems have lacked some or most of these essential support utilities typical of commercial server class mainstream systems. However, incremental progress in this area has been made with PBS as a leading example.

fully satisfies the diversity of requirements seen across the wide range of applications. As a result, commercial turnkey applications often incorporate their own proprietary distributed file management software tailored to the specific access patterns of the given application. But for the broader cluster user base, a generalpurpose parallel file system is required that is flexible, scalable, and efficient as well as standardized. Examples of parallel file systems that have been applied to the cluster environment include PPFS, PVFS, and GPFS. Availability as the scale of commodity clusters increases, the MTBF will decrease and active measures must be taken to ensure continued operation and minimal down time. Surprisingly, many clusters operate for months at a time without any outage and are often taken off line just for software upgrades. But especially during early operation, failures of both hardware (through infant mortality) and software (incorrect installation and configuration) can cause disruptive failures. For some cluster systems dedicated to a single long running application, continuous execution times of weeks or even months may be required. In all these cases, software tools to enhance system robustness and maximize uptime are becoming increasingly important to the practical exploitation of clusters. Checkpoint and restart tools are crucial to the realization of long-run applications on imperfect systems. Rapid diagnostics, recursion testing, fault detection, isolation, hot spares, and maintenance record analysis require software tools and a common framework to keep systems operational longer, resume operation rapidly in response to failures, and recover partial computations for continued execution in the presence of faults.

Monitoring and Diagnosis


continued operation of a system comprising a large number of nodes requires low level tools by which the operator may monitor the state, operation, and health of all system elements. Such tools can be as simple as distributed versions of command line Unix-like ps and top commands to GUI visual depictions of the total parallel system updated in real time. While a number of such tools has been devised by different groups, no single tool has achieved dominance in community wide usage. Distributed Secondary Storage almost all computations require access to secondary storage including both local and remote disk drives for support of file systems. Many commercial applications and even some technical computations are disk access intensive. While NFS has continued to serve in many commodity cluster configurations, its many limitations in both functionality and performance have resulted in a new generation of parallel file systems to be developed. No one of these

CLUSTER APPLICATION
Parallel applications exhibit a wide range of communication behaviors and impose various requirements on the underlying network. These may be unique to a specific application, or

an application category depending on the requirements of the computational processes. Some problems require the high bandwidth and low-latency capabilities of today's low-latency, high throughput switches using 10GbE, Infiniband or Myrinet. Other application classes perform effectively on commodity clusters and will not push the bounds of the bandwidth and resources of these same switches. Many applications and the messaging algorithms used fall in between these two ends of the spectrum. Currently, there are four primary categories of applications that use parallel clusters: compute intensive, data or input/output (I/O) intensive, and transaction intensive. Each of these has its own set of characteristics and associated network requirements. Each has a different impact on the network as well as how each is impacted by the architectural characteristics of the underlying network. The following subsections describe each application types. Data or I/O Intensive Applications Data intensive is a term that applies to any application that has high demands of attached storage facilities. Performance of many of these applications is impacted by the quality of the I/O mechanisms supported by current cluster architectures, the bandwidth available for network attached storage, and, in some cases, the performance of the underlying network components at both Layer 2 and 3. Dataintensive applications can be found in the area of data mining, image processing, and genome and protein science applications. The movement to parallel I/O systems continues to occur to improve the I/O performance for many of these applications.

Transaction Intensive Applications Transaction intensive is a term that applies to any application that has a high-level of interactive transactions between an application resource and the cluster resources. Many financial, banking, human resource, and webbased applications fall into this category.

Compute Intensive Application


Compute intensive is a term that applies to any computer application that demands a lot of computation cycles (for example, scientific applications such as meteorological prediction). These types of applications are very sensitive to end-to-end message latency. This latency sensitivity is caused by either the processors having to wait for instruction messages, or if transmitting results data between nodes takes longer. In general, the more time spent idle waiting for an instruction or for results data, the longer it takes to complete the application. Some compute-intensive applications may also be graphic intensive. Graphic intensive is a term that applies to any application that demands a lot of computational cycles where the end result is the delivery of significant information for the development of graphical output such as ray-tracing applications. These types of applications are also sensitive to end-toend message latency. The longer the processors have to wait for instruction messages or the longer it takes to send resulting data, the longer it takes to present the graphical representation of the resulting data.

PERFPRMANCE IMPACT AND CAREABOUTS


There are three main care bouts for cluster applications: message latency, CPU utilization, and throughput. Each of these plays an important part in improving or impeding application performance. This section describes each of these issues and their associated impact on application performance.

MESSAGE LATENCY
Message latency is defined as the time it takes to send a zero-length message from one processor to another (measured in microseconds). The lower the latency for some application types, the better. Message latency is made up of aggregate latency incurred at each element within the cluster network, including within the cluster nodes themselves .Although

network latency is often focused on, the protocol processing latency of message passing interface (MPI) and TCP processes within the host itself are typically larger. Throughput of today's cluster nodes are impacted by protocol processing, both for TCP/IP processing and the MPI. To maintain cluster stability, node synchronization, and data sharing, the cluster uses message passing technologies such as Parallel Virtual Machine (PVM) or MPI. TCP/IP stack processing is a CPUintensive task that limits performance within high speed networks. As CPU performance has increased and new techniques such as TCP offload engines (TOE) have been introduced, PCs are now able to drive the bandwidth levels higher to a point where we see traffic levels reaching near theoretical maximum for TCP/IP on Gigabit Ethernet and near bus speeds for PCIX based systems when using 10

therefore, the lower the cost of running a specific application or job.

SLOW START
In the original implementation of TCP, as soon as a connection was established between two devices, they could each send segments as fast as they liked as long as there was room in the other devices receive window. In a busy network, the sudden appearance of a large amount of new traffic could exacerbate any existing congestion. To alleviate this problem, modern TCP devices are restrained in the rate at which they initially send segments. Each sender is at first restricted to sending only an amount of data equal to one "fullsized" segment that is equal to the MSS value for the connection. Each time an acknowledgment is received, the amount of data the device can send is increased by the size of another full-sized segment. Thus, the device "starts slow" in terms of how much data it can send, with the amount it sends increasing until either the full window size is reached or congestion is detected on the link. In the latter case, the congestion avoidance feature, described below, is used.

CPU UTILIZATION
one important consideration for many enterprises is to use compute resources as efficiently as possible. As increased number of enterprises move towards realtime and businessintelligence analysis, using compute resources efficiently is an important metric. However, in many cases compute resource is underutilized. The more CPU cycles committed to application processing the less time it takes to run the application. Unfortunately, although this is a design goal, this is not obtainable as both the application and protocols compete for CPU cycles. As the cluster node processes the application, the CPU is dedicated to the application and protocol processing does not occur. For this to change, the protocol process must interrupt a uniprocessor machine or request a spin lock for a multiprocessor machine. As the request is granted, CPU cycles are then applied to the protocol process. As more cycles are applied to protocol processing, application processing is suspended. In many environments, the value of the cluster is based on the run-time of the application. The shorter the time to run, the more floating-point operations and/or millions of instructions per-second occur, and,

CONGESTION AVOIDANCE
When potential congestion is detected on a TCP link, a device responds by throttling back the rate at which it sends segments. A special algorithm is used that allows the device to drop the rate at which segments are sent quickly when congestion occurs. The device then uses the Slow Start algorithm, described above, to gradually increase the transmission rate back up again to try to maximize throughput without congestion occurring again. In the event of packet drops, TCP retransmission algorithms will engage. Retransmission timeouts can reach delays of up to 200 milliseconds, thereby significantly impacting throughput. .

Conclusion
High-performance cluster computing is enabling a new class of computationally intensive applications that are solving problems that were previously cost prohibitive for many enterprises. The use of commodity computers collaborating to resolve highly complex, computationally intensive tasks has broad application across several industry verticals such as chemistry or biology, quantum physics, petroleum exploration, crash test simulation, CG rendering, and financial risk analysis. However, cluster computing pushes the limits of server architectures, computing, and network performance. Due to the economics of cluster computing and the flexibility and high performance offered, cluster computing has made its way into the mainstream enterprise data centers using clusters of various sizes. As clusters become more popular and more pervasive, careful consideration of the application requirements and what that translates to in terms of network characteristics becomes critical to the design and delivery of an optimal and reliable performing solution.

The technologies associated with cluster computing, including host protocol stackprocessing and interconnect technologies, are rapidly evolving to meet the demands of current, new, and emerging applications. Much progress has been made in the development of lowlatency switches, protocols, and standards that efficiently and effectively use network hardware components.

Knowledge of how the application uses the cluster nodes and how the characteristics of the application impact and are impacted by the underlying network is critically important. As critical as the selection of the cluster nodes and operating system, so too are the selection of the node interconnects and underlying cluster network switching technologies. A scalable and modular networking solution is critical, not only to provide incremental connectivity but also to provide incremental bandwidth options as the cluster grows. The ability to use advanced technologies within the same networking platform, such as 10 Gigabit Ethernet, provides new connectivity options, increases bandwidth, whilst providing investment protection.

You might also like