You are on page 1of 63

Kapita Selekta- Pertemuan I

Rajkumar Buyya, Monash University, Melbourne.


Email: rajkumar@csse.monash.edu.au / rajkumar@buyya.com
Web: http://www.ccse.monash.edu.au/~rajkumar / www.buyya.com
Satuan Perkuliahan
Pengenalan Teknologi Hadoop
Pengenalan Teknologi Cluster
Pengenalan Image Processing
Penjelasan Tugas dan Contoh
Presentasi Tugas 1
Presentasi Tugas II
Wrap-up (ringkasan Materi)

Tugas = 25 %
Evaluasi = 20 %
Diskusi = 20 %
Ujian = 35 %
 
Introduction to Cloud Computing
1. Cluster computing
2. Grid computing
3. Web services
4. Cloud computing
How to Run App. Faster ?

 There are 3 ways to improve performance:


– 1. Work Harder
– 2. Work Smarter
– 3. Get Help
 Computer Analogy
– 1. Use faster hardware: e.g. reduce the time per
instruction (clock cycle).
– 2. Optimized algorithms and techniques
– 3. Multiple computers to solve problem: That
is, increase no. of instructions executed per
clock cycle. 4
Computing Platforms Evolution
Breaking Adm inistrative Barriers

2 1 0
2 1 0 2 1 0 2 1 0

P
E
?
R 2 1 0 2 1 0 2 1 0
2 1 0

F 21 00

O Administrative Barriers
R
M Individual
A Group
N Depart ment
C Campus
E Sta te
National
Globe
Inte r Plane t
Universe

Desktop SMPs or Local Enterprise Global


(Single Processor?) SuperC om Cluster Cluster/Grid Inter Plan et
puters Cluster/Grid
Cluster/Grid ??
Computational Power Improvement

Multiprocessor
C.P.I.

Uniprocessor

1 2. . . .
No. of Processors

6
Cluster Computing
A computer cluster is a group of linked computers,
working together closely thus in many respects
forming a single computer. The components of a
cluster are commonly, but not always, connected to
each other through fast local area networks. Clusters
are usually deployed to improve performance and/or
availability over that of a single computer, while
typically being much more cost-effective than single
computers of comparable speed or availability.[1]

http://en.wikipedia.org/wiki/Computer_cluster
Contoh Cluster Computer
Keuntungan Cluster
Menurut [BREW,1997):
Skalabilitas Absolut : Memungkinkan untuk diciptakannya
cluster besar yang melebihi tenaga mesin-mesin besar.
Skalabilitas Meningkat : memungkinkan untuk menambah
sistem baru ke cluster-cluster yang sudah ada
Ketersediaan Tinggi : karena masing-masing node pada
sebuah cluster merupakan komputer yang berdiri sendiri
maka kegagalan di salah satu node tidak akan menyebabkan
hilangnya servis. Kerja dapat dialihkan ke cluster lainnya.
Harga: lebih rendah daripada mesin tunggal yang memiliki
kemampuan yang sama.
There are many cluster configurations, but a simple
architecture such as the one shown in Figure 1, is used to
visualize the basic concept (Baker et al., 2002). In a
typical cluster, the application is run on a Master node.
However, the computational work is split-up and parsed
out to be done by the multiple nodes in the cluster. This
way, the cluster is better equipped to handle larger
amounts of data and complex problems than otherwise
possible on a stand-alone machine.

Clustercomputers are a realistic alternative for a variety of


applications similar to that of the more expensive
traditional supercomputers (Baker, Fox & Yau, 1996).
Computer Food Chain (Now and Future)

12
Beberapa sistem yang mendukung cluster antara lain:
Windows NT Server, Enterprise Edition, dengan sebuah layanan
yang disebut Microsoft Cluster Service (MSCS)
Windows 2000 Advanced Server, dengan sebuah layanan yang
disebut dengan Microsoft Clustering Service
Windows 2000 Datacenter Server
Windows Server 2003 Enterprise Edition (x86/IA-64/x64),
dengan sebuah layanan yang disebut sebagai Microsoft Clustering
Service
Windows Server 2003 Datacenter Edition (x86/IA-64/x64)
Solaris UNIX
GNU/Linux
Cluster Computing - Research Projects

 Beowulf (CalTech and NASA) - USA


 CCS (Computing Centre Software) - Paderborn, Germany
 Condor - Wisconsin State University, USA
 DQS (Distributed Queuing System) - Florida State University, US.
 EASY - Argonne National Lab, USA
 HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US
 far - University of Liverpool, UK
 Gardens - Queensland University of Technology, Australia
 MOSIX - Hebrew University of Jerusalem, Israel
 MPI (MPI Forum, MPICH is one of the popular implementations)
 NOW (Network of Workstations) - Berkeley, USA
 NIMROD - Monash University, Australia
 NetSolve - University of Tennessee, USA
 PBS (Portable Batch System) - NASA Ames and LLNL, USA
 PVM - Oak Ridge National Lab./UTK/Emory, USA
Cluster Computing - Commercial
Software

 Codine (Computing in Distributed Network Environment) -


GENIAS GmbH, Germany
 LoadLeveler - IBM Corp., USA
 LSF (Load Sharing Facility) - Platform Computing, Canada
 NQE (Network Queuing Environment) - Craysoft Corp., USA
 OpenFrame - Centre for Development of Advanced Computing,
India
 RWPC (Real World Computing Partnership), Japan
 Unixware (SCO-Santa Cruz Operations,), USA
 Solaris-MC (Sun Microsystems), USA
 ClusterTools (A number for free HPC clusters tools from Sun)
 A number of commercial vendors worldwide are offering clustering
solutions including IBM, Compaq, Microsoft, a number of startups
like TurboLinux, HPTI, Scali, BlackStone…..)
15
Cluster Components...1a Nodes

Multiple High Performance Components:


PCs
Workstations
SMPs (CLUMPS)
Distributed HPC Systems leading to
Metacomputing
They can be based on different architectures and
running difference OS
Cluster Components...1b Processors

 There are many (CISC/RISC/VLIW/Vector..)


 Intel: Pentiums, Xeon, Merceed….
 Sun: SPARC, ULTRASPARC
 HP PA
 IBM RS6000/PowerPC
 SGI MPIS
 Digital Alphas
 Integrate Memory, processing and networking into
a single chip
 IRAM (CPU & Mem): (http://iram.cs.berkeley.edu)
 Alpha 21366 (CPU, Memory Controller, NI)
Cluster Components…2 OS

State of the art OS:


 Linux (Beowulf)
 Microsoft NT (Illinois HPVM)
 SUN Solaris (Berkeley NOW)
 IBM AIX (IBM SP2)
 HP UX (Illinois - PANDA)
 Mach (Microkernel based OS) (CMU)
 Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project)
 OS gluing layers: (Berkeley Glunix)
Cluster Components…3
Programming environments
 Threads (PCs, SMPs, NOW..)
 POSIX Threads
 Java Threads
 MPI
 Linux, NT, on many Supercomputers
 PVM
 Software DSMs (Shmem)
Cluster Components…4
Applications
Sequential
Parallel / Distributed (Cluster-aware app.)
Grand Challenging applications
Weather Forecasting
Quantum Chemistry
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
……………….
PDBs, web servers,data-mining
Clusters Classification..1
Based on Focus (in Market)
High Performance (HP) Clusters
Grand Challenging Applications
Con’t
High Availability (HA) Clusters
Mission Critical applications
Clusters Classification..3
Based on Node Architecture..
Clusters of PCs (CoPs)
Clusters of Workstations (COWs)
Clusters Classification..4
Based on Node OS Type..
Linux Clusters (Beowulf)
Solaris Clusters (Berkeley NOW)
NT Clusters (HPVM)
AIX Clusters (IBM SP2)
SCO/Compaq Clusters (Unixware)
…….Digital VMS Clusters, HP clusters,
………………..
Clusters Classification..5
Based on node components architecture &
configuration (Processor Arch, Node Type:
PC/Workstation.. & OS: Linux/NT..):
Homogeneous Clusters
 All nodes will have similar configuration

Heterogeneous Clusters
Nodes based on different processors and
running different OSes.
Clusters Classification..6
Levels of Clustering
Group Clusters (#nodes: 2-99)
 (a set of dedicated/non-dedicated computers - mainly connected
by SAN like Myrinet)
 Departmental Clusters (#nodes: 99-999)
 Organizational Clusters (#nodes: many 100s)
 Internet-wide Clusters=Global Clusters: (#nodes: 1000s to
many millions)
 Metacomputing
 Web-based Computing
 Agent Based Computing
 Java plays a major in web and agent based computing
SELESAI
Cloud computing
http://www.navinot.com/2008/11/12/apakah-cloud-computing-itu/
Apakah Cloud Computing Itu?
Sebelum sampai ke cloud computing, kita harus mulai dulu dari
distributed computing. Seperti yang tercermin dari namanya,
distributed computing berarti komputasi yang terdistribusi. Proses
komputasi tidak terjadi dalam satu komputer saja akan tetapi
didistribusikan ke beberapa komputer. Analoginya seperti kerja
kelompok membuat kliping, semua anggota kelompok mencari bahan-
bahan berdasarkan pembagian tugas kemudian bahan tersebut akhirnya
dikumpulkan menjadi satu berbentuk kliping sebagai bentuk karya
kelompok. Analogi lain bisa berupa kelompok-kelompok kerja lain
semisal kantor, pabrik, dll. Intinya proses tersebar dalam kelompok,
namun menghasilkan satu output. Ya, distributed computing adalah
salah satu contoh parallel processing (pemrosesan paralel).
Grid computing adalah salah satu bentuk dari distributed computing. Jika
distributed computing  memandang sebuah proses komputasi berdasar
bagaimana proses tersebut diselesaikan, grid komputer memandang sisi
infrastruktur dari penyelesaian suatu proses. Grid computing adalah suatu
bentuk cluster (gabungan) komputer-komputer yang cenderung tak terikat
batasan geografi. Di sisi lain, cluster selalu diimplementasikan dalam satu
tempat dengan menggabungkan banyak komputer lewat jaringan. Contoh
grid computing misalnya: SETI@Home. Proyek SETI@Home bertujuan
mencari kecerdasan ekstra terestrial (ET) dengan memanfaatkan resource
komputer anggotanya yang tersebar di seluruh penjuru dunia. Anda
tinggal menjalankan sebuah program kecil saja. Program ini kemudian
akan mendownload data dari proyek SETI@Home dan memprosesnya di
komputer Anda. Hasilnya akan dikirimkan kembali ke SETI@Home
sebagai bagian satu perhitungan besar.
Cloud Computing berbatas tipis dengan grid computing. Cloud computing memandang
penyelesaian suatu proses dari sisi pemakaiannya. Dalam cloud computing, berarti si pemakai sama
sekali tidak memiliki resource yang dipakai untuk memproses permintaannya. Data yang disedikan
pemakai layanan akan diproses dalam suatu jaringan besar yang self-regulating (bisa mengatur
dirinya sendiri). Pemakai hanya tahu hasil akhirnya saja tanpa tahu detil siapa yang memproses
permintaannya, dimana diprosesnya dan dimana datanya tersimpan. Semua detil tersebut tertutup
awan. Contoh cloud computing misalnya: Amazon EC2, SalesForce.com, Google App Engine,
Yahoo! BOSS dan lain-lain.
Dengan semakin hebatnya pengaruh internet dalam kehidupan kita sehari-hari, cloud computing
sepertinya akan semakin menarik saja. Apalagi sekarang device-device yang bisa dipakai
mengakses internet semakin tersedia di mana-mana dengan bentuk yang makin portable. Yang hari
ini masih di film, beberapa tahun lagi bisa dinikmati di dunia nyata. Ugh! Tidakkah Anda excited?
PS:
Distributed computing tidak terbatas pada aktivitas perhitungan saja. Penyimpanan (storage) juga
termasuk dalam distributed computing.
Apakah tulisan ini sudah bisa menjelaskan cloud computing pada Anda? Perlukan Navinot
menuliskan hal-hal lainnya tentang cloud computing? Apa yang ingin Anda baca setelah artikel ini?

http://www.gogrid.com/-- link yang menjual komputer grid


Managing Your Cloud Infrastructure
Easily Build and Manage Complex Cloud Infrastucture with GoGrid
GoGrid makes it easy to manage your cloud infastructure with a variety of robust and easy-to-use
tools, giving you the ability to monitor, administer, and scale your infastructure components in real-
time.
GoGrid Portal
The GoGrid portal is an easy to use graphical user interface that allows you to provision or order
cloud (virtual) and dedicated (physical) servers, F5 hardware load balancing, Cloud Storage, CDN,
and custom server images in real time. With the portal you can understand your costs, submit and
review support cases, check invoices, review usage, order additional IP addresses, and manage
system users with roll-based access controls. Most of the portal capabilities are programmatically
exposed via the GoGrid API, which allows you to tightly integrate with the GoGrid Cloud.
GoGrid API
The GoGrid API is a REST-like Query interface. GoGrid method calls are made over the internet by
sending HTTPS GET or POST requests to the GoGrid API REST Server. Nearly any computer
language can be used to communicate over HTTPS with the REST server. Browse our Getting started
guide for more documentation regarding making API calls from different languages such as Java ,
PHP , Python , Perl , Ruby , C# , or even shell scripting languages such as bash.
Visit the GoGrid Wiki to learn more about the GoGrid API.
Komputer terdistribusi
http://id.wikipedia.org/wiki/Komputasi_Terdistribusi
Dalam ilmu komputer, komputasi terdistribusi mempelajari penggunaan terkoordinasi dari
komputer yang secara fisik terpisah atau terdistribusi. Sistem terdistribusi membutuhkan
perangkat lunak yang berbeda dengan sistem terpusat.
Tujuan
Tujuan dari komputasi terdistribusi adalah menyatukan kemampuan dari sumber daya
(sumber komputasi atau sumber informasi) yang terpisah secara fisik, ke dalam suatu sistem
gabungan yang terkoordinasi dengan kapasitas yang jauh melebihi dari kapasitas individual
komponen-komponennya.
Tujuan lain yang ingin dicapai dalam komputasi terdistribusi adalah transparansi. Kenyataan
bahwa sumber daya yang dipakai oleh pengguna sistem terdistribusi berada pada lokasi fisik
yang terpisah, tidak perlu diketahui oleh pengguna tersebut. Transparansi ini memungkinkan
pengguna sistem terdistribusi untuk melihat sumber daya yang terpisah tersebut seolah-olah
sebagai satu sistem komputer tunggal, seperti yang biasa digunakannya.
Salah satu masalah yang dihadapi dalam usaha menyatukan sumber daya yang terpisah ini
antara lain adalah skalabilitas, dapat atau tidaknya sistem tersebut dikembangkan lebih jauh
untuk mencakup sumber daya komputasi yang lebih banyak.
Arsitektur
Banyak arsitektur perangkat lunak dan keras yang bervariasi yang digunakan untuk komputasi
terdistribusi. Pada tingkat yang lebih rendah, penghubungan beberapa CPU dengan menggunakan
jaringan sangat dibutuhkan. Pada tingkat yang lebih tinggi menghubungkan proses yang berjalan
dalam CPU tersebut dengan sistem komunikasi juga dibutuhkan.
Arsitektur umum yang memungkinkan sistem terdistribusi antara lain:
klien-server: klien menghubungi server untuk pengambilan data, kemudian server memformatnya
dan menampilkannya ke pengguna.
arsitektur 3-tier: Kebanyakan aplikasi web adalah 3-Tier.
arsitektur N-tier: N-Tier biasanya menunjuk ke aplikasi web yang menyalurkan lagi permintaan
kepada pelayanan enterprise. Aplikasi jenis ini paling berjasa bagi kesuksesan server aplikasi.
Tightly coupled: biasanya menunjuk kepada satu set mesin yang sangat bersatu yang menjalankan
proses yang sama secara paralel, membagi tugas dalam bagian-bagian, dan kemudian
mengumpulkan kembali dan menyatukannya sebagai hasil akhir.
Peer-to-peer: sebuah arsitektur di mana tidak terdapat mesin khusus yang melayani suatu
pelayanan tertentu atau mengatur sumber daya dalam jaringan. Dan semua kewajiban dibagi rata
ke seluruh mesin, yang dikenal sebagai peer.
Service oriented di mana sistem diatur sebagai satu set pelayanan yang dapat diberikan melalui
antar-muka standar.
Mobile code: berdasarkan prinsip arsitektur mendekatkan pemrosesan ke sumber data
Replicated repository: Di mana repository dibuat replikanya dan disebarkan ke dalam sistem
untuk membantu pemrosesan online/offline dengan syarat keterlambatan pembaharuan data dapat
diterima.
Infrastruktur komputasi terdistribusi
Moab Grid Suite — Cluster workload management, reporting tools, and end user
submission portal
Remote procedure call — This high-level communication mechanism allows
processes on different machines to communicate using procedure calls even though
they don't share the same address space.
Distributed objects — Systems like CORBA, Microsoft D/COM, Java RMI,
ReplicaNet [1]
SOAP
XML-RPC
GLOBE
Acute [2] — Distributed functional programming with migration based on OCaml.
PYRO — Python Remote Objects
BOINC — Berkeley Open Infrastructure for Network Computing
GLOBUS — Home of the Globus Toolkit
Pengenalan Teknologi Hadoop
Hadoop ?
Hadoop is a software platform that lets
one easily write and run applications
that process vast amounts of data
Pengembangan Java
• Free Software
• ribuan node
• petabyte tingkat jumlah data
• pendiri Doug Cutting
• The Apache Software Foundation tingkat atas proyek
Distribusted File System
File System Overview
 Distributed File Systems: Issues
Distributed File Systems: Case Studies
Distributed File Systems for Clouds
Outline

Files and Directories


Implementation Issues
 Example File Systems
HDFS (Hadoop Distribusi File System)
Mengapa dibutuhkan File System ?
Data harus tetap ada meskipun setelah selesai di proses
Dan harus memungkinkan untu menyimpan data
dalam jumlah besar.
Data harus dapat di akses/di proses secara banyak dan
serentak

Solusinya adalah menyimpan data tersebut dalam


tempat yang biasa disebut “disk file” ataupun media
lainnya.
File : sebuah ringkasan
Sebuah data potensial, yang selalu berkembang dalam waktu yang lama akan
dapat menyebabkan :
 Kadang lebih besar dari memory komputer
 Kadang membutuhkan waktu yang lebih lama untuk di olah
 Terkadang memiliki masa hidup yang lebih lama daripada waktu hidup
mesin itu sendri

Biasa nya di susun sebagai byte array linear atau blok-blok


 Biasanya blok-blok menjadi sangat panjang

Sering mengalami multiple prosess


 Bahkan olah beberapa mesin yang berbeda
File System
File di atur oleh Sistem Operasi
Bagian dari file system yang menangani file-file
disebut sebagai file system
File merupakan gabungan dari blok-blok disk
File system memetakan nama file ke blok-blok disk

bagaimana file di atur, di gunakan,di proteksi merupakan


concern
utama dari file system.
Although parallel processing has been used for many years in many systems, it is still
somewhat unfamiliar to most computer users. Thus, before discussing the various
alternatives, it is important to become familiar with a few commonly used terms.
SIMD:SIMD (Single Instruction stream, Multiple Data stream) refers to a parallel
execution model in which all processors execute the same operation at the same time, but
each processor is allowed to operate upon its own data. This model naturally fits the
concept of performing the same operation on every element of an array, and is thus often
associated with vector or array manipulation. Because all operations are inherently
synchronized, interactions among SIMD processors tend to be easily and efficiently
implemented.
MIMD:MIMD (Multiple Instruction stream, Multiple Data stream) refers to a parallel
execution model in which each processor is essentially acting independently. This model
most naturally fits the concept of decomposing a program for parallel execution on a
functional basis; for example, one processor might update a database file while another
processor generates a graphic display of the new entry. This is a more flexible model than
SIMD execution, but it is achieved at the risk of debugging nightmares called race
conditions, in which a program may intermittently fail due to timing variations reordering
the operations of one processor relative to those of another.
SPMD:SPMD (Single Program, Multiple Data) is a restricted
version of MIMD in which all processors are running the same
program. Unlike SIMD, each processor executing SPMD code may
take a different control flow path through the program.
Communication Bandwidth:The bandwidth of a communication
system is the maximum amount of data that can be transmitted in a
unit of time... once data transmission has begun. Bandwidth for serial
connections is often measured in baud or bits/second (b/s), which
generally correspond to 1/10 to 1/8 that many Bytes/second (B/s).
For example, a 1,200 baud modem transfers about 120 B/s, whereas
a 155 Mb/s ATM network connection is nearly 130,000 times faster,
transferring about about 17 MB/s. High bandwidth allows large
blocks of data to be transferred efficiently between processors.
Communication Latency:The latency of a communication system is the minimum time
taken to transmit one object, including any send and receive software overhead. Latency is
very important in parallel processing because it determines the minimum useful grain size,
the minimum run time for a segment of code to yield speed-up through parallel execution.
Basically, if a segment of code runs for less time than it takes to transmit its result value
(i.e., latency), executing that code segment serially on the processor that needed the result
value would be faster than parallel execution; serial execution would avoid the
communication overhead.
Message Passing:Message passing is a model for interactions between processors within a
parallel system. In general, a message is constructed by software on one processor and is
sent through an interconnection network to another processor, which then must accept and
act upon the message contents. Although the overhead in handling each message (latency)
may be high, there are typically few restrictions on how much information each message
may contain. Thus, message passing can yield high bandwidth making it a very effective
way to transmit a large block of data from one processor to another. However, to minimize
the need for expensive message passing operations, data structures within a parallel
program must be spread across the processors so that most data referenced by each
processor is in its local memory... this task is known as data layout.
Shared Memory:Shared memory is a model for interactions between processors within a parallel
system. Systems like the multi-processor Pentium machines running Linux physically share a single
memory among their processors, so that a value written to shared memory by one processor can be
directly accessed by any processor. Alternatively, logically shared memory can be implemented for
systems in which each processor has it own memory by converting each non-local memory reference into
an appropriate inter-processor communication. Either implementation of shared memory is generally
considered easier to use than message passing. Physically shared memory can have both high bandwidth
and low latency, but only when multiple processors do not try to access the bus simultaneously; thus, data
layout still can seriously impact performance, and cache effects, etc., can make it difficult to determine
what the best layout is.
Aggregate Functions:In both the message passing and shared memory models, a communication is
initiated by a single processor; in contrast, aggregate function communication is an inherently parallel
communication model in which an entire group of processors act together. The simplest such action is a
barrier synchronization, in which each individual processor waits until every processor in the group has
arrived at the barrier. By having each processor output a datum as a side-effect of reaching a barrier, it is
possible to have the communication hardware return a value to each processor which is an arbitrary
function of the values collected from all processors. For example, the return value might be the answer to
the question "did any processor find a solution?" or it might be the sum of one value from each processor.
Latency can be very low, but bandwidth per processor also tends to be low. Traditionally, this model is
used primarily to control parallel execution rather than to distribute data values.
Collective Communication:This is another name for
aggregate functions, most often used when referring to
aggregate functions that are constructed using multiple
message-passing operations.
SMP:SMP (Symmetric Multi-Processor) refers to the
operating system concept of a group of processors working
together as peers, so that any piece of work could be done
equally well by any processor. Typically, SMP implies the
combination of MIMD and shared memory. In the IA32
world, SMP generally means compliant with MPS (the Intel
MultiProcessor Specification); in the future, it may mean
"Slot 2"....
SWAR:SWAR (SIMD Within A Register) is a generic term for the concept of partitioning a register
into multiple integer fields and using register-width operations to perform SIMD-parallel
computations across those fields. Given a machine with k-bit registers, data paths, and function
units, it has long been known that ordinary register operations can function as SIMD parallel
operations on as many as n, k/n-bit, field values. Although this type of parallelism can be
implemented using ordinary integer registers and instructions, many high-end microprocessors have
recently added specialized instructions to enhance the performance of this technique for
multimedia-oriented tasks. In addition to the Intel/AMD/Cyrix MMX (MultiMedia eXtensions),
there are: Digital Alpha MAX (MultimediA eXtensions), Hewlett-Packard PA-RISC MAX
(Multimedia Acceleration eXtensions), MIPS MDMX (Digital Media eXtension, pronounced "Mad
Max"), and Sun SPARC V9 VIS (Visual Instruction Set). Aside from the three vendors who have
agreed on MMX, all of these instruction set extensions are roughly comparable, but mutually
incompatible.
Attached Processors:Attached processors are essentially special-purpose computers that are
connected to a host system to accelerate specific types of computation. For example, many video
and audio cards for PCs contain attached processors designed, respectively, to accelerate common
graphics operations and audio DSP (Digital Signal Processing). There is also a wide range of
attached array processors, so called because they are designed to accelerate arithmetic operations
on arrays. In fact, many commercial supercomputers are really attached processors with workstation
hosts.
RAID:RAID (Redundant Array of Inexpensive Disks) is a simple technology for increasing both the
bandwidth and reliability of disk I/O. Although there are many different variations, all have two key
concepts in common. First, each data block is striped across a group of n+k disk drives such that each
drive only has to read or write 1/n of the data... yielding n times the bandwidth of one drive. Second,
redundant data is written so that data can be recovered if a disk drive fails; this is important because
otherwise if any one of the n+k drives were to fail, the entire file system could be lost. A good overview
of RAID in general is given at http://www.dpt.com/uraiddoc.html, and information about RAID options
for Linux systems is at http://linas.org/linux/raid.html. Aside from specialized RAID hardware support,
Linux also supports software RAID 0, 1, 4, and 5 across multiple disks hosted by a single Linux system;
see the Software RAID mini-HOWTO and the Multi-Disk System Tuning mini-HOWTO for details.
RAID across disk drives on multiple machines in a cluster is not directly supported.
IA32:IA32 (Intel Architecture, 32-bit) really has nothing to do with parallel processing, but rather refers
to the class of processors whose instruction sets are generally compatible with that of the Intel 386.
Basically, any Intel x86 processor after the 286 is compatible with the 32-bit flat memory model that
characterizes IA32. AMD and Cyrix also make a multitude of IA32-compatible processors. Because
Linux evolved primarily on IA32 processors and that is where the commodity market is centered, it is
convenient to use IA32 to distinguish any of these processors from the PowerPC, Alpha, PA-RISC,
MIPS, SPARC, etc. The upcoming IA64 (64-bit with EPIC, Explicitly Parallel Instruction Computing)
will certainly complicate matters, but Merced, the first IA64 processor, is not scheduled for production
until 1999.
COTS:Since the demise of many parallel supercomputer companies,
COTS (Commercial Off-The-Shelf) is commonly discussed as a
requirement for parallel computing systems. Being fanatically pure, the
only COTS parallel processing techniques using PCs are things like
SMP Windows NT servers and various MMX Windows applications; it
really doesn't pay to be that fanatical. The underlying concept of COTS
is really minimization of development time and cost. Thus, a more
useful, more common, meaning of COTS is that at least most
subsystems benefit from commodity marketing, but other technologies
are used where they are effective. Most often, COTS parallel
processing refers to a cluster in which the nodes are commodity PCs,
but the network interface and software are somewhat customized...
typically running Linux and applications codes that are freely available
(e.g., copyleft or public domain), but not literally COTS

You might also like