Professional Documents
Culture Documents
Baseline System
32 Racks
32 Node Cards
Supercomputing ~ HPC
Node Card
500TF/s
64 TB
14 TF/s
2 TB
Compute Card
1 chip, 1x1x1
Chip
435 GF/s
64 GB
4 processors
Sun Constellation
Ranger #9
Red Sky #10
Cloud Computing and Grid Computing 360-Degree Compared
NCAR
Tennessee
SDSC
PSC
PU
IU
2008
(~1PF)
ORNL
LONI/LSU
2007
(504TF)
TACC
Grids ~ Federation
Computational Resources
(size approximate
- not to scale)
Cloud Computing and Grid Computing 360-Degree
Compared
TeraGrid (TG)
200K-cores across 11 institutions and 22 systems
over the US
1.8
Search Volume Index
1.6
1.4
1.2
cloud computing
computer science
hpc
university of chicago
northwestern university
1
0.8
0.6
0.4
0.2
0
economies of scale
virtualization
dynamically-scalable resources
delivered on demand over the Internet
Clouds ~ hosting
Cloud Computing and Grid Computing 360-Degree Compared
Chicago
NYC
NERSC
Sunnyval
e
ALCF
OLCF
Nashville
100
gigabit/
sec
10
Industry
Academia/Government
Magellan
FutureGrid
Opensource middleware
Nimbus
Eucalyptus
OpenNebulaCloud Computing and Grid Computing 360-Degree Compared
11
12
13
14
Business model
Architecture
Resource management
Programming model
Application model
Security model
15
Grids:
Largest Grids funded by government
Largest user-base in academia and government labs to drive
scientific computing
Project-oriented: service units
Clouds:
Industry (i.e. Amazon) funded the initial Clouds
Large user base in common people, small businesses, large
businesses, and a bit of openn science research
Utility computing: real money
Cloud Computing and Grid Computing 360-Degree Compared
16
17
Animoto
Makes it really easy for people to create videos
with their own photos and music
18
Grids:
Clouds:
Application Layer: Software as a Service (SaaS)
Platform Layer: Platform as a Service (PaaS)
Unified Resource: Infrastructure as a Service (IaaS)
Fabric: IaaS
Cloud Computing and Grid Computing 360-Degree Compared
Cloud Architecture
Application
Platform
Unified Resource
Fabric
19
Compute Model
batch-scheduled vs. time-shared
Data Model
Data Locality
Combining compute and data management
Virtualization
Slow adoption vs. central component
Monitoring
Provenance
Cloud Computing and Grid Computing 360-Degree Compared
20
Grids:
Tightly coupled
High Performance Computing (MPI-based)
Loosely Coupled
High Throughput Computing
Workflows
Data Intensive
Map/Reduce
Clouds:
Loosely Coupled, transactional oriented
Cloud Computing and Grid Computing 360-Degree Compared
21
Multicore processors
Massive task parallelism
Massive data parallelism
Integrating black box applications
Complex task dependencies (task graphs)
Failure, and other execution management issues
Dynamic task graphs
Documenting provenance of data products
Data management: input, intermediate, output
Dynamic data access involving large amounts of
data
Cloud Computing and Grid Computing 360-Degree Compared
22
Clouds
Standard interface to Clouds
23
24
Grids
Grid Security Infrastructure (GSI)
Stronger, but steeper learning curve and wait time
Personal verification: phone, manager, etc
Clouds
Weaker, can use credit card to gain access, can
reset password over plain text email, etc
25
26
27
Distributed Systems
Cluster Computing
Grid Computing
Supercomputing
Cloud Computing
Manycore Computing
Petascale and Exascale Computing
Data-Intensive Computing
Cloud Computing and Grid Computing 360-Degree Compared
28
100
90
250
Today (2010): Multicore Computing 80
1~12 cores commodity architectures 70
200
60
80 cores proprietary architectures
150
50
480 GPU cores
40
100 Near future (~2018): Manycore Computing
30
Number
of Cores
~1000
cores
commodity architectures 20
50
Processing
10
0
0
2004 2006 2008 2010 2012 2014 2016 2018
Manufacturing Process
Number of Cores
300
29
250,000
Top500 Core-Max
Top500 Core-Sum
5,000,000
4,000,000
200,000
3,000,000
300,000
150,000
5K~50K nodes
1,000,000
50,000
50K~200K processor-cores
100,000
0
0 Near future (~2018):
Exascale Computing
http://www.top500.org/
~1M
nodes (20X~200X)
~1B processor-cores/threads (5000X~20000X)
30
Amazon in 2018
Will likely look similar to exascale computing
100K~1M nodes, ~1B-cores, ~1M disks
$100M~$200M costs + $10M~$20M/year in energy
Revenues 100X~1000X of what they are today
Avoiding Achilles Heel in Exascale Computing with Distributed File Systems
31
Power efficiency
Will limit the number of cores on a chip (Manycore)
Will limit the number of nodes in cluster (Exascale and
Cloud)
Will dictate a significant part of the cost of ownership
Programming models/languages
Automatic parallelization
Threads, MPI, workflow systems, etc
Functional, imperative
Languages vs. Middlewares
Avoiding Achilles Heel in Exascale Computing with Distributed File Systems
32
Reliability
How to keep systems operational in face of failures
Checkpointing (Exascale)
Node-level replication enabled by virtualization
(Exascale and Clouds)
Hardware redundancy and hardware error correction
(Manycore)
Avoiding Achilles Heel in Exascale Computing with Distributed File Systems
33
Decentralization is critical
Computational resource management (e.g. LRMs)
Storage systems (e.g. parallel file systems)
34
Network
Fabric
35
Critical issues:
Must mimic parallel file systems interfaces and features in order to get
wide adoption
Must handle some workloads currently run on parallel file systems
significantly better
36
37
38
39
More information:
http://www.cs.iit.edu/~iraicu/
iraicu@cs.iit.edu
Questions?
40