Professional Documents
Culture Documents
CS-524(NED) Lec 01
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Todays Agenda
Getting to know each other
Describing our roles to make this course a
real success
Overview of the Course
CS-524(NED) Lec 01
My Role
Continuously strive to expose you to the
subject knowledge in a manner that helps
save your time in getting hold of details
CS-524(NED) Lec 01
Your Role
Continuously strive to be regular in every aspect
schedule some time for review of lectures
before
coming to the class
Take sessional work seriously
Ask questions. There are NO stupid questions
Learning-centered approach
You learn as well as earn good grade
Grading-centered approach
You may get good grade but you never learn
CS-524(NED) Lec 01
Academic Calendar
9 weeks Teaching
22nd June, 2009 to 22nd August, 2009
7 weeks Teaching
26th September, 2009 to 14th November, 2009
Final Examinations
1st December, 2009 to 15th December, 2009
Results Declaration
Last week of December, 2009
CS-524(NED) Lec 01
Books
CS-524(NED) Lec 01
Topics
Introduction
Communication
Processes
Naming
Synchronization
Consistency and Replication
Fault Tolerance
Security
* We shall add topics to this list if time permits
CS-524(NED) Lec 01
Course Objectives
CS-524(NED) Lec 01
Grading
Quizzes
05%
3 announced quizzes
weeks 3, 6 and 12
2 surprise quizzes
2 announced and 1 surprise quiz will be graded
Homework
05%
Class Participation
05%
Term Paper
05%
Mid-Term (09th Week)
10%
Final
70%
No early or makeup exams please!
CS-524(NED) Lec 01
10
CS-524(NED) Lec 01
11
CS-524(NED) Lec 01
12
CS-524(NED) Lec 01
13
CS-524(NED) Lec 01
14
Definition # 1
A collection of independent computers that
act as an integrated system and hence
appear to the end user as a single
computer (i.e. a virtual uniprocessor)
Two aspects
Hardware: autonomous machines
Software: users think theyre dealing with a
single system
CS-524(NED) Lec 01
15
Definition # 1
Users view of a Distributed System:
Multiple computers that work together in a more or
less seamless fashion (single system image)
CS-524(NED) Lec 01
16
Definition # 1
CS-524(NED) Lec 01
17
CS-524(NED) Lec 01
18
RPC
Remote Procedure Call
RMI
Remote Method Invocation
CS-524(NED) Lec 01
19
ONC RPC
Open
Network
Computing
Remote
Procedure Call, is a widely deployed
remote procedure call system.
ONC was originally developed by Sun
Microsystems as part of their Network File
System project, and is sometimes referred
to as Sun ONC or Sun RPC
CS-524(NED) Lec 01
20
Definition # 2
Enslow:
A distributed system is the one, wherein
hardware, control and data achieve some
degree of decentralization and resources
distribution is transparent to the user
CS-524(NED) Lec 01
21
Definition # 2
CS-524(NED) Lec 01
22
Definition # 2
CS-524(NED) Lec 01
23
Definition # 3
An Intimidating Definition
A distributed system is one in which failure of
a computer you even didnt know existed can
render your own computer unusable
(Leslie Lamport)
CS-524(NED) Lec 01
24
Internet
Mobile and Ubiquitous Computing
P2P Systems
Sensor Networks
Distributed Mobile Robots
Air Traffic Control (ATC) System
Banking, Stock Markets, Stock Brokerages
Heath Care, Hospital Automation
Control of Power Plants, Electric Grid
Telecommunications Infrastructure
CS-524(NED) Lec 01
25
CS-524(NED) Lec 01
26
CS-524(NED) Lec 01
27
Motivation (1)
CS-524(NED) Lec 01
28
Motivation (2)
Improved PCR
The parallelism of distributed systems reduces
processing bottlenecks and provides improved allaround performance, at much lower cost.
Resource Sharing
Distributed systems can efficiently support information
and resource (hardware and software) sharing for
users at different locations.
CS-524(NED) Lec 01
29
Motivation (3)
Fault Tolerance
With the multiplicity of storage units and processing
elements, distributed systems have the potential ability to
continue operation in the presence of failures in the
system.
Scalability
Distributed systems are capable of incremental growth and
have the added advantage of facilitating modification or
extension of a system to adapt to a changing environment
without disrupting its operations.
Think of upgrading a mainframe or super computer!
June 24, 2009
CS-524(NED) Lec 01
30
Motivation (4)
Distribution as an Artifact
Distribution may be an artifact of an engineering solution to
satisfy some specific requirements such as
Fault-tolerance
Load-balancing
Minimum level of Quality of Service (QoS)
Functional Distribution
Computers have different functional capabilities
Client / server
Host / terminal
Data gathering / data processing
CS-524(NED) Lec 01
31
Driving Forces
There are two main stimuli for the current
interest in distributed systems:
Technological Enhancement
microelectronics
fast and inexpensive processors
communication
highly efficient computer networks
User Needs
many enterprises are cooperative in nature
CS-524(NED) Lec 01
32
CS-524(NED) Lec 01
33
CS-524(NED) Lec 01
34
Cluster Computing
A collection of similar processors (PCs, workstations)
running the same (commodity) operating system,
connected by a high-speed network.
Runs parallel programs
Popular because they offer parallel computing
capabilities using inexpensive PC hardware; an
organization may be able to capitalize on machines it
already has.
Microsoft, Sun, and others sell clustering software and
you can also buy turnkey systems
CS-524(NED) Lec 01
35
Cluster Computing
CS-524(NED) Lec 01
36
CS-524(NED) Lec 01
37
rather
than
CS-524(NED) Lec 01
38
CS-524(NED) Lec 01
39
CS-524(NED) Lec 01
40
Fabric Layer
interfaces to local resources
Connectivity Layer
protocols to support usage of
multiple resources for a single
application; e.g., access a
remote resource or transfer
data between sites
Resource Layer
manages a single resource
CS-524(NED) Lec 01
41
Collective Layer
services for resource discovery,
resource allocation, resource
scheduling, etc.
Interacts with the connectivity
and resource layers
Application layer
applications within a virtual
organization (V.O.) which share
the grid computing resources.
CS-524(NED) Lec 01
42
CS-524(NED) Lec 01
43
CS-524(NED) Lec 01
44
CS-524(NED) Lec 01
45
CS-524(NED) Lec 01
46
CS-524(NED) Lec 01
47
CS-524(NED) Lec 01
48
CS-524(NED) Lec 01
49
Monitoring a person in a pervasive electronic health care system, using (a) a local
hub or (b) a continuous wireless connection.
CS-524(NED) Lec 01
50
Sensor Networks
CS-524(NED) Lec 01
51
Sensor Networks
Organizing a sensor network database, while storing and processing data only at the
sensors.
June 24, 2009
CS-524(NED) Lec 01
52
CS-524(NED) Lec 01
53
Resource Accessibility
Security
Concurrency
Heterogeneity
Transparency
Openness
Scalability
Reliability
Lack of Global Clock and Global State
CS-524(NED) Lec 01
54
Resource Accessibility
Support user access to remote resources (printers, data
files, web pages, CPU cycles) and the fair sharing of the
resources
making convenient to share resources
CS-524(NED) Lec 01
55
Security
Sharing, as always, introduces security issues
Confidentiality
avoiding the disclosure of the content of a message to a party
distinct from the intended receiver
Integrity
avoiding the corruption of the transmitted contents by a third
party
Availability
the capability of providing a service in all circumstances
CS-524(NED) Lec 01
56
Concurrency
Resources can be shared by clients in a
distributed system, therefore several clients may
access a shared resource at the same time
Not acceptable that each request be processed
in turn, must be able to process requests
concurrently
For each object that represents a shared
resource, its operations must be synchronized in
such a way that its data remains consistent
June 24, 2009
CS-524(NED) Lec 01
57
Heterogeneity - I
Networksdifferences are masked by the fact that all of the computers use the Internet
protocols to communicate.
Operating Systemsdo not provide the same application API to the Internet protocols.
Middleware
Software layer that abstracts from the above providing a uniform computational model
All middleware deals with the differences in operating systems and hardware.
CS-524(NED) Lec 01
58
Heterogeneity - II
Mobile Code
A code that can be sent from one computer to another and runs
at the destination (e.g. Java applets).
Machine code suitable for running on one type of computer
hardware is not suitable for running on another.
CS-524(NED) Lec 01
59
Transparency
A distributed system that appears to its users &
applications to be a single computer system is
said to be transparent.
Users & applications should be able to access
remote resources in the same way they
access local resources.
Aims to conceal the component-based structure
of the system, and facilitate a perception of the
system as a whole
June 24, 2009
CS-524(NED) Lec 01
60
Access Transparency
Hides differences in data representation, different architectures and filename conventions of machines
Enables interoperability
Location Transparency
Hides location of resource i.e. the user can use the resource without
being aware of its location
The key is naming
E.g. URLs, email, etc.
(Access + Location) Transparency = Network Transparency
CS-524(NED) Lec 01
61
Migration Transparency
Hides from the user that the resource being used has moved to another
location
Relocation Transparency
Hides from the user that the resource being used is being moved
Enables mobile computing
Persistence Transparency
Hides whether a resource is in memory or on disk
CS-524(NED) Lec 01
62
Replication Transparency
Concurrency Transparency
Hides that multiple copies of the resource exist (for reliability and/or availability)
Failure Transparency
Scaling Transparency
Allows system and applications to expand without need to change structure or application
algorithms
Performance Transparency
Adaptation of the system to varying load situations without the user noticing it
CS-524(NED) Lec 01
63
Degrees of Transparency
Performance
e.g. multiple attempts to contact a remote server can slow down the
system should you report failure and let user cancel request?
Convenience
e.g. direct the print request to my local printer, not one on the next floor
CS-524(NED) Lec 01
64
Openness - I
Services should follow agreed-upon rules on component
syntax & semantics for interoperability and portability
Using interfaces, any process that needs a service
should be able to communicate with a process that
provides the service.
Multiple implementations of the same service may be
provided, as long as the interface is maintained
CS-524(NED) Lec 01
65
Openness - II
Interoperability
The ability of two different systems or applications to work together by relying on
each others services as specified by a common standard
Portability
The ability of an application designed to run on distributed system A to run on
distributed system B which implements the same interface, without modification
Extensibility
If a distributed system is open (implements standard interfaces) it should be
possible to add and delete components without affecting the system as a whole.
e.g., replace the file system
CS-524(NED) Lec 01
66
Scalability I
CS-524(NED) Lec 01
67
Scalability II
With respect to size
With respect to geographical distribution
With respect to the number of administrative
organizations it spans
Most systems account only, to a certain extent, for
size scalability.
Today, the challenge lies in geographical and
administrative scalability.
June 24, 2009
CS-524(NED) Lec 01
68
Size Scalability
The more users and resources a system has, the harder
it is to support a centralized model.
Scalability is affected when the system is based on
Centralized server
one for all users
Centralized data
a single database for all users
Centralized algorithms
e.g. for routing: one site collects all information,
processes it, distributes the results to all sites
June 24, 2009
CS-524(NED) Lec 01
69
Size Scalability
A single centralized server, running on a single machine,
can saturate if the workload becomes too heavy.
Communication links around the server can limit
performance, as well
Centralized
databases
data
storage
is
impractical
for
large
CS-524(NED) Lec 01
70
Size Scalability
Centralized algorithms rely on a central coordinator that
collects data from all sites in the network and then
makes decisions.
Complete knowledge
good
CS-524(NED) Lec 01
71
Size Scalability
Decentralized or Distributed Algorithms
No machine has complete information about the
system state
Machines make decisions based only on local
information
Failure of a single machine doesnt ruin the algorithm
There is no assumption that a global clock exists.
CS-524(NED) Lec 01
72
Geographic Scalability
Early distributed systems ran on LANs; relied on
synchronous communication
requesting client blocks until it gets a response,
makes it hard to scale
CS-524(NED) Lec 01
73
Administrative Scalability
Different domains may have different
policies
about
resource
usage,
management, security, etc.
Trust often stops at administrative
boundaries
CS-524(NED) Lec 01
74
Scaling Techniques
Scalability affects performance more than
anything else.
Three techniques to improve scalability:
Hiding Communication Latencies
Distribution
Replication
CS-524(NED) Lec 01
75
Scalability Amazon.com
Today Amazon has about 150 web services on its homepage alone.
CS-524(NED) Lec 01
76
CS-524(NED) Lec 01
77
CS-524(NED) Lec 01
78
Distribution
Instead of one centralized service, divide into
parts and distribute them geographically
Example: DNS namespace is organized as a
tree of domains; each domain is divided into
zones; names in each zone are handled by a
different name server
CS-524(NED) Lec 01
79
Distribution
CS-524(NED) Lec 01
80
Replication
Replication: multiple identical copies of
something
Replication
Increases availability
Improves performance through load balancing
May avoid latency by improving proximity of
resource
CS-524(NED) Lec 01
81
Replication - Caching
Caching is a form of replication
Normally creates a (temporary) replica of
something closer to the user
User decides to cache, system decides to
replicate
Replication is more permanent
Both lead to consistency problems
CS-524(NED) Lec 01
82
Replication - Caching
Having multiple copies (cached or replicated), leads to
inconsistencies:
modifying one copy makes that copy different from the rest.
CS-524(NED) Lec 01
83
Techniques
Failure Detection
message checksum
Failure Masking
making a detected failure hidden or less severe
email retransmission
Tolerating Failures
Web pages (informing users about failure)
Failure Recovery
CS-524(NED) Lec 01
84
CS-524(NED) Lec 01
85
CS-524(NED) Lec 01
86
CS-524(NED) Lec 01
87
Summary
Its difficult to design a good distributed system: there are a lot of problems
in getting good characteristics, not the least of which is people.
CS-524(NED) Lec 01
88