Artigo IEEE

Performance Analysis of Scheduling Techniques
in Distributed Parallel Environments

Giulianna Marega Marques, Ricardo José Sabatine, Kalinka R. L. J. Castelo Branco
LAS – Laboratório de Arquitetura e Sistemas Computacionais
UNIVEM – Centro Universitário “Eurípides Soares da Rocha” de Marília
Marília – São Paulo – Brasil – Caixa Postal 2041
{giuliannamm, sabatine, kalinka}@univem.edu.br
Abstract - The objective of this paper is to offer alternatives of II. PERFORMANCE AND LOAD INDEXES
process scheduling and to compare their performances. The
A load index is a metric that quantifies the workload
environments obtained were considered by applying parallel
processing of medical images using the comparison of their submitted to a system resource [9][15] and the objective is to
performance's time to measure them. indicate if the resource analyzed is idle, moderate, or
overloaded [3].
If the load is attributed to the resource, it can be observed
I. INTRODUCTION that the resource is idle when its workload is inexistent or has
a value which is too small, and then it is able to receive a load.
The process of obtaining load index is not common
considering the dynamic nature and does not determine the When the resource is considered moderate, the workload is
executed applications, therefore good load indexes can regular, and therefore the resource can still receive more
improve the global performance observed in the system. workloads. When a resource receives a workload that goes
Traffic bigger than that expected can be generated and as a over a determined limit, it is overloaded and therefore must
result hinders the system performance as a whole. It depends transfer and is not able to receive a load.
on the load frequency information collected for the calculation A load index can be defined as a numerical variable, entire
of the index. and not negative which has a value of zero when the resource
A distributed parallel computational system is usually is idle and, as the load of this resource increases, its value is
compound by processing elements both configurationally and added [9][15][3].
architecturally heterogeneous. This characteristic affects the The load index quality is related to the process scheduling
results of the load indexes and consequently the load performance as its objective is compound by the scheduling
scheduling, resulting either in the improvement or not of the algorithm. Therefore, the way the system load information is
platform’s final performance as a whole. collected and used and the periodicity of the collection
As there was no existing literature on the subject, Branco influences the efficiency of the load balancing.
[3] proposed a performance index that could provide In distributed parallel computational systems, it is of utmost
information on the work load and the operation situation of importance to be aware of the communication. If the
each system element involved in the process, considering frequency loads observed in the processing elements are not
different sorts of heterogeneity. coherent with the real necessity, it is likely that there will be
From this information, the importance of the process undesired results.
scheduling in the distributed parallel environments can be If the load is often updated, an overload occurs in the
observed, as well as the obstacles that can be found to achieve interconnection network and the general performance of the
a good performance. system is reduced; but if it is not often updated, the load
Thus, the objective of this article is to offer alternatives of balancing does not have the real information and will be
process scheduling aiming at load index collection and to carried out in a wrong way, hindering the performance of the
compare their performances. Therefore, it was necessary to system. Therefore, the purpose of the load index is to foresee
provide instruments to the message passage library so that its the future of the load behavior based on the current/past
process scheduling could have a basis for the load and behavior and to provide this information to the scheduler to
performance indexes to carry out load scheduling. The carry out load balancing [9][25][3].
environments achieved were considered by applying parallel Many varieties of load indexes can be mentioned, among
processing of medical images using the comparison of their them the CPU row length (instantaneous), CPU row middle
performance time to measure them. length in a determined time, utilization of the CPU, response
time, response time normalized, quantity of available memory,
rate of the context load, and others [9][25][3]. On the whole,
they can be distributed into groups based on the size of the
978-1-4244-1706-3/08/$25.00 ©2008 IEEE.

access row to the resource, in percentage of resource affects the use of the load indexes, the utilization of the agents
utilization, and in the execution/answer time [9] [25]. can improve significantly in the collection stage, as well as the
Most of the load indexes in the literature are for calculation of these indexes.
environments with homogeneous configuration and
architecture, and indexes for architectural and configurable
heterogeneous environments can not be found easily. III. MOBILE AGENTS
As there was no information found in the literature A mobile agent can be characterized as a program that acts in
concerning the load index for heterogeneous environments, an autonomous way, searching for an objective previously
Branco [3] proposed a performance index that can provide programmed. Taking this into account, it can utilize
information regarding the work load and the situation of the interactions with other agents or mobile environments, if
operation of each element of the system involved in the necessary [13]. In other words, it is free to transit among the
process, considering the configurational, architectural, and hosts of an inter-communication network carrying its code and
temporal indexes [4]. condition to the other environment, where execution can
A good performance index, such as the load index, must be restart [18].
able to estimate the future by means of actual values and past Investigations based on using mobile agents to implement
factors and then its base must be established in the load index various mechanisms in environments on a large scale can be
to achieve a good performance index. Another important found in the literature, such as resource management,
characteristic of the performance index is the time it has been discovery of resources and services [11], and scheduling of
used, therefore as work loads are volatile, consequently the functions [5].
load indexes and performance are also [3]. The motivation and interest for mobile agents apply to the
The types of applications usually considered in the benefits found in the distributed system development when it
scheduling of distributed parallel applications are [9] [15] is used. The benefits can be synthesized in the following ways
[25]: [2][17]: reduction in the network traffic; overcoming latency;
• CPU-Bound: applications with a high prosecution by autonomous and asynchrony execution; support for the mixed;
processing a small activity of in and out (I/O); robustness and tolerance to failure; interaction with systems in
• Memory-Bound: applications with a high prosecution by real time; instantaneous extension of services; paradigm of
memory; software development and updates.
• I/O-Bound: applications with a high prosecution by in and Most of the disadvantages of mobile agents listed in the
out. They are also called Disk-Bound applications, which literature are concerned with security and complexity. The
are defined as applications with a high prosecution by disc, problems regarding security can be of three types: showing
i.e., they need a lot of access to the disc, both for reading information, negation of service, and corruption of
and writing; information [14]. However, issues concerning security will not
• Network-Bound: applications with a high prosecution by be taken into consideration in this article as this is not the
communication between processes, which results in having objective of this paper.
large traffic in the network.
The performance index relies on the Euclidian distance

IV. RELATED WORK
between the point of origin (when the machine is idle) and the
result point, between the load values of the machine before In paper, [6] the proposed load balancing framework
receiving a certain application and the load vector already technique is described using a mobile agent to distribute loads
imposed on the application load. The machine which is more among nodes in a homogeneous cluster. Instead of balancing
suitable to receive the application is the one that achieves the the load in clusters by process migration, or by moving an
smaller Euclidian distance [3]. entire process to a less loaded computer, the authors made an
attempt to balance loads by splitting up processes into separate
Taking into account the relationship among the different
jobs and then balancing them to nodes by our novel load
resources that make up a machine, and considering that the
balancing method. Load balancing protocol was implemented
allocation of the processes is carried out in a balanced way, using AgletTM, a Java-compliant mobile agent platform from
the performance index can be achieved using equation (1): IBM. Although it cannot be claimed that a mobile agent
approach will always outperform messages passing based load
(1) balancing, such an alternative approach to designing load
balancing protocols has several advantages. Preliminary
experimental results demonstrated that the proposed
Therefore, one of the main factors of obtaining a good framework is effective in some points.
computational performance of the distributed platform is not The aim of the paper [7] was to apply the mobile agent
only the load index, but also how it is achieved, because if its technology to provide better scheduling to MPI applications
use does not have a good structure, it can degrade the platform executing a cluster configuration. According to the authors
performance as a whole, when it should improve it. this approach could enhance the load balancing of the parallel
Since the load collection and the period of this collection processes in a distributed cluster environment. MPI in clusters
of heterogeneous machines could lead parallel programmers to
obtain results which are frustrating, mainly because of the lack
of an even distribution of the workload in the cluster. As a
result, before submitting a MPI application to a cluster, a
JOTA mobile agent [7] approach to acquire more precise
information of the machine workload was used. Therefore,
with a more precise knowledge of the load and characteristics
in each machine, lightweight workstations to form a better
cluster were used. The empirical results indicate that less
elapsed time can be spent when considering the execution of a
parallel application using the agent approach in comparison to
an ordinary MPI environment.
V. IMPLEMENTATION
Figure 1: Distributed Parallel Environments using JPVM
The distributed systems must provide an improvement in PAgent
performance mainly concerning the execution time, utilization
of resources, communication in the inter-communication Parallel medical image processing requires quality and
network and mixed hardware and software. quantity of hardware resources, i.e., strong computational
Analyzing the advantages to use mobile agents in power. Datum loss, precision and short processing space time
distributed environments, it is believed that by using a is not allowed [1]. This has made medical image processing an
collection of loads and performance indexes in these interesting application in terms of evaluating the real gain
environments can provide a reduction in traffic in the network achieved when it is paralleled [18].
and, consequently, an improvement in the general The basic requirements of a parallel image processing
performance of the system in question. system consists of an infra-structure compound essentially of
To evaluate the performance of scheduling processes that adequate datum distribution and communication functions that
use load indexes collected by mobile agents in distributed can efficiently execute any image algorithms [1].
computational systems, the following is necessary: to change Edge techniques of softening and detention were chosen in
the way of scheduling carried out by the library of passage order to observe the cost/benefit relation of the application of
message JPVM (Java Parallel Virtual Machine) [8], as its distributed parallel computation technology when compared to
standard is the round-robin; to develop a mobile agent scheduling with and without help of mobile agents.
(PAgent – Performance Agent) using the Java programming The softening techniques are used to reduce noise and to
language in the μCode (muCode) environment [22], to remove small details of an image before the segmentation
develop the resource monitor (PRM – Performance Resource [12][20]. A common softening technique is the median filter
Monitor), which has been incorporated to the library, and to that consists of substituting the value of a pixel determined by
compare the scheduling achieved with the scheduling round- the median value of the neighborhood. The median value is
robin [19]. the central value achieved when the pixels of the
To initiate the modified JPVM daemon in each machine that neighborhood are in order.
composes the system, the PRM is started. These PRMs are The Edge detection is another example of algorithm that
responsible for collecting the raw load from the processor, uses operations based on the neighborhood. It is mainly used
from the memory, from the disc and from the network by the when one wants to know the size and the form of the objects
dstat command (Linux pack installed to evaluate the represented in the image [21].
performance of the machine resources). They are also The Edge detection process uses the gradient concept to
responsible for the performance and load index calculation of enhance the points that show a large difference with their
these resources helped by standardized benchmarks to deal neighborhood. For an image f(x,y) where x and y are the
with the environment heterogeneity, if it exists. spatial co-coordinating – line and column – the gradient of f in
When a machine is introduced into the distributed virtual the co-coordinating x and y can be defined according to
parallel environment (as shown in Figure 1), a message is equation 2 and its magnitude according to equation 3.
discharged, requesting that the servant software μCode must
be started in a determined gate connection. After introducing
all the machines, the PAgent is started by the master machine ª ∂f º
as it goes through the environment. Therefore, the PAgent is « »
responsible for collecting the performance and load indexes (2) ∇f = « ∂x »
calculated by the PRMs and for judging which machines are « ∂f »
able to share as load receivers in the load distribution process. «¬ ∂y »¼
1
the image sort and a larger neighborhood indicates a better
2 2 processing result.
ª§ ∂f · § ∂f ·
2 º
(3) ∇f = mag(∇f )«¨ ¸ + ¨¨ ¸¸ » 15 executions of each algorithm were carried out and the
«¬© ∂x ¹ © ∂y ¹ »¼ averages of execution time in the following scenarios were
achieved:
The gradient concept mentioned was introduced using Sobel (I) Homogeneous environment using JPVM;
operators [21], which calculate the approximate absolute value (II) Homogeneous environment using JPVM
of the gradient in each point of the image analyzed, showing instrumented (with process scheduling using mobile
that the areas with spatial frequency have a high value and agents);
correspond to the image edges [12]. (III) Heterogeneous environment using JPVM; and
The selection criterion of these algorithms is based on how (IV) Heterogeneous environment using JPVM
instrumented (with process scheduling using mobile
many resources are used. The median filter requires an
agents).
intensive memory use, due to the utilization of a sub-vector,
while the Sobel operator requests an intensive processor use, In Figure 2 and Figure 3, it can be observed that the
as successive multiplications are carried out. medium execution time of Sobel processors and median filter
The parallel of the algorithms is based on dividing images algorithms were close both in the homogeneous environment
in parts distributed by processors, processing at the same time using the standard JPVM and in the homogeneous
many parts of a same image. The master host divides and environment using the instrumented JPVM.
distributes the datum to the other hosts; each one processes its
received part and then sends it to the master that recomposes
the image. Hom ogeneous: JPVM - Standard JPVM - PAgent
50000
40000
VI. RESULTS
Exe c ut io n 30000
A parallel homogeneous and heterogeneous distributed T ime ( ms )
20000
environment has been achieved. The homogeneous 10000
environment consisted of seven homogeneous 4 2.7GHz IBM 0
Pentium machines with 512MB of memory. Moreover, the M 4P M 7P M 11P M 14P
pro c e ss e s
heterogeneous environment consisted of seven machines: a 4
2.66GHz Pentium with 1024MB of memory, three IBM 4
Figure 2: Comparison of average execution time of the median
2.7GHz Pentiums with 512MB of memory, a 4 1600 MHz filters in scenarios I and II.
Pentium with 256MB of memory, a 4 1600 MHz Pentium with
256MB of memory, and a 3 733MHz Pentium with 256MB of
Hom ogeneous: JPVM - Standard JPVM - PAgent
memory. All the machines were interlinked by a network
Ethernet of 100 Mbps, and used Linux Kernel 2.6. 20000
To analyze the performance of the environments, the
15000
execution times of the algorithms of parallel medical image E xe c ut io n
10000
processing were compared. First of all, a message passage T im e ( m s )
library JPVM standard was used and after the JPVM 5000
instrumented to provide the processes scheduling. The 0

algorithms used were: S4P S7P S11P S14P
pro c e s s es
• (M4P) median with 4 processes;
• (M7P) median with 7 processes; Figure 3: Comparison of average execution time of the edge
• (M11P) median with 11 processes; detection filters in scenarios I and II.
• (M14P) median with 14 processes;
In Figure 4 and Figure 5, the marked difference between the
• (S4P) Sobel with 4 processes;
average execution time of the algorithms in the heterogeneous
environment when compared to the scheduling of the standard
JPVM and the instrumented JPVM can be observed. The
• (S14P) Sobel with 14 processes.
profit achieved by means of the instrumented JPVM in the
heterogeneous environment was:
A mammography image was used in a TIFF (Tagged Image
File Format) format with a resolution of 16 bits of elevated • 37.95% when the median filter algorithm was performed
with 4 processes;
size, almost 21 Megabytes. The median filter was evaluated
• 8.09 % when the median filter algorithm was performed
with a template of size 7x7 and by using the algorithm of
with 7 processes;
ordering “quicksort”. The edge detection filter was executed
• 29.72% when the median filter algorithm was performed
with a template of size 11x11. The template size is what
with 11 processes;
defines the neighborhood size to be considered, depending on
• 3.44% when the median filter algorithm was performed
JPVM - PAgent: Hom ogeneous Heterogeneous
with 14 processes;
• 37.49% when the edges detection filter algorithm was 60000
performed with 4 processes; 50000
• 37.29% when the edges detection filter algorithm was Exe c ut io n

40000
30000
performed with 7 processes; T ime ( ms )
20000
• 6.59% when the edges detection filter algorithm was
10000
performed with 11 processes;
0
• 6.44% when the edges detection filter algorithm was M 4P M 7P M 11P M 14P S4P S7P S11P S14P
performed with 14 processes. pro c e s s e s
Figure 7: Comparison of average execution time of the

Heterogeneous: JPVM - Standard JPVM - PAgent algorithms in scenarios II and IV.
80000
70000 A satisfactory gain in the execution time of the algorithms
60000
E xe c ut io n 50000 in a heterogeneous distributed parallel environment using the
40000
T im e ( m s )
30000
instrumented JPVM was achieved, due to the heterogeneity
20000 established in its code. This heterogeneity consists of:
10000
0 collecting raw load resources, calculating performance and
M 4P M 7P M 11P M 14P load indexes, and providing scheduling by means of mobile
pro c e s s es
agents. The information collected by the agent to do the
Figure 4: Comparison of average execution time of the median scheduling does not provide significant traffic in the network.
filters in scenarios III and IV. The scheduling asserts the processing elements with a
coherent load to the respective computational power when it
Heterogeneous: JPVM - Standard JPVM - PAgent executes load scheduling based on this information and then
has a superior performance.
30000
25000
20000
E xe c ut io n VII. CONCLUSIONS
15000
T im e ( m s )
10000 The purpose of this article showed the benefit achieved using
5000
mobile agents in parallel computation applied to medical
0
S4P S7P S11P S14P image processing, always maintaining the workload
pro c e s s es information (reflected by the performance and load indexes)
updated. Thus, it was possible to provide a better performance
Figure 5: Comparison of average execution time of the edges in the application execution time that uses these indexes to
detection filters in scenarios III and IV. supply load scheduling. As a sub-aim, unnecessary traffic
should be avoided in order to generate this agent in the inter-
In Figure 6 and Figure 7, it is demonstrated how the communication network, which influences the performance
environment heterogeneity influences the system performance system.
as a whole, emphasizing that graphs were used to analyze the From the results achieved, it can be observed that there is a
same algorithms with the same passage library, however with huge benefit concerning process scheduling helped by the
different machines. mobile agents to collect performance and load indexes for
parallel medical image processing, i.e., there is a significant
decrease in the response time when the proposed environment
JPVM - Standard: Hom ogeneous Heterogeneous is used compared to the environment that uses the scheduling
80000
standard round-robin.
70000 In the future the following could be studied: performance
60000
50000 evaluations of the proposed environment with others types of
E xe c ut io n
40000 parallel applications, as well as a comparison of mobile agents
T im e ( m s )
30000
20000 developed in the μCode environment with agents developed in
10000 the Aglets environment.
0
M 4P M 7P M 11P M 14P S4P S7P S11P S14P
pro c e s s es
Figure 6: Comparison of average execution times of the

algorithms in scenarios I and III.
[15] KUNZ, T. (1991) The Influence of Different Workload
REFERENCES
Descriptions on a Heuristic Load Balancing Scheme.
[1] BARBOSA, J. M. G. PARALELISMO EM IEEE Transactions on Software Engineering.
PROCESSAMENTO E ANÁLISE DE IMAGEM [16] LANGE, D. B.; OSHIMA, M. (1998) Mobile Agents
MÉDICA. (2000) Tese apresentada ao Departamento with Java: The Aglet API. World Wide Web, 1(3).
de Engenharia Electrotécnica e de Computadores – [17] LANGE, D. B.; OSHIMA, M. (1999) Seven Good
Faculdade de Engenharia da Universidade do Porto. Reasons for Mobile Agents. Communications of the
[2] BIESZCZAD, A.; WHITE, T.; PAGUREK, B. (1998) ACM, 42(3):88-89.
Mobile Agents for Network Management. In IEEE [18] MARQUES, G. M.; BRANCO, K. R. L. J. C. (2007)
Communications Surveys, September. Coleta de Índice de Desempenho em Sistemas Paralelos
[3] BRANCO, K. R. L. J. C. (2004) Índice de Carga E Distribuídos Heterogêneos com Agentes Móveis – Uma
Desempenho em Ambientes Paralelos/ Distribuídos – Proposta. In: VII SIPM 2007, Anais do VII - Simpósio
Modelagem e Métricas. Tese de mestrado. ICMC-USP. de Informática do Planalto Médio, Passo Fundo/RS -
[4] BRANCO, K. R. L. J. C.; MORENO, E. D. (2006) Brasil.
"Load Indices - Past, Present and Future," ichit, pp. [19] MARQUES, G. M.; SABATINE, R. J.; NUNES, F. L.
206-214, 2006 International Conference on Hybrid S. M; BRANCO, K. R. L. J. C. (2007) Processamento
Information Technology - Vol2 (ICHIT'06). de Imagens Médicas em Ambiente Paralelo Distribuído
[5] CHAKRAVARTI, A. J.; BAUMGARTNER, G.; Utilizando Agentes Móveis. In: II C3N 2007. Anais do
LAURIA, M. (2004) Application-Specific Scheduling II Congresso da Academia Trinacional de Ciências, Foz
for the Organic Grid. In Proceedings of 5th IEEE/ACM do Iguaçu/PR - Brasil.
International Workshop on Grid Computing, p. 146- [20] NUNES, F. L. S. M. (2001) Investigações em
155. Processamento de Imagens Mamográficas para Auxílio
[6] CHO CHO MYINT; KHIN MAR LAR TUN. (2005) A ao Diagnóstico de Mamas Densas. Tese apresentada ao
Framework of Using Mobile Agent to Achieve Instituto de Física de São Carlos – Universidade de São
Efficient Load Balancing in Cluster. Information and Paulo.
Telecommunication Technologies. APSITT 2005 [21] NUNES, F. L. S. (2006) Introdução ao Processamento
Proceedings. 6th Asia-Pacific Symposium on Volume, de Imagens Médicas para Auxílio ao Diagnóstico. In:
Issue, 09-10 Nov. 2005, p. 66 – 70. Karin Breitman; Ricardo Anido. (Org.). Atualizações
[7] DANTAS, M.A.R.; LOPES, J.G.R.C.; RAMOS, T.G. em Informática. 1 ed. Rio de Janeiro: PUC-Rio, v. 1, p.
(2002) An Enhanced Scheduling Approach in a 73-126.
Distributed Parallel Environment Using Mobile Agents. [22] PICCO, G. (2007) A Mobile Code Toolkit. Disponível
Proceedings of the 16th Annual International em: <http://mucode.sourceforge.net>. Acesso em
Symposium on High Performance Computing Systems Fevereiro de 2007.
and Applications, p. 177 – 181. [23] SHIRAZI, B.A.; HURSON, A. R.; KAVIN, K. M.
[8] FERRARI, A. J. (2007) JPVM The Java Parallel (1995) Scheduling and Load Balancing in Parallel and
Virtual Machine. Disponível em: < Distributed Systems. IEEE Computer Society Press.
http://www.cs.virginia.edu/~ajf2j/jpvm.html >. Acesso [24] ZALUSKA, E. J. (1991) Research Lines in Distributed
em Maio de 2007. Computing Systems and Concurrent Computation.
[9] FERRARI, D.; ZHOU, S. (1987) An Empirical Workshop em Programação Concorrente, Sistemas
Investigation of Load Indices for Load Balancing Distribuídos e Engenharia de Software, ICMC/USP,
Applications. In Proceedings of Performance'87, the São Carlos/SP.
12th Int'l Symposium on Computer Performance [25] ZHOU, S.; ZHENG, X.; WANG, J.; DELISLE, P.
Modeling, Measurement, and Evaluation. (1993) Utopia: a Load Sharing Facility for Large,
[10] FUGGETTA, A.; PICCO, G.; VIGNA, G. (1998) Heterogeneous Distributed Computer Systems.
Understanding Code Mobility. IEEE Transactions on Software: Practice and Experience, v.23.
Software Engineering, vol. 24, p. 343-353.
[11] FUKUDA, M.; TANAKA, Y.; BIC, L. F.;
KOBAYASHI, S. (2003) A Mobile-Agent-Based PC
Grid. IEEE Computer.
[12] GONZALEZ, R. C.; WOODS, R. E. Processamento de
Imagens Digitais. Ed Edgard Blücher, São Paulo, 2002.
[13] JANSEN, W. (2000) Countermeasures for Mobile
Agent Security. Computer Communications, vol. 23, p.
1667-1676.
[14] JANSEN, W.; KARYGIANNIS, T. (1999) Mobile
Agent Security. National Institute of Standards and
Technology - Computer Security Division, NIST
Special Publication 800-19.

Artigo IEEE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artigo IEEE

Uploaded by

Copyright:

Available Formats

Performance Analysis of Scheduling Techniques

in Distributed Parallel Environments

978-1-4244-1706-3/08/$25.00 ©2008 IEEE.

The performance index relies on the Euclidian distance

instrumented to provide the processes scheduling. The 0

• 37.29% when the edges detection filter algorithm was Exe c ut io n

Figure 7: Comparison of average execution time of the

Figure 6: Comparison of average execution times of the

You might also like