ScienceCloud 2012 Oresentation

Efficient Provisioning of Bursty Scientific
Workloads on the Cloud Using Adaptive

Elasticity Control
Ahmed Ali-Eldin, Johan Tordsson,

and Erik Elmroth
Department of Computing Science
Ume University, Sweden
www.cloudresearch.se
Maria Kihl
Lund Center for Control of Complex
Engineering Systems
Lund University, Sweden
http://www.lccc.lth.se/
Ume University
Context
ve research programme in e-science between Uppsala University, Lund University

rch environment that enables a strong interplay between e-science research, e-infr
Motivation & Problem definition
The cloud elasticity problem

How much capacity to (de)allocate to a
cloud service (and when)?
Bursty and unknown workload
Increase ability to meet SLAs
Reduce resource usage
One of the limitations identified by Truong
et al. [1] to the wide adoption of the
Problem Description
Prediction of load/signal/future is not a new problem
Studied extensively within many disciplines
Time series analysis
Econometrics
Control theory
Stock markets
Biology, etc.
Multiple solutions proposed to prediction problem
Neural networks
Fuzzy logic
Adaptive control
Regression
Kriging models
<your favorite machine learning technique>
However, solution must be suitable for our problem
Requirements
Vary capacity allocated to a service
According to current and future load

Fulfill QoS requirements to meet SLAs
Without costly over-provisioning
Robustness
Avoid oscillations or behavioral changes
Scalability
Tens of thousands of servers + even more VMs
Adaptive to changing workloads
PID-controllers reliable for certain load patterns,
but unstable once the load or system dynamics
change
Fast
Limited look-ahead control accurate but too slow
Can take 30 min to control 15 servers and 60 VMs
Simplicity
Key to adoption
Our approach:
Adaptive Hybrid control
Closed loop control
Adaptive control:
P-controller
Adjust error signal by gain parameter
Error signal is the difference between current and
desired output
Change signal adjustments with load dynamics
Hybrid control, a controller that combines
Reactive control (step controller)

Proactive control (proportional, P-controller)
Initial model and

assumptions
Service with homogeneous requests
Short requests that take one time unit (or
less) to serve
VM startup time is negligible
Delayed requests are dropped
VM capacity constant
Infrastructure modeled as G/G/N queue
N (#VMs) varies over time
Perfect load balancing assumed
A. Ali-Eldin, J. Tordsson, and E. Elmroth. An

adaptive hybrid elasticity controller for cloud
infrastructures. In NOMS 2012, IEEE/IFIP Network
Operations and Management Symposium. IEEE,
2012.
Model and assumptions

Assumptions:
Homogeneous requests
Short requests that take one time unit
(or less)
Machine startup time is negligible
Delayed requests are dropped
Constant machine capacity
Infrastructure modeled as G/G/N queue
N (#VMs) varies over time

Perfect load balancing assumed
Our approach (cont.)
Adaptive control (cont.)

How to estimate change in workload?
F=C*P
Estimated
load change
Gain parameter
Average capacity in last time window

Window size changes dynamically
Smaller upon prediction errors
A tolerance level decide how often

window is resized
Two gain parameter alternatives studied
1.Periodical rate of change

2.P = Load change / avg. rate in last time window
3.Denoted P_1 henceforth
2. Ratio of load change over average system rate:
. P = Load change / avg. rate over all time
. Denoted P_2 henceforth
Hybrid control (cont.)

All in all, 9 approaches for
scale up (U) and scale down (D)

Reactively (R) and/or Proactively (P)
UR combined with:
DR, DP, DRP
UP combined with:
DR, DP, DRP
URP combined with:

DR, DP, DRP
Notation in the following:
URP-DP
Scale up: reactive + proactive
Scale down: proactive
Performance Evaluation
Simulation-based evaluations
3 aspects studied
1.Best combination of reactive and proactive

controllers
2.Controller stability w.r.t. workload size
3.Comparison with state-of-the art controller
4.Regression control [Iqbal et al, FGCS 2011]
Performance metrics
.Over-provisioning:
.VMs allocated but not needed
.Under-provisioning:
.VMs needed, but failed to allocate (SLA violation)
Studied workload
FIFA98 traces
~3 month Web server traces (bursty)

Grouped requests per second of arrival
Best controller combination

Scaled FIFA traces x 50
Reasonable Internet growth 1998 > today
Assume that 1 VM handles 500 requests

Reasonable for DB-backend Web servers
Studied (for sake of completion) all 9

combinations of reactive + proactive controller
Some make no sense & indeed performed poorly:

Reactive scale down causes oscillations and lot of
under-provisioning (SLA violations)
Pure proactive scale up tends to skew and cause
under-provisioning
Other approaches more promising:
Reactive scale up
Fast reaction to load increases, no skew
Proactive scale-down
Keep VMs for a while (just in case) once they are allocated
Best combination(cont.)
Baseline: UR-DR
1.63% under-provisioning
1.4% over-provisioning
UR-DP_1
0.41% under-provisioning (1.63% for UR-DR)

9.44% over-provisioning (1.4% for UR-DR)
UR-DP_2
0.18% under-provisioning (1.63% for UR-DR)

14.33% over-provisioning (1.4% for UR-DR)
Stability w.r.t workload size
Multiplied FIFA traces by X=10, 20, , 60

Assume that 1 VM handles 10*X requests/s
Studied UR-DR, UR-DP_1, UR-DP_2
Under-provisioning:
Conclusions:
Over-provisioning:
Reactive stable (no surprise)

Proactive controller prediction quality varies with workload
Error in over-provisioning grows slower than workload size
Comparison with regression

Regression-based control:
Scale up: reactively, Scale down: regression

2nd order regression based on full workload history
Evaluation on selected (nasty) part of FIFA trace
UR-DR:
2.99% under-provisioning,
UR-D_Regression:
UR-DP_1:
UR-DP_2:
19.57% over-prov.
47% over-prov.
32.24% over-prov.
39.75% over-prov.
Controller performance (execution time)
Regression: 0.98s on average, up to 6.5s observed

Our approach: 0.6 ms on average
Conclusions
P-control promising approach to cloud elasticity
Accurate predictions
Rapid
Controller execution time in ms
Robust
Copes with changes in workload dynamics
No one-size-fits all controller
Tradeoff between over- and under-provisioning

Costs for SLA violation (under-provisioning) and
resource wastage (over-provisioning) decides
strategy to use

ScienceCloud 2012 Oresentation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ScienceCloud 2012 Oresentation

Uploaded by

Copyright:

Available Formats

Efficient Provisioning of Bursty Scientific

Workloads on the Cloud Using Adaptive

Ahmed Ali-Eldin, Johan Tordsson,

ve research programme in e-science between Uppsala University, Lund University

Motivation & Problem definition

The cloud elasticity problem

According to current and future load

Hybrid control, a controller that combines

Reactive control (step controller)

Initial model and

A. Ali-Eldin, J. Tordsson, and E. Elmroth. An

Model and assumptions

N (#VMs) varies over time

Our approach (cont.)

Adaptive control (cont.)

Average capacity in last time window

A tolerance level decide how often

Two gain parameter alternatives studied

1.Periodical rate of change

Hybrid control (cont.)

scale up (U) and scale down (D)

URP combined with:

Notation in the following:

1.Best combination of reactive and proactive

~3 month Web server traces (bursty)

Best controller combination

Reasonable Internet growth 1998 > today

Assume that 1 VM handles 500 requests

Studied (for sake of completion) all 9

Some make no sense & indeed performed poorly:

0.41% under-provisioning (1.63% for UR-DR)

0.18% under-provisioning (1.63% for UR-DR)

Stability w.r.t workload size

Multiplied FIFA traces by X=10, 20, , 60

Reactive stable (no surprise)

Comparison with regression

Scale up: reactively, Scale down: regression

Evaluation on selected (nasty) part of FIFA trace

Controller performance (execution time)

Regression: 0.98s on average, up to 6.5s observed

No one-size-fits all controller

Tradeoff between over- and under-provisioning

You might also like