You are on page 1of 4

CS-6235

(Project Proposal)
Global Extrema locator using interval arithmetic

Advisor : Professor Ganesh Gopalakrishnan


Students: Sandesh Borgaonkar, Arnab Das

Introduction
We propose to implement an interval branch and bound algorithm(BB) that aims to find the global
maxima for a function in a given interval. The goal is to progressively find the best solution for the given
cost function using branch and bound technique until a termination condition is reached. Branch and
bound techniques deal with the optimization problems over a search space that can be presented as the
leaves of a search tree. BB is guaranteed to find an optimal solution, but its complexity in the worst case
is as high as that of exhaustive search. However, with the combination of interval analysis and leveraging
the gpu compute capability to restructure the interval priorities, it aims to reduce the search space of
intervals containing the extrema providing faster convergence. In following sections we discuss the crux
idea behind combining interval analysis with branch and bound, possibilities of parallelization and
leveraging using ILCS(iterated local champion search) framework.

Interval Branch and Bound


Branch and Bound(BB) is a principal problem solving paradigm for finding solutions to combinatorial
optimization problems in many areas, including computer science and operations research, mainly for
computationally intensive, NP-hard problems. The underlying idea of BB is to take a given problem that
is difficult to solve directly, and decompose it into smaller partial problems in a way that a solution to a
partial problem is also a solution to the original problem. This is iteratively applied to a partial problem
until it can be solved directly or is proven not to lead to an optimal solution. The interval branch and
bound algorithm(IBBA) is basically BB operating in a search space of intervals, wherein the cost function
is re-coded using interval arithmetic.
Consider x=(x1, x2, ..., xn) the real vectors and X=(X1, X2,..., Xn) the interval vectors. An interval function
F: In I is said to be an interval extension of the real-valued function f:Rn R if f(x) F(X), whenever
x X. An interval function is said to be inclusion monotonic if X Y implies F(X) F(Y). Results from
Moore[3], states that if F is an inclusion monotonic interval extension of f, then F(X) contains the range
of f(x) for all x X.
Interval analysis for solving optimization problem relies on Moore's result and use BB to find the optima
of the function. Given an initial interval domain, it is divided into sub-intervals(called boxes). This forms
the branching part of BB. Next each of these intervals are evaluated over the inclusion function F. Boxes
that fail to be in bounds of the best solution are discarded while the surviving boxes are further divided
into sub-intervals for evaluation. This continues recursively until a termination condition is reached, that
is, the queue containing the valid intervals becomes empty or the interval width has reduced beyond a
certain tolerance.

Below is a sample pseudo-code for basic IBBA algorithm from [1].

Fig[1]. Basic Sequential Interval Branch and Bound

Interval branch and bound with work-list distribution


In the sequential IBBA algorithm, every iteration results in addition of two additional sub-intervals or
removal of one interval(whose upper-bound is less than the current best), as described in the above
algorithm of Fig[1]. At the initial stages of this algorithm the number of intervals getting queued is quite
large since our evaluation of current best value is quite far away from the optimum. IBBA guarantees
finding the optimum but the complexity might worsen becoming very close to exhaustive search.
However, if the work-list in the queue could be rearranged to help the cpu thread get access to better
target intervals that might contain the optima, then that could lead to faster convergence towards the
optima. We leverage GPUs compute capability to perform a large number of interval sample
computation and deduce a better work-list distribution by reprioritizing the queue. Below is the
pseudocode of the cpuThread showing its management of the IBBA code along with synchronization with
the gpu handler thread.

Fig[2]. IBBA with gpu synchronization

Description of the pseudocode :


The initial condition of the algorithm starts with fbest(current best) set to minimum value supported and
the starting interval of Xo pushed to the tempQue. The while loop maintains the gpuThread
synchronization and the IBBA operation. In absence of a currently executing gpuThread, it checks for the
current size of the MainQue. The algorithm always tries to keep the cpu busy and remove any overhead of
frequent synchronization. It implements the knob cpu_threshold which is the number of elements in the
queue that the cpu must execute before syncing with the gpu. Thus, for N intervals in the queue, the first
cpu_threshold number of intervals are reserved for the cpu, while the next (N- cpu_threshold) intervals
are sent to the gpuThread for redistribution only if (N-cpu_threshold) is greater than new_int_threshold.
The knob new_int_threshold controls the minimum amount of intervals that must be sent to the gpu. It
helps to control the tradeoff between frequent gpu-calls against the volume of gpu-operations necessary
and the overhead of thread synchronization.
When the gpu kernel returns, the gpuThread sets the SyncGpuThread indicating the main thread for
synchronizing the gpu evaluation results.
The IBBA thread continues evaluating the intervals with the best priorities from its queue. The while
loop continues until MainQue becomes empty. For every iteration it de-queues an interval from Mainque
and checks its upperbound and interval width tolerance against the current best solution and the allowed
tolerance respectively. If any of them return false, that interval is discarded and a new interval is dequeued. For a valid interval, the interval is further split into two subintervals X1 and X2. For each of these
sub-intervals, the function is evaluated at the mid-point(just a heuristic) of the interval and compared with
the current best solution. If the evaluation gives a better solution than the current best, then the current
best is updated and the corresponding subinterval is inserted into the queue, else the interval gets
discarded. As evident, we always have a current best solution. Hence, when the termination condition is
reached (queue empty), the current best value gives the global optima.

Suitability for gpu acceleration


The proposed technique to accelerate the work-list distribution is inherently suitable to gpu-computation
since it is evidently embarrassingly parallel requiring a lot of parallel computation and a significant
amount of work-redistribution based on the computed values. We will describe the work-distribution
technique below.
Whenever a gpu kernel call is made, the gpu is required to evaluate N intervals of M dimensions. As M
becomes large, for sequential programs it becomes progressively difficult to find the optima along the
possible dimension (especially for extremely complex functions). Hence, we partition the problem on the
GPU across all the dimensions trying to find the best evaluation along every possible dimension.
Consider the Ith interval of M dimensions. Thus Ith interval will have M sub-intervals. Each of these M
sub-intervals are further divided into K sub-sub-intervals and computations are performed by taking k
samples of each of these K sub-sub-intervals and some random value within the intervals along the other
dimensions. Thus, for every interval in the queue, MxK number of threads compute MxK values of which
the interval gets a new priority value which is the maximum of these MxK values. Thus we get a
redistributed priority list of intervals which is synchronized back to the cpu thread. It helps the cpu thread
to access better intervals with higher probability of faster convergence.

Challenges
Synchronizing the gpu-cpu thread while maintain a proper work-balance for the cpu and gpu is the
primary challenge. We are using C++11 thread utility. Also we will be using openmp constructs to
parallelize parts of the cpu-thread where a lot of data exchange occurs(like copying queues across the
intervals libraries of the cpu and gpu).
Additionally, we need to look into the possibility of the gpu being able to prune certain parts of the
intervals such that the returned list is not just redistributed but also a pruned list of intervals. One
challenge in this aspect is maintaining the precision. The interval gpu libraries do not have the precison
requirement as used by the cpu-library. Also, the current fbest evaluation at the cpu-side needs to be
regularly communicated to the gpu-kernel through some shared resource such that the kernel can make
pruning decisions. However, if we consider these intervals to be anyway pruned by the cpu while
checking the upperbound, the overhead of cpu pruning the space is expected to be quite low(since only
comparison is done on those intervals that get pruned- no interval breakdown). Some experiments in these
direction will help to realize the proper location for pruning.
References
[1]Finding and Proving the optimum:Cooperative Stochastic and Deterministic Search Jean-Marc
Alliot, Nicolas Durand, David Gianazza, Jean-Baptiste Gotteland
[3]Efficient Search for inputs causing high floating point errors Wei-Fan Chiang, Ganesh
Gopalakrishnan, Zvonimir Rakamaric, Alexey Solovyev
[2]Introduction to interval analysis Ole Caprani, Kaj Madsen, Hans Bruun Nielson
[3]Interval Analysis R.E.Moore
[4]A Scalabale Heterogeneous Parallelization Framework for Iterative Local Searches Martin Burtscher,
Hassan Rabeti

You might also like