You are on page 1of 4

Lab Report - Assignment 1

-Sarthak Vishnoi, 2016CS10336

The assignment required us to implement K-means algorithm in sequential form as well as in parallel
form with help of PThreads library and OpenMP library.
Variables
1. N -> Total number of points in the dataset. The dimensionality of each point is 3 and the type is
int.
2. K -> Total number of Clusters required (Will be given as an argument to the main function).
3. T -> Total number of threads.

Sequential Code
1. Randomly choose K out the N points and mark them as centers for the data.
2. We then assign each point a center, according to the distance formula. Each of the point is
allocated the center nearest to it.
3. Now we recompute the the new centers as the mean of the points of one the particular cluster.
4. We again reassign all the points and continue this process till we don’t have any of the point
changing its cluster.
5. The stopping criteria as explained above is either when there is no point which is changing its
cluster or the number of iterations of recomputing the center and re-assigning all the points
have reached a maximum value called ​max_iter.​
6. The efficiency of K-Means algorithm depends on the initial cluster centers and the algorithm
might get stuck at a local optima.

Global Optimum Local Optimum


Parallel Code
1. For the cases of PThreads and OpenMP we have parallelised a part of the code. We haven’t
parallelised the whole code as that required accessing the critical section which would have
required locks and this method wouldn’t have yielded a good speedup.
2. So the part of re-assigning the points to cluster after the new centers of each cluster has been
computed is parallelised. That is each thread does the calculation for N/T points and finds the
nearest center and allocates the point to that cluster.

PThread and OpenMP


1. PThread allows the programmer to get a low level experience of the code. It specifically asks
the programmer to identify the function. In this we have made a function which assigns all the
points to clusters by checking which of the center is nearest to it. This function runs on several
threads with each thread executing it for points between some fixed indices, depending on the
problem size and number of processors.
2. OpenMP allows the programmers to define the code block which needs to be parallelised and
thus it just involves a high level understanding of the code and parallelisation. We used it on
the same loop, for which a different function was made for PThreads and did the computations.

Observations
1. We find the following time taken by different implementations on different Problem Sizes (N)
and different number of processors.

Number of Processors Sequential PThreads OpenMP


5 0.85 0.44 0.41
10 0.85 0.42 0.41
15 0.85 0.4 0.4
20 0.85 0.42 0.39
25 0.85 0.42 0.39

Time taken(in seconds) for with different number of processors on the problem size (N) of 100,000
and K=4
● As the problem size is kept constant so the time taken by the sequential code is also same in all
the cases. Apart from this we also see that there is a slight decrease in time taken by OpenMP
and PThreads implementation on increasing the number of threads. This change isn’t visible
out loud as increasing the number of threads also increases the total overhead.
Problem Size Sequential PThreads OpenMP
200K 2.1 1.23 1.14
400K 4.95 2.59 2.42
600K 3.79 1.83 1.85
800K 15.46 5.97 5.98
1Mil 6.34 2.52 2.53

Time taken(in seconds) for different problem sizes, with number of processors being 5 and K=4.
● In this case with an increase in problem size we see that the time taken also increases for all
the implementations. The time taken is artificially high for the case of 800K data points, this
might have happened because the initialisation points might not be good, so the time taken for
convergence was high.
● In both the implementations OpenMP takes a little bit of less time than compared to the
PThreads implementation of the same logic is because OpenMP is better wrapped with better
optimisation flags.

● The above two graphs show the dependance of speedup and efficiency of the parallel programs
on number of processors. We know from theory that on increasing the number of processors,
speedup should increase but the overhead of maintaining threads also leads to increase in
time. Hence there is increase in speedup but it’s very less and the graph turns out to be linear.
● For efficiency vs number of processors, we know that on increasing the number of processors
the efficiency of system decreases which is also the trend which is shown by the graph.
● The first observation which we make in case of these graphs, where the number of processors
is kept constant is that the graph of speedup is exactly similar to the graph of efficiency. This is
so because we know that efficiency equals the ratio of speedup and number of processors, and
as the number of processors is kept constant, the graph of both is similar.
● We know from theory that the efficiency of a system must increase with increase in problem
size with number of processors kept constant. This is also the trend which is shown by the
graph.
● The total complexity of the algorithm comes out to be O(iteration*N*K) for both the parallel
implementation as well as the sequential implementation. The only difference is in execution of
a particular O(N*K) function (the re-assignment of points). While the same function for parallel
implementations is O(N*K/T).

Conclusion
● We studied and experimented speedups and efficiencies for various problem sizes and
different number of processors with sequential and parallel implementation of K-Means
algorithm.
● For parallel implementation we achieved it using the PThreads and OpenMP libraries.
● All the experimental results which we got were in coherence with the theoretical results which
we studied in class.

You might also like