You are on page 1of 33

Principles of Parallel Algorithm Design

Steps in Designing Parallel Algorithms


Identify portions of the work that can be performed concurrently. Map the concurrent pieces of work onto multiple processes running in parallel. Distribute the input, output, and intermediate data associated with the program. Manage accesses to shared data by multiple processors. Synchronize the processors as needed.

Task Decomposition
Tasks: programmer-defined units of computation. Tasks can be of arbitrary size. Decomposition: dividing the main computation into multiple tasks. The execution time is reduced by executing multiple tasks in parallel. Ideal decomposition:
All tasks can be executed in parallel with comparable execution times. The computation require little or no sharing of data among tasks.

Example: Matrix-Vector Multiplication

Task = computation of y[i] All tasks are independent.

Task-Dependency Graph
Some tasks may use data produced by other tasks. Problem: How to express such dependencies? Task-dependency graph: a directed acyclic graph (DAG) in which nodes represent tasks and directed edges represent dependencies among them. Rule: A task can be executed when all the tasks connected to it by incoming edges have completed. Ideal task-dependency graph for parallel computing: very few or no directed edges. A task-dependency graph is influenced by decomposition of computation into tasks and organization of tasks.

Task-Dependency Graph Example (I): Database Query Processing

MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)

Task-Dependency Graph Example (II): Database Query Processing

MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)

Granularity
The number and size of tasks determines the granularity of decomposition. Fine-grained decomposition: large number of small tasks. => suitable when the computation has a lot of parallelism and the underlying architecture can provide low-latency, high bandwidth communication. Coarse-grained decomposition: small number of large tasks.

Granularity: Example

Coarse-grained decomposition => 4 tasks, each computing n/4 entries of the output vector.

Concurrency
Degree of concurrency: number of tasks that can be performed concurrently. Maximum degree of concurrency: maximum number of tasks that can be performed simultaneously at any given time. Average degree of concurrency: average number of tasks that can run concurrently over the entire duration of the program. Influenced by the decomposition and the corresponding task dependency graph. Path length: sum of weights of all nodes (amount of work corresponding to a node) in the path. Critical path: the longest directed path between any pair of start and finish nodes. Average degree of concurrency = (total work done) / (critical path length)

Concurrency: Example

Total work = 63 Critical path length = 27 Max. Deg. Conc. = 4 Ave. Deg. Conc. = 63/27 = 2.33

Total work = 64 Critical path length = 34 Max. Deg. Conc. = 4 Ave. Deg. Conc. = 64/34 = 1.88

Task Interactions
Task interaction = sharing/exchanging data among the tasks. Task-dependency graphs indicate a specific form of interaction where the output of a task is the input of other tasks. Tasks that are executed in parallel also interact by sharing/exchanging data. Task-interaction graph: represents the pattern of interactions among tasks. A task-dependency graph is a subgraph of taskinteraction graph.

Task Interactions: Example

Sparse matrix-vector multiplication Task i: computes


0 j 11, A[ i , j ]0

Task-interaction graph

A[i, j ]b[ j ]

Processes and Mapping



Process: abstract entity that uses the code and data corresponding to a task to produce output in a finite amount of time. A process may communicate or synchronize with other processes as needed. Mapping = assigning tasks to processes for execution. Good mapping:
maximize the degree of concurrency minimize the total completion time minimize interaction

Mapping determines how much concurrency is actually utilized and how efficiently. Processes are mapped to physical processors.

Mapping Tasks: Example

Mapping the tasks in the database example onto four processes.

Decomposition Techniques
Decomposition: split the computation into a set of tasks for concurrent execution. => fundamental step in designing parallel algorithms. Decomposition techniques: 1. Recursive decomposition 2. Data decomposition 3. Exploratory decomposition 4. Speculative decomposition

Recursive Decomposition
Suitable for problems solved using the divideand-conquer strategy. Divide-and-conquer strategy:
Decompose a problem into multiple smaller problems of the same type. Each of these problems are further decomposed recursively until they are simple enough to solve directly. Results of smaller problems are combined to provide the complete solution. Example: quicksort

Example 1: Quicksort

Task dependency graph based on recursive decomposition for sorting a sequence of 12 numbers. Task = partitioning a given subsequence.

Example 2: Minimum Element

Serial algorithm No concurrency Divide-and-conquer algorithm Recursive decomposition used to extract concurrency.

Example 2: Minimum Element

Task-dependency graph for finding the minimum number in the sequence {4, 9, 1, 7, 8, 11, 2, 12}

Data Decomposition
Often used for extracting concurrency in algorithms that operate on large data structures. Decomposition is done in two steps:
Input or output data are partitioned. Computations associated with the data are partitioned into tasks. Input data Intermediate data Output data Combination of input and output data

Partitioning based on:


Partitioning Output Data: Example 1

nxn matrix multiply decomposed into four tasks

Partitioning Output Data: Example 2


Computing itemsets frequencies In a transaction database

Partitioning Input Data: Example

Final step: combine the intermediate results from the two tasks

Partitioning both Input and Output Data: Example

Final step: combine the intermediate results from the four tasks

Partitioning Intermediate Data: Example

Partitioning Intermediate Data: Example

Task dependency graph

Exploratory Decomposition
Used for problems in which the underlying computation is a search in a solution space. Idea: Partition the solution space into smaller parts and search each part concurrently until the desired solution is found. Example: solving games - Only one of the tasks need to achieve the solution to stop the computation. - Could lead to anomalous speedups.

Exploratory Decomposition: Example


The 15-puzzle problem

Anomalous Speedups in Exploratory Decomposition

Speculative Decomposition
Used for programs involving many possible computational branches. Idea: while one task is performing the computation whose output is used in deciding the branch, other tasks start the computation in the next stage. Examples: - Concurrent execution of some of the cases in a switch statement in C program. - Discrete-event simulation.

Speculative Decomposition: Example

Parallel discrete event simulation

Hybrid Decomposition

Finding the minimum element in an array of 16 elements using four tasks Pure recursive decomposition => 8 tasks

You might also like