PC ch3p1

Principles of Parallel Algorithm Design
Steps in Designing Parallel Algorithms

Identify portions of the work that can be performed concurrently. Map the concurrent pieces of work onto multiple processes running in parallel. Distribute the input, output, and intermediate data associated with the program. Manage accesses to shared data by multiple processors. Synchronize the processors as needed.
Task Decomposition
Tasks: programmer-defined units of computation. Tasks can be of arbitrary size. Decomposition: dividing the main computation into multiple tasks. The execution time is reduced by executing multiple tasks in parallel. Ideal decomposition:
All tasks can be executed in parallel with comparable execution times. The computation require little or no sharing of data among tasks.
Example: Matrix-Vector Multiplication
Task = computation of y[i] All tasks are independent.
Task-Dependency Graph
Some tasks may use data produced by other tasks. Problem: How to express such dependencies? Task-dependency graph: a directed acyclic graph (DAG) in which nodes represent tasks and directed edges represent dependencies among them. Rule: A task can be executed when all the tasks connected to it by incoming edges have completed. Ideal task-dependency graph for parallel computing: very few or no directed edges. A task-dependency graph is influenced by decomposition of computation into tasks and organization of tasks.
Task-Dependency Graph Example (I): Database Query Processing
MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)
Task-Dependency Graph Example (II): Database Query Processing
MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)
Granularity
The number and size of tasks determines the granularity of decomposition. Fine-grained decomposition: large number of small tasks. => suitable when the computation has a lot of parallelism and the underlying architecture can provide low-latency, high bandwidth communication. Coarse-grained decomposition: small number of large tasks.
Granularity: Example
Coarse-grained decomposition => 4 tasks, each computing n/4 entries of the output vector.
Concurrency
Degree of concurrency: number of tasks that can be performed concurrently. Maximum degree of concurrency: maximum number of tasks that can be performed simultaneously at any given time. Average degree of concurrency: average number of tasks that can run concurrently over the entire duration of the program. Influenced by the decomposition and the corresponding task dependency graph. Path length: sum of weights of all nodes (amount of work corresponding to a node) in the path. Critical path: the longest directed path between any pair of start and finish nodes. Average degree of concurrency = (total work done) / (critical path length)
Concurrency: Example
Total work = 63 Critical path length = 27 Max. Deg. Conc. = 4 Ave. Deg. Conc. = 63/27 = 2.33
Total work = 64 Critical path length = 34 Max. Deg. Conc. = 4 Ave. Deg. Conc. = 64/34 = 1.88
Task Interactions
Task interaction = sharing/exchanging data among the tasks. Task-dependency graphs indicate a specific form of interaction where the output of a task is the input of other tasks. Tasks that are executed in parallel also interact by sharing/exchanging data. Task-interaction graph: represents the pattern of interactions among tasks. A task-dependency graph is a subgraph of taskinteraction graph.
Task Interactions: Example
Sparse matrix-vector multiplication Task i: computes

0 j 11, A[ i , j ]0
Task-interaction graph
A[i, j ]b[ j ]
Processes and Mapping

Process: abstract entity that uses the code and data corresponding to a task to produce output in a finite amount of time. A process may communicate or synchronize with other processes as needed. Mapping = assigning tasks to processes for execution. Good mapping:
maximize the degree of concurrency minimize the total completion time minimize interaction
Mapping determines how much concurrency is actually utilized and how efficiently. Processes are mapped to physical processors.
Mapping Tasks: Example
Mapping the tasks in the database example onto four processes.
Decomposition Techniques
Decomposition: split the computation into a set of tasks for concurrent execution. => fundamental step in designing parallel algorithms. Decomposition techniques: 1. Recursive decomposition 2. Data decomposition 3. Exploratory decomposition 4. Speculative decomposition
Recursive Decomposition
Suitable for problems solved using the divideand-conquer strategy. Divide-and-conquer strategy:
Decompose a problem into multiple smaller problems of the same type. Each of these problems are further decomposed recursively until they are simple enough to solve directly. Results of smaller problems are combined to provide the complete solution. Example: quicksort
Example 1: Quicksort
Task dependency graph based on recursive decomposition for sorting a sequence of 12 numbers. Task = partitioning a given subsequence.
Example 2: Minimum Element
Serial algorithm No concurrency Divide-and-conquer algorithm Recursive decomposition used to extract concurrency.
Example 2: Minimum Element
Task-dependency graph for finding the minimum number in the sequence {4, 9, 1, 7, 8, 11, 2, 12}
Data Decomposition
Often used for extracting concurrency in algorithms that operate on large data structures. Decomposition is done in two steps:
Input or output data are partitioned. Computations associated with the data are partitioned into tasks. Input data Intermediate data Output data Combination of input and output data
Partitioning based on:

Partitioning Output Data: Example 1
nxn matrix multiply decomposed into four tasks
Partitioning Output Data: Example 2

Computing itemsets frequencies In a transaction database
Partitioning Input Data: Example
Final step: combine the intermediate results from the two tasks
Partitioning both Input and Output Data: Example
Final step: combine the intermediate results from the four tasks
Partitioning Intermediate Data: Example
Partitioning Intermediate Data: Example
Task dependency graph
Exploratory Decomposition
Used for problems in which the underlying computation is a search in a solution space. Idea: Partition the solution space into smaller parts and search each part concurrently until the desired solution is found. Example: solving games - Only one of the tasks need to achieve the solution to stop the computation. - Could lead to anomalous speedups.
Exploratory Decomposition: Example

The 15-puzzle problem
Anomalous Speedups in Exploratory Decomposition
Speculative Decomposition
Used for programs involving many possible computational branches. Idea: while one task is performing the computation whose output is used in deciding the branch, other tasks start the computation in the next stage. Examples: - Concurrent execution of some of the cases in a switch statement in C program. - Discrete-event simulation.
Speculative Decomposition: Example
Parallel discrete event simulation
Hybrid Decomposition
Finding the minimum element in an array of 16 elements using four tasks Pure recursive decomposition => 8 tasks

PC ch3p1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PC ch3p1

Uploaded by

Copyright:

Available Formats

Principles of Parallel Algorithm Design

Steps in Designing Parallel Algorithms

Example: Matrix-Vector Multiplication

Task = computation of y[i] All tasks are independent.

Task-Dependency Graph Example (I): Database Query Processing

MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)

Task-Dependency Graph Example (II): Database Query Processing

MODEL=Civic AND YEAR=2001 AND (COLOR=Green OR COLOR=White)

Task Interactions: Example

Sparse matrix-vector multiplication Task i: computes

Processes and Mapping

Mapping Tasks: Example

Mapping the tasks in the database example onto four processes.

Example 2: Minimum Element

Example 2: Minimum Element

Partitioning based on:

Partitioning Output Data: Example 1

nxn matrix multiply decomposed into four tasks

Partitioning Output Data: Example 2

Partitioning Input Data: Example

Partitioning both Input and Output Data: Example

Partitioning Intermediate Data: Example

Partitioning Intermediate Data: Example

Task dependency graph

Exploratory Decomposition: Example

Anomalous Speedups in Exploratory Decomposition

Speculative Decomposition: Example

Parallel discrete event simulation

You might also like