You are on page 1of 6

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip
1

Mythili.R ,2Mugilan.D

PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology, TN, India 2 Assistant Professor, Department of Electronics and Communication K S Rangasamy College Of Technology, TN, India

Abstract - Multiprocessor system-on-chip (MPSoC) is an attractive solution for increase in complexity and size of embedded applications. MPSoC is an integrated circuit containing multiple instruction-set processors on a single chip that implements most of the functionality of a complex electronic system. While embedded systems become increasingly complex, the increase in memory access speed has failed to keep up with the processor speed. This makes the memory access latency a major issue in scheduling embedded applications on embedded systems. Scheduling the tasks of an embedded application on the processors and partitioning the available Scratch-pad memory (SPM) budget among those processors are two critical issues in complex embedded systems. This research focuses mainly on task scheduling and SPM partitioning to reduce the execution time of embedded applications. Equally partitioned SPM reduces the computation time. To further reduce these applications computation time, available SPM can be divided between the processors in any ratio. Pipelined scheduling allows tasks of different embedded application instances to be scheduled at each stage of the pipeline. Keywords - Memory partitioning, multiprocessor system-onchip, scratchpad memory, task scheduling.

in terms of the clock cycles compared to fast on-chip SPM. Cache memory in the processor is replaced by SPM. SPM has been employed as a partial or entire replacement for cache memory due to its better energy efficiency. SPM consists of only decoding circuits, data arrays, and output units. Unlike in caches, it does not require tag comparison on SPM. Due to its simplified architecture, SPM is more energy/area efficient than cache. The computation time of a program on a processor depends on how much SPM is allocated to that processor. Execution time predictability is a critical issue for realtime embedded applications; this means that data caches are not suitable since it is hard to model the exact behaviour and to predict the execution time of programs. To alleviate such problems, many modern MPSoC systems use scratchpad memories. SPM contributes to better timing predictability. Cellular phones, portable media players, gaming consoles are some complex embedded applications consisting of multiple concurrent real-time tasks. Usually tasks are scheduled first and the SPM budget is then partitioned among the processors. Such a decoupled technique may prevent better schedules in terms of minimizing the computation time of the whole application. The integration of those two steps improve the performance.

I.

INTRODUCTION

II.

METHODOLOGY

MPSoC consists of multiple heterogeneous processing elements, a SPM memory hierarchy, and input/output components which are linked together by an on-chip interconnect structure. MPSoC models use a memory hierarchy with slow off-chip memory and fast on-chip scratchpad memories. A larger SPM results in less computation time since off-chip access is more expensive

The embedded application is given to the MPSoC that consists of multiple processors. The application is then divided in to number of tasks. These tasks are scheduled and the memory should be partitioned among the processors. Finally the execution time has to be found.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1058

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013
Embedded Application

Tasks MPSoC TDG

Assigning the scheduled tasks with allocated memory to each processors

Task Scheduling & Memory Partitionin g

T5. Any time there is an edge between two tasks Ti and Tj means that a communication cost should be accounted for provided that these two tasks are allocated to two different processors. Tasks T1, T2, T3, and T5 are ready to be scheduled in our example. Task T5 will not be scheduled at this point based on its ALAP value. Thus, first tasks T1 and T2 will be mapped to the two available processors P1 and P2.
T1 T5

T2

T4

T6

Execution Time Prediction

T3

Fig.1. Block Diagram of the Project.

Fig.2. An example TDG. The scheduling algorithm will map T3 to P2 as it is free before P1 since the computation time of T2 is less than that of T1. In a similar fashion, the scheduling algorithm will assign tasks T4 and T6 to processor P1 whereas task T5 will be mapped to processor P2. From the task schedule, it has seen that task T4 can only start after P2 is done executing task T3. The issue now is to try to reduce the dead time between tasks T1 and T4 imposed by the computation time for tasks T2 and T3. To minimize this dead time, techniques usually allocate more SPM budget to processor P2 to reduce the computation time of tasks T2 and T3.

III.

TASKS & TDG

Embedded applications usually consist of computation blocks, which are treated as tasks. An application program is divided in to tasks. Tasks are the various processes in the application. Whenever a program is executed, the operating system creates a new task for it. The task is like an envelope for the program. The state information of a task is represented by the task states such as idle, running, ready and blocked states. There are usually dependences between tasks that should be respected in the schedule. The problem formulation is based on a task dependence graph (TDG). A TDG is a directed acyclic graph with weighted edges where each vertex represents a task in the embedded application. An edge from task Ti to task Tj in the TDG represents a scheduling order that needs to be enforced due to the fact that Tj needs data to be transferred from Ti after Ti is already executed. The weight of this edge is the communication cost. A processor cannot start executing task until all the necessary data communication is performed. The weight of an edge is the communication cost. Each task can be mapped to any of the available processors. Since the processors in this architectural model can be heterogeneous, the execution time of each task depends on the processor to which this task is mapped as well as the SPM memory allocated to that processor. Accessing a data variable from a SPM is usually in the order of 100 times faster than accessing it from the off-chip memory. Consider the example task graph shown below with six tasks, T1, T2, T3, T4, T5, and T6. Task T4 depends on tasks T1, T2 and T3, and task T6 depends on tasks T4 and

IV.

TASK SCHEDULING & MEMORY PARTITIONING

Four approaches can be implemented to solve the task scheduling and memory allocation problem on MPSoC systems, namely: Decoupled task scheduling and memory partitioning assuming equally partitioned SPM among all available processors, TSMPEQUAL; Decoupled task scheduling and memory partitioning with SPM partitioned among different processors with any ratio, TSMPANY; Integrated task scheduling and memory partitioning heuristic, TSMPINTEG; Integrated heuristic with pipelining TSMPPIPE; Unlike current approaches that treat task scheduling and memory partitioning as two separate problems, these two problems can be solved in an integrated fashion. An effective heuristic was developed for the task scheduling/ memory partitioning problem for a multiprocessor system-

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1059

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013
on-chip where a single application is using the MPSoC at a time. These two steps are performed in an integrated fashion where the private on-chip memory budget allocated to a processor is decided as tasks are mapped to this processor. The computation time of a task depends on the processor to which it is mapped, as well as on the SPM memory available for that task. Therefore, task scheduling should take into consideration the varying computation time of a task based on the processor and the SPM budget. An embedded application is usually executed many times for a stream of input data on an MPSoC. Such multiple executions make embedded applications amenable to pipelined implementation. Pipeline scheduling benefits from allowing tasks of different embedded application instances to be scheduled at each stage of the pipeline. The objective is to decrease the pipeline stage time interval, as after filling up the pipeline an instance execution of the application is performed each pipeline stage. The maximum number of stages is equal to the number of processors in the MPSoC system. A. Decoupled TSMP using Cache Memory At first the schedule is done by assuming no available scratch pad memories. Tasks T1, T2, T3, and T5 are ready to be scheduled in the example. Task T5 will not be scheduled at this point based on its ALAP value. Thus, first tasks T1 and T2 will be mapped to the two available processors P1 and P2. The scheduling algorithm will map T3 to P2 as it is free before P1 since the computation time of T2 is less than that of T1. In a similar fashion, the scheduling algorithm will assign tasks T4 and T6 to processor P1 whereas task T5 will be mapped to processor P2. Fig.4. Schedule on Equal Partitioned SPM The results following partitioning the available SPM memory equally between the two processors. With such a criterion, the available SPM budget will be equally divided between processors P1 and P2 regardless of what tasks are mapped to what processors. The idle time can be reduced. Equally partitioned SPM reduces the computation time of the whole application. C. Decoupled TSMP on Non equal Partitioned SPM

P1

T1

T5

P2

T2

T3

T4

T6

Time

Fig.5. Schedule Based on Non equal Partitioned SPM To further reduce this applications computation time, the available SPM can be divided between the two processors in any ratio. From the task schedule, we can see that task T4 can only start after P2 is done executing task T3. The issue now is to try to reduce the dead time between tasks T1 and T4 imposed by the computation time for tasks T2 and T3. To minimize this dead time, techniques usually allocate more SPM budget to processor P2 to reduce the computation time of tasks T2 and T3. D. Integrated TSMP The problem with the previous schedule is that it allocated T3 to the same processor P2 that is scheduled to execute T2. This choice is the reason for the dead time in the schedule as T2 cannot benefit much from more SPM memory which is clear from the Min, Avg, and Max values. A good heuristic should take these values into consideration where a better choice for T3 is to be scheduled on P1 with all available SPM memory being allocated to this processor, and the result is a schedule with the minimal end time.

P1 P2

T1 T2 T3

T4 T5

T6

Time

Fig.3. Schedule Based on no SPM B. Decoupled TSMP on Equal Partitioned SPM

P1 P1 P2 T1 T2 T5 T3 T4 T6 P2

T1

T3

T4

T6

T2

T5

Time Time

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1060

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013
Fig.6. Schedule Based on Integrated Approach E. Integrated TSMP with Pipelining Pipeline scheduling allows tasks of different embedded application instances to be scheduled at each stage of the pipeline. Such a schedule does not necessarily decrease the computation time of one instance of embedded application, but rather it decreases the time between the start times of two consecutive iterations of the task graph. Here the pipelined concept is implemented by storing the result of previous task in to the memory while current task is executing. This further reduces the computation time.

The decoupled TSMP approach using equally partitioned SPM needs 700ns for execution.

V.

RESULTS AND DISCUSSION C. Simulation Result for Decoupled TSMP on Non equal Partitioned SPM

Task Dependency Graph shown in fig.2 is considered and the implementation was done using the Modelsim software. The various tasks are considered to be interpolation, Sum of Absolute Differences (SAD), Multiply and Accumulation (MAC), addition, subtraction, and multiplication from MPEG4 encoder block.

A. Simulation Result for Decoupled TSMP using Cache Memory The execution time obtained for decoupled TSMP using cache memory approach is 800ns.

The execution time obtained for decoupled TSMP approach using non equal partitioned SPM is 600ns.

D. Simulation Result for Integrated TSMP The execution time obtained for integrated TSMP approach using SPM is 500ns.

B. Simulation Result for Decoupled TSMP on Equal Partitioned SPM

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1061

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013
An effective heuristic was presented that integrates task scheduling and memory partitioning of embedded applications on multiprocessor systems-on-chip with scratchpad memory. Compared to the widely-used decoupled approach, this integrated approach significantly improved the results, since the appropriate partitioning of SPM spaces among different processors depends on the tasks scheduled on each of those processors and vice versa. Thus the reduction in the execution time of the tasks scheduled on the processors is obtained using various approaches such as equally partitioned SPM, non equal partitioned SPM, integrated approach and integrated approach with pipelining. Simulation results are obtained using modelsim software and the frequency values are obtained using xilinx software.

E. Simulation Result for Integrated TSMP with pipelining Integrated TSMP with pipelining approach needs 500ns for executing the given tasks.

REFERENCES
[1] Hassan Salamy and J. Ramanujam, An Effective Solution to Task Scheduling and Memory Partitioning for Multiprocessor System-onChip, in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 5, May 2012. L. Benini, D. Bertozzi, A. Guerri, and M. Milano, Allocation and scheduling for MPSOC via decomposition and no-good generation, in Proc. IJCAI, 2005, pp. 107121. Y.K. Kwok and I. Ahmad, Benchmarking and comparison of the task graph scheduling algorithms, J. Parallel Distributed Comput., vol. 59, no. 3, pp. 381422, Dec. 1999. R. Neimann and P. Marwedel, Hardware/software partitioning using integer programming, in Proc. DATE, 1996, pp. 473480. K. S. Chatha and R. Vemuri, Hardware-software partitioning and pipelined scheduling of transformative applications, IEEE Trans. Very Large Scale Integr., vol. 10, no. 3, pp. 193208, Jun.2002. P. Panda, N. D. Dutt, and A. Nicolau, On-chip vs. off-chip memory: The data partitioning problem in embedded processorbased systems, ACM Trans. Des. Automat. Electron. Syst., vol. 5, no. 3, pp. 682704, Jul. 2000. O. Avissar, R. Barua, and D. Stewart, An optimal memory allocation scheme for scratch-pad-based embedded systems, ACM Trans. Embedded Comput. Syst., vol. 1, no. 1, pp. 626, Nov. 2002. A. Dominguez, S. Udayakumaran, and R. Barua, Heap data allocation to scratch-pad memory in embedded systems, J. Embedded Comput., vol. 1, no. 4, pp. 521540, Dec. 2005.

[2]

F. Comparison Result The results obtained for various processes are compared and it is shown in fig.7. Comparison is done with 37k memory allocation for five different approaches. 300 200 100 0 T0 T1 T2 T3 T4
[8] [3]

[4] [5]

[6]

Frequency
[7]

Fig.7. Comparison between Various Approaches T0 -T1 -SPM T2 -SPM T3 -T4 -Decoupled TSMP using Cache Memory Decoupled TSMP on Equal Partitioned Decoupled TSMP on Non equal Partitioned Integrated TSMP Integrated TSMP with Pipelining

AUTHORS PROFILE

The frequency values increased between each processes and hence the execution time gets reduced in the implemented concept.

Mythili.R received her B.E degree from Anna University, Coimbatore, India, in 2011. She is currently pursuing her M.E degree from Anna university, Chennai, India. Her research area includes optimization of MPSoC and low power VLSI circuits.

VI.

CONCLUSION

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1062

International Journal of Computer Trends and Technology (IJCTT) - volume4 Issue5May 2013

Mugilan.D received his B.E degree from Erode Sengunthar Engineering College, Erode, India, in 2007, M.E degree from Kongu Engineering College, Erode, India, in 2009. He worked as a Assistant Professor in Maharaja Engineering College, Avinashi, India. Since 2010 he is working as a Assistant Professor in K.S.Rangasamy College of Technology, Tamilnadu, India. His research is in the area of embedded systems and digital image processing. He is a life member in ISTE.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 1063

You might also like