You are on page 1of 26

Linux Scheduling Algorithm

Shinto Philip, Vipin M.K.

History and Background

In 1991 Linus Torvalds took a college computer science course that used the Minix operating system Minix is a toy UNIX-like OS written by Andrew Tanenbaum as a learning workbench

Linus went in his own direction and began working on Linux


In October 1991 he announced Linux v0.02 In March 1994 he released Linux v1.0 Latest stable version is Linux v3.7.

Scheduling

Job of allocating CPU time to different tasks within an operating system.

Scheduling includes..

Running and interrupting of user processes Running of various kernel tasks

Linux Approach-Earlier versions

Earlier versions(up to version 2.5) run on traditional UNIX scheduling algorithm.


Does

not support SMP systems. Does not scale well as the number of tasks on the system grows
O(n)

operations needed for selecting a process. Uses multi level feedback queue scheduling.

Linux Scheduling-Features

Runs in constant time-O(1)- regardless of the number of processes. Increased support for SMP Processor affinity Load Balancing Fairness and support for interactive tasks

Completely Fair Scheduler

CPU scheduler from version 2.6.23 is known as Completely Fair Scheduler(CFS).

Total complexity- O(log n) if there are n processes.


Choosing a task can be done in constant time-O(1). Reinserting a task after it has run requires O(log n) operations.

Processor Affinity

Processors contain cache memory, which speeds up repeated accesses to the same memory locations. If a process were to switch from one processor, the data in the cache ( for that process ) would have to be invalidated and re-loaded from main memory. Therefore SMP systems attempt to keep processes on the same processor, via processor affinity.

Soft affinity occurs when the system attempts to keep processes on the same
processor but makes no guarantees. Hard affinity, in which a process specifies that it is not to be moved between processors. Linux supports Hard affinity.

Load Balancing
Important goal in a multiprocessor system is to balance the load between processors. Most systems, maintain separate ready queues for each processor. Balancing can be achieved through either push migration or pull

migration:

Push migration involves a separate process that runs periodically, and moves

processes from heavily loaded processors onto less loaded ones. Pull migration involves idle processors taking processes from the ready queues of other processors.

Load Balancing???

Linux Approach

Two separate scheduling algorithms


Time

sharing algorithm : for fair preemptive scheduling among multiple processes. time tasks scheduling: absolute priorities have more importance than fairness.

Real

Linux Approach

Preemptive, priority based algorithm Two separate priority ranges:


Real time: ranges from 0-99 Other processes: ranging from 100 to 140.

Lower value indicates higher priority.

In Linux, process priority is dynamic. Scheduler increases/decreases the priority.

Linux Approach

Scheduler assigns

Higher priority processes longer time quanta. Lower priority processes shorter time quanta.

Example: Tasks running at priority level 60 will receive time quanta of 800 ms, whereas tasks at priority level of 95 will receive 5 ms.

runqueue

The runqueue is the list of runnable processes on a given processor. there is one runqueue per processor.

Linux Approach

When a task has exhausted its time slice, it is considered expired. Not eligible for execution until all other tasks exhausted their time quanta. In SMP, each processor maintains its own runqueue and schedules itself independently. Each runqueue contains 2 priority arrays indexed according to priority.

Active : contains all tasks with time remaining in their time slices. Expired: Contains all expired tasks.

Scheduler selects the task with highest priority for execution on the CPU.

Linux Approach

When the active array becomes empty, the two priority queues are exchanged.

Expired array becomes active array and vice-versa.

Tasks are assigned dynamic priorities.


Based on the nice value 5. Whether a value is added to or subtracted from a tasks nice value depends on the interactivity of the task. Processes that are more interactive and adjustments are closer to -5(Scheduler favors such interactive tasks).

A tasks dynamic priority is recalculated when the task has exhausted its time quantum.

Increase priority of waiting processes. Decrease priority of processes running for a long time.

Dynamic Priority

The processs dynamic priority is continuously recalculated, so as to

Reward interactive threads and Punish CPU hogging threads.

Maximum priority bonus is -5. The maximum priority penalty is +5.


the scheduler maintains a sleepavg variable associated with each task.

Whenever a task is awaken, this variable is incremented. whenever a task is preempted or its quantum expires, this variable is decremented by the corresponding value.

Linux Approach

If the task blocks for an I/O event, before its timeslice expires, after completion of IO, it is placed back to the original active array.

Its time slice is decremented to reflect the CPU time it already consumed.

Reasons:

If a process was blocked waiting for keyboard input, it is clearly an interactive process.

waitqueue

Tasks which are not runnable and are waiting on various I/O operations or other kernel events are placed on another data structure - waitqueue.

active and expired queues

Kernel Synchronization

Preemptive kernel: a process running in kernel mode can be replaced


by another process while in the middle of a kernel function A request for kernel-mode execution occur in two ways:

explicitly via system call Implicitly via hardware interrupts.

Implicit System call: Example When a page fault occurs, a device controller may deliver hardware interrupt that causes the CPU to start executing a kernel-defined handler for that interrupt.

Critical Section Problem

If one kernel task is in the middle of accessing some data structure when an interrupt service routine executes, then that service routine cannot access or modify the same data without risking data corruption. A framework is required that allows kernel tasks to run without violating the integrity of shared data. Prior to Version 2.6, Linux was a non-preemptive kernel.

Solution: Critical Section Problem

Linux kernel provides


Spin locks Semaphores

SMP machines Spin locks On single processor systems Enabling and disabling kernel preemption.

Single Processor Disable kernel preemption Enable kernel preemption

Multiple processor Acquire spin lock Release spin lock

Disabling and Enabling Interrupts

Two system calls for disabling and enabling kernel preemption.


preempt_disable( ) Preempt_enable( )

In addition, a kernel is not preemptable if a kernel-mode task is holding a lock. preempt_count : A counter indicating the number of locks being held by the task.

A counter is incremented when a lock is aquired and decremented when a lock is released.

If preempt_count>0 It is not safe to preempt the kernel. otherwise if preempt_count==0 Can saftly be interrupted.

Spin Locks

Used in kernel only when the lock is held for short durations. When the locks are held for long durations, semaphores are used. Included in <linux/spinlock.h>

Spin Lock Functions


Function
spin_lock_init() spin_lock() spin_unlock() spin_unlock_wait() spin_is_locked() spin_trylock()

Description
set the spinlock to 1 (unlocked) cycle until spin lock becomes 1, then set to 0 set the spin lock to 1 wait until the spin lock becomes 1 return 0 if the spin lock is set to 1 set the spin lock to 0 (locked), and return 1 if the lock is obtained

System calls related to scheduling


System call
nice( ) getpriority( ) setpriority( ) sched_yield( ) sched_get_ priority_min( )

Description
Change the priority of a conventional process. Get the maximum priority of a group of conventional processes. Set the priority of a group of conventional processes. Relinquish the processor voluntarily without blocking.

Get the minimum priority value for a policy.

sched_get_ priority_max( ) sched_rr_get_interval( )

Get the maximum priority value for a policy.


Get the time quantum value for the Round Robin policy

References

[1] Silberschatz, A., P.B. Galvin, and G. Gagne, "Chapter 8 CPU Scheduling, Operating System Concepts, Sixth Ed.," John Wiley & Son, 2003. [2] Daniel P. Bovet & Marco Cesati, "Chapter 10, Processing Scheduling, Understanding the Linux Kernel," 2000. [3] Sivarama P. Dandamudi and Samir Ayachi. Performance of hierarchical processor scheduling in shared-memory multiprocessor systems". IEEE Transactions on Computers, 48(11):12021213, 1999. DA99 [4] S. Haldar and D. K. Subramanian. Fairness in processor scheduling in time sharing systems. Operating Systems Review, Vol 25. Issue 1.:418, 1991. HS91 [7]John O'Gorman, Chapter 7, Scheduling, The Linux Process Manager, 2003.

You might also like