LECTURE 0 INTRODUCTION Shankar Balachandran, IIT Bombay (shankarb@ee.iitb.ac.in) EE 717/453 Pre-requisite Knowledge of programming (Any language C/C++/Java, ..) Attendance Highly recommended Submission of assignments (firm deadlines) Timing Mon and Thu (Slot 13): 6:30 pm to 8:00 pm Office Hours: TBA Email: shankarb@ee.iitb.ac.in
2 Course Outline Data Structures and Algorithms About 10-12 classes Cover basic data structures (Stack, Queue, Linked List, Tree, Graph, Hashing) Basic Algorithm Design Computer Architecture and Compiler Design About 2-4 lectures Basic architectures and compiler design phases 3 Course Outline Operating System About 4 - 6 lectures Basics of operating systems Support for multiple programs and multithreaded programs Parallel Programming About 6 8 lectures Basics of parallel programming Cilk/OpenMP/MPI/CUDA
4 Course Outline Miscellaneous About 1-2 lectures Summary 5 Books Data Structures, Algorithms, and Applications in C++ Sartaj Sahni Introduction to Algorithms Cormen et al. Modern Operating Systems Tanenbaum Introduction to parallel programming Thomas Rauber 6 Evaluation Mid semester Examination (15%) Date: TBD Final Examination (35%) Date: As per the institute norms Assignments (40%) Periodic programming assignments will be given Strict submission deadlines Continuous Evaluation (10%) Once in two weeks, beginning of class Monday Best M of N
7 Grades Absolute Grading AA: > 94 AB : 85 - 94 BB : 75 - 84 BC : 65 74 CC : 55 74 CD : 45 54 DD : 40 45 FR: < 40 Instructor reserves right to change the grading policy 8 Problem Solving: Main Steps 1. Problem definition 2. Algorithm design/ Algorithm specification 3. Algorithm analysis 4. Implementation 5. Testing 6. [Maintenance] 9 10 Algorithms An algorithm is a well-defined computational procedure that takes a set of values as input and produces a set of values as output. An algorithm can also be thought of as a tool for solving a computational problem that specifies in general terms the desired input/output relationship.
Example: Sorting problem 11 Input: A sequence of n numbers <a 1 , ... , a n >. Output: Reordering of the input sequence <a 1 ', ... , a n '> such that: a 1 ' <= a 2 ' <= ... <= a n ' Is the specification complete? What else will you have to put in to make it complete? 12 Algorithms and Data Structures Definitions An algorithm is correct if for every input instance, it finishes with the correct output. We say that a correct algorithm solves the given computational problem. A data structure is a way to store and organize data in order to facilitate access and modifications.
Running A Program on A Processor Processor Performance = --------------- Time Program
Programmer + Compiler Designer Instructions Cycles Program Instruction Time Cycle (code size) = X X (CPI) (cycle time) 13 Processor Designer Chip Designer Problem Definition What is the task to be accomplished? Calculate the average of the grades for a given student Understand the speeches given by politicians and translate them into Chinese
What are the time / space / speed performance requirements ? 15 Algorithm Design/Specifications Algorithm: Finite set of instructions that, if followed, accomplishes a particular task. Describe: in natural language / pseudo-code / diagrams / etc Criteria to follow: Input: Zero or more quantities (externally produced) Output: One or more quantities Definiteness: Clarity, precision of each instruction Finiteness: The algorithm has to stop after a finite (may be very large) number of steps Effectiveness: Each instruction has to be basic enough and feasible Understand speech Translate to Chinese
16 Implementation, Testing, and Maintenance Implementation Decide on the programming language to use C,C++,Lisp,Java,Perl,Prolog,Assembly etc. Write clean, well documented code Test, test, test
Integrate feedback from users, fix bugs, ensure compatibility across different versions Maintenance
17 Algorithm Analysis Space complexity How much space is required
Time complexity How much time does it take to run the algorithm
Often, we deal with estimates! 18 Space Complexity (1/3) Space complexity = The amount of memory required by an algorithm to run to completion Some algorithms may be more efficient if data is completely loaded into memory Need to look also at system limitations E.g. Classify 10 GB of text in various categories [politics, tourism, sport, natural disasters, etc.] can I afford to load the entire collection?
19 Space complexity (2/3 ) Fixed part: The size required to store certain data/variables, that is independent of the size of the problem: e.g. name of the data collection same size for classifying 2GB or1MB of texts Variable part: Space needed by variables, whose size is dependent on the size of the problem: e.g. actual text load 2GB of text vs. load 1MB of text
20 Space Complexity (3/3) S(P) = c + S(instance characteristics) Example: void float sum (float* a, int n){ float s = 0; for (int i = 0; I < n; i++){ s += a[i]; } return s; } Space? One word for n, one for a [passed by reference], one for i Constant space Input: Requires linear space 21 Time Complexity Often more important than space complexity space available (for computer programs!) tends to get larger and larger time is still a problem for all of us 3-4 GHz multi-core processors are in the market still researchers estimate that the computation of various transformations for 1 single DNA chain for one single protein on 10 GHz computer would take about 1 year to run to completion Running time of algorithms is an important issue 22 Running Time Problem: Prefix average Given an array X Compute the array A such that A[i] is the average of elements X[0] ... X[i], for i=0..n-1 Solution 1: At each step i, compute the element X[i] by traversing the array A and determining the sum of its elements, respectively the average Solution 2: At each step i update a variable sum as sum of i numbers Compute the element X[i] as sum/I Big question: Which solution to choose? Easy for this problem. Not so always. 23 Running Time
Suppose the program includes an if-then statement that may execute or not variable running time Typically algorithms are measured by their worst case
24 0 1 2 3 4 5 6 A B C D E F G Worst case Best case Average case Experimental Approach Write a program that implements the algorithm Run the program with data sets of varying size. Determine the actual running time using a system call to measure time (e.g. system (date) ); Problems? 25 Experimental Approach It is necessary to implement and test the algorithm in order to determine its running time Experiments can be done only on a limited set of inputs, and may not be indicative of the running time of the other inputs The same hardware and software should be used in order to compare two algorithms condition very hard to achieve! 26 Theoretical Approach Based on high-level description of the algorithms, rather than language dependent implementation Makes possible an evaluation of the algorithms that is independent of the hardware and software environment Generality 27 Algorithm Description Usually done with pseudocode. Example: find the maximum element of an array Algorithm ArrayMax (A, n): Input: An array A storing n intergers Output: The maximum element in A
currentMax A[0] for I 1 to n-1 do if currentMax < A[i] then currentMax A[i] return currentMax
28 Pseudo Code Expression: use standard mathematical symbols use for assignment use = for the equality relationship Programming constructs: Decision structures: if then [else ..] While-loop: while .. Do Repeat-loops: repeat until For-loop: for do Array indexing: A[i]
29 30 Method declarations: Algorithm name (parameters) Methods Calls: object methods (arguments) ; - return: return value Use comments Instructions have to be basic enough and feasible Low Level Algorithm Analysis Based on primitive operations (low-level computations independent from the programming language) For example: Make an addition = 1 operation Calling a method or returning from a method = 1 operation Index in an array = 1 operation Comparison = 1 operation etc. Method: Inspect the pseudo-code and count the number of primitive operations executed by the algorithm 31 How Many Operations? Algorithm ArrayMax (A, n): Input: An array A storing n intergers Output: The maximum element in A currentMax A[0] for I 1 to n-1 do if currentMax < A[i] then currentMax A[i] return currentMax 32 Asymptotic Notation
Need to abstract further Give an idea of how the algorithm performs n steps vs. n+500 steps n steps vs. n 2 steps 33 Compare the two growth rates 34 N N+500 N 2 1 6 1 10 510 100 20 520 400 25 525 625 30 530 900 50 550 2500 100 600 10000 Homework 35 Compare the growth rate of the following functions: N+250 N 1.35