You are on page 1of 93

csc2404 Operating Systems

Faculty of Sciences

Laboratory Book

Written by Dr. Peiyi Tang, Dr. Jiuyong Li, Dr. Richard Watson & Dr. Leigh Brookshaw The University of Southern Queensland

c The University of Southern Queensland, June 6, 2011. Distributed by The University of Southern Queensland Toowoomba, Queensland 4350 Australia http://www.usq.edu.au

Copyrighted materials reproduced herein are used under the provisions of the Copyright Act 1968 as amended, or as a result of application to the copyright owner. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without prior permission.
A Produced with L TEX by the author(s) using the Department of Mathematics & Computing (http://www.sci.usq.edu.au) StudyBook class. Adapted from the refrep class (part of the refman v2.0e package by Kielhorn & Partl) to implement Wendy Priestlys Instructional typographies using desktop publishing techniques to produce eective learning and training materials, http://www.ascilite.org.au/ajet/ajet7/priestly.html.

Table of Contents

Preface 1 Installation of Nachos System 1.1 1.2 1.3 1.4 Installation of Nachos . . . . . . . . . . . . . . . . . . Testing Nachos . . . . . . . . . . . . . . . . . . . . . . The C++ Programming Language and the gdb Debugger Navigating the Nachos system using editor tags . . . . 1.4.1 1.4.2 1.5 1.6 Vi and ctags . . . . . . . . . . . . . . . . . . . Emacs and etags . . . . . . . . . . . . . . . . .

ix 1 1 2 4 6 7 8 9 10 11 11 12 12 12 13 16 21 23 23 23 24 24 24 iii

Clean Up . . . . . . . . . . . . . . . . . . . . . . . . . Things to do . . . . . . . . . . . . . . . . . . . . . . .

2 Makeles of Nachos 2.1 Makeles Structure of Nachos . . . . . . . . . . . . . . 2.1.1 2.1.2 2.1.3 2.1.4 2.2 2.3 Makele . . . . . . . . . . . . . . . . . . . . . . Makele.local . . . . . . . . . . . . . . . . . . . Makele.dep . . . . . . . . . . . . . . . . . . . Makele.common . . . . . . . . . . . . . . . . .

Building a Modied Nachos in Another Directory . . . Things to Do . . . . . . . . . . . . . . . . . . . . . . .

3 Synchronization Using Semaphores 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 3.2 Semaphores . . . . . . . . . . . . . . . . . . . . The Producer/Consumer Problem . . . . . . . Nachos main Program . . . . . . . . . . . . . .

Things to Do . . . . . . . . . . . . . . . . . . . . . . .

iv 3.2.1

Table of Contents Tasks . . . . . . . . . . . . . . . . . . . . . . . 25 29 29 30 30 32 32 33 33 33 33 36 37 37 38 39 40 41 43 47 47 47 48 48 49 50 50 50 50

4 Nachos File System 4.1 Nachos File System summary . . . . . . . . . . . . . . 4.1.1 4.2 4.3 4.4 File Header . . . . . . . . . . . . . . . . . . . .

Compiling the Nachos le system . . . . . . . . . . . . Usage of Nachos File System Commands . . . . . . . . Test Files . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 UNIX command od . . . . . . . . . . . . . . . .

4.5

Things to Do . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 4.5.2 Compiling Nachos File System . . . . . . . . . Testing Nachos File System . . . . . . . . . . .

4.6

Questions . . . . . . . . . . . . . . . . . . . . . . . . .

5 Extendable Files 5.1 5.2 5.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . Nachos le system runtime organisation . . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . . . 5.3.1 5.3.2 5.4 Getting started in the lab5 directory . . . . . . Modications to Nachos . . . . . . . . . . . . .

Testing the New File System . . . . . . . . . . . . . .

A Unix essentials A.1 Command line interface . . . . . . . . . . . . . . . . . A.2 Files and directories . . . . . . . . . . . . . . . . . . . A.3 Processes . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Commands . . . . . . . . . . . . . . . . . . . . . . . . A.4.1 man pages . . . . . . . . . . . . . . . . . . . . . A.4.2 Directories . . . . . . . . . . . . . . . . . . . . A.4.3 Files . . . . . . . . . . . . . . . . . . . . . . . . A.4.4 Miscellaneous . . . . . . . . . . . . . . . . . . . A.4.5 Using the shell . . . . . . . . . . . . . . . . . .

c USQ, June 6, 2011

Table of Contents B GDB Essential Commands B.1 Before you start . . . . . . . . . . . . . . . . . . . . . . B.2 Typing gdb commands . . . . . . . . . . . . . . . . . . B.3 Starting and stopping gdb . . . . . . . . . . . . . . . . B.4 Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . B.5 Continuing and Stepping . . . . . . . . . . . . . . . . . B.6 Displaying source and expressions . . . . . . . . . . . . C A Quick Introduction to C++ C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . C.2 C in C++ . . . . . . . . . . . . . . . . . . . . . . . . . C.3 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . C.3.1 Classes . . . . . . . . . . . . . . . . . . . . . .

v 53 53 53 54 54 55 56 57 57 58 59 60 65 66 67 73 75 77 79 79 80 80

C.3.2 Other Basic C++ Features . . . . . . . . . . . C.4 Advanced Concepts in C++: Dangerous but Occasionally Useful . . . . . . . . . . . . . . . . . . . . . . . . . C.4.1 Inheritance . . . . . . . . . . . . . . . . . . . . C.4.2 Templates . . . . . . . . . . . . . . . . . . . . . C.5 Features To Avoid Like the Plague . . . . . . . . . . . C.6 Style Guidelines . . . . . . . . . . . . . . . . . . . . . . C.7 Compiling and Debugging . . . . . . . . . . . . . . . . C.8 Example: A Stack of Integers . . . . . . . . . . . . . . C.9 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . C.10 Further Reading . . . . . . . . . . . . . . . . . . . . .

c USQ, June 6, 2011

vi

Table of Contents

c USQ, June 6, 2011

List of Figures

4.1 4.2 5.1 5.2 5.3

Nachos File System . . . . . . . . . . . . . . . . . . . . Nachos File Header (128 bytes) . . . . . . . . . . . . . Extension of a le . . . . . . . . . . . . . . . . . . . . File system data structures . . . . . . . . . . . . . . . Structure of Nachos File System . . . . . . . . . . . .

30 31 38 39 40

vii

viii

List of Figures

c USQ, June 6, 2011

Preface

This is the Laboratory Book for the course CSC2404 Operating Systems. It contains a number of laboratory exercises that should be completed sequentially through the semester. The exercises are all based on use of the Nachos operating system. You must complete these exercises in order to gain sucient understanding of the system to undertake the programming assignments. In the appendix there is also reference material about programming in C++, the Unix operating system and debugging Unix programs. More material about programming in C++ is also available on the course web site.

ix

Preface

c USQ, June 6, 2011

Laboratory 1

Installation of Nachos System

The purpose of this laboratory session is to enable you to: install and compile the Nachos system, understand the structure of Nachos, and familiarise yourself with C++ programming language. We assume that you already have installed Linux. If you have not done so, please install the Departments version of Debian/GNU Linux as soon as possible (see the Introductory Book for this course).

Laboratory contents
1.1 1.2 1.3 1.4 Installation of Nachos . . . . . . . . . . . . Testing Nachos . . . . . . . . . . . . . . . . The C++ Programming Language and the gdb Debugger . . . . . . . . . . . . . . . . . Navigating the Nachos system using editor tags . . . . . . . . . . . . . . . . . . . . 1.4.1 1.4.2 1.5 1.6 Vi and ctags . . . . . . . . . . . . . . . . . Emacs and etags . . . . . . . . . . . . . . . 1 2 4 6 7 8 9 10

Clean Up . . . . . . . . . . . . . . . . . . . Things to do . . . . . . . . . . . . . . . . .

1.1

Installation of Nachos
The simplest way to install Nachos is to use the command tar directly from a terminal window. First you must access the Nachos distributiondownload the archive le from the web site.

Installation of Nachos Follow the following steps to install Nachos: 1. 2. 3. cd mkdir CSC2404 cd CSC2404

Laboratory 1 Installation of Nachos System 4. download the distribution le here. 5. 6. tar -xzvf nachos-3.4.tgz rm nachos-3.4.tgz You may wish to keep the le after you have unpacked it; if so then dont do this step.

After these steps, you should have a directory named CSC2404 in your home directory. Directory CSC2404 should contain a directory named nachos-3.4 which is the Nachos package. Installation of Nachos on a 64 Bit Operating System If your Linux distribution is the 64Bit version you will need to install a number of extra packages. Nachos is a 32Bit operating system and will have to be run in 32Bit emulation mode on a 64Bit operating system. The 32Bit version of some of the run-time libraries need to be installed plus a version of the compiler that is aware of them. Under Debian/GNU or Ubuntu the packages to install are: g++-4.3-multilib ia32-libs libc6-dev-i386 The Nachos Makeles should automatically set the appropriate compiler ags for a 64Bit operating systemso only the installation of the above packages should be necessary. MacOS X Currently Nachos can not be compiled on MacOS X. Part of the Nachos compilation procedure requires the compilation of Assembly Language code. Nachos is a 32bit operating system and the Assembler used must have the ability to compile to 32bit code. The default MacOS Assembler does not appear to be able to create 32bit code! (If anyone knows dierently please contact the examiner.) Mac users will have to either install Linux (as a virtual machine or dual boot) or remotely use the Linux machine decius.sci.usq.edu.au If you have any questions please contact the examiner.

1.2

Testing Nachos
Once you have installed Nachos in your home directory, follow the steps below to test it:

c USQ, June 6, 2011

1.2 Testing Nachos

3 1. Move to directory ~/CSC2404/nachos-3.4/ and check its subdirectories: c++example: This subdirectory contains examples of simple C++ programs and a short paper on the programming language C++ written by Prof. Tom Anderson. This article is also available on the course web site and in Appendix C. The purpose of these examples and the article is to provide a quick introduction to C++ for programming with Nachos. code: This subdirectory contains the Nachos source code. doc: This directory is empty at the moment. 2. Move into subdirectory code and you will see the following les and directories in it:
Makefile.common Makefile.dep ass2/ ass4/ bin/ filesys/ lab2/ lab3/ lab5/ machine/ network/ test/ threads/ userprog/ vm/

Here lab2/, lab3/, lab5/, and ass2/ and ass4/ are your working directories for laboratory sessions and assignments, respectively. The remaining directories are the original Nachos directories. An executable version of Nachos can be created in any of these directories. Depending upon the denitions in the Makeles that reside in those directories, the Nachos executable will behave dierently. For instance, making Nachos in threads/ builds a minimal Nachos, while making Nachos in filesys/ builds a Nachos that has support for a lesystem. The build process is described in detail in the following chapter. 3. Move into directory threads/ and execute command make. You should see that the Nachos system is compiling. The last couple of lines of the output on the screen should be
....>>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ............. ........................................................ ........................................................ ln -sf arch/unknown-i386-linux/bin/nachos nachos

If you get this far, you have successfully compiled the smallest core of the Nachos system. You also should see symbolic link nachos in the current directly linked to arch/unknown-i386-linux/bin/nachos.

c USQ, June 6, 2011

Laboratory 1 Installation of Nachos System 4. Now you can test your Nachos system by executing the command nachos in the current directory. Note: it will be necessary to issue this command in the form ./nachos if your $PATH environmental variable does not include the current directory (.). The output on the screen should be as follows:
*** thread 0 looped 0 times *** thread 1 looped 0 times *** thread 0 looped 1 times *** thread 1 looped 1 times *** thread 0 looped 2 times *** thread 1 looped 2 times *** thread 0 looped 3 times *** thread 1 looped 3 times *** thread 0 looped 4 times *** thread 1 looped 4 times No threads ready or runnable, and no pending interrupts. Assuming the program completed. Machine halting! Ticks: total 130, idle 0, system 130, user 0 Disk I/O: reads 0, writes 0 Console I/O: reads 0, writes 0 Paging: faults 0 Network I/O: packets received 0, sent 0 Cleaning up...

1.3

The C++ Programming Language and the gdb Debugger


The Nachos source code is written in C++. This course does not require prior knowledge of C++. C++ is a complicated object-oriented language. We only use the part of C++ related to abstract data types and encapsulation. We do not use inheritance of C++ in Nachos. Dr. Tom Anderson wrote an article for quick introduction of C++ for the purpose of using Nachos. We included this article as an appendix to this laboratory book. If you are not familiar with C++, read this article rst. There are three C++ example programs in c++example directory discussed in this article. You only need to study program stack.cc. Read the code of stack.cc and make sure that you understand it. If you are not familiar with GNU debugger gdb, you may want to learn how to use it. You can compile stack.cc by command make stack and run gdb for executable stack. You may need to use gdb when you need to debug your programs in this course.

c USQ, June 6, 2011

1.3 The C++ Programming Language and the gdb Debugger Note

The debugger is a powerful tool that will aid you in nding errors in your code. Please do not ignore itNachos is not a simple codeit is a complex multi-threaded operating system simple print statements will not be adequate when attempting to debug your modications and additions to Nachos. Let us use the C++ program in directory c++example/ to show the steps to trace programs using gdb. In the c++example directory, compile the stack program by typing make stack. Run gdb to debug this program as follows: type the command gdb stack The debugger will respond with a introductory message followed by the standard debugger prompt: (gdb) At the prompt above, type gdb command list (or l). The rst 10 lines of the main program will be shown in the buer as follows:
(gdb) list 125 } 126 127 //-----------------------------------------------128 // main 129 // Run the test code for the stack implementation. 130 //-----------------------------------------------131 132 int 133 main() { 134 Stack *stack = new Stack(10); // Constructor... (gdb)

As you can see, the rst statement of the main program is at line 134. Set a break point at line 134 by typing gdb command break 134 and you will see:
(gdb) break 134 Breakpoint 1 at 0x8048a73: file stack.cc, line 134. (gdb)

Then type gdb command run and the control will stop at the break point you just set up. You will see:
(gdb) run Starting program: /CSC2404/nachos-3.4/c++example/stack

c USQ, June 6, 2011

Laboratory 1 Installation of Nachos System

Breakpoint 1, main () at stack.cc:134 134 Stack *stack = new Stack(10); // Constructor... (gdb)

Type the gdb command next, (or just n or RETURN) and you will see the next statement displayed. The next command steps to the statement following the current one. If the current statement contains a function or method call, the debugger will not step through the statements of the function. In order to step into the method call stack->SelfTest(), type gdb command step (this can be abbreviated to just s). You can display the value of any variable by using the gdb command print. For instance, within stack->SelfTest() the cammand could be used like this: (gdb) print count $1 = 17 Try this command to watch the values of any variables that you think can demonstrate that the program is running correctly. This has been a very brief introduction to using gdb with Nachos. For more on gdb see Appendix B, which contains a brief description of the most useful commands and how to use them.

1.4

Navigating the Nachos system using editor tags


Nachos is a large piece of software, spread over many les. Reading the code online is much easier if one can quickly switch, for instance, between looking at a line containing a function or method call to viewing the denition of that method. Integrated development environments (IDEs) often provide this feature. In Unix sustems, the use of tag-enabled editors, and tag producing software provides precisely the same functionality. Editors such as vi (and its derivatives (e.g. Elvis, Vim, Vile, Lemmy)) and emacs support navigation using tags. The editor reads a tag le which is an index le that associates names (tags) with locations in source les. The editor provides commands that, given a tag, will move the cursor position to the target of that tag. This works within

c USQ, June 6, 2011

1.4 Navigating the Nachos system using editor tags

a single le, or a set of les; the les can be spread over a number of directories. This obviously is very useful when you have to deal with a large software artifact such as Nachos. The following sections provide just enough information to get you started using tags. More help is available via man pages (ctags) and the application help within vi and emacs. We will describe two avours of tag usage, with either the vi or emacs editors. Note that the vi variant described is actually the vim editor. A number of tag generation programs exist. The are normally called ctags (to produce tables for vi-like editors) and etags (to produce tables for emacs editors). (Often this is the same executable program, but produces tables of dierent format depending on its name or command line arguments.) The Exuberant Ctags system is a widely used system, and is present in the Department of Mathematics and Computings Debian distribution. Source distributions are easily found on the web. It is not normally installed by default, so you may need to install it using the aptitude application. To check if you have Exuberant Ctags, type the command ctags --help You should see a response which starts like this:
Exuberant Ctags 5.7, Copyright (C) 1996-2007 Darren Hiebert

It is important to check this because, if you have installed emacs, a version of GNU Ctags will have been installed. This GNU version is not recommendedit simply does not have all the features that we require. 1.4.1 Vi and ctags The following is a brief demonstration of using tags in vi. It only demonstrates the basic (and most important) command. However, these will be sucient to make life a lot easier for you in navigating the Nachos system. The vim help system provides full documentation on all tags related commands. In the following, actual command will appear in typewriter format. cd to the code directory of Nachos Create the tags le: ctags -R The -R options species a recursive search of les. Note that fancier invocations are possible (see man ctags), but this is

c USQ, June 6, 2011

Laboratory 1 Installation of Nachos System sucient. This will create the le code/tags, in a format that vi can use. vi threads/main.cc You can view help on tags by typing :help tags. This opens a new window; type CTRL-w <uparrow> and CTRL-w <downarrow> to switch between windows, and CTRL-w q to close the current window. Move to line 104 (:104) and position the cursor in the function name StartProcess. Press CTRL-]. This should cause the window to now display the le userprog/progtest.cc, positioned at line 24, the denition of StartProcess. Type the CTRL-T key; this will return the editor to its previous position. You can chain tag movement commands. The editor maintains a tag stack to record where you have come from so that after a sequence of CTRL-] commands a sequence of CTRL-T commands will return to the original position. To see the tag stack type the command :tags. See also the vim help on commands to jump to locations on the tag stack. For instance count CTRL-T will step back count (a number) of steps. Move the cursor into Print on line 135 of the le main.cc. Jump using CTRL-]. The new position should now be line 185 of filesys/directory.cc. Move the cursor into Print on line 193 (the call is hdr->Print()) and try CTRL-]. You should see the message tag 1 of 9 or more (or something like it) and the display will position to the current denition of Print. Type :ts Print to see all the possible destinations. Note the the current denition is the rst entry. The :ts command allows you to move to any one of the listed tag entries. Try going to one (e.g. type 5).

1.4.2

Emacs and etags The following is a brief demonstration of using tags in emacs. It follows the example of the previous section on using vi tags. It only demonstrated the basic (and most important) command. However, these will be sucient to make life a lot easier for you in navigating the Nachos system. The emacs help system provides full documentation on all tags related commands. In the following, actual command will appear in typewriter format. cd to the code directory of Nachos

c USQ, June 6, 2011

1.5 Clean Up

9 Create the tags le: etags -R The -R options species a recursive search of les. fancier invocations are possible (see man ctags), but this is sucient. This will create the le code/TAGS, in a format that emacs can use. emacs threads/main.cc You can view help on tags by typing choosing the Help menu and then selecting Read the Emacs Manual. This opens a new window; use the Buers menu to switch windows. Search (CTRL-S) for the keyword tags, select this item and then on the Tags Tables menu select the Find Tag item. Move to line 104 and select (highlight) the StartProcess function name. Press ALT-. to go to the denition of StartProcess. (You will have to conrm this selection.) The rst time you use ALT-. you will see a question in the bottom window like:
Visit tags table: (default TAGS) /nachos-3.4/code/threads/

Emacs is asking for the location of the TAGS le. It is not in code/threads, but in code, so remove the trailing threads/ and press enter to continue. A window will be opened and the le userprog/progtest.cc will be displayed, positioned at line 24, the denition of StartProcess. Type the ALT-* key; this will return the editor to its previous position. You can chain tag movement commands. The editor maintains a tag stack to record where you have come from so that after a sequence of ALT-. commands a sequence of ALT-* commands will return to the original position.

1.5

Clean Up
You need to clean up after you nish with Nachos in each directory. Simply execute make clean and all the object les, dependency les, binary le nachos as well as the symbolic link nachos in the current directory will be deleted. You can tell whether the directory is cleared by looking at whether the symbolic link nachos exists or not. You can also check the directory size (total size of all les in the directory) using the the unix command du -sh before and after performing the make clean command. This will show how much le

c USQ, June 6, 2011

10

Laboratory 1 Installation of Nachos System space was recovered by the clean command.

1.6

Things to do
We summarise the things to do in this lab session as follows: install the Linux operating system install the Nachos system (see Section 1.1) compile and test the Nachos system installed (see Section 1.2) exercise with gdb (see Section 1.3) navigating with tags (see Section 1.4)

c USQ, June 6, 2011

Laboratory 2

Makeles of Nachos

The purpose of this laboratory is to understand the makeles structure of Nachos system, and to know how to set up a separate directory to develop a new version of Nachos system.

Laboratory contents
Makeles Structure of Nachos . . . . . . . 2.1.1 Makele . . . . . . . . . . . . . . . . . . . 2.1.2 Makele.local . . . . . . . . . . . . . . . . 2.1.3 Makele.dep . . . . . . . . . . . . . . . . 2.1.4 Makele.common . . . . . . . . . . . . . . 2.2 Building a Modied Nachos in Another Directory . . . . . . . . . . . . . . . . . . . 2.3 Things to Do . . . . . . . . . . . . . . . . . 2.1 . . . . 11 12 12 12 13 16 21

2.1

Makeles Structure of Nachos


As we mentioned in Lab 1, the Nachos system can be compiled in a number of Nachos directories, ../threads/, ../filesys/, etc. In each of these directories, there are two makeles, Makefile and Makefile.local. In the parent directory ../code/, there are additional makeles, Makefile.common and Makefile.dep, which are shared and invoked by the makeles in all the Nachos directories. Thus, the structure of the makeles is as follows:
../code/Makefile.common /Makefile.dep | | /threads/Makefile /Makefile.local | | /filesys/Makefile /Makefile.local | | ..

11

12 2.1.1 Makele

Laboratory 2 Makeles of Nachos

This is the makele used by make program, when you build a Nachos in any Nachos directory by typing command make or make all. Examining this le reveals that it mainly includes two other makeles: include Makefile.local include ../Makefile.common 2.1.2 Makele.local This makele in each Nachos directory is to dene a couple of important Macros: CCFILES: the string to specify all the C++ les used to build the Nachos in this directory. INCPATH: the string to dene the include path for g++ to search head les (.h les) specied in the C++ programs. DEFINES: the string for labels to be passed to g++. Note that the assignment operator for INCPATH and DEFINES is +=, which means that the right-hand side string is to be appended to the original contents of INCPATH and DEFINES. 2.1.3 Makele.dep This is the makele to be included in Makefile.common. It denes a lot of system-dependent macros used by g++. The current Nachos distribution can be compiled on four dierent UNIX systems and all the object codes and binary executables as well as dependence les are to be placed in a particular directory under the directory arch in the Nachos directory shown as follows:
threads>> pwd /home/leighb/CSC2404/nachos-3.4/code/threads threads>> ls arch dec-alpha-osf/ sun-sparc-sunos/ dec-mips-ultrix/ unknown-i386-linux/

The system-dependent macros dened by Makefile.dep includes: HOST, arch, CPP, CPPFLAGS, GCCDIR, LDFLAGS and ASFLAGS. The definitions for the LINUX systems is:
# 386, 386BSD Unix, or NetBSD Unix (available via anon ftp # from agate.berkeley.edu) ifeq ($(uname),Linux) HOST_LINUX=-linux HOST = -DHOST_i386 -DHOST_LINUX CPP=/lib/cpp
c USQ, June 6, 2011

2.1 Makeles Structure of Nachos


CPPFLAGS = $(INCDIR) -D HOST_i386 -D HOST_LINUX arch = unknown-i386-linux ifeq ($(shell uname -m),x86_64) MACHINE=-m32 LDFLAGS=$(MACHINE) ASFLAGS=--32 endif ifdef MAKEFILE_TEST #GCCDIR = /usr/local/nachos/bin/decstation-ultrixGCCDIR = /usr/local/mips/bin/decstation-ultrixLDFLAGS = -T script -N ASFLAGS = -mips2 endif endif

13

This makele also denes other macros dependent on the systemdependent macros above. They are:
arch_dir = arch/$(arch) obj_dir = $(arch_dir)/objects bin_dir = $(arch_dir)/bin depends_dir = $(arch_dir)/depends

These macros show that in each of the system-dependent directories in arch directory, there are three directories to accommodate object codes, binary executable and dependence les, respectively. For example, in the arch/unknown-i386-linux/ for your linux system, there are directories as follows:
unknown-i386-linux>> ls bin/ depends/ objects/

2.1.4

Makele.common The le Makefile.common is the most complicated one and it denes all the rules for compiling a completed Nachos system. It rst includes the Makefile.dep. Then it denes the vpaths for various kinds of les as follows:

vpath %.cc ../network:../filesys:../vm:../userprog:../threads:../machine vpath %.h ../network:../filesys:../vm:../userprog:../threads:../machine vpath %.s ../network:../filesys:../vm:../userprog:../threads:../machine

It tells make where to nd les if it cannot nd them in the current directory. This is why you can build a Nachos in a new directory (other than ../threads/, ../filesys/) without copying the les which you do not need to modify.

c USQ, June 6, 2011

14

Laboratory 2 Makeles of Nachos This le then denes macros for object les ( ofiles = $(cc ofiles) $(c ofiles) $(s ofiles)) , CFLAGS, and the ultimate target (program = $(bin dir)/nachos). These denitions show that we are going to build the binary executable named nachos in the directory $(bin dir) which is arch/unknown-i386-linux/bin/ in your linux system. The rule to build that target is dened in the following lines:
$(bin_dir)/% : @echo ">>> Linking" $@ "<<<" $(LD) $^ $(LDFLAGS) -o $@ ln -sf $@ $(notdir $@)

This rule is a static pattern rule. The % in the target can match any non-empty string in the target of other rules and these multiple rules will be combined to dene the dependence. In our case, this rule is to be combined with rule:
$(program): $(ofiles)

In the command above, $@ represent the target which is


arch/unknown-i386-linux/bin/nachos

in your Linux system and $^ all the dependence les which are all object les dened by macro ofiles. The rst command of this rule is simply to load these object les to form a binary executable. Note that LD is actually g++. The next command is to make a symbolic link to the binary executable. ln -sf $@ $(notdir $@) actually expands to
ln -sf arch/unknown-i386-linux/bin/nachos nachos

The rule to make object codes from the C++ source codes is:
$(obj_dir)/%.o: %.cc @echo ">>> Compiling" $< "<<<" $(CC) $(CFLAGS) -c -o $@ $<

It is a static rule again. The % is to match any non-empty string. For example, this rule tells how to make
arch/unknown-i386-linux/objects/main.o

from main.cc. However, the object code should also depend on many header les (.h les) included by main.cc. This dependence relation for these header les is actually specied by another rule generated automatically. First of all, we need to know which header les are included (directory and indirectly) by the C++ source le during the compilation. g++ can do the search automatically for you. All you have to do is to use option -MM. Let us do some experiments. In the ../threads/
c USQ, June 6, 2011

2.1 Makeles Structure of Nachos

15

directory, do the following (Use one line for the command. I split it here for clarity of presentation):
threads>> g++ -MM -g -Wall -Wshadow -I../threads -I../machine -DTHREADS -DHOST_i386 -DHOST_LINUX -DCHANGED main.cc

You should see the results as follows:


main.o: main.cc copyright.h utility.h ../machine/sysdep.h \ ../threads/copyright.h system.h thread.h scheduler.h list.h \ ../machine/interrupt.h ../threads/list.h ../machine/stats.h \ ../machine/timer.h ../threads/utility.h

This is the list of all the header les on which main.o depends. If any of these header les is updated, the main.cc should be recompiled to make a new main.o. You can also see that the output of the above command is actually a rule which can be included in the Makefile.common. This is exactly what is done by the remainder of the Makefile.common. First of all, the makele builds a dependence le in directory
arch/unknown-i386-linux/depends/

for each source code le. The rule to do that is:


$(depends_dir)/%.d: %.cc @echo ">>> Building dependency file for " $< "<<<" @$(SHELL) -ec $(CC) -MM $(CFLAGS) $< \ | sed \s@$*.o[ ]*:@$(depends_dir)/$(notdir $@) \ $(obj_dir)/&@g\ > $@

Here the variable CC is equal to g++ and CFLAGS is the same ag which would be used for the real compiling. Note the -MM option of g++. The rest of command creates a dependence le in directory arch/unknown-i386-linux/depends/ after appending the prex arch/unknown-i386-linux/objects/ to the object le name. For example, for main.cc, this rule will create a new le named main.d in directory arch/unknown-i386-linux/depends/ whose contents are:
arch/unknown-i386-linux/depends/main.d \ arch/unknown-i386-linux/objects/main.o: \ main.cc copyright.h utility.h ../machine/sysdep.h \ ../threads/copyright.h system.h thread.h scheduler.h list.h \ ../machine/interrupt.h ../threads/list.h ../machine/stats.h \ ../machine/timer.h ../threads/utility.h

You can check if these les exist, after you make the nachos. Then, there is an include statement in Makefile.common:

c USQ, June 6, 2011

16
include $(dfiles)

Laboratory 2 Makeles of Nachos

This means that Makefile.common includes all the dependence les it created. The contents of these les which are all makele rules become part of this makele. It is these rules that will be combined with the rule of compiling to make the object codes. As a result, we have a complete list of dependence les to make each object code. Another important use of this technique is that we can see what header les are used in compiling a source code by examining the corresponding dependence le. This is very helpful when you are building a new version of Nachos in a separate directory which contains some modied source and header les and you want to be sure that these modied les are actually used in compilation.

2.2

Building a Modied Nachos in Another Directory


The current Nachos allows you to build dierent Nachos in directories ../threads/, ../filesys/ and ../userprog/. You will be required to extend or modify Nachos in the programming assignments and lab sessions. It is always a good idea to change only the relevant les in a separate directory and build the new Nachos there. You want to use the les which are not modied in their original directories. Let us assume that you are required to build a new Nachos in a separate directory called ../lab2. Suppose that you need to change class Scheduler. You do not want to change the original scheduler.h and scheduler.cc in directory ../threads/. What you can do is to copy these two les from ../threads/ to ../lab2/ and make changes to them in ../lab2/. Suppose that you want to build the new Nachos in ../lab2/ using the new scheduler.h and scheduler.cc there. All the other les of the new Nachos should be the original ones from directories ../threads/ ../machine/, etc. In order to do that, you need to copy the empty ../arch/ directory tree recursively and les Makefile and Makefile.local from ../threads/ to ../lab2. The last task is to modify makeles Makefile and Makefile.local so that you can build the new Nachos properly. Makefile in ../lab2 does not need changes, but you do need to change Makefile.local in ../lab2/. Makefile.local basically denes macro CCFILES and re-denes the include path macro INCPATH. The denition of CCFILES does not need changes, because make will follow the vpaths to nd the required source les if they are not in the current directory.
c USQ, June 6, 2011

2.2 Building a Modied Nachos in Another Directory The re-denition of INCPATH needs changes.

17

In the following, I provide two solutions to this problem. You should read through the rst solution as it lays out the problem of synchronising header les with source les. If your compiler allows it I recommend you use the second solution. First Solution: You can change the re-denition of INCPATH as follows:
INCPATH += -I../lab2 -I../threads -I../machine

That is, add -I../lab2 before -I../threads so that C preprocessor (cpp) of g++ will search ../lab2/ rst when it processes include macros in the source les. However, this simple change does not solve all the problems. The current contents of ../lab2/ are as follows:
lab2>> ls Makefile Makefile.local arch/ scheduler.cc scheduler.h

We then type make to build the new Nachos as follows:


lab2>> make ... >>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ........ ....................... ln -sf arch/unknown-i386-linux/bin/nachos nachos lab2>> ls Makefile arch scheduler.cc Makefile.local nachos scheduler.h

But, in this new Nachos, only the new scheduler.cc uses the new scheduler.h. This can be shown by the following output:
lab2>> touch scheduler.h lab2>> make >>> Building dependency file for scheduler.cc <<< >>> Compiling scheduler.cc <<< g++ -g -Wall -Wshadow -I../lab2 -I../threads -I../machine -DTHREADS -DHOST_i386 -DHOST_LINUX -DCHANGED -c -o arch/unknown-i386-linux/objects/scheduler.o scheduler.cc >>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ............ .................... ln -sf arch/unknown-i386-linux/bin/nachos nachos

Other classes which depend upon the header scheduler.h use the old version of scheduler.h in the ../threads/ directory. This can be shown by the following output:
c USQ, June 6, 2011

18

Laboratory 2 Makeles of Nachos


lab2>> touch ../threads/scheduler.h lab2>> make >>> Building dependency file for ../machine/timer.cc <<< ... >>> Compiling ../threads/main.cc <<< .. >>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ......... .......... ln -sf arch/unknown-i386-linux/bin/nachos nachos lab2>>

This is because when g++ -MM generates dependences, it looks for the header .h les in the same directory as the .cc le. For example, ../threads/main.cc indirectly includes scheduler.h (via the header le system.h). Therefore g++ -MM looks for the header le scheduler.h in the directory ../threads rstnds it, and therefore generates the dependency string ../threads/scheduler.h (you can proove this by checking the contents of the le main.d in the directory ../lab2/arch/unknown-i386-linux/depends/). In order to avoid this, you need to copy all the les in the directory ../threads/ which directly and indirectly include scheduler.h to the ../lab2/ directory. To nd the minimum set of these les, you can use grep command to search for the les which contain string scheduler.h as follows:
threads>> grep scheduler.h * grep: arch: Is a directory scheduler.cc:#include "scheduler.h" scheduler.h:// scheduler.h system.h:#include "scheduler.h" threads>>

We then search for the string system.h because the header le system.h includes scheduler.h.
threads>> grep system.h * grep: arch: Is a directory main.cc:#include "system.h" scheduler.cc:#include "system.h" synch.cc:#include "system.h" synchtest.cc:#include "system.h" system.cc:#include "system.h" system.h:// system.h thread.cc:#include "system.h" threadtest.cc:#include "system.h" threads>>

This means that the minimum set of les we need to copy from ../threads/ to ../lab2/ are
c USQ, June 6, 2011

2.2 Building a Modied Nachos in Another Directory


system.h main.cc synch.cc synchtest.cc system.cc thread.cc threadtest.cc

19

Then we make the new Nachos and the contents of ../lab2/ should be as follows:
lab2>> ls Makefile Makefile.local arch/ main.cc nachos scheduler.cc scheduler.h synch.cc synchtest.cc system.cc system.h thread.cc threadtest.cc

Now we can test that it works OK as follows: 1. We rst change the time-stamp of scheduler.h in ../lab2/ and then make Nachos again. The make command should cause re-compiling of a lot of modules:
lab2>> touch scheduler.h lab2>> make >>> Building dependency file for ../machine/timer.cc <<< ... >>> Compiling main.cc <<< g++ -g -Wall -Wshadow -I../lab2 -I../threads -I../machine -DTHREADS -DHOST_i386 -DHOST_LINUX -DCHANGED -c -o arch/unknown-i386-linux/objects/main.o main.cc ... >>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ....... ............ ln -sf arch/unknown-i386-linux/bin/nachos nachos lab2>>

2. We then change the time-stamp of ../threads/scheduler.h and try the make Nachos again. This time, none of the modules should be re-compiled and it should be shown that the existing Nachos is updated.
lab2>> touch ../threads/scheduler.h lab2>> make make: arch/unknown-i386-linux/bin/nachos is up to date. lab2>>

Second Solution: The second solution is much simpler than the rst one. It takes advantage of a feature of the preprocessor of g++

c USQ, June 6, 2011

20

Laboratory 2 Makeles of Nachos by using the command-line option -I-. Here is the description of this option to g++ (obtained through man gcc).
-I.... In addition, the -I- option inhibits the use of the current directory (where the current input file came from) as the first search directory for #in clude "file". There is no way to override this effect of -I-. With -I. you can specify searching the directory which was current when the compiler was invoked. That is not exactly the same as what the preprocessor does by default, but it is often satisfactory. ...

This means that -I- prohibits including the .h les from the same directory as the .cc le processed. It therefore forces the preprocessor to look for .h les according to the path dened by -I after the -I-. Therefore, we can use the re-denition of INCPATH in ../lab2/Makefile.local as follows:
INCPATH += -I- -I../lab2 -I../threads -I../machine

without copying any les from ../threads/ other than the header le scheduler.h and source le scheduler.cc. Note Though the compiler issues a warning that this option is obsolete and to use the option iquotedont use the replacement option. It is not a complete replacement for -I- and will not work as expected. If you are using version 4.5 or newer of the Gnu Compiler the option -I- will not exist and you will have to use the rst solution discussed in this Laboratory session. The contents of ../lab2/ after Nachos is made is as follows now:
lab2>> ls Makefile Makefile.local lab2>> arch nachos scheduler.cc scheduler.h

We can test that it works OK as follows:


lab2>> touch ../threads/scheduler.h lab2>> make make: arch/unknown-i386-linux/bin/nachos is up to date.

c USQ, June 6, 2011

2.3 Things to Do

21 If we touch the scheduler.h in the current directory ../lab2, it will make Nachosrecompiling as follows:
lab2>> touch scheduler.h lab2>> make >>> Building dependency file for ../machine/timer.cc <<< ... >>> Compiling ../threads/main.cc <<< g++ -g -Wall -Wshadow -I- -I../lab2 -I../threads -I../machine -DTHREADS -DHOST_i386 \ -DHOST_LINUX -DCHANGED \ -c -o arch/unknown-i386-linux/objects/main.o \ ../threads/main.cc ... >>> Linking arch/unknown-i386-linux/bin/nachos <<< g++ arch/unknown-i386-linux/objects/main.o ........... .................... ln -sf arch/unknown-i386-linux/bin/nachos nachos lab2>>

2.3

Things to Do
Your tasks in this lab session are as follows: 1. Read Section 2.1 and make sure you understand the make-le structure of Nachos. 2. Experiment with the two solutions to build a new Nachos in a separate directory described in Section 2.2. Make sure you understand why both solutions are correct.

c USQ, June 6, 2011

22

Laboratory 2 Makeles of Nachos

c USQ, June 6, 2011

Laboratory 3

Synchronization Using Semaphores

In this laboratory session, you are required to write a test program for the producer/consumer problem using semaphores for synchronization. After completing the session, you will have a understanding in Nachos of how semaphores are implemented, and how the producer/consumer problem is implemented using semaphores know how to create concurrent threads in Nachos, and know how to test and debug programs in Nachos. The program has been partially implemented (in the code/lab3 directory of the Nachos system) and you are required to complete the implementation. The work of this laboratory session is very important, as it will prepare you for the programming tasks required for Assignment 2.

Laboratory contents
3.1 Background . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 3.2 3.2.1 23 Semaphores . . . . . . . . . . . . . . . . . . 23 The Producer/Consumer Problem . . . . . 24 Nachos main Program . . . . . . . . . . . . 24 24 Tasks . . . . . . . . . . . . . . . . . . . . . 25

Things to Do . . . . . . . . . . . . . . . . .

3.1
3.1.1

Background
Semaphores Semaphores are one of the most commonly used synchronization schemes for concurrent processes or threads. Section 6.5 of the textbook gives a full description of the concept and implementation of semaphores. In Nachos, semaphores are implemented as class Semaphore. The implementation of Semaphore in Nachos is dierent from the textbook. The implementation of semaphores in Nachos can be found in ../threads/synch.cc. 23

24 3.1.2 The Producer/Consumer Problem

Laboratory 3 Synchronization Using Semaphores

The producer/consumer problem is one of the problems encountered frequently in operating systems design. Both producer and consumer threads access the same ring buer in the shared memory. The producers produce items and put them in the ring buer, while the consumers take and consume items from the buer. A producer has to be blocked when the buer is full and resumed when it becomes non-full. Similarly, a consumer has to be blocked when the buer is empty and resumed when it becomes non-empty. Consequently, producers and consumers need a mechanism for synchronization. 3.1.3 Nachos main Program When you start nachos, the rst program module executed is the main program. Every subdirectory of Nachos can have a main.cc. Take a look at the main.cc in ../threads. You need to study how the command line of nachos is interpreted, how the Nachos kernel is initialized, and how the thread for the main program creates another thread executing function SimpleThread(int which). The source code of SimpleThread(int which) can be found in ../threads/threadtest.cc.

3.2

Things to Do
In this laboratory session, you are required to implement the producer/consumer algorithm in Nachos using semaphores for synchronization. In your ../lab3/ directory, you will nd the les: main.cc, prodcons++.cc, ring.cc, ring.h and synch.cc. Files ring.cc and ring.h dene and implement a class Ring for the ring buer used by producers and consumers. These two les are complete and you do not need to change any part of them. main.cc in this directory is modied from the version in ../threads/. It is complete and you do not need to change it. synch.cc in this directory is modied with the addition of tracing statements from the version in ../threads/. It is complete and you do not need to change it. See the Tasks section later in this laboratory session for instructions on how to activate the tracing code. In the new main.cc, function ProdCons() is called instead of the function ThreadTest(). Function ProdCons() is dened in the le
c USQ, June 6, 2011

3.2 Things to Do

25 prodcons++.cc. This le is supposed to include the code to create producer and consumer threads as well as implement the producer/consumer algorithm described in the textbook. However, this le is not complete. Your task in this laboratory session is to complete the le prodcons++.cc and make the producer/consumer algorithm work. File prodcons++.cc contains all the data structures and the interfaces of the functions. There are detailed comments in the le about what needs to be done in each part of the code. Because all the interfaces of the functions are present, the le is compilable. You can execute make in the ../lab3/ now to make a new Nachos for producer/consumer problem. (But it wont work yet because prodcons++.cc is incomplete.)

3.2.1

Tasks 1. Read ring.h and ring.cc and make sure that you understand everything in them. 2. Read main.cc. 3. Read prodcons++.cc and make sure that you understand (a) the structure of the program (b) the task to complete the program 4. Complete all programs in le prodcons++.cc. 5. Compile a new nachos by command make and test your program. The output les of an example run of the problem with two consumers and two producers each of which produces four messages should be similar (see item 6 below) to the following: the contents of tmp 0: producer id --> 0; Message number --> 0; producer id --> 0; Message number --> 1; producer id --> 1; Message number --> 3; the contents of tmp 1: producer producer producer producer id id id id --> --> --> --> 0; 0; 1; 1; Message Message Message Message number number number number --> --> --> --> 2; 3; 0; 1;

c USQ, June 6, 2011

26

Laboratory 3 Synchronization Using Semaphores producer id --> 1; Message number --> 2; What is the criteria for testing this program? According to the concepts of producer/consumer, a correct implementation should guarantee the following: (a) all the messages produced by the producer threads are received and recorded in the output les, and (b) no messages are received and recorded more than once (c) messages that are from the same producer and received by the same consumer should be received in increasing order. It is easy check for the rst two (especially if you have many consumers and messages) by using some Unix commands. cat tmp*|wc -l will count the number of lines in the output les. It should be equal to N PROD * N MESSG. cat tmp*|sort|uniq -d will report any duplicate messages (there should be none). 6. Normally, Nachos allows a thread to continue until it calls the yield() method (possibly due to a call to P(). This will result in a unique ordering of thread execution. However, it is possible to ask Nachos to, at random times, force the current running thread to yield. To do this, use the -rs (random seed) command line argument when starting Nachos: The command nachos -rs number will start Nachos, and uses number as the seed value that initialises the pseudo-random number generator that is used to determine the length of time that a thread may run. Try running Nachos with a selection of random seeds and check that the dierent results all satisfy the correctness criteria. 7. Produce a detailed trace of the thread actions, by running Nachos with the trace debug option: nachos -d t. The lab3 directory also contains an instrumented version of synch.cc that contains some tracing statements to show the behavior of the semaphores. This will have been compiled into the Nachos executable by default. Look at these messages with the command nachos -d s.

c USQ, June 6, 2011

3.2 Things to Do

27 To see both threading and semaphore traces, use nachos -d ts. 8. Modify prodcons++.cc to change the numbers of producers, consumers, buers, and messages. Rebuild Nachos and check the result of running the system.

c USQ, June 6, 2011

28

Laboratory 3 Synchronization Using Semaphores

c USQ, June 6, 2011

Laboratory 4

Nachos File System

The purpose of this laboratory session is to study the functionality of the le system in Nachos. The le system in Nachos is designed to be small and simple so that you can read all its source code in a short period of time. Before starting to read the code, it is very useful to get an idea of what functionality the Nachos le system oers. In this laboratory session, you will run the commands of the Nachos le system and watch the eects on the simulated hard disk in Nachos. On the completion of this laboratory session, you should know what is the functionality of the Nachos le system, and how to examine the contents of the simulated hard disk in Nachos.

Laboratory contents
4.1 4.2 4.3 4.4 4.5 Nachos File System summary . . . . . . . 4.1.1 Compiling the Nachos le system . . . . . Usage of Nachos File System Commands Test Files . . . . . . . . . . . . . . . . . . . 4.4.1 4.5.1 4.5.2 4.6 Things to Do . . . . . . . . . . . . . . . . . 29 30 32 32 33 File Header . . . . . . . . . . . . . . . . . . 30

UNIX command od . . . . . . . . . . . . . . 33 Compiling Nachos File System . . . . . . . 33 Testing Nachos File System . . . . . . . . . 33 36

Questions . . . . . . . . . . . . . . . . . . .

4.1

Nachos File System summary


Physically, a Nachos le system is a sequence of 128 byte sectors. There are 32 tracks each containing 32 sectors, giving a total of 1024 sectors, or 131072 bytes. The rst 4 bytes of the le system contain a magic number, so a standard Nachos le system is 131076 bytes long. (Check this with the command ls -l DISK.) All these constants are dened in machine/disk.h (SectorSize, SectorsPerTrack, NumTracks) and machine/disk.cc (MagicNumber). The le system is logically created by FileSystem::FileSystem(). The physical (on disk) layout of an initialised but empty le system

29

30

Laboratory 4 Nachos File System is shown in gure 4.1. The numbers on the left are sector numbers. Note that when making changes to the le system, we can ignore the presence of the magic number, which is only visible to the lower level functions. Nachos expects that sector 0 will always contain the free map le header, and that sector 1 will always contain the directory le header. (See filesys/filesys.cc, lines 57 and 58.) Nachos instantiates a single FileSystem object (pointed to by fileSystem) at startup. This object contains two data items: freeMapFile and directoryFile, that are pointers to OpenFile objects. 0x456789ab File Header File Header Data Data 1023 Figure 4.1: Nachos File System (unused) Magic Number Free map (Bit map) Directory bitmap le directory le

0 1 2 3 4

4.1.1

File Header When a le is successfully opened, an OpenFile object is created which contains a pointer OpenFile::hdr to a FileHeader object. The data members of the FileHeader object are precisely the contents of the le header sector stored on disk. The physical (on disk) le header is shown in gure 4.2, together with the corresponding FileHeader members. A Nachos le header occupies one sector. So, by the standard denitions in machine/disk.h, the header is 128 bytes, and is composed of 32 x 4 byte integers. After the rst two size counters, there is a 30 element table of sector numbers.

4.2

Compiling the Nachos le system


It is very simple to compile Nachos with its le system. You simply move to the directory filesys and execute command make. A new
c USQ, June 6, 2011

4.2 Compiling the Nachos le system

31

Byte oset 0 4 8

4n + 4 124

Number of bytes in le Number of sectors in le (n) rst data sector number last valid data sector number

FileHeader::numBytes FileHeader::numSectors FileHeader::dataSectors[0] FileHeader::dataSectors[n 1] FileHeader::dataSectors[29]

Figure 4.2: Nachos File Header (128 bytes)

version of Nachos with its le system included will be made in the directory. The Makele in filesys includes both Makefile.local les from threads and filesys. The Makefile.local in filesys is as follows:
ifndef MAKEFILE_FILESYS_LOCAL define MAKEFILE_FILESYS_LOCAL yes endef # Add new sourcefiles here. CCFILES +=bitmap.cc\ directory.cc\ filehdr.cc\ filesys.cc\ fstest.cc\ openfile.cc\ synchdisk.cc\ disk.cc ifdef MAKEFILE_USERPROG_LOCAL DEFINES := $(DEFINES:FILESYS_STUB=FILESYS) else INCPATH += -I../userprog -I../filesys DEFINES += -DFILESYS_NEEDED -DFILESYS endif endif # MAKEFILE_FILESYS_LOCAL

This means that this version of Nachos uses C++ les listed above in addition to the les used to compile the Nachos in threads. Most of these additional les exist in the current directory. Some of them are

c USQ, June 6, 2011

32

Laboratory 4 Nachos File System in other directories such as userprog. The make program will nd them automatically due to the variable VPATH dened in the makele code/Makefile.common.

4.3

Usage of Nachos File System Commands


The Usage of Nachos commands is dened in threads/main.cc and threads/system.cc. In particular, the commands related to the le system are listed below. The optional ag -d f is used to print all the debug information related to the le system. nachos [-d f] -f. This is used to format the simulated hard disk named DISK before any other le system commands can start. nachos [-d f] -cp unix filename nachos filename. This command copies a UNIX le named unix filename in your UNIX system to a Nachos le named nachos filename in the Nachos le system. This is currently the only way to create a le in the Nachos le system. nachos [-d f] -p nachos filename. This command displays the contents of the nachos le named nachos filename (similar to UNIX command cat). nachos [-d f] -r nachos filename. This command removes the nachos le named nachos filename (similar to UNIX command rm). nachos [-d f] -l. This command lists the names of all the nachos les on the screen (similar to UNIX command ls). nachos [-d f] -D. This command prints all the contents of the entire le system including the bitmap, the le headers, the directory and the les. nachos [-d f] -t. This command tests the performance of the le system. It is not working yet. To understand how these commands work, you need to study the functions threads/main.cc and filesys/fstest.cc.

4.4

Test Files
In the subdirectory test in filesys, there are three les to be used when testing the Nachos le system: small, medium and big. Take a look at the contents of them.

c USQ, June 6, 2011

4.5 Things to Do 4.4.1 UNIX command od

33

You need to use the UNIX command od (Octal Dump) to examine the simulated hard disk when you debug the Nachos le system. 1. Read the manual page of od (by typing man od). 2. Execute command od -c test/small. You should see
0000000 0000020 0000040 0000046 T n t h g e i n s o t i f . \n s o u t r h e d i s s p c r o i n

on your screen. Each line displays 16 characters. The column on the left shows the oset in octal of the rst character of each line. For example, the oset of the rst character of the second line (n) is 0000020 in octal which is 16 in decimal.

4.5
4.5.1

Things to Do
Compiling Nachos File System Follow the description in Section 4.2 to compile the Nachos with its le system in filesys).

4.5.2

Testing Nachos File System Execute the following commands and check the results as described: 1. Execute nachos -f. Nachos should have created the simulated hard disk called DISK in your current directory. Check that it has been created, and also check its size with the Unix command ls -l. 2. Execute nachos -D to dump the whole le system on the simulated hard disk DISK and you should have the following dump:

c USQ, June 6, 2011

34

Laboratory 4 Nachos File System


.... Bit map file header: FileHeader contents. File size: 128. File blocks: 2 File contents: \1f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 Directory file header: FileHeader contents. File size: 200. File blocks: 3 4 File contents: \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 Bitmap set: 0, 1, 2, 3, 4, Directory contents: No threads ready or runnable, and no pending interrupts. Assuming the program completed. Machine halting! Ticks: total 5500, idle 5030, system 470, user 0 Disk I/O: reads 10, writes 0 Console I/O: reads 0, writes 0 Paging: faults 0 Network I/O: packets received 0, sent 0 Cleaning up...

We have deleted the output from function ThreadTest() in the above dump. You can get rid of them by making a copy of threads/main.cc in your current filesys directory and comment out the line of invoking function ThreadTest(). The dump is produced by FileSystem::Print() and shows the following: (a) Bitmap file header: the values in the header in sector 0 are shown, then the contents of the actual bitmap le in sector 2 (by FileHeader::Print()). (b) Directory file header: the values in the header in sector 1 are shown, then the contents of the actual directory le in sectors 3 and 4 (by FileHeader::Print()). (c) Bitmap set: is a list of the actual sectors allocated and recorded in the bitmap.

c USQ, June 6, 2011

4.5 Things to Do

35 (d) Directory contents: is empty, as there are no les yet in the system. This dump shows that the Nachos le system has been created on DISK. There are no les at the moment in the only directory of the Nachos le system. 3. Now we will look at the contents of the le system using od. We will use a few dierent display options for od. In all cases note that od displays 32 bytes per line, and that the starting address is in octal. If multiple lines of 32 bytes are the same (usually all zeros), od displays only the rst line, then on the following line shows an asterisk to represent all the duplicate lines. Decimal 128 is octal 200, and there is a four byte magic number at the beginning of the le system, so the disk sectors appear at (octal) byte address 04, 0204, 0404, 0604, 01004, 01204, etc. (a) Execute od -i DISK. This interprets and prints each four byte block as an integer. You should have the following dump on the screen:
0000000 0000020 * 0000200 0000220 0000240 * 0000400 0000420 * 0400000 0400004 1164413355 0 0 4 0 0 0 0 128 0 200 0 0 31 0 1 0 2 0 0 0 0 2 0 3 0 0 0 0

Note the following: Sector 0, oset 04, is the le header for the bitmap. It has size 128 bytes, requires 1 sector, and this sector is sector 2. Sector 1, oset 0204, is the le header for the directory. It has size 200 bytes, requires 2 sectors, and these sectors begin with sector 3. Sector 2, oset 0404, as mentioned above, is the bitmap le. There are just ve sectors allocated, so the number 31 is the integer interpretation of ve bits. (Binary 00011111 is decimal 31.) (b) Now look at the disk using a four byte hexadecimal display, using the command od -t x4 DISK:

c USQ, June 6, 2011

36
0000000 0000020 * 0000200 0000220 0000240 * 0000400 0000420 * 0400000 0400004

Laboratory 4 Nachos File System


456789ab 00000080 00000001 00000002 00000000 00000000 00000000 00000000 00000000 000000c8 00000002 00000003 00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000001f 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Note the following: The magic number in the rst four bytes. The bitmap at 0404; the hexadecimal value is 0x1f. We can see ve bits set as 0x1 is binary 0001 and 0xf is binary 1111 4. Execute nachos -cp test/small small to copy le small into the Nachos le system. Use nachos -l, nachos -p and nachos -D to make sure you do create a new le named small in the Nachos le system. Also use the dump of DISK to see what has been changed. 5. Continue the experiment with other nachos commands such as nachos -r on more les.

4.6

Questions
1. According to the result of the last command nachos -D and the result of od -c DISK, how many les are there on the hard disk DISK? 2. What are the sector numbers of data blocks for le big? 3. What is the sector number of the disk to store the le header for le big? 4. The sector size of the Nachos hard disk is 128 bytes. Could you check the result of od -c DISK to make sure that the data blocks and the le header of big are in the right places in the disk?

c USQ, June 6, 2011

Laboratory 5

Extendable Files

The purpose of this lab session is to help you start to work on extending the Nachos le system. It is the initial part of the programming task of assignment 4.

Laboratory contents
5.1 5.2 5.3 Introduction . . . . . . . . . . . . . . . . . 37 38 39 Nachos le system runtime organisation . Implementation . . . . . . . . . . . . . . . 5.3.1 5.3.2 5.4

Getting started in the lab5 directory . . . . 40 Modications to Nachos . . . . . . . . . . . 41 43

Testing the New File System . . . . . . .

5.1

Introduction
The Nachos le system is a simple le system with many restrictions. One of them is that the size of the le is not extendable: once you specify the size of a le upon its creation, the size of the le is xed throughout its lifetime. Nachos allows any subsequent write operation to an existing le, using OpenFile::WriteAt(), but this operation will not write beyond the current end of le. That is, it may not completely satisfy the write request, only writing some of the bytes requested. In this laboratory session, you are going to modify the Nachos le system to allow the size of les to be extended. To be precise, the task is to modify the existing function
OpenFile::WriteAt(char *from, int numBytes, int position)

so that, if position + numBytes is greater than the current le length, the le will be extended. By extending a le, we mean that more sectors will be allocated to the le. Of course, you will need to modify other functions as well as OpenFile::WriteAt(). Here is an example. If the initial size of a le is 100 bytes and a write operation for 100 bytes data from the position 50 (the rst byte is at position 0) will extend the size of the le to 150 bytes. The situation 37

38

Laboratory 5 Extendable Files is illustrated in Figure 5.1, in which (a) represents the initial size (100 bytes) of the le. The light shadow represents the current contents of the le. (b) represents the new 100 bytes of data to be written from position 50. (c) shows the extended size of the le with dark shadow representing the new data.

0 (a) (b) 0 (c)

50

99

50

149

Figure 5.1: Extension of a le

5.2

Nachos le system runtime organisation


Section 4.1 and Figure 4.1 describe the physical (on disk) layout of the le system. Nachos uses a standard way of modifying or updating structured data held in the le system. This applies to le headers, the directory, and the free map. Unstructured le data is simply written directly to the le system, using the OpenFile::WriteAt() method. The method for updating structured data is as follows. 1. Instantiate an object of the class that handles the data structure (FileHeader, Directory, BitMap). This typically reads the disk sectors into memory, using the FetchFrom method. The memory-resident data structures can then be accessed via class methods. The object is a cache of the disk-based structure. 2. Modify the cached data via the class methods. 3. Write the memory resident data back to disk. Each of the three classes listed above has one of these methods, named WriteBack(). 4. Deallocate the class storage when it is no longer needed. Presumably this occurs only after the WriteBack() call. Figure 5.2 shows the live Nachos objects, set up when a FileSystem object is created, that describe the le system. The actual le system is the same as pictured in gure 4.1. These objects persist while Nachos executes. Other objects (e.g. FileHeader and Directory
c USQ, June 6, 2011

5.3 Implementation

39

fileSystem

OpenFile hdr seekPosition

FileSystem freeMapFile directoryFile

FileHeader numBytes numSectors 1 dataSectors[0] 2 ... dataSectors[29] FileHeader numBytes numSectors 2 dataSectors[0] 3 dataSectors[1] 4 ... dataSectors[29]

header header data data

OpenFile hdr seekPosition

data

in memory

on disk

Figure 5.2: File system data structures

that describe user les are transitorythey are created and destroyed as needed as described above. Note that actual lesystem blocks are written directly using the OpenFile methods ReadAt and WriteAt, which in turn call the SyncDisk methods ReadSector and WriteSector. Note also that fileSystem is dened in system.cc.

5.3

Implementation
The Nachos le system consists of the following modules: class Disk class SynchDisk class BitMap class FileHeader class OpenFile class Directory class FileSystem The structure of the le system is shown in Figure 5.3. The arrows indicate module dependencies. If a module calls methods from another, an arrow is drawn from the caller to the target.

c USQ, June 6, 2011

40

Laboratory 5 Extendable Files


FileSystem OpenFile Directory FileHeader SynchDisk Disk Bitmap

Figure 5.3: Structure of Nachos File System You do not need to modify the SyncDisk and Disk modules, as they support the physical disk. Bitmap is also not modied. You will need to modify FileSystem, FileHeader and OpenFile. You dont need to modify Directory. 5.3.1 Getting started in the lab5 directory You will use the code/lab5 directory for this exercise. Some les are already provided in the lab5 directory (fstest.cc, main.cc, test/*) but you need to create the Makele structure to build Nachos. Proceed as follows: 1. Copy and Makefile and Makefile.local from /filesys. 2. In Makefile: replace include ../filesys/Makefile.local with include Makefile.local 3. In Makefile.local: replace INCPATH += -I../userprog -I../filesys with INCPATH += -I../lab5 -I../userprog -I../filesys 4. Copy the arch directory tree; in the lab5 directory type: cp -r ../threads/arch . 5. Copy any les that you need to change from ../filesys. See Section 5.3.2 for hints on what to modify. The two les main.cc and fstest.cc in ../lab5/ are new and include many new le system commands to test the new features required. We will discuss these new commands in Section 5.4. main.cc

c USQ, June 6, 2011

5.3 Implementation

41 is complete and you should not change it. fstest.cc is almost complete except that you need to uncomment four lines in it. In both functions Append(...) and NAppend(...), you can see the following three lines:
// Write the inode back to the disk, because we have changed it // openFile->WriteBack(); // printf("inodes have been written back\n");

You need to uncomment the last two lines after you add the Writeback() function to class OpenFile. Why do you need this function for OpenFile? Think about it. 5.3.2 Modications to Nachos Standard Nachos has xed size les. The le is created by the FileSystem::Create() method. Creating a le requires the following actions: 1. create a Directory object and initialise it from the disk directory le 2. create a Bitmap object and initialise it from disk bitmap le 3. get a free le system sector to hold the le header (updates Bitmap object) 4. create a directory entry (this will point the the header sector) (updates Directory object) 5. create a new FileHeader object 6. get enough free le system sectors to hold the le data (updates Bitmap and FileHeader objects) 7. write Directory object back to disk directory le 8. write BitMap object back to disk bitmap le 9. write FileHeader object back to le header disk sector for this le The OpenFile::WriteAt() method writes to a le. Currently an attempt to write past the end of a le is not allowed. You must modify this method to allow a le to be extended. To extend a le will in general require that extra disk sectors will have to be allocated to a le. Basically, the only modications required to OpenFile::WriteAt() are to write code to allocate enough extra blocks to hold the extra size of the le, and to update the le header accordingly. The remainder of OpenFile::WriteAt(), that handles

c USQ, June 6, 2011

42

Laboratory 5 Extendable Files the actual writing of data bytes to data sectors, will not require alteration. For guidance on how to perform allocation of data blocks, look at the implementation of FileSystem::Create() method (described above). You need to perform steps similar to steps 2, 5, 6, and 8 described there. Because the le header object is created when a le is opened, it need only be written back to disk when all le operations are nished. The OpenFile::WriteBack() method (you have to write this one) does this, and is typically called by whatever function opened the le. For example see the Append function in fstest.cc. Initial allocation of data sectors, and updating the associated le header, is performed by FileHeader::Allocate(). It is reasonable to create a new method, say ReAllocate(), to perform the extra allocation duties. The new method would require similar parameters to the original. The original Allocate would still be used for initial allocation for a new le. The method FileHeader::ReAllocate() requires access to the data FileSystem::freeMapFile, which is currently a private data element. It should become public (filesys.h) In summary I suggest that you perform these changes: 1. Add new method: FileHeader::ReAllocate(BitMap *freeMap, int newFileSize). Changes to: filehdr.h, filehdr.cc. 2. Modify OpenFile::WriteAt() to extend blocks allocated to le, using the ReAllocate function. Changes to: openfile.h, openfile.cc. 3. Make FileSystem::freeMapFile publicly accessible. Changes to: filesys.h 4. Create OpenFile::WriteBack(). The body of this function is just a single function call, but it does require the knowledge of which sector the le header resides on. This is not currently available after a le is created and opened, as it is not currently modied after creation. It is a simple matter to add a new (private) integer data member, OpenFile::headerSector, that will hold this sector value for later use. It could be initialised by OpenFile::Open(), which is passed that value.

c USQ, June 6, 2011

5.4 Testing the New File System Changes to: openfile.h, openfile.cc 5. You will also need to add an #include "filesys.h" to openfile.cc

43

6. Uncomment the calls to openFile->WriteBack() in fstest.cc. There are two occurrences, in Append() and NAppend(). This is described above, in section 5.3.1.

5.4

Testing the New File System


We need commands to test the new features of the Nachos le system. These new commands have been implemented in main.cc and fstest.cc in ../lab5/ for you. They are: nachos [-d f] -ap unix filename nachos filename. This command appends a UNIX le named unix filename in your UNIX system to the end of a Nachos le named nachos filename in the Nachos le system. It is used to test whether we can extend the le size as we append a le to the end of an existing Nachos le. nachos [-d f] -hap unix filename nachos filename. This command over-writes the Nachos le (named nachos filename) from its middle with a UNIX le (named unix filename). If the length of the UNIX le exceeds the half of that of the Nachos le, the Nachos le size should be extended. Read les main.cc and fstest.cc in ../lab5/ and make sure that you understand how these new commands are implemented. When testing the new Nachos le system, start with a fresh DISK by removing the old DISK and executing nachos -f. Then execute the following commands in the order: nachos nachos nachos nachos -cp -ap -cp -ap test/small small test/small small test/empty empty test/medium emtpy

Your le system dump with nachos -D at this point should be like this:

c USQ, June 6, 2011

44

Laboratory 5 Extendable Files

Bit map file header: FileHeader contents. File size: 128. File blocks: 2 File contents: \ff\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 Directory file header: FileHeader contents. File size: 200. File blocks: 3 4 File contents: \1\0\0\0\5\0\0\0 small\0\0\0\0\0\0\0\1\0\0\0\8\0\0\0 empty\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 Bitmap set: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, Directory contents: Name: small, Sector: 5 FileHeader contents. File size: 168. File blocks: 6 7 File contents: small file small file small file\asmall file small file small file\a***end of file***\asmall file small file small file\asm all file small file small file\a***end of file***\a Name: empty, Sector: 8 FileHeader contents. File size: 162. File blocks: 9 10 File contents: medium file medium file medium file\amedium file medium file med ium file\amedium file medium file medium file\amedium file medi um file medium file\a***end of file***\a No threads ready or runnable, and no pending interrupts. Assuming the program completed. Machine halting! Ticks: total 8490, idle 8000, system 490, user 0 Disk I/O: reads 16, writes 0 Console I/O: reads 0, writes 0 Paging: faults 0 Network I/O: packets received 0, sent 0 Cleaning up...

This dump shows that the bitmap le is of size 128 bytes (one sector) and is located in sector 2. It also shows the contents of the bitmap le. We know that the i-node (le header) of the bitmap le is located

c USQ, June 6, 2011

5.4 Testing the New File System in sector 0.

45

The dump also shows that the directory le has 200 bytes (two sectors) and the data blocks are located in sectors 3 and 4. The i-node of the directory le is in section 1 (not shown in the dump) as we know. The le named small is shown to have 168 bytes (two sectors) and its i-node is located in sector 5. The data block of the le is in sectors 6 and 7. You also need to test the cases where part of the le is overwritten and the le is being extended. Design your own test programs to make sure that your new le system works in all dierent kinds of situations. In particular, you should check the i-nodes before and after performing a le copy or append operation, and conrm that the correct number of extra sectors were allocated. Try to predict the outcome before checking with nachos -D.

c USQ, June 6, 2011

46

Laboratory 5 Extendable Files

c USQ, June 6, 2011

Appendix A

Unix essentials

This appendix gives a brief introduction to Unix. It is meant as a starting point for your exploration of Unix, not as a complete reference. The description is common to all variants of Unix, including Linux, unless otherwise noted.

A.1

Command line interface


Although all modern Unix distributions include a graphical user interface, implemented by a window manager, we are interested here only in the overall structures of Unix, and in the common utilities accessible via a command line interface. To use a command line interface, open a terminal window. The program running in the command line is called a shell program, and its job is to read commands and execute them. The most common shell program is bash, a direct descendant of the original Unix shell sh, which implements a superset of the sh commands. Unless noted, anything presented here is independent of the actual shell program used.

A.2

Files and directories


A Unix le system contains les and directories. Most les are just a sequence of bytes; these are called regular les1 . A directory can contain les or directories (also called subdirectories). There is a special root directory; its full name is just /. Directories form a tree structure: all directories (except /) have one2 parent directory, and can have zero or more (sub)directories A le or directory is uniquely dened by a pathname. An absolute pathname begins with the root directory, and lists all directories that you must traverse to get to the le/directory, separated by a / character. For instance: /usr/bin/g++.
1 2

There are also directories and special les, which we will not discuss here. This is not quite true; the use of symbolic links can result in a directory appearing to be present in more that one parent directory. But this is an advanced topic.

47

48

Appendix A Unix essentials The shell program uses the concept of a current working directory. This is set a shell startup time and can be changed with the cd command. The shorthand for this directory is .. The parent of the current directory is ... The pwd command displays the current working directory. A pathname without a leading / is relative. The equivalent absolute pathname is obtained by appending the relative pathname to the pathname of the current working directory, separated by a / character. A command that reads a le will typically expect to nd it in the current directory; les will by default be written to the current directory. File and directory names can have any length, and contain any characters! There is no concept of a le type, though le names usually include a sux to indicate the kind of le. So, myfile.c is a complete le name that includes a hint that it is a C source le.

A.3

Processes
A process is created whenever an executable program is run. We say that a program is run within a process. The process provides the memory and other resources needed to run a program. At any time there can be many processes running. A normal command, or program, runs in the foreground; no other shell command can be executed until the foreground process terminates. Its input normally comes from the keyboard, and output is to the terminal window. A background process runs without waiting for user input, while allowing the user to keep issuing commands to the shell.

A.4

Commands
In this section, we introduce a handful of essential and useful commands. Command typically take arguments, which can be options (also called ags or switches) that set optional behavior, and le names. To nd out the format and meaning of the command, use the man command (see A.4.1). Single character command options normally begin with a -; for example ls -l. Multiple single character options can be combined;

c USQ, June 6, 2011

A.4 Commands

49 for example ls -FAl. A multi-character option begins with --; for example gcc --help. There are two kinds of commands: shell commands, that are executed directly by the shell, and regular executable commands. This second kind is just the name of an executable le. For instance, cd is a shell command, but ls is a command that results in the execution of the le /bin/ls. In general, if the command which command-name returns the path of a le, then it is a regular command. If not, it is probably a shell command; use man bash to nd out about it. The following sections present a very few commands. This is a minimal set required to be able to complete your assignments under Unix. For further information on the huge number of Unix commands, look at the resources available at the CSC2408 web site: http://www.sci.usq.edu.au/courses/CSC2408/

A.4.1

man pages Unix systems are usually installed with an extensive set of online documentation, organised as chapters of a manual. Chapter 1 of the manual contains information on Unix commands. The man command lists the contents of the manual pages. Unless requested otherwise, man will look in chapter 1 for the page. For instance the command man ls will describe the ls command. If you are unsure of a command name, the command man -k keyword will report manual entries that include that keyword. The man entry can have many sections. Most importantly, look at the syntax in the SYNOPSIS section, and the description of the behavior in DESCRIPTION, and pointers to other related commands in SEE ALSO. This last section may be the most important; if the page you are looking at does not seem to describe a command that you want, then maybe the SEE ALSO section will list the command you do want. Many pages also include an EXAMPLES section. A consistent notation is used to describe command syntax. Consider the description for the mdir command, from its man page. rmdir [OPTION]... DIRECTORY... Elements in (square) brackets, are optional. The elipses (...) mean more than one. If a name is underlined, then you are not meant to type this text, rather you type an instance of that name. In this case, the syntax description for the command indicates that you should type a command that contains, in sequence, (1) rmdir (2) zero or more options, and (3) one or more directory names.

c USQ, June 6, 2011

50 A.4.2 Directories pwd cd mkdir rmdir ls A.4.3 Files rm mv cp cat less more ln -s ln

Appendix A Unix essentials

print working directory change working directory make a new directory remove an empty directory list directory contents (useful switches F,A,l)

remove (delete) le move (rename) le copy a le list contents of a le list contents of a le, page at a time list contents of a le, page at a time create a symbolic link to a le create a hard link to a le

There are many editors available to create les. These editors work inside your command window: vi, emacs, pico, nano. Vi and Emacs are complex and sophisticated editors, that are well worth the eort in learning. Pico or Nano are almost identical and are very simple to use. If you wish to use an editor in a separate window, try gedit or kedit. A.4.4 Miscellaneous Here is a very short selection of other useful commands. grep lpr g++ gcc ps kill -9 diff date passwd make A.4.5 Using the shell The output of a command can be redirected to a le with >: ls > dirlist A command can get input from a le with <: more < dirlist A command can pipe its output to another command with |: ls | more
c USQ, June 6, 2011

nd text in a le print a le to a printer C++ compiler C compiler list processes terminate a process nd dierences between les current date and time change password run make according to the rules in a makele

A.4 Commands

51 Ctrl-C will stop and delete a process. Ctrl-Z will pause a process. bg will resume a paused process, as a background process. fg will resume a paused process, as a foreground process. command & will run command in the background. In a command, * means all les in the current directory. In general, when a lename is expected, * matches any sequence of characters that would result in a lename match. So *.c matches all les whose names end in .c. The ? character matches any one lename character. The pattern a?.c matches all les whose names are four characters long, end in .c, and begin with a.

c USQ, June 6, 2011

52

Appendix A Unix essentials

c USQ, June 6, 2011

Appendix B

GDB Essential Commands


This appendix summarises the most useful gdb commands. For more information you can consult the GNU gdb manual; the latest version is available from http://www.gnu.org/manual/manual.html Alternatively you can use the gdb help command to get online help about GBD commands.

B.1

Before you start


Programs must be compiled with the -g compiler option. This ensures that identier names are added to the object le, for use by the debugger. If the executable is built from a number of object les, all objects should be compiled with -g. If using make, it is easiest to add -g to the CFLAGS variable. Warning: There seems to be some inconsistencies between the various GNU compilers gcc, g++ and possibly between dierent versions. On at least one version of g++, you must use the -g3 compile option instead of -g, otherwise line numbers are not generated, and gdb will be unable to set breakpoints at specied lines.

B.2

Typing gdb commands


You interact with gdb by typing commands when it issues the (gdb) prompt. gdb commands can always be truncated to their minimum unambiguous length. Some commands have special abbreviations that violate this general rule: for instance using s as an abbreviation for step is permitted. A blank command line (press Enter only) means repeat the previous command. If possible, gdb will complete a word you have started to type just type the Tab key. If your initial letters are unique, gdb will complete them and then you can either press Enter to execute the command, or backspace to edit the command line. If the characters you have typed are not unique, gdb will sound the bell. You can type more letters and try again, or press Tab a second time to see the possibilities. 53

54

Appendix B GDB Essential Commands The help command can be used to get online help about gdb commands. The up and down cursors can be used to recall previous commands, which can be edited using left and right cursors.

B.3

Starting and stopping gdb


Start gdb with the gdb command, e.g.: gdb myfile This does not execute the program, it just starts gdb and awaits further commands from you. To exit from gdb type quit or <Ctrl-D> at the gdb prompt. To start the program running (after setting at least one breakpoint): run [arg ...] [> le] [< le] You can use set args to set the command line arguments prior to a run, and use show args to show what arguments it used at the previous run command. you can run the program many times in a gdb session.

B.4

Breakpoints
The debugger will halt a running program just before a line marked with a breakpoint. This returns control to the user, who can then type gdb commands. There are many commands related to breakpoints, and also to watchpoints (not covered here). Only the more common are mentioned here. For full details use gdbs help system or the gdb Manual. Here is a selection of options for setting, displaying, deleting, enabling, and disabling breakpoints.

c USQ, June 6, 2011

B.5 Continuing and Stepping break [lename:]function break [lename:]linenum break break arg if cond Set breakpoint at entry to a function Set breakpoint at linenum; stops before the line is executed Set breakpoint at next line Program stops at breakpoint only if cond is true. arg is any valid argument to break Display all breakpoints Display breakpoint n Delete breakpoints set at function entry Delete breakpoints set at or in line Delete all breakpoints; can be abbreviated to d Delete numbered breakpoints Disable all or selected breakpoints Enable all or selected breakpoints Ignore this breakpoint for the next count times it is reached.

55

info break info break n clear [lename:]function clear [lename:]linenum delete delete bnums... disable [bnums...] enable [bnums...] ignore bnum count

B.5

Continuing and Stepping


Continuing means resuming normal program execution after a halt, commonly after a breakpoint. Stepping means to execute a specied number of lines (or instructions) before halting. In either case the program may stop prematurely at a breakpoint. A handful of commands is sucient for most needs. continue [ignore-count] Resume execution; ignore the next ignore-count break at this line c can be used as an abbreviation for continue Stop at the beginning or the next line; Step into functions. s can be used as an abbreviation for step Step count lines Continue to next line; do not step into functions Continue to current line + count Continue until function returns; print return value Like next, but will not step backward in a loop

step

step [count] next next [count] finish until

c USQ, June 6, 2011

56

Appendix B GDB Essential Commands

B.6

Displaying source and expressions


The list command will display the source code of the current le. There are many options. Here are some common ones. list linenum list function list list set listsize count Print lines around line number linenum Print lines around function function Print more lines following the most recently displayed lines. Print lines before those last printed. Set number of lines to display (default 10)

The print command is used to display expressions. Because assignment is an expression in C/C++, you can also use it to modify a variable as in print x=14 Here are the common forms. print expr print /f expr print print /f Print the value of expr Print the value of expr using format f Print the same expression as last time print was used. Print same expression with a dierent format

Useful format speciers are x d u o t c f Print Print Print Print Print Print Print integer as hexadecimal integer as signed decimal integer as unsigned decimal integer as octal integer as binary (mnemonic: think of t as in two) integer as character constant oating point number in usual syntax

c USQ, June 6, 2011

Appendix C

A Quick Introduction to C++

by Tom Anderson1
If programming in Pascal is like being put in a straightjacket, then programming in C is like playing with knives, and programming in C++ is like juggling chainsaws. Anonymous.

Contents
C.1 Introduction . . . . . . . . . . . . . . . . . . . C.2 C in C++ . . . . . . . . . . . . . . . . . . . . . C.3 Basic Concepts . . . . . . . . . . . . . . . . . . C.3.1 Classes . . . . . . . . . . . . . . . . . . . . . C.3.2 Other Basic C++ Features . . . . . . . . . . C.4 Advanced Concepts in C++: Dangerous but C.4.1 Inheritance . . . . . . . . . . . . . . . . . . . C.4.2 Templates . . . . . . . . . . . . . . . . . . . . C.5 Features To Avoid Like the Plague . . . . . . C.6 Style Guidelines . . . . . . . . . . . . . . . . . C.7 Compiling and Debugging . . . . . . . . . . . C.8 Example: A Stack of Integers . . . . . . . . . C.9 Epilogue . . . . . . . . . . . . . . . . . . . . . . C.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Occasionally Useful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 58 59 60 65 66 67 73 75 77 79 79 80 80

C.1

Introduction

This note introduces some simple C++ concepts and outlines a subset of C++ that is easier to learn and use than the full language. Although we originally wrote this note for explaining the C++ used in the Nachos project, I believe it is useful to anyone learning C++. I assume that you are already somewhat familiar with C concepts like procedures, for loops, and pointers; these are pretty easy to pick up from reading Kernighan and Ritchies The C Programming Language. I should admit up front that I am quite opinionated about C++, if that isnt obvious already. I know several C++ purists (an oxymoron perhaps?) who violently disagree with some of the prescriptions contained here; most of the objections are of the form, How could you have possibly left out feature X? However, Ive found from teaching C++ to nearly 1000 undergrads over the past several years that the subset of C++ described here is pretty easy to learn, taking only a day or so for most students to get started.
1

This article is based on an earlier version written by Wayne Christopher.

57

58

Appendix C A Quick Introduction to C++

The basic premise of this note is that while object-oriented programming is a useful way to simplify programs, C++ is a wildly over-complicated language, with a host of features that only very, very rarely nd a legitimate use. Its not too far o the mark to say that C++ includes every programming language feature ever imagined, and more. The natural tendency when faced with a new language feature is to try to use it, but in C++ this approach leads to disaster. Thus, we need to carefully distinguish between (i) those concepts that are fundamental (e.g., classes, member functions, constructors) ones that everyone should know and use, (ii) those that are sometimes but rarely useful (e.g., single inheritance, templates) ones that beginner programmers should be able to recognize (in case they run across them) but avoid using in their own programs, at least for a while, and (iii) those that are just a bad idea and should be avoided like the plague (e.g., multiple inheritance, exceptions, overloading, references, etc). Of course, all the items in this last category have their proponents, and I will admit that, like the hated goto, it is possible to construct cases when the program would be simpler using a goto or multiple inheritance. However, it is my belief that most programmers will never encounter such cases, and even if you do, you will be much more likely to misuse the feature than properly apply it. For example, I seriously doubt an undergraduate would need any of the features listed under (iii) for any course project (at least at Berkeley this is true). And if you nd yourself wanting to use a feature like multiple inheritance, then, my advice is to fully implement your program both with and without the feature, and choose whichever is simpler. Sure, this takes more eort, but pretty soon youll know from experience when a feature is useful and when it isnt, and youll be able to skip the dual implementation. A really good way to learn a language is to read clear programs in that language. I have tried to make the Nachos code as readable as possible; it is written in the subset of C++ described in this note. It is a good idea to look over the rst assignment as you read this introduction. Of course, your TAs will answer any questions you may have. You should not need a book on C++ to do the Nachos assignments, but if you are curious, there is a large selection of C++ books at Codys and other technical bookstores. (My wife quips that C++ was invented to make researchers at Bell Labs rich from writing How to Program in C++ books.) Most new software development these days is being done in C++, so it is a pretty good bet youll run across it in the future. I use Stroustrups The C++ Programming Language as a reference manual, although other books may be more readable. I would also recommend Scott Meyers Eective C++ for people just beginning to learn the language, and Copliens Advanced C++ once youve been programming in C++ for a couple years and are familiar with the language basics. Also, C++ is continually evolving, so be careful to buy books that describe the latest version (currently 3.0, I think!).

C.2

C in C++

To a large extent, C++ is a superset of C, and most carefully written ANSI C will compile as C++. There are a few major caveats though: 1. All functions must be declared before they are used, rather than defaulting to type int. 2. All function declarations and denition headers must use new-style declarations, e.g.,

c USQ, June 6, 2011

C.3 Basic Concepts extern int foo(int a, char* b);

59

The form extern int foo(); means that foo takes no arguments, rather than arguments of an unspecied type and number. In fact, some advise using a C++ compiler even on normal C code, because it will catch errors like misused functions that a normal C compiler will let slide. 3. If you need to link C object les together with C++, when you declare the C functions for the C++ les, they must be done like this: extern "C" int foo(int a, char* b); Otherwise the C++ compiler will alter the name in a strange manner. 4. There are a number of new keywords, which you may not use as identiers some common ones are new, delete, const, and class.

C.3

Basic Concepts

Before giving examples of C++ features, I will rst go over some of the basic concepts of objectoriented languages. If this discussion at rst seems a bit obscure, it will become clearer when we get to some examples. 1. Classes and objects. A class is similar to a C structure, except that the denition of the data structure, and all of the functions that operate on the data structure are grouped together in one place. An object is an instance of a class (an instance of the data structure); objects share the same functions with other objects of the same class, but each object (each instance) has its own copy of the data structure. A class thus denes two aspects of the objects: the data they contain, and the behavior they have. 2. Member functions. These are functions which are considered part of the object and are declared in the class denition. They are often referred to as methods of the class. In addition to member functions, a classs behavior is also dened by: (a) What to do when you create a new object (the constructor for that object) in other words, initialize the objects data. (b) What to do when you delete an object (the destructor for that object). 3. Private vs. public members. A public member of a class is one that can be read or written by anybody, in the case of a data member, or called by anybody, in the case of a member function. A private member can only be read, written, or called by a member function of that class. Classes are used for two main reasons: (1) it makes it much easier to organize your programs if you can group together data with the functions that manipulate that data, and (2) the use of private members makes it possible to do information hiding, so that you can be more condent about the way information ows in your programs.

c USQ, June 6, 2011

60 C.3.1 Classes

Appendix C A Quick Introduction to C++

C++ classes are similar to C structures in many ways. In fact, a C++ struct is really a class that has only public data members. In the following explanation of how classes work, we will use a stack class as an example. 1. Member functions. Here is a (partial) example of a class with a member function and some data members: class Stack { public: void Push(int value); // Push an integer, checking for overflow. int top; // Index of the top of the stack. int stack[10]; // The elements of the stack. }; void Stack::Push(int value) { ASSERT(top < 10); // stack should never overflow stack[top++] = value; } This class has two data members, top and stack, and one member function, Push. The notation class::function denotes the function member of the class class. (In the style we use, most function names are capitalized.) The function is dened beneath it. As an aside, note that we use a call to ASSERT to check that the stack hasnt overowed; ASSERT drops into the debugger if the condition is false. It is an extremely good idea for you to use ASSERT statements liberally throughout your code to document assumptions made by your implementation. Better to catch errors automatically via ASSERTs than to let them go by and have your program overwrite random locations. In actual usage, the denition of class Stack would typically go in the le stack.h and the denitions of the member functions, like Stack::Push, would go in the le stack.cc. If we have a pointer to a Stack object called s, we can access the top element as s->top, just as in C. However, in C++ we can also call the member function using the following syntax: s->Push(17); Of course, as in C, s must point to a valid Stack object. Inside a member function, one may refer to the members of the class by their names alone. In other words, the class denition creates a scope that includes the member (function and data) denitions. Note that if you are inside a member function, you can get a pointer to the object you were called on by using the variable this. If you want to call another member function on the same object, you do not need to use the this pointer, however. Lets extend the Stack example to illustrate this by adding a Full() function.

c USQ, June 6, 2011

C.3 Basic Concepts

61

class Stack { public: void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. int top; // Index of the lowest unused position. int stack[10]; // A pointer to an array that holds the contents. }; bool Stack::Full() { return (top == 10); } Now we can rewrite Push this way: void Stack::Push(int value) { ASSERT(!Full()); stack[top++] = value; } We could have also written the ASSERT: ASSERT(!(this->Full()); but in a member function, the this-> is implicit. The purpose of member functions is to encapsulate the functionality of a type of object along with the data that the object contains. A member function does not take up space in an object of the class. 2. Private members. One can declare some members of a class to be private, which are hidden to all but the member functions of that class, and some to be public, which are visible and accessible to everybody. Both data and function members can be either public or private. In our stack example, note that once we have the Full() function, we really dont need to look at the top or stack members outside of the class in fact, wed rather that users of the Stack abstraction not know about its internal implementation, in case we change it. Thus we can rewrite the class as follows: class Stack { public: void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. private: int top; // Index of the top of the stack. int stack[10]; // The elements of the stack. };

c USQ, June 6, 2011

62

Appendix C A Quick Introduction to C++ Before, given a pointer to a Stack object, say s, any part of the program could access s->top, in potentially bad ways. Now, since the top member is private, only a member function, such as Full(), can access it. If any other part of the program attempts to use s->top the compiler will report an error. You can have alternating public: and private: sections in a class. Before you specify either of these, class members are private, thus the above example could have been written: class Stack { int top; // Index of the top of the stack. int stack[10]; // The elements of the stack. public: void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. }; Which form you prefer is a matter of style, but its usually best to be explicit, so that it is obvious what is intended. In Nachos, we make everything explicit. What is not a matter of style: all data members of a class should be private. All operations on data should be via that class member functions. Keeping data private adds to the modularity of the system, since you can redene how the data members are stored without changing how you access them. 3. Constructors and the operator new. In C, in order to create a new object of type Stack, one might write: struct Stack *s = (struct Stack *) malloc(sizeof (struct Stack)); InitStack(s, 17); The InitStack() function might take the second argument as the size of the stack to create, and use malloc() again to get an array of 17 integers. The way this is done in C++ is as follows: Stack *s = new Stack(17); The new function takes the place of malloc(). To specify how the object should be initialized, one declares a constructor function as a member of the class, with the name of the function being the same as the class name: class Stack { public: Stack(int sz); // Constructor: initialize variables, allocate space. void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. private: int size; // The maximum capacity of the stack. int top; // Index of the lowest unused position. int* stack; // A pointer to an array that holds the contents.

c USQ, June 6, 2011

C.3 Basic Concepts }; Stack::Stack(int sz) { size = sz; top = 0; stack = new int[size]; }

63

// Lets get an array of integers.

There are a few things going on here, so we will describe them one at a time. The new operator automatically creates (i.e. allocates) the object and then calls the constructor function for the new object. This same sequence happens even if, for instance, you declare an object as an automatic variable inside a function or block the compiler allocates space for the object on the stack, and calls the constructor function on it. In this example, we create two stacks of dierent sizes, one by declaring it as an automatic variable, and one by using new. void test() { Stack s1(17); Stack* s2 = new Stack(23); } Note there are two ways of providing arguments to constructors: with new, you put the argument list after the class name, and with automatic or global variables, you put them after the variable name. It is crucial that you always dene a constructor for every class you dene, and that the constructor initialize every data member of the class. If you dont dene your own constructor, the compiler will automatically dene one for you, and believe me, it wont do what you want (the unhelpful compiler). The data members will be initialized to random, unrepeatable values, and while your program may work anyway, it might not the next time you recompile (or vice versa!). As with normal C variables, variables declared inside a function are deallocated automatically when the function returns; for example, the s1 object is deallocated when test returns. Data allocated with new (such as s2) is stored on the heap, however, and remains after the function returns; heap data must be explicitly disposed of using delete, described below. The new operator can also be used to allocate arrays, illustrated above in allocating an array of ints, of dimension size: stack = new int[size]; Note that you can use new and delete (described below) with built-in types like int and char as well as with class objects like Stack. 4. Destructors and the operator delete. Just as new is the replacement for malloc(), the replacement for free() is delete. To get rid of the Stack object we allocated above with
c USQ, June 6, 2011

64 new, one can do: delete s2;

Appendix C A Quick Introduction to C++

This will deallocate the object, but rst it will call the destructor for the Stack class, if there is one. This destructor is a member function of Stack called ~Stack(): class Stack { public: Stack(int sz); // Constructor: initialize variables, allocate space. ~Stack(); // Destructor: deallocate space allocated above. void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. private: int size; // The maximum capacity of the stack. int top; // Index of the lowest unused position. int* stack; // A pointer to an array that holds the contents. }; Stack::~Stack() { delete [] stack; }

// delete an array of integers

The destructor has the job of deallocating the data the constructor allocated. Many classes wont need destructors, and some will use them to close les and otherwise clean up after themselves. The destructor for an object is called when the object is deallocated. If the object was created with new, then you must call delete on the object, or else the object will continue to occupy space until the program is over this is called a memory leak. Memory leaks are bad things although virtual memory is supposed to be unlimited, you can in fact run out of it and so you should be careful to always delete what you allocate. Of course, it is even worse to call delete too early delete calls the destructor and puts the space back on the heap for later re-use. If you are still using the object, you will get random and non-repeatable results that will be very dicult to debug. In my experience, using data that has already been deleted is major source of hard-to-locate bugs in student (and professional) programs, so hey, be careful out there! If the object is an automatic, allocated on the execution stack of a function, the destructor will be called and the space deallocated when the function returns; in the test() example above, s1 will be deallocated when test() returns, without you having to do anything. In Nachos, we always explicitly allocate and deallocate objects with new and delete, to make it clear when the constructor and destructor is being called. For example, if an object contains another object as a member variable, we use new to explicitly allocated and initialize the member variable, instead of implicitly allocating it as part of the containing object. C++ has strange, non-intuitive rules for the order in which the constructors and destructors are called when you implicitly allocate and deallocate objects. In practice, although simpler, explicit

c USQ, June 6, 2011

C.3 Basic Concepts

65

allocation is slightly slower and it makes it more likely that you will forget to deallocate an object (a bad thing!), and so some would disagree with this approach. When you deallocate an array, you have to tell the compiler that you are deallocating an array, as opposed to a single element in the array. Hence to delete the array of integers in Stack::~Stack: delete [] stack; C.3.2 Other Basic C++ Features

Here are a few other C++ features that are useful to know. 1. When you dene a class Stack, the name Stack becomes usable as a type name as if created with typedef. The same is true for enums. 2. You can dene functions inside of a class denition, whereupon they become inline functions, which are expanded in the body of the function where they are used. The rule of thumb to follow is to only consider inlining one-line functions, and even then do so rarely. As an example, we could make the Full routine an inline. class Stack { ... bool Full() { return (top == size); }; ... }; There are two motivations for inlines: convenience and performance. If overused, inlines can make your code more confusing, because the implementation for an object is no longer in one place, but spread between the .h and .c les. Inlines can sometimes speed up your code (by avoiding the overhead of a procedure call), but that shouldnt be your principal concern as a student (rather, at least to begin with, you should be most concerned with writing code that is simple and bug free). Not to mention that inlining sometimes slows down a program, since the object code for the function is duplicated wherever the function is called, potentially hurting cache performance. 3. Inside a function body, you can declare some variables, execute some statements, and then declare more variables. This can make code a lot more readable. In fact, you can even write things like: for (int i = 0; i < 10; i++) ; Depending on your compiler, however, the variable i may still visible after the end of the for loop, however, which is not what one might expect or desire. 4. Comments can begin with the characters // and extend to the end of the line. These are usually more handy than the /* */ style of comments. 5. C++ provides some new opportunities to use the const keyword from ANSI C. The basic idea of const is to provide extra information to the compiler about how a variable or function
c USQ, June 6, 2011

66

Appendix C A Quick Introduction to C++ is used, to allow it to ag an error if it is being used improperly. You should always look for ways to get the compiler to catch bugs for you. After all, which takes less time? Fixing a compiler-agged error, or chasing down the same bug using gdb? For example, you can declare that a member function only reads the member data, and never modies the object: class Stack { ... bool Full() const; ... };

// Full() never modifies member data

As in C, you can use const to declare that a variable is never modied: const int InitialHashTableSize = 8; This is much better than using #define for constants, since the above is type-checked. 6. Input/output in C++ can be done with the >> and << operators and the objects cin and cout. For example, to write to stdout: cout << "Hello world! This is section " << 3 << "!";

This is equivalent to the normal C code fprintf(stdout, "Hello world! This is section %d!\n", 3);

except that the C++ version is type-safe; with printf, the compiler wont complain if you try to print a oating point number as an integer. In fact, you can use traditional printf in a C++ program, but you will get bizarre behavior if you try to use both printf and << on the same stream. Reading from stdin works the same way as writing to stdout, except using the shift right operator instead of shift left. In order to read two integers from stdin: int field1, field2; cin >> field1 >> field2; // equivalent to fscanf(stdin, "%d %d", &field1, &field2); // note that field1 and field2 are implicitly modified In fact, cin and cout are implemented as normal C++ objects, using operator overloading and reference parameters, but (fortunately!) you dont need to understand either of those to be able to do I/O in C++.

C.4

Advanced Concepts in C++: Dangerous but Occasionally Useful

There are a few C++ features, namely (single) inheritance and templates, which are easily abused, but can dramatically simplify an implementation if used properly. I describe the basic idea behind these dangerous but useful features here, in case you run across them. Feel free to skip this section its long, complex, and you can understand 99% of the code in Nachos without reading this section.
c USQ, June 6, 2011

C.4 Advanced Concepts in C++: Dangerous but Occasionally Useful

67

Up to this point, there really hasnt been any fundamental dierence between programming in C and in C++. In fact, most experienced C programmers organize their functions into modules that relate to a single data structure (a class), and often even use a naming convention which mimics C++, for example, naming routines StackFull() and StackPush(). However, the features Im about to describe do require a paradigm shift there is no simple translation from them into a normal C program. The benet will be that, in some circumstances, you will be able to write generic code that works with multiple kinds of objects. Nevertheless, I would advise a beginning C++ programmer against trying to use these features, because you will almost certainly misuse them. Its possible (even easy!) to write completely inscrutable code using inheritance and/or templates. Although you might nd it amusing to write code that is impossible for your graders to understand, I assure you they wont nd it amusing at all, and will return the favor when they assign grades. In industry, a high premium is placed on keeping code simple and readable. Its easy to write new code, but the real cost comes when you try to keep it working, even as you add new features to it. Nachos contains a few examples of the correct use of inheritance and templates, but realize that Nachos does not use them everywhere. In fact, if you get confused by this section, dont worry, you dont need to use any of these features in order to do the Nachos assignments. I omit a whole bunch of details; if you nd yourself making widespread use of inheritance or templates, you should consult a C++ reference manual for the real scoop. This is meant to be just enough to get you started, and to help you identify when it would be appropriate to use these features and thus learn more about them! C.4.1 Inheritance

Inheritance captures the idea that certain classes of objects are related to each other in useful ways. For example, lists and sorted lists have quite similar behavior they both allow the user to insert, delete, and nd elements that are on the list. There are two benets to using inheritance: 1. You can write generic code that doesnt care exactly which kind of object it is manipulating. For example, inheritance is widely used in windowing systems. Everything on the screen (windows, scroll bars, titles, icons) is its own object, but they all share a set of member functions in common, such as a routine Repaint to redraw the object onto the screen. This way, the code to repaint the entire screen can simply call the Repaint function on every object on the screen. The code that calls Repaint doesnt need to know which kinds of objects are on the screen, as long as each implements Repaint. 2. You can share pieces of an implementation between two objects. For example, if you were to implement both lists and sorted lists in C, youd probably nd yourself repeating code in both places in fact, you might be really tempted to only implement sorted lists, so that you only had to debug one version. Inheritance provides a way to re-use code between nearly similar classes. For example, given an implementation of a list class, in C++ you can implement sorted lists by replacing the insert member function the other functions, delete, isFull, print, all remain the same.

c USQ, June 6, 2011

68 Shared Behavior

Appendix C A Quick Introduction to C++

Let me use our Stack example to illustrate the rst of these. Our Stack implementation above could have been implemented with linked lists, instead of an array. Any code using a Stack shouldnt care which implementation is being used, except that the linked list implementation cant overow. (In fact, we could also change the array implementation to handle overow by automatically resizing the array as items are pushed on the stack.) To allow the two implementations to coexist, we rst dene an abstract Stack, containing just the public member functions, but no data. class Stack { public: Stack(); virtual ~Stack(); // deallocate the stack virtual void Push(int value) = 0; // Push an integer, checking for overflow. virtual bool Full() = 0; // Is the stack is full? }; // For g++, need these even though no data to initialize. Stack::Stack {} Stack::~Stack() {} The Stack denition is called a base class or sometimes a superclass. We can then dene two dierent derived classes, sometimes called subclasses which inherit behavior from the base class. (Of course, inheritance is recursive a derived class can in turn be a base class for yet another derived class, and so on.) Note that I have prepended the functions in the base class is prepended with the keyword virtual, to signify that they can be redened by each of the two derived classes. The virtual functions are initialized to zero, to tell the compiler that those functions must be dened by the derived classes. Heres how we could declare the array-based and list-based implementations of Stack. The syntax : public Stack signies that both ArrayStack and ListStack are kinds of Stacks, and share the same behavior as the base class. class ArrayStack : public Stack { // the same as in Section 2 public: ArrayStack(int sz); // Constructor: initialize variables, allocate space. ~ArrayStack(); // Destructor: deallocate space allocated above. void Push(int value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. private: int size; // The maximum capacity of the stack. int top; // Index of the lowest unused position. int *stack; // A pointer to an array that holds the contents. };

c USQ, June 6, 2011

C.4 Advanced Concepts in C++: Dangerous but Occasionally Useful class ListStack : public Stack { public: ListStack(); ~ListStack(); void Push(int value); bool Full(); private: List *list; // list of items pushed on the stack }; ListStack::ListStack() { list = new List; } ListStack::~ListStack() { delete list; } void ListStack::Push(int value) { list->Prepend(value); } bool ListStack::Full() { return FALSE; // this stack never overflows! }

69

The neat concept here is that I can assign pointers to instances of ListStack or ArrayStack to a variable of type Stack, and then use them as if they were of the base type. Stack *s1 = new ListStack; Stack *s2 = new ArrayStack(17); if (!stack->Full()) s1->Push(5); if (!s2->Full()) s2->Push(6); delete s1; delete s2; The compiler automatically invokes ListStack operations for s1, and ArrayStack operations for s2; this is done by creating a procedure table for each object, where derived objects override the default entries in the table dened by the base class. To the code above, it invokes the operations Full, Push, and delete by indirection through the procedure table, so that the code doesnt need to know which kind of object it is. In this example, since I never create an instance of the abstract class Stack, I do not need to

c USQ, June 6, 2011

70

Appendix C A Quick Introduction to C++

implement its functions. This might seem a bit strange, but remember that the derived classes are the various implementations of Stack, and Stack serves only to reect the shared behavior between the dierent implementations. Also note that the destructor for Stack is a virtual function but the constructor is not. Clearly, when I create an object, I have to know which kind of object it is, whether ArrayStack or ListStack. The compiler makes sure that no one creates an instance of the abstract Stack by mistake you cannot instantiate any class whose virtual functions are not completely dened (in other words, if any of its functions are set to zero in the class denition). But when I deallocate an object, I may no longer know its exact type. In the above code, I want to call the destructor for the derived object, even though the code only knows that I am deleting an object of class Stack. If the destructor were not virtual, then the compiler would invoke Stacks destructor, which is not at all what I want. This is an easy mistake to make (I made it in the rst draft of this article!) if you dont dene a destructor for the abstract class, the compiler will dene one for you implicitly (and by the way, it wont be virtual, since you have a really unhelpful compiler). The result for the above code would be a memory leak, and who knows how you would gure that out! Shared Implementation What about sharing code, the other reason for inheritance? In C++, it is possible to use member functions of a base class in its derived class. (You can also share data between a base class and derived classes, but this is a bad idea for reasons Ill discuss later.) Suppose that I wanted to add a new member function, NumberPushed(), to both implementations of Stack. The ArrayStack class already keeps count of the number of items on the stack, so I could duplicate that code in ListStack. Ideally, Id like to be able to use the same code in both places. With inheritance, we can move the counter into the Stack class, and then invoke the base class operations from the derived class to update the counter. class Stack { public: virtual ~Stack(); // deallocate data virtual void Push(int value); // Push an integer, checking for overflow. virtual bool Full() = 0; // return TRUE if full int NumPushed(); // how many are currently on the stack? protected: Stack(); // initialize data private: int numPushed; }; Stack::Stack() { numPushed = 0; } void Stack::Push(int value) {
c USQ, June 6, 2011

C.4 Advanced Concepts in C++: Dangerous but Occasionally Useful numPushed++; } int Stack::NumPushed() { return numPushed; }

71

We can then modify both ArrayStack and ListStack to make use the new behavior of Stack. Ill only list one of them here: class ArrayStack : public Stack { public: ArrayStack(int sz); ~ArrayStack(); void Push(int value); bool Full(); private: int size; // The maximum capacity of the stack. int *stack; // A pointer to an array that holds the contents. }; ArrayStack::ArrayStack(int sz) : Stack() { size = sz; stack = new int[size]; // Lets get an array of integers. } void ArrayStack::Push(int value) { ASSERT(!Full()); stack[NumPushed()] = value; Stack::Push(); // invoke base class to increment numPushed } There are a few things to note: 1. The constructor for ArrayStack needs to invoke the constructor for Stack, in order to initialize numPushed. It does that by adding : Stack() to the rst line in the constructor: ArrayStack::ArrayStack(int sz) : Stack() The same thing applies to destructors. There are special rules for which get called rst the constructor/destructor for the base class or the constructor/destructor for the derived class. All I should say is, its a bad idea to rely on whatever the rule is more generally, it is a bad idea to write code which requires the reader to consult a manual to tell whether or not the code works! 2. I introduced a new keyword, protected, in the new denition of Stack. For a base class, protected signies that those member data and functions are accessible to classes derived
c USQ, June 6, 2011

72

Appendix C A Quick Introduction to C++ (recursively) from this class, but inaccessible to other classes. In other words, protected data is public to derived classes, and private to everyone else. For example, we need Stacks constructor to be callable by ArrayStack and ListStack, but we dont want anyone else to create instances of Stack. Hence, we make Stacks constructor a protected function. In this case, this is not strictly necessary since the compiler will complain if anyone tries to create an instance of Stack because Stack still has an undened virtual functions, Push. By dening Stack::Stack as protected, you are safe even if someone comes along later and denes Stack::Push. Note however that I made Stacks data member private, not protected. Although there is some debate on this point, as a rule of thumb you should never allow one class to see directly access the data in another, even among classes related by inheritance. Otherwise, if you ever change the implementation of the base class, you will have to examine and change all the implementations of the derived classes, violating modularity. 3. The interface for a derived class automatically includes all functions dened for its base class, without having to explicitly list them in the derived class. Although we didnt dene NumPushed() in ArrayStack, we can still call it for those objects: ArrayStack *s = new ArrayStack(17); ASSERT(s->NumPushed() == 0); // should be initialized to 0 4. Conversely, even though we have dened a routine Stack::Push(), because it is declared as virtual, if we invoke Push() on an ArrayStack object, we will get ArrayStacks version of Push: Stack *s = new ArrayStack(17); if (!s->Full()) // ArrayStack::Full s->Push(5); // ArrayStack::Push 5. Stack::NumPushed() is not virtual. That means that it cannot be re-dened by Stacks derived classes. Some people believe that you should mark all functions in a base class as virtual; that way, if you later want to implement a derived class that redenes a function, you dont have to modify the base class to do so. 6. Member functions in a derived class can explicitly invoke public or protected functions in the base class, by the full name of the function, Base::Function(), as in: void ArrayStack::Push(int value) { ... Stack::Push(); // invoke base class to increment numPushed } Of course, if we just called Push() here (without prepending Stack::, the compiler would think we were referring to ArrayStacks Push(), and so that would recurse, which is not exactly what we had in mind here.

c USQ, June 6, 2011

C.4 Advanced Concepts in C++: Dangerous but Occasionally Useful

73

Whew! Inheritance in C++ involves lots and lots of details. But its real downside is that it tends to spread implementation details across multiple les if you have a deep inheritance tree, it can take some serious digging to gure out what code actually executes when a member function is invoked. So the question to ask yourself before using inheritance is: whats your goal? Is it to write your programs with the fewest number of characters possible? If so, inheritance is really useful, but so is changing all of your function and variable names to be one letter long a, b, c and once you run out of lower case ones, start using upper case, then two character variable names: XX XY XZ Ya ... (Im joking here.) Needless to say, it is really easy to write unreadable code using inheritance. So when is it a good idea to use inheritance and when should it be avoided? My rule of thumb is to only use it for representing shared behavior between objects, and to never use it for representing shared implementation. With C++, you can use inheritance for both concepts, but only the rst will lead to truly simpler implementations. To illustrate the dierence between shared behavior and shared implementation, suppose you had a whole bunch of dierent kinds of objects that you needed to put on lists. For example, almost everything in an operating system goes on a list of some sort: buers, threads, users, terminals, etc. A very common approach to this problem (particularly among people new to object-oriented programming) is to make every object inherit from a single base class Object, which contains the forward and backward pointers for the list. But what if some object needs to go on multiple lists? The whole scheme breaks down, and its because we tried to use inheritance to share implementation (the code for the forward and backward pointers) instead of to share behavior. A much cleaner (although slightly slower) approach would be to dene a list implementation that allocated forward/backward pointers for each object that gets put on a list. In sum, if two classes share at least some of the same member function signatures that is, the same behavior, and if theres code that only relies on the shared behavior, then there may be a benet to using inheritance. In Nachos, locks dont inherit from semaphores, even though locks are implemented using semaphores. The operations on semaphores and locks are dierent. Instead, inheritance is only used for various kinds of lists (sorted, keyed, etc.), and for dierent implementations of the physical disk abstraction, to reect whether the disk has a track buer, etc. A disk is used the same way whether or not it has a track buer; the only dierence is in its performance characteristics. C.4.2 Templates

Templates are another useful but dangerous concept in C++. With templates, you can parameterize a class denition with a type, to allow you to write generic type-independent code. For example, our Stack implementation above only worked for pushing and popping integers; what if we wanted a stack of characters, or oats, or pointers, or some arbitrary data structure? In C++, this is pretty easy to do using templates: template <class T>

c USQ, June 6, 2011

74

Appendix C A Quick Introduction to C++

class Stack { public: Stack(int sz); // Constructor: initialize variables, allocate space. ~Stack(); // Destructor: deallocate space allocated above. void Push(T value); // Push an integer, checking for overflow. bool Full(); // Returns TRUE if the stack is full, FALSE otherwise. private: int size; // The maximum capacity of the stack. int top; // Index of the lowest unused position. T *stack; // A pointer to an array that holds the contents. }; To dene a template, we prepend the keyword template to the class denition, and we put the parameterized type for the template in angle brackets. If we need to parameterize the implementation with two or more types, it works just like an argument list: template <class T, class S>. We can use the type parameters elsewhere in the denition, just like they were normal types. When we provide the implementation for each of the member functions in the class, we also have to declare them as templates, and again, once we do that, we can use the type parameters just like normal types: // template version of Stack::Stack template <class T> Stack<T>::Stack(int sz) { size = sz; top = 0; stack = new T[size]; // Lets get an array of type T } // template version of Stack::Push template <class T> void Stack<T>::Push(T value) { ASSERT(!Full()); stack[top++] = value; } Creating an object of a template class is similar to creating a normal object: void test() { Stack<int> s1(17); Stack<char> *s2 = new Stack<char>(23); s1.Push(5); s2->Push(z); delete s2;

c USQ, June 6, 2011

C.5 Features To Avoid Like the Plague }

75

Everything operates as if we dened two classes, one called Stack<int> a stack of integers, and one called Stack<char> a stack of characters. s1 behaves just like an instance of the rst; s2 behaves just like an instance of the second. In fact, that is exactly how templates are typically implemented you get a complete copy of the code for the template for each dierent instantiated type. In the above example, wed get one copy of the code for ints and one copy for chars. So whats wrong with templates? Youve all been taught to make your code modular so that it can be re-usable, so everything should be a template, right? Wrong. The principal problem with templates is that they can be very dicult to debug templates are easy to use if they work, but nding a bug in them can be dicult. In part this is because current generation C++ debuggers dont really understand templates very well. Nevertheless, it is easier to debug a template than two nearly identical implementations that dier only in their types. So the best advice is dont make a class into a template unless there really is a near term use for the template. And if you do need to implement a template, implement and debug a non-template version rst. Once that is working, it wont be hard to convert it to a template. Then all you have to worry about code explosion e.g., your programs object code is now megabytes because of the 15 copies of the hash table/list/... routines, one for each kind of thing you want to put in a hash table/list/... (Remember, you have an unhelpful compiler!)

C.5

Features To Avoid Like the Plague

Despite the length of this note, there are numerous features in C++ that I havent explained. Im sure each feature has its advocates, but despite programming in C and C++ for over 15 years, I havent found a compelling reason to use them in any code that Ive written (outside of a programming language class!) Indeed, there is a compelling reason to avoid using these features they are easy to misuse, resulting in programs that are harder to read and understand instead of easier to understand. In most cases, the features are also redundant there are other ways of accomplishing the same end. Why have two ways of doing the same thing? Why not stick with the simpler one? I do not use any of the following features in Nachos. If you use them, caveat hacker. 1. Multiple inheritance. It is possible in C++ to dene a class as inheriting behavior from multiple classes (for instance, a dog is both an animal and a furry thing). But if programs using single inheritance can be dicult to untangle, programs with multiple inheritance can get really confusing. 2. References. Reference variables are rather hard to understand in general; they play the same role as pointers, with slightly dierent syntax (unfortunately, Im not joking!) Their most common use is to declare some parameters to a function as reference parameters, as in Pascal. A call-by-reference parameter can be modied by the calling function, without the callee having to pass a pointer. The eect is that parameters look (to the caller) like they are called by value (and therefore cant change), but in fact can be transparently modied

c USQ, June 6, 2011

76

Appendix C A Quick Introduction to C++ by the called function. Obviously, this can be a source of obscure bugs, not to mention that the semantics of references in C++ are in general not obvious. 3. Operator overloading. C++ lets you redene the meanings of the operators (such as + and >>) for class objects. This is dangerous at best (exactly which implementation of + does this refer to?), and when used in non-intuitive ways, a source of great confusion, made worse by the fact that C++ does implicit type conversion, which can aect which operator is invoked. Unfortunately, C++s I/O facilities make heavy use of operator overloading and references, so you cant completely escape them, but think twice before you redene + to mean concatenate these two strings. 4. Function overloading. You can also dene dierent functions in a class with the same name but dierent argument types. This is also dangerous (since its easy to slip up and get the unintended version), and we never use it. We will also avoid using default arguments (for the same reason). Note that it can be a good idea to use the same name for functions in dierent classes, provided they use the same arguments and behave the same way a good example of this is that most Nachos objects have a Print() method. 5. Standard template library. An ANSI standard has emerged for a library of routines implementing such things as lists, hash tables, etc., called the standard template library. Using such a library should make programming much simpler if the data structure you need is already provided in the library. Alas, the standard template library pushes the envelope of legal C++, and so virtually no compilers (including g++) can support it today. Not to mention that it uses (big surprise!) references, operator overloading, and function overloading. 6. Exceptions. There are two ways to return an error from a procedure. One is simple just dene the procedure to return an error code if it isnt able to do its job. For example, the standard library routine malloc returns NULL if there is no available memory. However, lots of programmers are lazy and dont check error codes. So whats the solution? You might think it would be to get programmers who arent lazy, but no, the C++ solution is to add a programming language construct! A procedure can return an error by raising an exception which eectively causes a goto back up the execution stack to the last place the programmer put an exception handler. You would think this is too bizarre to be true, but unfortunately, Im not making this up.

While Im at it, there are a number of features of C that you also should avoid, because they lead to bugs and make your code less easy to understand. See Maguires Writing Solid Code for a more complete discussion of this issue. All of these features are legal C; whats legal isnt necessarily good. 1. Pointer arithmetic. Runaway pointers are a principal source of hard-to-nd bugs in C programs, because the symptom of this happening can be mangled data structures in a completely dierent part of the program. Depending on exactly which objects are allocated on the heap in which order, pointer bugs can appear and disappear, seemingly at random. For example, printf sometimes allocates memory on the heap, which can change the addresses returned by all future calls to new. Thus, adding a printf can change things so that a pointer which used to (by happenstance) mangle a critical data structure (such as the middle of a threads execution stack), now overwrites memory that may not even be used.

c USQ, June 6, 2011

C.6 Style Guidelines

77

The best way to avoid runaway pointers is (no surprise) to be very careful when using pointers. Instead of iterating through an array with pointer arithmetic, use a separate index variable, and assert that the index is never larger than the size of the array. Optimizing compilers have gotten very good, so that the generated machine code is likely to be the same in either case. Even if you dont use pointer arithmetic, its still easy (easy is bad in this context!) to have an o-by-one errror that causes your program to step beyond the end of an array. How do you x this? Dene a class to contain the array and its length; before allowing any access to the array, you can then check whether the access is legal or in error. 2. Casts from integers to pointers and back. Another source of runaway pointers is that C and C++ allow you to convert integers to pointers, and back again. Needless to say, using a random integer value as a pointer is likely to result in unpredictable symptoms that will be very hard to track down. In addition, on some 64 bit machines, such as the Alpha, it is no longer the case that the size of an integer is the same as the the size of a pointer. If you cast between pointers and integers, you are also writing highly non-portable code. 3. Using bit shift in place of a multiply or divide. This is a clarity issue. If you are doing arithmetic, use arithmetic operators; if you are doing bit manipulation, use bitwise operators. If I am trying to multiply by 8, which is easier to understand, x << 3 or x * 8? In the 70s, when C was being developed, the former would yield more ecient machine code, but todays compilers generate the same code in both cases, so readability should be your primary concern. 4. Assignment inside conditional. Many programmers have the attitude that simplicity equals saving as many keystrokes as possible. The result can be to hide bugs that would otherwise be obvious. For example: if (x = y) { ... Was the intent really x == y? After all, its pretty easy to mistakenly leave o the extra equals sign. By never using assignment within a conditional, you can tell by code inspection whether youve made a mistake. 5. Using #define when you could use enum. When a variable can hold one of a small number of values, the original C practice was to use #define to set up symbolic names for each of the values. enum does this in a type-safe way it allows the compiler to verify that the variable is only assigned one of the enumerated values, and none other. Again, the advantage is to eliminate a class of errors from your program, making it quicker to debug.

C.6

Style Guidelines

Even if you follow the approach Ive outlined above, it is still as easy to write unreadable and undebuggable code in C++ as it is in C, and perhaps easier, given the more powerful features the language provides. For the Nachos project, and in general, we suggest you adhere to the following guidelines (and tell us if you catch us breaking them):
c USQ, June 6, 2011

78

Appendix C A Quick Introduction to C++ 1. Words in a name are separated SmallTalk-style (i.e., capital letters at the start of each new word). All class names and member function names begin with a capital letter, except for member functions of the form getSomething() and setSomething(), where Something is a data element of the class (i.e., accessor functions). Note that you would want to provide such functions only when the data should be visible to the outside world, but you want to force all accesses to go through one function. This is often a good idea, since you might at some later time decide to compute the data instead of storing it, for example. 2. All global functions should be capitalized, except for main and library functions, which are kept lower-case for historical reasons. 3. Minimize the use of global variables. If you nd yourself using a lot of them, try and group some together in a class in a natural way or pass them as arguments to the functions that need them if you can. 4. Minimize the use of global functions (as opposed to member functions). If you write a function that operates on some object, consider making it a member function of that object. 5. For every class or set of related classes, create a separate .h le and .cc le. The .h le acts as the interface to the class, and the .cc le acts as the implementation (a given .cc le should include its respective .h le). If using a particular .h le requires another .h le to be included (e.g., synch.h needs class denitions from thread.h) you should include the dependency in the .h le, so that the user of your class doesnt have to track down all the dependencies himself. To protect against multiple inclusion, bracket each .h le with something like: #ifndef STACK_H #define STACK_H class Stack { ... }; #endif Sometimes this will not be enough, and you will have a circular dependency. For example, you might have a .h le that uses a denition from one .h le, but also denes something needed by that .h le. In this case, you will have to do something ad-hoc. One thing to realize is that you dont always have to completely dene a class before it is used. If you only use a pointer to class Stack and do not access any member functions or data from the class, you can write, in lieu of including stack.h: class Stack; This will tell the compiler all it needs to know to deal with the pointer. In a few cases this wont work, and you will have to move stu around or alter your denitions. 6. Use ASSERT statements liberally to check that your program is behaving properly. An assertion is a condition that if FALSE signies that there is a bug in the program; ASSERT tests an expression and aborts if the condition is false. We used ASSERT above in Stack::Push() to check that the stack wasnt full. The idea is to catch errors as early as possible, when they

c USQ, June 6, 2011

C.7 Compiling and Debugging

79

are easier to locate, instead of waiting until there is a user-visible symptom of the error (such as a segmentation fault, after memory has been trashed by a rogue pointer). Assertions are particularly useful at the beginnings and ends of procedures, to check that the procedure was called with the right arguments, and that the procedure did what it is supposed to. For example, at the beginning of List::Insert, you could assert that the item being inserted isnt already on the list, and at the end of the procedure, you could assert that the item is now on the list. If speed is a concern, ASSERTs can be dened to make the check in the debug version of your program, and to be a no-op in the production version. But many people run with ASSERTs enabled even in production. 7. Write a module test for every module in your program. Many programmers have the notion that testing code means running the entire program on some sample input; if it doesnt crash, that means its working, right? Wrong. You have no way of knowing how much code was exercised for the test. Let me urge you to be methodical about testing. Before you put a new module into a bigger system, make sure the module works as advertised by testing it standalone. If you do this for every module, then when you put the modules together, instead of hoping that everything will work, you will know it will work. Perhaps more importantly, module tests provide an opportunity to nd as many bugs as possible in a localized context. Which is easier: nding a bug in a 100 line program, or in a 10000 line program?

C.7

Compiling and Debugging

The Makeles we will give you works only with the GNU version of make, called gmake. You may want to put alias make gmake in your .cshrc le. You should use gdb to debug your program rather than dbx. Dbx doesnt know how to decipher C++ names, so you will see function names like Run__9SchedulerP6Thread. On the other hand, in GDB (but not DBX) when you do a stack backtrace when in a forked thread (in homework 1), after printing out the correct frames at the top of the stack, the debugger will sometimes go into a loop printing the lower-most frame (ThreadRoot), and you have to type control-C when it says more?. If you understand assembly language and can x this, please let me know.

C.8

Example: A Stack of Integers

Weve provided the complete, working code for the stack example. You should read through it and play around with it to make sure you understand the features of C++ described in this paper. To compile the simple stack test, type make all this will compile the simple stack test (stack.cc), the inherited stack test (inheritstack.cc), and the template version of stacks (templatestack.cc).

c USQ, June 6, 2011

80

Appendix C A Quick Introduction to C++

C.9

Epilogue

Ive argued in this note that you should avoid using certain C++ and C features. But youre probably thinking I must be leaving something out if someone put the feature in the language, there must be a good reason, right? I believe that every programmer should strive to write code whose behavior would be immediately obvious to a reader; if you nd yourself writing code that would require someone reading the code to thumb through a manual in order to understand it, you are almost certainly being way too subtle. Theres probably a much simpler and more obvious way to accomplish the same end. Maybe the code will be a little longer that way, but in the real world, its whether the code works and how simple it is for someone else to modify, that matters a whole lot more than how many characters you had to type. A nal thought to remember: There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deciencies and the other way is to make it so complicated that there are no obvious deciencies. C. A. R. Hoare, The Emperors Old Clothes, CACM Feb. 1981

C.10

Further Reading

James Coplien, Advanced C++, Addison-Wesley. This book is only for experts, but it has some good ideas in it, so keep it in mind once youve been programming in C++ for a few years. James Gosling. The Java Language. Online at http://java.sun.com/ Java is a safe subset of C++. Its main application is the safe extension of Web browsers by allowing you to download Java code as part of clicking on a link to interpret and display the document. Safety is key here, since after all, you dont want to click on a Web link and have it download code that will crash your browser. Java was dened independently of this document, but interestingly, it enforces a very similar style (for example, no multiple inheritance and no operator overloading). C.A.R. Hoare, The Emperors Old Clothes. Communications of the ACM, Vol. 24, No. 2, February 1981, pp. 75-83. Tony Hoares Turing Award lecture. How do you build software that really works? Attitude is everything you need a healthy respect for how hard it is to build working software. It might seem that addding this whiz-bang feature is only a small matter of code, but thats the path to late, buggy products that dont work. Brian Kernighan and Dennis Ritchie, The C Programming Language, Prentice-Hall. The original C book a very easy read. But the language has evolved since it was rst designed, and this book doesnt describe all of Cs newest features. But still the best place for a beginner to start, even when learning C++. Steve Maguire, Writing Solid Code, Microsoft Press. How to write bug-free software; I think this should be required reading for all software engineers. This really will change your life if you dont follow the recommendations in this book, youll probably never write code that completely works, and youll spend your entire life struggling with hard to nd bugs.

c USQ, June 6, 2011

C.10 Further Reading

81

There is a better way! Contrary to the programming language types, this doesnt involve proving the correctness of your programs, whatever that means. Instead, Maguire has a set of practical engineering solutions to writing solid code. Steve Maguire, Debugging the Development Process, Microsoft Press. Maguires follow up book on how to lead an eective team, and by the way, how to be an eective engineer. Maguires background is that he is a turnaround artist for Microsoft he gets assigned to oundering teams, and gures out how to make them eective. After youve pulled a few all-nighters to get that last bug out of your course project, youre probably wondering why in heck youre studying computer science anyway. This book will explain how to write programs that work, and still have a life! Scott Meyers, Eective C++. This book describes how 50 easy ways to make mistakes C++; if you avoid these, you will be a lot more likely to write C++ code that works. Bjarne Stroustrup, The C++ Programming Language, Addison-Wesley. This should be the denite reference manual, but it isnt. You probably thought I was joking when I said the C++ language was continually evolving. I bought the second edition of this book three years ago, and it is already out of date. Fortunately, its still OK for the subset of C++ that I use.

c USQ, June 6, 2011

82

Appendix C A Quick Introduction to C++

c USQ, June 6, 2011

C.10 Further Reading

83

c USQ, June 6, 2011

You might also like