Berkeley RISC

RISC: The Processor Architecture of the Future
Introduction
In this essay I shall be arguing the benefits of the RISC school of processor design over more traditional instruction set architectures, while at the same time telling the story of the development of RISC in the wider context of the history of computers. RISC stands for 'Reduced Instruction Set Computer', the basic premise of which is as follows: Conventional processors spend most of their time executing only a small subset of simple instructions. By providing only these commonly used instructions and abstaining from complex patterns of memory access, RISC processors can be simpler and hence run faster and more efficiently. The story of RISC really begins with the early computers of the 1940s such as Colossus and ENIAC, where there were no stored programs at all - only hardware patch-panels. Programming these machines was a matter of altering the physical configuration of the circuits. The subsequent development of stored-program machines such as the Manchester Mark I allowed programs to be loaded into memory along with data. Now programs could be written in machine code, or its symbolic equivalent - assembly language. However, writing software in assembly language with a simple instruction set is a laborious and time-consuming process: "It is like directing someone by telling him which muscles to move" (Ceramalus, 1998). In order to make life easier for the programmers, hardware designers began providing more and more powerful instructions - a trend that would continue until the end of the 1970s. The instructions being built into machines became so complex that they had to be realised using microcode rather than random logic - high-level functions are interpreted into microinstructions that are designed to execute in a single cycle. Another strategy that was developed to ease life for the programmers was the creation of high-level languages, where programs were interpreted or compiled into low-level machine instructions. The full significance of this development was not initially realised by the hardware engineers, however, who continued designing computers to be programmed in assembly language.
The 801 project
The birthplace of the modern-day RISC architecture was undoubtedly IBM's Thomas J. Watson Research Centre, where in 1974 a team of engineers began work on a project to design a large telephone switching network capable of handling 300 calls per second. With stringent real-time response requirements, an ambitious performance target of 12 MIPS was set. "This specialised application required a very fast processor, but did not have to perform complicated instructions and had little demand for floating-point calculations." (Cocke and Markstein, 1990) The telephone project was terminated in 1975, with no machine having been built. However, the design had a number of promising features that seemed an ideal basis for a high performance minicomputer with a good cost/performance ratio:

Separate instruction and data caches, allowing high bandwidth memory access. No arithmetic operations to storage, allowing a greatly simplified pipeline. Uniform instruction length and simplicity of design (only ten levels of logic), making possible a very short cycle time.
These are still the basic principles underlying RISC machines today, although of course at the time that acronym had not yet been invented. Instead, the new minicomputer project was named '801' after the number of the building where the research was taking place. Contemporary machines such as the System/370 had a large number of complex instructions, the prevailing wisdom in the 1970s being that "...the more instructions you could pack into a machine the better..." (Ceramalus, 1998). However, programmers and compilers were making little or no use of a large number of these instructions. As Radin (1983) explains "...in fact these instructions are often hard to use, since the compiler must find those cases which exactly fit the architected construct." It was not until researchers analysed the vast amount of data that IBM had on instruction frequencies, that this waste was noticed: "...it was clear that LOAD, STORE, BRANCH, FIXED-POINT ADD, and FIXED-POINT COMPARE accounted for well over half of the total execution time in most application areas." (Cocke and Markstein, 1990) The key realisation of the 801 team was that not only were the complex high-level instructions rarely used, but they were having a "...pernicious effect on the primitive instructions..." (Radin, 1983). If the presence of complex instructions adds extra logic levels to the basic machine cycle, or if instructions have to be interpreted into microcode, then the whole CPU is
slowed down: "Imposing microcode between a computer and its users imposes an expensive overhead in performing the most frequently executed instructions." (Cocke and Markstein, 1990) Since the goal of the 801 was to execute the most frequently used instructions as fast as possible, many of the more complex System/370 instructions were deliberately excluded from the instruction set. This left a machine that was "...very similar to a vertical microcode engine..." (Cocke and Markstein, 1990), "...but instead of 'hiding' this attribute behind a complex instructions set in microcode, we exposed it directly to the end user." Apart from efficiency gains in execution of the simplest instructions, programming complex functions as macros or procedures rather than in hardware was found to have interesting advantages:
The CPU is very responsive to interrupts, being interruptible at "microcode" boundaries: CISC architectures must either restrict interrupts to coarse-grained instruction boundaries, or define interruptible points and deal with the complexities of guaranteeing atomicity and re-starting instructions. An optimizing compiler can often separate and rearrange the components of a function, e.g. moving some parts out of a loop or rescheduling memory accesses. It is often possible for parts of a complex instruction to be computed at compile time - for multiplication by a constant, a compiler can often substitute more efficient shift/add sequences.
Of course there were also some disadvantages to dispensing with microcode:
One of the benefits of vertical microcode is the residence of micro-instructions in a high-speed control store. "This amounts to a hardware architect attempting to guess which subroutines, or macros, are most frequently used and assigning high-speed memory to them." (Radin, 1983) Inevitably, simple instructions reduce the code density of RISC machines, and hence programs take up more memory.
In order to match the performance characteristics of vertical microcode, the 801 fetched instructions via a high-speed "least-recently-used" cache, in which all frequently used functions were likely to be found. In any case, the simplified pipeline allowed instructions to be fetched more easily. In the event, serious problems with the code density of RISC machines did not materialise. "The code sequences were not unduly long or unnatural. In later years,
path-length comparisons between RISC and CISC architectures have been shown to be very nearly equal." (Cocke and Markstein, 1990) This result stems partly from the comparative rarity of complex instructions in conventional code, but also because the 801's regular instruction format and large number of registers allowed the compiler to perform greater optimisation. In any case, at the time "...memory became cheaper... and the motivation for making small but really powerful instructions faded." (Ho, 1998) When the 801 minicomputer was eventually built in 1978 it was IBM's fastest experimental processor. However, the project was terminated by IBM in 1980 without the 801 ever reaching the market.
Berkeley RISC
In 1980, at the University of California, Berkeley, Dr David Patterson and his team began a Reduced Instruction Set Computer project, in order to investigate "...an alternative to the general trend toward computers with increasingly complex instruction sets..." (Patterson and Sequin, 1981). Their objective was the design of a VLSI computer that would reduce "...the delaypower penalty of data transfers across chip boundaries...", and make better use of "...the still-limited amount of resources (devices) available on a single chip." This revolutionary machine was named 'RISC-I' - the first time that this acronym had been used. We cannot know for certain whether or not Patterson truly had heard "...rumours of the 801...", as Ceramalus (1998) suggests, but he was certainly averse to the complexity of architectures such as the Intel iAPX-432, citing "...increased design time, increased design errors and inconsistent implementations." (Patterson and Sequin, 1981) Ceramalus (1998) is also voluble in his condemnation of the iAPX-432 as a system that "...represents the height of CISC lunacy." The ill-fated APX reportedly ran "...slower than older 8-bit systems...", being spread across 12 chips, with 222 instructions that varied in length between 17 and 300 bytes! At Berkeley, they independently came to similar conclusions about desirable architectural constraints as had the 801 project team: (Although at the time none of that research had been published.)
Simple instructions allow one instruction to be executed per cycle.
All instructions are the same size, to simplify implementation. Only load and store instructions access memory; the rest operate between registers. Multiple banks of registers, in order to reduce off-chip memory accesses. Architecture designed with the needs of high-level language programming in mind.
Like Radin (1983) before them, Patterson and Sequin (1981) observed that "...this simplicity makes microcode control unnecessary. Skipping this extra level of interpretation appears to enhance performance while reducing chip size." Furthermore, "....the RISC programs were only about 50% larger than the programs for the other machines, even though size optimisation was virtually ignored." Apart from these general RISC findings, the main contribution of the Berkeley project to the field was the invention of overlapping sets of register banks (or 'register windows') to enable parameters to be passed directly to subroutines. This system was developed with the goal that "...procedure CALL must be as fast a possible...", because "...the complex instructions found in CISCs are subroutines in RISC..." (Patterson and Sequin, 1981) The way in which the register window scheme worked was that the set of registers was broken into four chunks: GLOBAL registers are preserved across procedure call. HIGH registers contain parameters passed from 'above' the current procedure, LOCAL registers are used for local variables, and LOW registers are used to pass parameters to procedures 'below'. On procedure call, the hardware overlaps the register windows, such that the LOW registers appear as the called procedure's HIGH registers. This innovative approach avoids the time-consuming operations of saving registers to memory on procedure call, and restoring them on return (thus making the desired saving in "...data transfers across chip boundaries..."). The system was later adopted for Sun's SPARC architecture. Patterson and Sequin (1981) conclude that having "...taken out most of the complexity of modern computers... we can build a single-chip computer much sooner than the traditional architectures..." One such modern-day VLSI computer is the ARM7500FE 'system chip', which integrates a RISC processor, FP co-processor, video/sound controller and memory/IO controller. The fact that this is possible supports Dr. Patterson's scepticism about whether "...the extra hardware needed to implement CISC is the best way to use [the limited number of transistors available on a single chip]..." - It isn't!
Of course, the actual manner in which the spare 'room' on RISC chips is used is a matter for the designers. In the case of the the high-performance Digital Alpha 21164 (of which more later), the extra space allows a large 96KB on-chip level 2 cache, which inflates the transistor count from a modest 1.8 million to 9.3 million.
The road to acceptance

Berkeley RISC

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Berkeley RISC

Uploaded by

Copyright:

Available Formats

RISC: The Processor Architecture of the Future

The 801 project

Of course there were also some disadvantages to dispensing with microcode:

Simple instructions allow one instruction to be executed per cycle.

The road to acceptance

You might also like