You are on page 1of 40

Unit II

Introduction to Compilers
Partha Sarathi Chakraborty
Assistant Professor
Department of Computer Science and Engineering
SRM University, Delhi NCR Campus

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compiler Course Outline


(Unit 2 to Unit 5)

Ch. 1: Introduction
Ch. 3: Lexical Analysis and Lex/Flex
Ch. 4: Syntax Analysis and Yacc/Bison
Ch. 5: Syntax-Directed Translation
Ch. 6: Type Checking
Ch. 7: Run-Time Environments
Ch. 8: Intermediate Code Generation
Ch. 9: Code Generation
Ch.10: Code Optimization
Reference:
Aho, Ullman, and Sethi, Compilers: Principles, Techniques, and Tools

Textbook

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compilers: Principles, Techniques, and Tools, 2/E.

Alfred V. Aho, Columbia University


Monica S. Lam, Stanford University
Ravi Sethi, Avaya Labs
Jeffrey D. Ullman, Stanford University

Dragon

(C) 2014, Prepared by Partha Sarathi Chakraborty

Contents

Overview and History


What Do Compilers Do
Compiler and Interpreters
Phases of a compiler
Analysis of source program / Translation
Tex
Features of Good Programming Language
The Syntax and Semantics of Programming Language
Computer Architecture and Compiler Design
Compiler Construction Tools

Overview and History


Cause

(C) 2014, Prepared by Partha Sarathi Chakraborty

Software for early computers was written in assembly language


The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler

The first real compiler


FORTRAN compilers of the late 1950s
18 person-years to build

Overview and History (Contd)

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compiler technology
is more broadly applicable and has been employed in
rather unexpected areas.
Text-formatting languages, like nroff and troff;
preprocessor packages like eqn, tbl, pic
Silicon compiler for the creation of VLSI circuits
Command languages of OS
Query languages of Database systems

What Do Compilers Do

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compilers may generate three types of code:


Pure Machine Code
Machine instruction set without assuming the existence
of any operating system or library.
Mostly being OS or embedded applications.

Augmented Machine Code


Code with OS routines and runtime support routines.
More often

Virtual Machine Code


Virtual instructions, can be run on any architecture with
a virtual machine interpreter or a just-in-time compiler
Ex. Java

(C) 2014, Prepared by Partha Sarathi Chakraborty

What Do Compilers Do (Contd)


Another way that compilers differ from one
another is in the format of the target machine
code they generate:
Assembly or other source format
Relocatable binary
Relative address
A linkage step is required

Absolute binary
Absolute address
Can be executed directly

Compilers and Interpreters


(C) 2014, Prepared by Partha Sarathi Chakraborty

Compilation
Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input
Source
Program

Compiler

Target
Program

Error messages

Output

10

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compilers
Source languages: Fortran, Pascal, C, etc.
Target languages: another PL, machine Lang
Compilers:

Single-pass
Multi-pass
Load-and-Go
Debugging
Optimizing

11

Compilers and Interpreters


(contd)
(C) 2014, Prepared by Partha Sarathi Chakraborty

Interpretation
Performing the operations implied by the
source program

Source
Program

Interpreter

Input
Error messages

Output

12

(C) 2014, Prepared by Partha Sarathi Chakraborty

Other Tools that Use the


Analysis-Synthesis Model

Editors (syntax highlighting)


Pretty printers (e.g. doxygen)
Static checkers (e.g. lint and splint)
Interpreters
Text formatters (e.g. TeX and LaTeX)
Silicon compilers (e.g. VHDL)
Query interpreters/compilers (Databases)

13

Preprocessors, Compilers,
Assemblers, and Linkers
(C) 2014, Prepared by Partha Sarathi Chakraborty

Skeletal Source Program

Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker
Absolute Machine Code

Try for example:


gcc -v myprog.c

Libraries and
Relocatable Object Files

14

The Analysis-Synthesis Model of


Compilation
(C) 2014, Prepared by Partha Sarathi Chakraborty

There are two parts to compilation:


Analysis determines the operations implied by the
source program which are recorded in a tree
structure
Synthesis takes the tree structure and translates the
operations therein into the target program

(C) 2014, Prepared by Partha Sarathi Chakraborty

15

Phases of Compiler

16

The Structure of a Compiler


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Scanner

Tokens

Parser

Syntactic
Structure

Semantic
Routines

Intermediate
Representation

Symbol and
Attribute
Tables

Optimizer

(Used by all Phases of The Compiler)


Code
Generator
Target machine code

17

The Structure of a Compiler (Contd)


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Scanner

Tokens

Scanner

Parser

Syntactic
Structure

The scanner begins the analysis of the source program by


Symbol and
reading the input, character by character, and grouping
Attribute
characters into individual
words and symbols (tokens)
Tables

RE ( Regular expression )
(Used
by all )
NFA ( Non-deterministic Finite
Automata
DFA ( Deterministic FinitePhases
Automataof
)
LEX
The Compiler)

Semantic
Routines

Intermediate
Representation

Optimizer

Code
Generator
Target machine code

18

The Structure of a Compiler (Contd)


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Scanner

Parser

Tokens

Parser

Syntactic
Structure

Given a formal syntax specification (typically as a contextfree grammar [CFG] ), the parse reads tokens and groups
Symbol and
them into units as specified by the productions of the CFG
Attribute
being used.
Tables
As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
Phases
of
CFG ( Context-Free Grammar
)
Compiler)
BNF ( Backus-Naur FormThe
)
GAA ( Grammar Analysis Algorithms )
LL, LR, SLR, LALR Parsers
YACC

Semantic
Routines

Intermediate
Representation

Optimizer

Code
Generator
Target machine code

19

The Structure of a Compiler (Contd)


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Scanner

Tokens

Semantic Routines

Parser

Perform two functions


Check the static semantics of each construct
Symbol and
Do the actual translation
Attribute
The heart of a compiler
Tables
Syntax Directed Translation
(Used
by all
Semantic Processing
Techniques
Phases of
IR (Intermediate Representation)
The Compiler)

Syntactic
Structure

Semantic
Routines

Intermediate
Representation

Optimizer

Code
Generator
Target machine code

20

The Structure of a Compiler (Contd)


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Optimizer

Scanner

Tokens

Parser

Syntactic
Structure

The IR code generated by the semantic routines is


analyzed and transformed into functionally equivalent but
Symbol and
improved IR code
Attribute
This phase can be very complex and slow
Tables
Peephole optimization
loop optimization, register allocation, code scheduling
(Used by all
Phases
of
Register and Temporary
Management
Peephole OptimizationThe Compiler)

Semantic
Routines

Intermediate
Representation

Optimizer

Code
Generator
Target machine code

21

The Structure of a Compiler (Contd...)


Source
Program

(C) 2014, Prepared by Partha Sarathi Chakraborty

(Character Stream)

Scanner

Tokens

Code Generator
Interpretive Code Generation
Generating Code from Tree/Dag
Grammar-Based Code Generator

Parser

Syntactic
Structure

Semantic
Routines

Intermediate
Representation

Optimizer

Code
Generator
Target machine code

22

Symbol-table Management

To record the identifiers in source program

(C) 2014, Prepared by Partha Sarathi Chakraborty

Identifier is detected by lexical analysis and then is


stored in symbol table

To collect the attributes of identifiers


(not by lexical analysis)
Storage allocation : memory address
Types
Scope (where it is valid, local or global)
Arguments (in case of procedure names)

Arguments numbers and types


Call by reference or address
Return types

23

Symbol-table Management

(C) 2014, Prepared by Partha Sarathi Chakraborty

Semantic analysis uses type information check


the type consistence of identifiers
Code generating uses storage allocation
information to generate proper relocation
address code

24

Error Detection and Reporting

(C) 2014, Prepared by Partha Sarathi Chakraborty

Syntax and semantic analysis handle a large fraction


of errors
Some of the Errors may occur in compilation phase
Lexical phase: could not form any token

e.g. Misspelling or Juxtaposing of characters

Syntax phase: tokens violate structure rules

e.g. Unbalanced parenthesis, Missing punctuation


operators or Undeclared variables.

Semantic phase: no meaning of operations

Add an array name and a procedure name, Truncation


of results or Unreachable Code.

(C) 2014, Prepared by Partha Sarathi Chakraborty

25

Translation of A Statement

(C) 2014, Prepared by Partha Sarathi Chakraborty

26

Translation of A Statement

(C) 2014, Prepared by Partha Sarathi Chakraborty

27

Translation of A Statement

Properties and Advantage of


Intermediate Code
(C) 2014, Prepared by Partha Sarathi Chakraborty

Properties:
It should be easy to produce.
It should be easy to translate into the target machine.

Advantage (In current scenario, the internet era):

Consistent and Continuous Programming Model


Develop once and run anywhere
Simplified deployment
Wide platform reach
Programming language integration
Simplified code reuse
Interoperability

28

29

(C) 2014, Prepared by Partha Sarathi Chakraborty

Intermediate Code Representation


Two kinds of representations
Trees
Parse Trees, Syntax Trees

Linear representations
Three-address code

30

(C) 2014, Prepared by Partha Sarathi Chakraborty

Code Optimization
It attempts to improve the intermediate code, to
produce more efficient code that will result a fasterrunning machine code.
Compiler Optimizations must meet the following
objectives:
The optimization must be correct, that is, preserve the
meaning of the compiled program,
The optimization must improve the performance of many
programs,
The compilation time must be kept reasonable, and
The engineering effort required must be manageable.

31

(C) 2014, Prepared by Partha Sarathi Chakraborty

Factors influenced Code Generation

Register Allocation
Register Scheduling
Code Selection
Addressing Modes
Instruction format
Power of instructions
Optimization at the machine code level
Back patching

32

(C) 2014, Prepared by Partha Sarathi Chakraborty

Analysis in Text Formatters:


TeX is a typesetting system designed and mostly written by Donald
Knuth, and released in 1978.
TeX was designed with two main goals:
To allow anybody to produce high-quality books using a reasonably
minimal amount of effort .
To provide a system that would give exactly the same results on all
computers, now and in the future.
TeX is a popular means by which to typeset complex mathematical
formulae.
It noted as one of the most sophisticated digital typographical systems in
the world.
TeX is popular in academia, especially in mathematics, computer
science, economics, engineering, physics, statistics, and quantitative
psychology.
Tex many other typesetting tasks, LaTeX, ConTeXt and other packages.

33

(C) 2014, Prepared by Partha Sarathi Chakraborty

Analysis in Text Formatters (Contd)

\hbox {<list of boxes>}

\hbox {\vbox{! 1} \vbox{@ 2}}

34

The Grouping of Phases


(C) 2014, Prepared by Partha Sarathi Chakraborty

Compiler front and back ends:


Analysis (machine independent front end)
Synthesis (machine dependent back end)

Passes
A collection of phases may be repeated only once
(single pass) or multiple times (multi pass)
Single pass: usually requires everything to be defined
before being used in source program
Multi pass: compiler may have to keep entire program
representation in memory

35

(C) 2014, Prepared by Partha Sarathi Chakraborty

Features of a Good Programming Language

Easy to understand, and Expressive Power


Interoperability and Portability
Good turnaround time
Automatic error recovery and Good Error reporting
Efficient memory usage and Garbage Collection
Provision of good run-time environment
Support for virtual machine
Support for concurrent operation
Support for unblock operations
Ability to interface with foreign functions
Ability to model real-world problems
Ability to expose the functions for usage in other languages

36

(C) 2014, Prepared by Partha Sarathi Chakraborty

The Syntax and Semantics of Programming Language


A programming language must include the
specification of syntax (structure) and semantics
(meaning).
Syntax typically means the context-free syntax
because of the almost universal use of context-freegrammar (CFGs)
Ex.
a = b + c is syntactically legal
b + c = a is illegal

37

The Syntax and Semantics of Programming


Language (Contd)

(C) 2014, Prepared by Partha Sarathi Chakraborty

The semantics of a programming language are


commonly divided into two classes:
Static semantics
Semantics rules that can be checked at compiled time.
Ex. The type and number of a functions arguments

Runtime semantics
Semantics rules that can be checked only at run time

38

Computer Architecture and Compiler Design

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compilers should exploit the hardware-specific feature and


computing capability to optimize code.
The problems encountered in modern computing platforms:
Instruction sets for some popular architectures are highly nonuniform.
High-level programming language operations are not always easy to
support.
Ex. exceptions, threads, dynamic heap access

Exploiting architectural features such as cache, distributed processors


and memory
Effective use of a large number of processors

39

Compiler Design Considerations


Debugging Compilers

(C) 2014, Prepared by Partha Sarathi Chakraborty

Designed to aid in the development and debugging of


programs.

Optimizing Compilers
Designed to produce efficient target code

Retargetable Compilers
A compiler whose target architecture can be changed
without its machine-independent components having to be
rewritten.

40

(C) 2014, Prepared by Partha Sarathi Chakraborty

Compiler-Construction Tools
Software development tools are available to
implement one or more compiler phases

Scanner generators
Parser generators
Syntax-directed translation engines
Automatic code generators
Data-flow engines
Compiler-construction toolkits

You might also like