03 Handouts

Lecture Outline
Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning.
Programming LanguagesSyntactic Specications and Analysis Formal Grammars Backus-Naur Form Classication of Formal Languages Syntactic Analysis of Programs Derivations Syntax Trees Ambiguity Avoiding Ambiguity Scanning Summary
September 1st, 2010
Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (1/54)
Formal Grammars contd
Formal Grammars contd
How to use a grammar to generate sentences?

1. Let be a sequence containing just the start variable: = vs . 2. While contains any non-terminals, do:
2.1 Choose one non-terminal (say, v) in . 2.2 From R choose a rule (say, r) in which v appears on the left-hand side. 2.3 Replace the chosen occurence of v in with the right-hand side of r.
Example (Formal grammar)

V = {c} S = {a, b} R = {(c, ), (c, aca), (c, bcb)} vs = c Is the string abacaba valid in L? Is ababbbaba valid in L? What is the language L generated by the grammar?
3. Return .
What if contains a non-terminal v for which there is no rule in R that would have v at its left-hand side?
The grammar is incomplete.
Backus-Naur Form
Backus-Naur Form contd

Elements of BNF
Terminals are distinguished from non-terminals (variables) by some typographical convention, for example:
BNF Notation
non-terminals are written in italics, using angle brackets, etc.; terminals are written in a monotype font, enclosed in quotation marks, etc. a non-terminal, a special production symbol (typically, ::=), a sequence of terminals and non-terminals, or the symbol .
Grammars are usually written using a special notation: the Backus-Naur Form (BNF). BNF is often extended with convenience symbols to shorten the notation: the Extended BNF (EBNF). BNF (and EBNF) is a metalanguage, a language for talking about languages. We will use EBNF extensively during the course.
Rules are written as strings which contain:

By convention,
the terminals and non-terminals of the grammar are those, and only those, included in at least one of the rules; the left-hand side (the rst element) of the topmost rule is the start variable vs .
Example (BNF representation of a grammar, 1 )

c c c In this 1 ,

::= ::= ::=
aca bcb
Example (EBNF representation of a grammar, 1 )

The grammar can be also written as c ::= | | aca bcb
V = { c }, S = {a, b}, R = {( c , ), ( c , a c a), ( c , b c b)}, vs = c . L(1 ) = {, aa, bb, aaaa, baab, abba, bbbb, aaaaaa, baaaab, . . . }
or as c
::=
|a c a|b c b
The specied language L(1 ) is:
The special symbol | has the meaning of or, and is an element of the metalanguage, not the language specied by the grammar.
Chomskys Hierarchy of Languages

Noam Chomsky dened four classes of languages:
Type 0: Unconstrained Languages
Metasyntactic extensions
Convenient extensions to the metalanguage inlcude:
the special symbols [ and ] used to enclose a subsequence that appears in the string at most once; the special symbols { and } used to enclose a subsequence that appears in the string any number of times.1
Type 1: Context-Sensitive Languages Type 2: Context-Free Languages Type 3: Regular Languages
Alternatively, we can use only the symbols { and } together with a superscript to specify the number of occurences:

{ sequence }2 means two subsequent occurences of sequence ; { sequence }+ means at least one occurence of sequence ; { sequence } means any number of occurences of sequence ;
Further extensions are possible (and are sometimes used).
The Kleene closure.

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (9/54) Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (10/54)
Chomskys Hierarchy of Languages contd
Regular Grammars
What is a regular language?
A regular language is a language generated by a regular grammar.
In a regular grammar, all rules are of one of the forms:2 v v v ::= ::= ::= s v s
Note:
All regular languages are context-free, but not all context-free languages are regular. All context-free languages are context-sensitive [sic], but not all context-sensitive languages are context-free. etc.
where s S; v, v V; and it is not required that v = v .
Example (A regular grammar)

string
This may sound unintuitive, but it follows a well-established convention.
substring
::= ::=
a substring | b substring | c substring
Regular grammars are conveniently expressed with regular expressions. The above could be written as (a|b)c*, (?:a|b)c*, or [ab]c*, etc.
2 These are right-regular grammars. In left-regular grammars, the rst rule form above is replaced by v ::= v s. Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (11/54) Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (12/54)
Context-Free Grammars
Context-Sensitive and Unconstrained Languages

What is a context-sensitive language?
A context-sensitive language is a language generated by a context-sensitive grammar.
What is a context-free language?

A context-free language is a language generated by a context-free grammar.
In a context-free grammar, all rules are of the form: v
::=
In a context-sensitive grammar, all rules are of the form: v ::=
where v V and (V S) (the set of all sequences of variables from V and symbols from S).3
where v V, and , , (V S) .
Example (A non-regular context-free grammar)

expression ::= | | | number expression operator ( expression ) ... expression
What is an unconstrained language?

An unconstrained language is a language generated by an unrestricted grammar.
In an unrestricted grammar, all rules are of the form: ::= where , (V S) and is non-empty.
(V S) is the Kleene closure of V S.

Chomskys Hierarchy of Languages contd

Why care about the hierarchy of languages?
Lecture Outline
Different grammars have different computational complexity:

unconstrained context-sensitive context-free regular
Regular grammars are commonly used to dene the microsyntax of programming languagesthe syntax of lexemes as sequences of symbols from the alphabet of characters.4 Context-free grammars are used to dene (macro)syntax of programming languagesthe syntax of programs as sequences of symbols from the alphabet of tokens (classied lexemes).5 Additional constraints may be needed to further constrain the syntax, e.g., by specifying that variable identiers can be used only after they have been declared, etc.6
CTMCP uses the term lexical syntax rather than microsyntax; others use the term lexical structure. 5 Macrosyntax is usually referred to as syntactic structure. 6 The less restrictive the metalanguage used to dene the grammar, the more restrictive can the grammar be wrt. to the specied language.
Syntactic Analysis of Programs
Syntactic Analysis of Programs contd

How are programs processed? contd
Program: if X == 1 then . . . Input: i f X = = t h e n . . . Lexemization: if X == 1 then . . . Tokenization: key(if) var(X) op(==) int(1) key(then) . . . Parsing: program(ifthenelse(eq(var(X) int(1)) ... ...) ...) Interpretation: actions according to the program and language semantics Compilation: code generation according to the program and language semantics
How are programs processed?
The initial input is linearit is a sequence of symbols from the alphabet of characters. A lexical analyzer (scanner, lexer, tokenizer) reads the sequence of characters and outputs a sequence of tokens. A parser reads a sequence of tokens and outputs a structured (typically non-linear) internal representation of the programa syntax tree (parse tree). The syntax tree is further processed, e.g., by an interpreter or by a compiler.
We have seen some of these steps implemented in the mdc interpreter.7
7 There, both the microsyntax and the syntax were trivial, no parsing was really needed as the intermediate representation was linear and colinear with the list of tokens, and no compilation was developed. Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (17/54) Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (18/54)

Example (Partial microsyntax of Oz, using Perl-style regexes)
variable ::= [A..Z][A..Za..z0..9_]*
Example (Partial syntax of Oz)

statement ::= | | skip if variable then statement else statement end ...
A variable (a variable name) consists of an uppercase letter followed by any number of word characters.
Variable is valid as a variable name, atom and 123 are not.
Example (Partial microsyntax of Oz, using POSIX classes)

atom ::= [[:lower:]][[:word:]]* additional constraint: no keyword is an atom
where skip, if, then, else, and end are symbols from the alphabet of lexemes.
if X then skip else if Y then skip else skip end end is a valid statement in Oz; if X then skip end and if x then skip else skip end are not.8
An atom consists of a lowercase letter followed by any number of word characters.
variable is valid as an atom, Atom and 123 are not.

8 The former is not valid in the Oz kernel language, but is valid in the syntactically extended version. Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (19/54) Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (20/54)

Note: It is convenient to use indentation to make the structure of a program clear to the programmer, but (in Oz) this is inessential for the syntactic and semantic validity of programs.

Note: In some programming languages indentation is essential for the syntactic and semantic validity of programs.
Example (Indentation in Oz)

if A then skip else if B then if C then skip else skip end else skip end end
Example (Indentation in Python)

# valid function definition def foo(bar): print bar return foo # invalid def foo(bar): print bar return foo # invalid def foo(bar): print bar return foo
Derivations
Note: In some programming languages the programmer has control of whether indentation is essential for the syntactic and semantic validity of programs or not.
Derivations
Following the recipe for using a grammar explained earlier, we can derive sentences in the language L( ) specied by a grammar in a sequence of steps.
Example (Indentation in F#)

(* valid, no indentation required *) let hello = fun name -> printf "hello, %a" name
In each step we transform one sentential form (a sequence of terminals and/or non-terminals) into another sentential form by replacing one non-terminal with the right-hand side of a matching rule. The rst sentential form is the start variable vs alone. The last sentential form is a valid sentence, composed only of terminals.
(* invalid, 4-space indentation required *) #light let hello = fun name -> printf "hello, %a" name
Sequences of sentential forms starting with vs and ending with a sentence in L( ) obtained as specied above are called derivations.
Derivations contd
The following are two of innitely many derivations possible to obtain with the previously dened grammar 1 .9
Derivations contd
Rightmost and leftmost derivations

A derivation is a sequence of sentential forms beginning with a single nonterminal and ending with a (valid) sequence of terminals.
Example (Derivation using 1 )

1. c 2. a c a 3. ab c ba 4. abba
A derivation such that in each step it is the leftmost non-terminal that is replaced is called a leftmost derivation. A derivation such that in each step it is the rightmost non-terminal that is replaced is called a rightmost derivation. There can be derivations that are neither leftmost nor rightmost.
Example (Derivation using 1 )

1. c 2.
Given a start variable v and a sequence s of terminals, there can be

no derivation of s from v (if s is not valid in the dened language); exactly one derivation of s from v; more than one derivation.
c ::= | a c a | b c b .
Derivations contd
Example (A leftmost derivation)
1. statement 2. if variable then statement else statement end 3. if A then statement else statement end 4. if A then skip else statement end 5. if A then skip else if variable then statement else statement end end ... 11. if A then skip else if B then if C then else skip end else skip end end
Derivations contd
Example (A rightmost derivation)

1. statement 2. if variable then statement else statement end 3. if variable then statement else if variable then statement else statement end end ... 11. if A then skip else if B then if C then else skip end else skip end end
Syntax Trees
Syntax Trees
Example (Syntax tree)
Syntax tree
A parse tree (a syntax tree) is a structured representation of a program.

Let have the following rule(s): v ::= |a v | v b| v v
Parse trees are generate in the process of parsing programs. A parser is a function (a program) that takes as input a sequence of tokens (the output of a lexer) and returns a nested data structure corresponding to a parse tree.
Does the sequence ba belong to L( )? Yes, it has the following parse tree: v v v b a v v
The data structure returned by the parser is an internal (intermediate) representation of the program. A parse tree can be used to:

interpret the program (in interpreted langagues); generate target code (in compiled languages); optimize the intermediate code (in both interpreted and compiled languages).
How many distinct derivations lead from v to ba?
There are six such derivations (check this!).
Syntax Trees contd

Example (A simple syntax tree for Oz)
The Oz grammar includes the following rules: statement ::= | skip if variable then statement else statement end
Syntax Trees contd
Suppose we rewrite the grammar above as statement ::= | | skip if variable then statement else statement if variable then statement
with the microsyntactic denition of variable given earlier. What is the parse tree for if A then skip else if B then skip else skip end end?
statement
if
variable
then
statement
else
statement
end
How many syntax trees does if A then if B then skip else skip have, given this grammar? There are two parse trees for this sequencesee the next slide.
statement end
skip if
variable
then
statement
else
skip
skip
Syntax Trees contd

Example (Parse tree for if A then if B then skip else skip)
statement if variable A if then variable B statement then statement skip else statement skip
Syntax Trees contd
Does it matter that a sentence has more than one parse tree?
For a sentence like if A then if B then skip else skip where all the conditional actions are skip (do nothing, noop), it does not matter much.
Example (Parse tree for if A then if B then skip else skip)

statement if variable A if then statement then else statement skip statement skip

In general, it does matter, since what actions will be taken and in which order depends on how the program is understood by the interpreter (or compiler), which in turn depends on how the program is parsed. the specication of the syntax is unambiguous, and the programmer does not make false assumptions about how the code will be parsed.
It is therefore essential that
variable B
Syntax Trees contd

Example (The if-then-else construct in Python)
Given these two pieces of code, what is the output for each possible combination of values if both a and b can have a value from {True, False}?
1. if a: if b: print 1 else: print 2 if a: if b: print 1 else: print 2
Syntax Trees contd

Example (Multistatement lines in Python)
In Python, colon (;) can be used to separate multiple statements within one line.10 Which of the following are equivalent?
1. 2. if a: print 1; print 2 if a: print 1 print 2 if a: print 1 print 2
2.
3.

a = True, b = True: both print 1 a = True, b = False: the rst prints 2, the second nothing a = False, b = True: the second prints 2, the rst nothing a = False, b = False: the second prints 2, the rst nothing

1. is equivalent to 2. What about if a: if b: print 1; else print 2?11
The lack of end would add to the grammar ambiguity which is resolved by involving whitespace in the specication.
10 11
Multistatement lines are considered bad practice in Python. Invalid syntax.

Ambiguity
Ambiguity contd
Ambiguity
A grammar is ambiguous if a sentence can be parsed in more than one way:

Example (An ambiguous grammar)

Let exp be a grammar including the following rules: expression operator ::= | ::= integer expression -|+|*|/ operator expression
the program has more than one parse tree, that is, the program has more than one leftmost derivation.12
Note: The fact that a program has more than one derivation is not sufcient to consider the grammar ambiguous. In practice, most programs have more than one derivation, but all these derivations correspond to the same parse treethe grammar is unambiguous. Two distinct leftmost derivations for the same program must correspond to two distinct parse treesthe grammar must be ambiguous in this case.
where integer may generate any integer numeral (a sequence of digits). Why is exp ambiguous?

Sentences like 1 + 2 + 3 have more than one parse tree. Worse, sentences like 1 + 2 * 3 have more than one parse tree. In Smalltalk, the result would be 9. In general, we would like it to be 7.
Should 1 + 2 * 3 evaluate to 9 or to 7?

12
Or more than one rightmost derivation.

Ambiguity contd
Example (An ambiguous grammar contd)
The expression 1 + 2 * 3 has two parse trees:
expression expression integer 1 operator expression integer 2 expression expression expression integer 1 operator expression integer 2 operator * expression integer 3 expression operator * expression integer 3
Avoiding Ambiguity
There are a number of ways to avoid ambiguity in grammars. Here, we consider four alternative solutions.
Solution 1: Obligatory parentheses

We can modify exp by enforcing parentheses around complex expressions: expression operator ::= | ::= integer ( expression operator expression ) -|+|*|/
Benet: Ambiguity has been resolved. Drawback: Expressions such as 1 + 2 * 3, or even 1 + 2, are no longer legal. (We must type (1 + (2 * 3)) and (1 + 2) instead.)
Avoiding Ambiguity
Solution 2: Precedence of operators
We can modify exp by distinguishing operators of high and low priority: expression term ::= | ::= | | ::= ::= term expression lp-operator expression integer ( expression ) term hp-operator term *|/ +|-
Avoiding Ambiguity
Solution 3: Associativity of operators

We can modify exp by introducing associativity of operators: expression operator ::= ::= integer | expression *|/|+|operator integer
hp-operator lp-operator
where hp-operator and lp-operator are high-priority and low-priority operators, respectively. Benet: Expressions such as 1 + 2 * 3 can be (partially) parsed as 1 + expression but not as expression * 3. Drawback: An expression like 1 - 2 - 3 is still ambiguous: it can be (partially) parsed both as expression - 3 and as 1 - expression .
Benet: The operators in this grammar are left-associative; the expression 1 - 2 - 3 can only be (partially) parsed as expression - 3, and not as 1 - expression . Drawback: All operators have equal precedence; an expression like 1 - 2 * 3 can only be (partially) parsed as expression * 3, and not as 1 - expression .
Ambiguity contd
Scanning
Solution 4: Combine associativity, precedence, and parentheses

We can modify exp by adding all of the above: expression term factor hp-operator lp-operator ::= | ::= | ::= | ::= ::= term expression hp-operator term factor term lp-operator factor integer ( expression ) *|/ +|-
What is scanning?
Scanning is the process of translating programs from the string-of-characters input format into the sequence-of-tokens intermediate format. We have seen scanning in action in the mdc example:
the lexemizer took as input a string of characters and returned a sequence of lexemes; the tokenizer took as input a sequence of lexemes and returned a sequence of tokens.
These two steps are usually merged into one pass, called scanning (but sometimes even lexing, or tokenization is used about both operations, and scanning may be used for only creating the lexemes).
Scanning contd
How do we design and implement a scanner?
Building a scanner requires a number of steps: 1. Specication of the microsyntax (the lexical structure) of the language, typically using regular expressions (regexes).
Scanning contd
Before we implement an mdc scanner, we rst have a look at a recognizer for mdc lexemes.
A scanner processes an input string and returns a list of lexemes (or tokens). A recognizer checks whether the whole input string is a single lexeme.
2. Based on the regexes, a nondeterministic nite automaton (NFA) is built that recognizes lexemes of the language. 3. A deterministic nite automaton (DFA) equivalent to the NFA is built. 4. The DFA is implemented using a nested control stucture that processes the input one character at a time. All steps can be realized manually, but there exist tools which

Example (A recognizer for mdc lexemes)

Step 1: The microsyntax of mdc is trivially specied with the following regular expressions: command operator integer ::= ::= ::= [pf] (exactly one p or one f) [\+\-\*\/] (analogously, symbols escaped with \) [0..9]+ (one or more digits)
allow one to specify the lexical structure using regular expressions, and build an implementation of the DFA automatically.
We shall revisit the mdc example and build a scanner both manually and using a scanner-building tool.
Scanning contd
Example (A recognizer for mdc lexemes contd)
Step 3: The regex specication is realized by the following DFA:
13
Scanning contd
Example (A recognizer for mdc lexemes contd)
Step 4: An algorithm for the mdc recognizer DFA:14
input: string of characters; output: boolean state start; char next() while char = EOF: if state = start: if char {p, f}: state cmd else if char {+, -, *, /}: state op else if char {0, . . . , 9}: state int else: return false else if state {cmd, op}: return false else if state = int: if char {0, . . . , 9}: return false / char next() if state {cmd, op, int}: return true else: return false
cmd p, f +, -, *, / start 0, . . . , 9
op
int
0, . . . , 9
13
We skip Step 2; see the further reading section for references if you need more details.
14 Notation varies. EOF means end of le (input). Each call to next() returns the next character from the input. Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (48/54)
Scanning contd
Scanning contd
The recognizer checks whether the whole string is a single lexeme, but we want more:

process strings that include more than one lexeme; return a sequence of classied lexemes rather than a yes/no answer.
In the previous implementation of mdc, all lexemes in a program had to be separated by whitespace. This leads to a tradeoff:
it is more convenient to implement the lexemizerjust split the input by whitespace; it is less convenient to use the languagethe programmer must separate all lexemes with whitespace.
We shall now develop a scanner that makes whitespace between lexemes optional (unless we want to separate two numerals).
Try it! The le code/mdc-recognizer.oz contains an implementation of the mdc recognizer and a few simple test cases. Open the le in the OPI (oz &, then C-x C-f). Execute the code (C-. C-b). What happens? {MDCRecognizer "p"} evaluates to true, because the input is a command. {MDCRecognizer "123"} evaluates to true, because the input is an integer. {MDCRecognizer "1 2 +"} evaluates to false, because the input is not a valid lexeme, even though it is a valid sentence (legal sequence of valid lexemes) in mdc.
Scanning contd
Example (A scanner for mdc)
Step 4: An algorithm for the mdc scanner DFA:15
input: string of characters; output: sequence of tokens tokens (); state start; char next(); seen while char = EOF: if state = start: if char {p, f}: append cmd, char to tokens else if char {+, -, *, /}: append op, char to tokens else if char {0, . . . , 9}: state int; seen char else if char S: error(char) / char next() else if state = int: if char {0, . . . , 9}: concatenate char to seen; char next() else: append int, seen to tokens; seen (); state start if state = int: append int, seen to tokens return tokens
Lecture Outline
15 tokens maintains a list of tokens recognized so far. seen maintains a string of characters seen since the most recently recognized token. Angle brackets ( and ) denote tokens (class-lexeme pairs). Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (51/54) Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (52/54)
Summary
Summary contd
Homework This time

Examine and try out todays code, read Mozart/Oz documentation if necessary. Most of todays slides, except for implementational details of mdc scanners and the recognizer and scanner DFA. See, e.g., Ch. 3 in Sebesta Concepts of Programming Languages; Ch. 2 in Scott Programming Language Pragmatics; Ch. 24 in Copper and Torczon Engineering a Compiler (a detailed, in-depth but readable presentation). ...? ...? ...?
syntax, grammars, derivations, parse trees, ambiguity recognizing, scanning design and implementation of an mdc scanner
Pensum
Note! The code examples are used as an illustration; we will return to (some parts of) them when you learn more about the syntax and semantics of Oz. Next time
Further reading
syntax and semantics of the declarative kernel language Questions


03 Handouts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Handouts

Uploaded by

Copyright:

Available Formats

Lecture Outline

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning.

September 1st, 2010

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (1/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (2/54)

Formal Grammars contd

Formal Grammars contd

How to use a grammar to generate sentences?

Example (Formal grammar)

The grammar is incomplete.

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (3/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (4/54)

Backus-Naur Form contd

Rules are written as strings which contain:

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (5/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (6/54)

Backus-Naur Form contd

Backus-Naur Form contd

Example (BNF representation of a grammar, 1 )

::= ::= ::=

Example (EBNF representation of a grammar, 1 )

The specied language L(1 ) is:

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (7/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (8/54)

Backus-Naur Form contd

Chomskys Hierarchy of Languages

Type 1: Context-Sensitive Languages Type 2: Context-Free Languages Type 3: Regular Languages

Further extensions are possible (and are sometimes used).

The Kleene closure.

Chomskys Hierarchy of Languages contd

where s S; v, v V; and it is not required that v = v .

Example (A regular grammar)

This may sound unintuitive, but it follows a well-established convention.

a substring | b substring | c substring

Context-Sensitive and Unconstrained Languages

What is a context-free language?

In a context-free grammar, all rules are of the form: v

In a context-sensitive grammar, all rules are of the form: v ::=

Example (A non-regular context-free grammar)

What is an unconstrained language?

(V S) is the Kleene closure of V S.

Chomskys Hierarchy of Languages contd

Different grammars have different computational complexity:

Syntactic Analysis of Programs

Syntactic Analysis of Programs contd

How are programs processed?

We have seen some of these steps implemented in the mdc interpreter.7

Syntactic Analysis of Programs contd

Syntactic Analysis of Programs contd

Example (Partial syntax of Oz)

Variable is valid as a variable name, atom and 123 are not.

Example (Partial microsyntax of Oz, using POSIX classes)

An atom consists of a lowercase letter followed by any number of word characters.

variable is valid as an atom, Atom and 123 are not.

Syntactic Analysis of Programs contd

Syntactic Analysis of Programs contd

Example (Indentation in Oz)

Example (Indentation in Python)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (21/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (22/54)

Syntactic Analysis of Programs contd

Example (Indentation in F#)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (23/54)

Lecture 3: Syntax: Grammars, Derivations, Parse Trees. Scanning. (24/54)