COMS W4115
Programming Languages and Translators
Lecture 8: Syntax Analysis
October 5, 2009
Lecture Outline
- Review
- Role of the parser
- Context-free grammars
- Derivations and parse trees
- Ambiguity
- Yacc: a language for specifying syntax-directed translators
- Reading
1. Review
- Converting a regular expression to an NFA:
the McNaughton-Yamada-Thompson algorithm
- Converting an NFA to a DFA: subset construction
- Simulating an NFA: two-stack algorithm
2. Role of the Parser
- Reads sequence of tokens generated by the lexical analyzer.
- Verifies that the sequence of tokens obeys the
grammatical rules of the programming language
by generating a parse tree implicitly or explicitly
for the sequence of tokens.
- Enters information about tokens into the symbol table.
- Reports errors.
3. Context-Free Grammars
- A context-free grammar consists of
- A finite set of terminal symbols
- A finite nonempty set of nonterminal symbols
- One distinguished nonterminal called the start symbol
- A finite set of rewrite rules, called productions, of the form
A → α
where A is a nonterminal and α is a string (possibly empty)
of terminals and nonterminals
- Consider the grammar G with the productions
lines → lines expr NEWLINE
lines → lines NEWLINE
lines → ε
expr → expr + expr
expr → expr * expr
expr → ( expr )
expr → NUMBER
- The terminal symbols are the alphabet from which strings are formed.
In this grammar the set of terminal symbols is
{ NUMBER, +, *, (, ) }. The terminal symbols are the token names.
- The nonterminal symbols are syntactic variables that denote sets
of strings of terminal symbols. In this grammar the set of nonterminal
symbols is {
lines, expr }.
- The start symbol is
lines.
4. Derivations and Parse Trees
- L(G), the language generated by a grammar G, consists of all strings of
terminal symbols that can be derived from the start symbol of G.
- A leftmost derivation expands the leftmost nonterminal in
each sentential form:
lines ⇒ lines expr NEWLINE
⇒ expr NEWLINE
⇒ expr + expr NEWLINE
⇒ NUMBER + expr NEWLINE
⇒ NUMBER + expr * expr NEWLINE
⇒ NUMBER + NUMBER * expr NEWLINE
⇒ NUMBER + NUMBER * NUMBER NEWLINE
A rightmost derivation expands the rightmost nonterminal in each sentential form:
lines ⇒ lines expr NEWLINE
⇒ lines expr + expr NEWLINE
⇒ lines expr + expr * expr NEWLINE
⇒ lines expr + expr * NUMBER NEWLINE
⇒ lines expr + NUMBER * NUMBER NEWLINE
⇒ lines NUMBER + NUMBER * NUMBER NEWLINE
⇒ NUMBER + NUMBER * NUMBER NEWLINE
Note that these two derivations have the same parse tree.
5. Ambiguity
6. Yacc: a Language for Specifying Syntax-Directed Translators
- A syntax-directed translation scheme is a context-free grammar
with program fragments embedded within the right-sides of productions.
- Yacc is popular language, first implemented by
Steve Johnson of Bell Labs, for implementing syntax-directed
translation schemes.
- Yacc specifications
- A Yacc program has three parts:
declarations
%%
translation rules
%%
supporting C-routines
The declarations part may be empty and the last part (%%
followed by the supporting C-routines) may be omitted.
Example Yacc program for a desk calculator: (see ALSU, p. 292, Fig. 4.59)
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double
%}
%token NUMBER
%left '+'
%left '*'
%%
lines : lines expr '\n' { printf("%g\n", $2); }
| lines '\n'
| /* empty */
;
expr : expr '+' expr { $$ = $1 + $3; }
| expr '*' expr { $$ = $1 * $3; }
| '(' expr ')' { $$ = $2; }
| NUMBER
;
%%
/* the lexical analyzer; returns <token-name, yylval> */
int yylex() {
int c;
while ((c = getchar()) == ' ');
if ((c == '.') || (isdigit(c))) {
ungetc(c, stdin);
scanf("%lf", &yylval);
return NUMBER;
}
return c;
}
On Linux, we can make a desk calculator from this Yacc program
as follows:
- Put the yacc program in a file, say
desk.y.
- Invoke
yacc desk.y to create the yacc output file y.tab.c.
- Compile this output file with a C compiler by typing
gcc y.tab.c -ly
to get a.out.
(The library -ly contains the Yacc parsing program.)
a.out is the desk calculator. Try it!
7. Reading
- ALSU, Sections 4.1, 4.2, 4.9
aho@cs.columbia.edu