COMS W4115
Programming Languages and Translators
Lecture 7: Context-free Grammars and YACC
February 8, 2012
Outline
- Review
- Examples of context-free grammars
- Yacc: a language for specifying syntax-directed translators
- Noncontext-free constructs in languages
- Top-down parsing
- Transformations on grammars
1. Review
- Role of the parser
- Context-free grammars
- Derivations and parse trees
- Ambiguity
2. Examples of Context-Free Grammars
- All strings of balanced parentheses:
- CFG:
S → ( S ) S | ε
- Note that this grammar is unambiguous.
- Nonempty palindromes of
a's and b's.
(A palindrome is a string that reads the same forwards as backwards;
e.g., abba.)
- CFG:
S → a S a | b S b | a a | b b | a | b
- Note that the language generated by this grammar is not regular.
Can you prove this using the pumping lemma for regular languages?
- Strings with an equal number of
a's and b's:
- CFG:
S → a S a | b S b | S S | ε
- Note that this grammar is ambiguous.
Can you find an equivalent unambiguous grammar?
- If- and if-else statements:
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
Note that this grammar is ambiguous.
Some typical programming language constructs:
stmt → expr ;
| if (expr) stmt
| for ( optexpr; optexpr; optexpr;) stmt
| other
optexpr → ε
| expr
3. Yacc: a Language for Specifying Syntax-Directed Translators
- Yacc is popular language, first implemented by
Steve Johnson of Bell Labs, for implementing syntax-directed
translators.
- Bison is a gnu version of Yacc, upward compatible with the original Yacc,
written by Charles Donnelly and Richard Stallman.
Many other versions of Yacc are also available.
- The original Yacc used C for semantic actions. Yacc has been rewritten for
many other languages including Java, ML, OCaml, and Python.
- Yacc specifications
- A Yacc program has three parts:
declarations
%%
translation rules
%%
supporting C-routines
The declarations part may be empty and the last part (%%
followed by the supporting C-routines) may be omitted.
Here is a Yacc program for a desk calculator
that adds and multiplies numbers.
(See ALSU, p. 292, Fig. 4.59 for a more advanced desk calculator.)
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double
%}
%token NUMBER
%left '+'
%left '*'
%%
lines : lines expr '\n' { printf("%g\n", $2); }
| lines '\n'
| /* empty */
;
expr : expr '+' expr { $$ = $1 + $3; }
| expr '*' expr { $$ = $1 * $3; }
| '(' expr ')' { $$ = $2; }
| NUMBER
;
%%
/* the lexical analyzer; returns <token-name, yylval> */
int yylex() {
int c;
while ((c = getchar()) == ' ');
if ((c == '.') || (isdigit(c))) {
ungetc(c, stdin);
scanf("%lf", &yylval);
return NUMBER;
}
return c;
}
On Linux, we can make a desk calculator from this Yacc program
as follows:
- Put the yacc program in a file, say
desk.y.
- Invoke
yacc desk.y to create the yacc output file y.tab.c.
- Compile this output file with a C compiler by typing
gcc y.tab.c -ly
to get a.out.
(The library -ly contains the Yacc parsing program.)
a.out is the desk calculator. Try it!
4. Noncontext-free Constructs in Languages
- The pumping lemma for context-free languages can be used to show certain
languages are not context free.
- The pumping lemma: If L is a context-free language, then there exists a
constant n such that if z is any string in L of length n or more, then
z = uvwxy subject to the following conditions:
- The length of vwx is less than or equal to n.
- The length of vx is one or more. (That is, not both of v and x can be empty.)
- For all i ≥ 0, uviwxiy is in L.
- A typical proof using the pumping lemma to show a language L is not context free
proceeds by assuming L is context free, and then finding a long string in L
which, when pumped, yields a string not in L, thereby deriving a contradiction.
- The language {
anbncn | n ≥ 0 }
is not context free. (Models "respectively" in English.)
- The language {
ww | w is in (a|b)* } is not context free.
(Models variables have to be declared before they are used.)
- The language {
ambnambn |
n ≥ 0 }
is not context free.
(Models the number of formal parameters must agree with the number of actual parameters.)
5. Top-down Parsing
- Top-down parsing consists of constructing a parse tree
for an input string starting from the root and creating
the nodes of the parse tree in preorder.
- Equivalently, top-down parsing consists of finding a
leftmost derivation for the input string.
- Consider grammar G:
S → + S S | * S S | a
Leftmost derivation for + a * a a:
S ⇒ + S S
⇒ + a S
⇒ + a * S S
⇒ + a * a S
⇒ + a * a a
Recursive-descent parsing
- Recursive-descent parsing is a top-down method of syntax
analysis in which a set of recursive procedures is used
to process the input string.
- One procedure is associated with each nonterminal of
the grammar. See Fig. 4.13, p. 219.
- The sequence of successful procedure calls defines the parse tree.
Nonrecursive predictive parsing
- A nonrecursive predictive parser uses an explicit stack.
- See Fig. 4.19, p. 227, for a model of table-driven predictive
parser.
- Parsing table for G:
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS
Moves made by this predictive parser on input +a*aa.
(The top of the stack is to the left.)
Stack Input Output
S$ +a*aa$
+SS$ +a*aa$ S → +SS
SS$ a*aa$
aS$ a*aa$ S → a
S$ *aa$
*SS$ *aa$ S → *SS
SS$ aa$
aS$ aa$ S → a
S$ a$
a$ a$ S → a
$ $
Note that these moves trace out a leftmost derivation for the input.
6. Transformations on Grammars
- Two common language-preserving transformations are often applied to
grammars to try to make them parsable by top-down methods.
These are eliminating left recursion and left factoring.
- Eliminating left recursion:
expr → expr + term
| term
by
expr → term expr'
expr' → + term expr'
| ε
Left factoring:
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
by
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
7. Practice Problems
- Write down a CFG for regular expressions over the alphabet
{
a, b}.
Show a parse tree for the regular expression
a | b*a.
- Using the nonterminals
stmt and expr,
design context-free grammar productions to model
- C while-statements
- C for-statements
- C do-while statements
- Consider grammar G:
S → S S + | S S * | a
- What language does this grammar generate?
- Eliminate the left recursion from this grammar.
- Use the pumping lemma to show that
{
anbncn | n ≥ 0 }
is not context free.
8. Reading
- ALSU, Sections 4.3, 4.4, 4.9.
- See
The Lex & Yacc Page
for yacc and bison tutorials and manuals.
aho@cs.columbia.edu