Programming Languages and Translators

Lecture 7: Parsing Context-Free Grammars

February 13, 2013

- Yacc: a language for specifying syntax-directed translators
- The pumping lemma for context-free languages
- The parsing problem for context-free grammars
- Top-down parsing
- Transformations on grammars

- Yacc is popular language, first implemented by Steve Johnson of Bell Labs, for implementing syntax-directed translators.
- Bison is a gnu version of Yacc, upward compatible with the original Yacc, written by Charles Donnelly and Richard Stallman. Many other versions of Yacc are also available.
- The original Yacc used C for semantic actions. Yacc has been rewritten for many other languages including Java, ML, OCaml, and Python.
- Yacc specifications
- A Yacc program has three parts:

declarations`%%`

translation rules`%%`

supporting C-routines

`%%`

followed by the supporting C-routines) may be omitted.```
%{
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double
%}
%token NUMBER
%left '+'
%left '*'
%%
lines : lines expr '\n' { printf("%g\n", $2); }
| lines '\n'
| /* empty */
;
expr : expr '+' expr { $$ = $1 + $3; }
| expr '*' expr { $$ = $1 * $3; }
| '(' expr ')' { $$ = $2; }
| NUMBER
;
%%
/* the lexical analyzer; returns <token-name, yylval> */
int yylex() {
int c;
while ((c = getchar()) == ' ');
if ((c == '.') || (isdigit(c))) {
ungetc(c, stdin);
scanf("%lf", &yylval);
return NUMBER;
}
return c;
}
```

- Put the yacc program in a file, say
`desk.y`

. - Invoke
`yacc desk.y`

to create the yacc output file`y.tab.c`

. - Compile this output file with a C compiler by typing
`gcc y.tab.c -ly`

to get`a.out`

. (The library -ly contains the Yacc parsing program.) `a.out`

is the desk calculator. Try it!

- The pumping lemma for context-free languages can be used to show certain languages are not context free.
- The pumping lemma: If L is a context-free language, then there exists a
constant
*n*such that if z is any string in L of length*n*or more, then z can be written as uvwxy subject to the following conditions:- The length of vwx is less than or equal to
*n*. - The length of vx is one or more. (That is, not both of v and x can be empty.)
- For all
*i*≥ 0, uv^{i}wx^{i}y is in L.

- The length of vwx is less than or equal to
- A typical proof using the pumping lemma to show a language L is not context free proceeds by assuming L is context free, and then finding a long string in L which, when pumped, yields a string not in L, thereby deriving a contradiction.
- Examples of non-context-free languages:
- {
`a`

|^{n}b^{n}c^{n}`n`

≥ 0 } - {
`ww`

|`w`

is in (a|b)* } - {
`a`

|^{m}b^{n}a^{m}b^{n}`n`

≥ 0 }

- The parsing problem for context-free grammars is given a CFG G and an input string w to construct all parse trees for w according to G, if w is in L(G).
- The Cocke-Younger-Kasami algorithm is a dynamic programming algorithm
that given a Chomsky Normal Form grammar G and an input string w will
create in O(|w|
^{3}) time a table from which all parse trees for w according to G can be constructed. - For compiler applications two styles of parsing algorithms are common: top-down parsing and bottom-up parsing.

- Top-down parsing consists of trying to construct a parse tree for an input string starting from the root and creating the nodes of the parse tree in preorder.
- Equivalently, top-down parsing consists of trying to find a leftmost derivation for the input string.
- Consider grammar
*G*:

```
S → + S S | * S S | a
```

```
S ⇒ + S S
⇒ + a S
⇒ + a * S S
⇒ + a * a S
⇒ + a * a a
```

- Recursive-descent parsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input string.
- One procedure is associated with each nonterminal of the grammar. See Fig. 4.13, p. 219.
- The sequence of successful procedure calls defines the parse tree.

- A nonrecursive predictive parser uses an explicit stack.
- See Fig. 4.19, p. 227, for a model of table-driven predictive parser.
- Parsing table for
*G*:

```
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS
```

`+a*aa`

.
(The top of the stack is to the left.)```
Stack Input Output
S$ +a*aa$
+SS$ +a*aa$ S → +SS
SS$ a*aa$
aS$ a*aa$ S → a
S$ *aa$
*SS$ *aa$ S → *SS
SS$ aa$
aS$ aa$ S → a
S$ a$
a$ a$ S → a
$ $
```

- Two common language-preserving transformations are often applied to grammars to try to make them parsable by top-down methods. These are eliminating left recursion and left factoring.
- Eliminating left recursion:
- Replace

```
expr → expr + term
| term
```

by
```
expr → term expr'
expr' → + term expr'
| ε
```

- Replace

```
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
```

by
```
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
```

- Write down a CFG for regular expressions over the alphabet
{
`a`

,`b`

}. Show a parse tree for the regular expression`a | b*a`

. - Using the nonterminals
`stmt`

and`expr`

, design context-free grammar productions to model - C while-statements
- C for-statements
- C do-while statements
- Consider grammar
*G*: - What language does this grammar generate?
- Eliminate the left recursion from this grammar.
- Use the pumping lemma to show that
{
`a`

|^{n}b^{n}c^{n}`n`

≥ 0 } is not context free.

```
S → S S + | S S * | a
```

- ALSU, Sections 4.3, 4.4, 4.9.
- See The Lex & Yacc Page for lex and yacc tutorials and manuals.
- Another nice Lex & Yacc tutorial

aho@cs.columbia.edu