Computer Science Theory

Lecture 9: October 3, 2012

CFGs and PDAs

- From a CFG to a PDA
- From a PDA to a CFG
- Eliminating useless symbols
- Eliminating ε-productions
- Eliminating unit productions
- Chomsky normal form

- Given a CFG
*G*, we can construct a PDA*P*such that N(*P*) = L(*G*). - The PDA will simulate leftmost derivations of G.
- Algorithm to construct a PDA for a CFG
- Input: a CFG
*G*= (V, T, Q, S). - Output: a PDA
*P*such that N(*P*) = L(*G*). - Method: Let
*P*= ({q}, T, V ∪ T, δ, q, S) where - δ(
*q*, ε,*A*) = {(*q*, β) |*A*→ β is in Q } for each nonterminal*A*in V. - δ(
*q*,*a*,*a*) = {(*q*, ε)} for each terminal*a*in*T*. - For a given input string
*w*, the PDA simulates a leftmost derivation for*w*in*G*. - We can prove that N(
*P*) = L(*G*) by showing that*w*is in N(*P*) iff*w*is in L(*G*): - If part: If
*w*is in L(*G*), then there is a leftmost derivationS = γ

_{1}⇒ γ_{2}⇒ ... ⇒ γ_{n}= w- We show by induction on
*i*that*P*simulates this leftmost derivation by the sequence of moves- (
*q*,*w*, S) |–* (*q*,*y*_{i}, α_{i})- such that if γ
_{i}=*x*α_{i}_{i}, then*x*_{i}*y*=_{i}*w*. - We show by induction on
- Only-if part: If
(
*q*,*x*, A) |–* (*q*, ε, ε), then A ⇒**x*. - We can prove this statement by induction on the number of moves made
by
*P*.

- Given a PDA
*P*, we can construct a CFG*G*such that L(*G*) = N(*P*). - The basic idea of the proof is to generate the strings that cause
*P*to go from state*q*to state*p*, popping a symbol X off the stack, by a nonterminal of the form [*q*X*p*]. - Algorithm to construct a CFG for a PDA
- Input: a PDA
*P*= (Q, Σ, Γ, δ, q_{0}, Z_{0}, F). - Output: a CFG
*G*= (V, Σ, R, S) such that L(*G*) = N(*P*). - Method:
- Let the nonterminal S be the start symbol of
*G*. The other nonterminals in V will be symbols of the form [*p*X*q*] where*p*and*q*are states in Q, and X is a stack symbol in Γ. - The set of productions R is constructed as follows:
- For all states
*p*, R has the production S → [*q*_{0}Z_{0}*p*]. - If δ(
*q*,*a*, X) contains (*r*, Y_{1}Y_{2}… Y_{k}), then R has the productions- [
*q*X*r*_{k}] →*a*[*r*Y_{1}*r*_{1}] [*r*_{1}Y_{2}*r*_{2}] … [*r*_{k-1}Y_{k}*r*_{k}]- for all lists of states
*r*_{1},*r*_{2}, … ,*r*_{k}. - [
- We can prove that [
*q*X*p*] ⇒**w*iff (*q*,*w*, X) |–* (*p*, ε, ε). - From this, we have
[
*q*Z_{0}_{0}*p*] ⇒**w*iff (*q*,_{0}*w*, Z_{0}) |–* (*p*, ε, ε), so we can conclude L(*G*) = N(*P*).

- A symbol X is
*useful*for a CFG if there is a derivation of the form S ⇒^{*}αXβ ⇒^{*}w for some string of terminals w. - If X is not useful, then we say X is
*useless*. - To be useful, a symbol X needs to be
*generating*; that is, X needs to be able to derive some string of terminals.*reachable*; that is, there needs to be a derivation of the form S ⇒^{*}αXβ where α and β are strings of nonterminals and terminals.- To eliminate useless symbols from a grammar, we
- identify the nongenerating symbols and eliminate all productions containing one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable from the start symbol.

- In the grammar

```
S → AB | a
A → b
```

`S`

, `A`

, `a`

, and
`b`

are generating. `B`

is not generating.```
S → a
A → b
```

`A`

is not reachable from `S`

, so
we can eliminate the second production to get`S → a`

`S`

.- If a language L has a CFG, then L - { ε } has a CFG without any ε-productions.
- A nonterminal A in a grammar is
*nullable*if A ⇒^{*}ε. - The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X
_{1}X_{2}... X_{k}is a production and*m*of the*k*X_{i}'s are nullable. Then add the 2^{m}versions of this production where the nullable X_{i}'s are present or absent. (But if all symbols are nullable, do not add an ε-production.) - Let us eliminate the ε-productions from the grammar G

```
S → AB
A → aAA | ε
B → bBB | ε
```

`S → AB`

we add the productions `S → A | B`

`A → aAA`

we add the productions `A → aA | a`

`B → bBB`

we add the productions `B → bB | b`

```
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
```

- A
*unit*production is one of the form`A → B`

where both`A`

and`B`

are nonterminals. - Let us assume we are given a grammar G with no ε-productions.
- From G we can create an equivalent grammar H with no unit productions as follows.
- Define (A, B) to be a unit pair if A ⇒
^{*}B in G. - We can inductively construct all unit pairs for G.
- For each unit pair (A, B) in G, we add to H the productions A → α where B → α is a nonunit production of G.
- Consider the standard grammar G for arithmetic expressions:

```
E → E + T | T
T → T * F | F
F → ( E ) | a
```

`(E,E), (E,T), (E,F), (T,T), (T,F), (F,F)`

.```
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
```

- A grammar G is in Chomsky Normal Form if each production in G is one of two forms:
- A → BC where A, B, and C are nonterminals, or
- A → a where a is a terminal.
- We will further assume G has no useless symbols.
- Every context-free language without ε can be generated by a Chomsky Normal Form grammar.
- Let us assume we have a CFG G with no useless symbols, ε-productions, or unit productions. We can transform G into an equivalent Chomsky Normal Form grammar as follows:
- Arrange that all bodies of length two or more consist only of nonterminals.
- Replace bodies of length three or more with a cascade of productions, each with a body of two nonterminals.
- Applying these two transformations to the grammar H above, we get:

```
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
```

- Eliminate useless symbols from the following grammar:
- Put the following grammar into Chomsky Normal Form:

```
S → AB | CA
A → a
B → BC | AB
C → aB | b
```

```
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
```

- HMU: Section 7.1

aho@cs.columbia.edu