Computer Science Theory

Lecture 10: October 8, 2012

Properties of Context-Free Languages

- Eliminating useless symbols
- Eliminating ε-productions
- Eliminating unit productions
- Chomsky normal form
- Pumping lemma for CFL's
- Cocke-Younger-Kasami algorithm

- A symbol X is
*useful*for a CFG if there is a derivation of the form S ⇒^{*}αXβ ⇒^{*}w for some string of terminals w. - If X is not useful, then we say X is
*useless*. - To be useful, a symbol X needs to be
*generating*; that is, X needs to be able to derive some string of terminals.*reachable*; that is, there needs to be a derivation of the form S ⇒^{*}αXβ where α and β are strings of nonterminals and terminals.- To eliminate useless symbols from a grammar, we
- identify the nongenerating symbols and eliminate all productions containing one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable from the start symbol.

- In the grammar

```
S → AB | a
A → b
```

`S`

, `A`

, `a`

, and
`b`

are generating. `B`

is not generating.```
S → a
A → b
```

`A`

is not reachable from `S`

, so
we can eliminate the second production to get`S → a`

`S`

.- If a language L has a CFG, then L - { ε } has a CFG without any ε-productions.
- A nonterminal A in a grammar is
*nullable*if A ⇒^{*}ε. - The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X
_{1}X_{2}... X_{k}is a production and*m*of the*k*X_{i}'s are nullable. Then add the 2^{m}versions of this production where the nullable X_{i}'s are present or absent. (But if all symbols are nullable, do not add an ε-production.) - Let us eliminate the ε-productions from the grammar G

```
S → AB
A → aAA | ε
B → bBB | ε
```

`S → AB`

we add the productions `S → A | B`

`A → aAA`

we add the productions `A → aA | a`

`B → bBB`

we add the productions `B → bB | b`

```
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
```

- A
*unit*production is one of the form`A → B`

where both`A`

and`B`

are nonterminals. - Let us assume we are given a grammar G with no ε-productions.
- From G we can create an equivalent grammar H with no unit productions as follows.
- Define (A, B) to be a unit pair if A ⇒
^{*}B in G. - We can inductively construct all unit pairs for G.
- For each unit pair (A, B) in G, we add to H the productions A → α where B → α is a nonunit production of G.
- Consider the standard grammar G for arithmetic expressions:

```
E → E + T | T
T → T * F | F
F → ( E ) | a
```

`(E,E), (E,T), (E,F), (T,T), (T,F), (F,F)`

.```
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
```

- A grammar G is in Chomsky Normal Form if each production in G is one of two forms:
- A → BC where A, B, and C are nonterminals, or
- A → a where a is a terminal.
- Every context-free language without ε can be generated by a Chomsky Normal Form grammar.
- Let us assume we have a CFG G with no useless symbols, ε-productions, or unit productions. We can transform G into an equivalent Chomsky Normal Form grammar as follows:
- Arrange that all bodies of length two or more consist only of nonterminals.
- Replace bodies of length three or more with a cascade of productions, each with a body of two nonterminals.
- Applying these two transformations to the grammar H above, we get:

```
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
```

- For every nonfinite context-free language L,
there exists a constant
*n*that depends on L such that for all*z*in L with |*z*| ≥*n*, we can write*z*as*uvwxy*where *vx*≠ ε,- |
*vwx*| ≤*n*, and - for all
*i*≥ 0, the string*uv*is in L.^{i}wx^{i}y - Proof: See HMU, pp. 281-282.
- One important use of the pumping lemma is to prove certain languages are not context free.
- Example: The language L =
{
*a*|^{n}b^{n}c^{n}*n*≥ 0 } is not context free. - The proof will be by contradiction. Assume L is context free.
Then by the pumping lemma there is a constant
*n*associated with L such that for all*z*in L with |*z*| ≥*n*,*z*can be written as*uvwxy*such that *vx*≠ ε,- |
*vwx*| ≤*n*, and - for all
*i*≥ 0, the string*uv*is in L.^{i}wx^{i}y - Consider the string
*z*=*a*.^{n}b^{n}c^{n} - From condition (2),
*vwx*cannot contain both*a*'s and*c*'s. - Two cases arise:
*vwx*has no*c*'s. But then*uwy*cannot be in L since at least one of*v*or*x*is nonempty.*vwx*has no*a*'s. Again,*uwy*cannot be in L.- In both cases we have a contradiction, so we must conclude L cannot be context free. The details of the proof can be found in HMU, p. 284.

- Input: a Chomsky normal form CFG G = (V, T, P, S) and a string
*w*=*a*_{1}*a*_{2}...*a*_{n}in T*. - Output: "yes" if
*w*is in L(G), "no" otherwise. - Method: The CYK algorithm is a dynamic programming algorithm that fills in
a triangular table
`X`

with nonterminals A such that A ⇒*_{ij}*a*_{i}*a*_{i+1}...*a*_{j}.

```
for i = 1 to n do
if A → a
```_{i} is in P then
add A to X_{ii}
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in X_{ik}) and (C is in X_{k+1,j}) then
add A to X_{ij}
if S is in X_{1n} then
output "yes"
else
output "no"

`X`_{ij}

iff there is a
production A → BC in P where B ⇒*
`X`_{ij}

, we examine at most
`X`_{ii}

, `X`_{i+1,j}

),
(`X`_{i,i+1}

, `X`_{i+2,j}

),
and so on until
(`X`_{i,j-1}

, `X`_{j,j}

).- Eliminate useless symbols from the following grammar:
- Put the following grammar into Chomsky Normal Form:
- Show that {
*a*|^{n}b^{n}c^{n}*n*≥ 0 } is not context free. - Show that {
*a*|^{n}b^{n}c^{i}*i*≤*n*} is not context free. - Show that {
*ss*^{R}*s*|*s*is a string of*a*'s and*b*'s } is not context free. - (Hard) Show that the complement of {
*ss*|*ss*is a string of*a*'s and*b*'s } is context free.

```
S → AB | CA
A → a
B → BC | AB
C → aB | b
```

```
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
```

- HMU: Ch. 7

aho@cs.columbia.edu