Computer Science Theory

Lecture 11: October 10, 2012

Decision and Closure Properties of CFL's

- Cocke-Younger-Kasami algorithm
- Testing emptiness of a CFG
- Closure properties of CFL's
- Nonclosure properties of CFL's
- Undecidable CFL problems

- Input: a Chomsky normal form CFG G = (V, T, P, S) and a string
*w*=*a*_{1}*a*_{2}...*a*_{n}in T*. - Output: "yes" if
*w*is in L(G), "no" otherwise. - Method: The CYK algorithm is a dynamic programming algorithm that fills in
a triangular table
`X`

with nonterminals A such that A ⇒*_{ij}*a*_{i}*a*_{i+1}...*a*_{j}.

```
for i = 1 to n do
if A → a
```_{i} is in P then
add A to X_{ii}
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in X_{ik}) and (C is in X_{k+1,j}) then
add A to X_{ij}
if S is in X_{1n} then
output "yes"
else
output "no"

`X`_{ij}

iff there is a
production A → BC in P where B ⇒*
`X`_{ij}

, we examine at most
`X`_{ii}

, `X`_{i+1,j}

),
(`X`_{i,i+1}

, `X`_{i+2,j}

),
and so on until
(`X`_{i,j-1}

, `X`_{j,j}

).- Problem: Given a CFG G, is L(G) empty?
- Emptiness problem is decidable: determine whether the start symbol of G is generating.
- Naive algorithm has O(
*n*^{2}) time complexity where*n*is the size of G (sum of the lengths of the productions). - With a more sophisticated list-processing algorithm, emptiness problem can be solved in linear time. See HMU, p. 302.

- The context-free languages are closed under
- substitution
- Let Σ be an alphabet and let L
_{a}be a language for each symbol*a*in Σ. These languages define a substitution*s*on Σ. - If
*w*=*a*_{1}*a*_{2}...*a*_{n}is a string in Σ*, then*s*(*w*) = {*x*_{1}*x*_{2}...*x*_{n}|*x*_{i}is a string in*s*(*a*_{i}) for 1 ≤*i*≤*n*}. - If L is a language,
*s*(L) = {*s*(*w*) |*w*is in L }. - If L is a CFL over Σ and
*s*(*a*) is a CFL for each*a*in Σ, then*s*(L) is a CFL. - union
- concatenation
- Kleene star
- homomorphism
- reversal
- intersection with a regular set
- inverse homomorphism

- The context-free languages are not closed under
- intersection
- L
_{1}= {*a*|^{n}b^{n}c^{i}*n, i*≥ 0 } and L_{2}= {*a*|^{i}b^{n}c^{n}*n, i*≥ 0 } are CFL's. But L = L_{1}∩ L_{2}= {*a*|^{n}b^{n}c^{n}*n*≥ 0 } is not a CFL. - complement
- Suppose comp(L) is context free if L is context free.
Since L
_{1}∩ L_{2}= comp(comp(L_{1}) ∪ comp(L_{2})), this would imply the CFL's are closed under intersection. - difference
- Suppose L
_{1}– L_{2}is a context free if L_{1}and L_{2}are context free. If L is a CFL over Σ, then comp(L) = Σ* - L would be context free.

- We say a problem that cannot be solved by any Turing machine is
*undecidable*. There is no algorithm that can solve an undecidable problem. - We shall see that several fundamental questions about context-free grammars and languages are undecidable, such as:
- Is a given CFG ambiguous?
- Given a CFG, is there another equivalent CFG that is unambiguous?
- Do two given CFG's generate the same language?
- Is the intersection of the languages generated by two CFG's empty?
- Given a CFG G = (V, T, P, s), is L(G) = T*?

- Let G be the following grammar:
- Use the CYK algorithm to determine whether
`aabab`

is in L(G). - Modify the CYK algorithm to report the number of distinct parse trees there are for a given string w in a CNF grammar G.
- Let min(L) = { w | w is in L but no proper prefix of w is in L }. Are the CFL's closed under the min operation?
- Let max(L) = { w | w is in L but for no string x other than ε is wx is in L }. Are the CFL's closed under the max operation?
- Let init(L) = { w | wx is in L for some string x (possibly the empty string) }. Are the CFL's closed under the init operation?
- Let cycle(L) = { w | we can write w as xy where yx is in L }. Are the CFL's closed under the cycle operation?
- Let half(L) = { w | there exists a string x such that |w| = |x| and wx is in L }. Are the CFL's closed under the half operation?

```
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
```

- HMU: Ch. 7

aho@cs.columbia.edu