Programming Languages and Translators

Lecture 5: Implementing a Lexical Analyzer

February 6, 2013

- Finite automata
- Converting an NFA to a DFA
- Equivalence of regular expressions and finite automata
- Simulating an NFA
- The pumping lemma for regular languages
- Closure and decision properties of regular languages

- Variants of finite automata are commonly used to match regular expression patterns.
- A nondeterministic finite automaton (NFA) consists of
- A finite set of states
*S*. - An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps
*S*× (Σ ∪ {ε}) to subsets of*S*. This transition function can be represented by a transition graph in which the nodes are labeled by states and there is a directed edge labeled*a*from node*w*to node*v*if δ(*w*,*a*) contains*v*. - An initial state
*s*_{0}in*S*. *F*, a subset of*S*, called the final (or accepting) states.- An NFA accepts an input string
*x*iff there is a path in the transition graph from the initial state to a final state that spells out*x*. - The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state
*s*and input symbol*a*there is exactly one transition out of*s*labeled*a*.

- Every NFA can be converted to an equivalent DFA using the subset construction (Algorithm 3.20, ALSU, pp. 153-154).
- Every DFA can be converted into an equivalent minimum-state DFA Using Algorithm 3.39, ALSU, pp. 181-183. All equivalent minimum-state DFAs are isomorphic up to state renaming.

- Regular expressions and finite automata define the same class of languages, namely the regular sets.
- Every regular expression can be converted into an equivalent NFA using the McNaughton-Yamada-Thompson algorithm (Algorithm 3.23, ALSU, pp. 159-161).
- Every finite automaton can be converted into a regular expression using Kleene's algorithm.

- Two-stack simulation of an NFA: Algorithm 3.22, ALSU, pp. 156-159.

- The pumping lemma allows us to prove certain languages, like
{
`a`

^{n}`b`

^{n}|*n*≥ 0 }, are not regular. **The pumping lemma.**If L is a regular language, then there exists a constant*n*associated with L such that for every string*w*in L where |*w*| ≥*n*, we can partition*w*into three strings*xyz*(i.e.,*w*=*xyz*) such that*y*is not the empty string,- the length of
*xy*is less than or equal to*n*, and - for all
*k*≥ 0, the string*xy*is in L.^{k}z

- The regular languages are closed under the following operations:
- union
- intersection
- complement
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Decision properties
- Given a regular expression
*r*and a string*w*, it is decidable whether*r*matches*w*. - Give a finite automaton A, it is decidable whether L(A) is empty.
- Given two finite automata A and B, it is decidable whether L(A) = L(B).

- Write down deterministic finite automata for the following regular expressions:
`(a*b*)*`

`(aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*`

`a(ba|a)*`

`ab(a|b*c)*bb*a`

- Construct a deterministic finite automaton that will recognize all strings of 0's and 1's representing integers that are divisible by 3. Assume the empty string represents 0.
- Use the McNaughton-Yamada-Thompson algorithm to convert the regular
expression
`a(a|b)*a`

into a nondeterministic finite automaton. - Convert the NFA of (3) into a DFA.
- Minimize the number of states in the DFA of (4).

- ALSU Chapter 3, all sections except 3.9.
- Russ Cox's article Regular Expression Matching Can Be Simple and Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) has a good historical account on the evolution of regular expression matching programs.

aho@cs.columbia.edu