COMS W4117
Compilers and Translators:
Software Verification Tools
Lecture 5: Data-Flow Analysis
September 18, 2007
Lecture Outline
- Review
- Introduction to data-flow analysis
- Data-flow schemas
- Reaching definitions
- Live-variable analysis
- Reading
1. Review
- Flow graphs and their representation
- Loops
- Introduction to code optimization
2. Introduction to Data-flow Analysis
- Data-flow analysis
- Data-flow analysis refers to a collection of techniques used to
collect information about the flow of data along all possible execution paths of a program
- The execution of a program is viewed as a series of transformations on the state
of the program. Each execution of an IR statement transforms an input state to
an output state, where the input state is associated with the program point before
the statement and the output state with the program point after the statement.
- In each application of data-flow analysis we associate with every program point
a data-flow value that represents an abstraction of the set of all program states
that can be observed for that point.
- We denote the data-flow values before and after each statement s by IN[s] and OUT[s],
respectively.
- The data-flow problem is to find a solution for the IN and OUT sets for all
statements subject to the constraints imposed by the semantics of the statements
("transfer functions") and those based on the flow of control.
3. Data-Flow Schemas
- For forward data-flow problems
- For a basic block B, OUT[B] = fB(IN[B]), where fB
is the composition of the transfer functions for the individual statements in B.
- The constraint for IN[B] is that IN[B] is the union of OUT[P] over all predecessors P
of B.
- For backward data-flow problems
- IN[B] = fB(OUT[B]).
- The constraint for OUT[B] is that OUT[B] is the union of IN[S]
over all successors P of B.
4. Reaching Definitions
- Reaching definitions is the prototypical forward data-flow analysis problem.
- A definition of a variable x is a statement that assigns, or may assign, a value
to x. Because of procedure parameters, arrays, and indirect accesses it is not always easy
to tell if a statement is referring to a particular variable x.
- A definition d reaches a point p if there is a path from the point immediately
following d to p such that d is not killed along that path.
- We kill a definition of a variable x if there is another definition of x
anywhere along the path (which may have loops in it).
- Example: the definition d: u = v+w generates a definition d of variable u and kills
all the other definitions in the program that define variable u.
- Transfer equations for reaching definitions, pp. 603-605.
- Control-flow equations, p. 605.
- Iterative algorithm for reaching definitions: See Alg. 9.11, p. 606.
5. Live-Variable Analysis
- Live-variable analysis is the prototypical backward data-flow analysis problem.
- In live-variable analysis we want to know for variable x and point p
whether the value of x at p could be used along some path in the flow graph
starting at p. If so, we say x is live at p; otherwise, x is dead at p.
- Live-variable analysis is useful for register allocation.
- We define defB as the set of variables defined in block B prior
to any use of that variable in B.
- We define useB as the set of variables whose values may be used in B prior
to any definition of that variable.
- Transfer equations and control-flow equations for live-variable analysis, pp. 608-610.
- Control-flow equations, p. 605.
- Iterative algorithm for live-variable analysis: See Alg. 9.14, p. 610.
6. Reading
aho@cs.columbia.edu