COMS W4117
Compilers and Translators:
Software Verification Tools
Lecture 12: Interprocedural Analysis
October 16, 2007
Lecture Outline
- Review
- Intraprocedural vs. interprocedural analysis
- Call graphs
- Context-sensitive vs. context-insensitive analysis
- Call strings
- Cloning-based context-sensitive analysis
- Summary-based context-sensitive analysis
1. Review
- Program dependency graphs
- Context-free language reachability
- Pushdown systems
- Reachability in a pushdown system
2. Intraprocedural vs. interprocedural analysis
- Intraprocedural analysis: analysis is done one procedure at a time.
- Assumes procedures invoked may
- alter the state of all variables visible to the procedures,
- create all possible side effects.
- Intraprocedural analysis is relatively simple but often imprecise.
- Interprocedural analysis: analysis is done across an entire program.
- One common technique is to inline procedures where possible.
- Pointer alias analysis is key to precise interprocedural analysis.
- Many commercial and academic static analysis tools use interprocedural analysis.
See
- for extensive lists of such tools.
3. Call Graphs
- A call graph for a program is a set of nodes and edges such that
- There is one node for each procedure in the program.
- There is one node for each call site in the program
(place where a procedure is invoked).
- If call site c may call procedure p, then there is an edge from c to p.
- Consider the following C program:
int (*pf)(int);
int fun1(int x) {
if (x < 10)
c1: return (*pf)(x+1);
else
return x;
}
int fun2(int y) {
pf = &fun1;
c2: return (*pf)(y);
}
void main() {
pf = &fun2;
c3: (*pf)(5);
}
Is this an ISO99-compliant C program? Answer: No; main should return an int.
What can pf point to? fun1, fun2?
4. Context-sensitive vs. Context-insensitive Analysis
- Context-insensitive interprocedural analysis treats each call and return
statement as goto operations.
- We can model this kind of analysis with a super control-flow graph (like the PDG)
which is a control-flow graph with additional edges connecting each call
site to the beginning of the procedure it calls and each return statement to
the instruction following the call site.
- A context-sensitive analysis keeps track of the context in which each
procedure was called.
for (i = 0; i < n; i++) {
c1: t1 = f(0);
c2: t2 = f(243);
c3: t3 = f(243);
X[i] = t1 + t2 + t3;
}
int f(int v) {
return (v+1);
}
A context-insensitive analysis yields that X[i] is either 3, 246, 489, or 732.
A context-sensitive analysis yields that X[i] is 489.
5. Call Strings
- A calling context is defined by the contents of the entire call stack.
We refer to the string of call sites on the stack as the call string.
- Consider a modification of the above program where the calls to f are now
calls to g, which then calls f:
for (i = 0; i < n; i++) {
c1: t1 = g(0);
c2: t2 = g(243);
c3: t3 = g(243);
X[i] = t1 + t2 + t3;
}
int g(int v) {
c4: return f(v);
}
int f(int v) {
return (v+1);
}
There are three calls strings to f: (c1, c4), (c2, c4), (c3, c4).
6. Cloning-based Context-sensitive Analysis
- In cloning-based context-sensitive analysis we create a conceptual clone of
the procedure for each context of interest and then apply a context-insensitive
analysis to the cloned call graph.
- In practice we do not clone the procedure; we use an internal data structure
to keep track of the analysis results on each clone.
- A cloned version of the above program is shown here:
for (i = 0; i < n; i++) {
c1: t1 = g(0);
c2: t2 = g(243);
c3: t3 = g(243);
X[i] = t1 + t2 + t3;
}
int g(int v) {
c4.1: return f1(v);
}
int g(int v) {
c4.2: return f2(v);
}
int g(int v) {
c4.3: return f3(v);
}
int f1(int v) {
return (v+1);
}
int f2(int v) {
return (v+1);
}
int f3(int v) {
return (v+1);
}
7. Summary-based Context-sensitive Analysis
- In summary-based context-sensitive analysis we create a concise description ("summary") of
the observable behavior of each procedure.
- The purpose of the summary is to avoid reanalyzing the procedure body at each
invoking call site.
- Each procedure is represented by a region with a single entry point.
- A procedure region can be nested within several different outer regions.
- The analysis has two parts:
- A bottom-up phase that computes a transfer function to summarize the
effect of a procedure.
- A top-down phase that propagates caller information to compute callee results.
- The following program shows the result of propagating all possible
constant arguments to the function f in the program in section (4) above:
for (i = 0; i < n; i++) {
c1: t1 = f0(0);
c2: t2 = f243(243);
c3: t3 = f243(243);
X[i] = t1 + t2 + t3;
}
int f0(int v) {
return (1);
}
int f243(int v) {
return (244);
}
8. Reading
aho@cs.columbia.edu