COMS W4117
Compilers and Translators:
Software Verification Tools
Lecture 3: Foundations of Software Verification
September 11, 2007
Lecture Outline
- Review
- Mathematical logic and proofs
- Compiler phases
- Reading
1. Review
- Software reliability
- Liveness vs. safety properties
- False positives vs. false negatives
- Examples of software bugs
- Approaches to software verification
- Representative static verification tools
- Reading
2. Mathematical Logic and Proofs
- Mathematical logic
- Mathematical logic provides the foundation for software verification.
- Logic formalizes the notion of a proof.
- A logic has a syntax which describes how to write its legal formulas.
- A logic has a semantics which gives a precise meaning to each formula.
- Automated theorem provers can help guide a software verifier.
- Proof Systems
- A proof system is a set of axioms and proof rules.
- Each axiom is a template for a formula.
- A proof rule consists of a finite set of template formulas (called premises)
and an additional template formula (called a consequent).
- Propositional logic
- Propositional logic is an algebra for reasoning about the truth or falsehood
of logical expressions.
- A proposition is any statement that can have one of the truth values,
true or false.
- Propositional logic expressions can be defined with the following grammar:
E → true | false | prop | E ∧ E | E ∨ E | ¬ E | ( E )
prop is a set of propositional variables.
A truth assigment maps a propositional variable to either true or false.
The meaning of a logical expression is a function that takes truth assignments
to the variables in the expressions as arguments and returns as its value
either true or false. It is possible to represent the meaning of a logical
expression as a truth table, in which the rows correspond to all possible
combinations of truth values for the variables in the expression.
Note that there are 22n different Boolean functions
with n arguments.
Boolean algebra is equivalent to propositional logic.
SAT is the problem of determining whether there exists a truth assignment that
causes a logical expression to have the value true. The tautology problem is
to determine whether a logical expression is equivalent to true. Both SAT and
the tautology problem are classical examples of NP-complete problems.
First-order logic (sometimes referred to as predicate logic)
- First-order logic generalizes propositional logic using predicates instead of
propositional variables and adding two universal quantifiers, ∀ (for all)
and ∃ (there exists).
- A predicate is a function of zero or more variables that returns a Boolean value.
For example,
csg(C,S,G) can be a predicate with arguments C
for Course, S for Student, and G for Grade.
It returns true whenever the values of C, S,
and G are such that student S got grade G in course C.
E.g., csg("COMS4117", "Aristotle", "A") = true.
- An atomic formula is predicate with zero or more arguments, where an argument
is either a variable or a constant. A variable is symbol capable of taking on
any constant as value. We should not confuse first-order logic variables with
propositional variables. A propositional variable can be represented as a
predicate with no arguments in first-order logic.
- A ground atomic formula is an atomic formula all of whose arguments are constants.
E.g., csg(X, "Aristotle", Z) is an atomic formula;
csg("COMS4117", "Aristotle", "A") is a ground atomic formula.
- Expressions can be built from atomic formulas using the operators of propositional
logic plus the universal quantifiers.
E → true | false |AF | E ∧ E | E ∨ E | ¬ E | ∀v(E) | ∃v(E) | ( E )
Here v is a variable. Other binary connectives such as →
(for "implies") and ≡ (for "equivalence") are often added.
Example formula: ∀v (∃w (ge(w, v))). Assuming ge means ≥, this means
for every v there exists a w such that w ≥ v.
Interpretations
- An interpretation for a predicate p is a function that takes as input an assignment of
domain elements to each of the arguments of p and returns true or false.
- An interpretation for an expression E consists of a domain, an interpretation for
each predicate in E, and a domain value for each of the free variables in E. That is,
an interpretation for E provides a possible meaning for the predicates and variables
in E.
Tautologies
- An expression E is a tautology if for every interpretation of E the value of E is true.
- E.g., p(x) ∨ ¬ p(x) is a tautology; it is true no matter what interpretation
we use for predicate p or what value we assign to the free variable x.
A proof system for first-order logic
A1: X → (Y → X)
A2: (X → (Y → Z)) → ((X → Y) → (X → Z))
A3: ((¬X → Y) → ((¬X → ¬Y) → X))
A4: (∀v(X → Y)) → (X → ∀v Y), v not free in X
A5: (∀v (X(v)) → X(e) where e is a term to be substituted for
every free occurrence of v in X
The following three axioms deal with equivalence:
A6: e ≡ e
A7: ei ≡ ei' →
f(e1,...,ei,...,en) ≡ f(e1,...,ei',...,en)
where f is a function symbol
A8: ei ≡ ei' →
(r(e1,...,ei,...,en) → r(e1,...,ei',...,en))
where r is a relation symbol or equivalence
Proof rules
- Modus ponens:
- X, X→Y
- Y
- Generalization:
- X
- (∀v) X
Proofs by forward reasoning
- Given a set of expressions (called hypotheses), {E1,...En},
a proof by forward reasoning of a expression E, is a sequence of expressions ending
with E such that each formula in the sequence either
- is a hypothesis, or
- is an axiom of the proof system, or
- follows from some previous expression in the sequence using a proof rule
- We write E1,...En ⊦ E.
Proofs by backward reasoning
- In backward reasoning we read each proof rule backward by reducing
the task of proving the consequent to the tasks of proving the premises
- For example consider the proof rule
- In forward reasoning we would first prove X and Y, and then apply
the proof rule to conclude X ∧ Y.
- In backward reasoning we would try to prove the goal X ∧ Y by
creating two subgoals X and Y, and then trying to prove these two
subgoals separately.
Soundness and completeness
- A proof system is sound if it can be used to prove only true statements.
- A proof system is complete if it can be used to prove all true statements.
Models
- A model for a set of expressions is a set of interpretations that makes all
the expressions in the set true.
- E.g., let E1 = p ∧ q and E2 = ¬ ∨ r.
The only model for {E1, E2} is the set consisting of
the single interpretation p = true, q = true, r = true.
Entailment
- We say that {E1,..., En} entails E if
every model for {E1,..., En} is also a model for E.
If so, we write E1,..., En ⊧ E. The intuition is
that each interpretation is a possible world. When we say
E1,..., En ⊧ E, we are saying that E is true in every
possible world where the expressions E1,..., En are true.
- Note that if E1,..., En ⊧ E, then
(E1 ∧ ... ∧ En) → E is a tautology.
3. Compiler Phases
- Front end
- Lexical analysis
- Syntax analysis
- Semantic analysis
- Intermediate code generation
- Machine-independent code optimizer
- Program analysis
- Optimizing transformations
- Code generator
4. Reading
- The material on logic was taken from chapters 12 and 14 of Aho and Ullman,
Foundations of Computer Science, C Edition, Freeman, 1995.
aho@cs.columbia.edu