Guest lecture for Compiler Construction, Spring 2015

Verified compilers Magnus Myréen Chalmers University of Technology Mentions joint work with Ramana Kumar, Michael Norrish, Scott Owens and many more

Course info to compiling Examples Guest lecture forIntroduction Compiler Construction, Spring Javalette 2015

LLVM

CompCert 2005 – Program verification For safety-critical software, formal verification of program correctness may be worth the cost.

Verified compilers

Such verification is typically done of the source program. So what if the compiler is buggy?

Use a certified compiler!

CompCert is a compiler for a large subset of C, with PowerPC assembler as target language.

What? Written in Coq, a proof assistant for formal proofs.

Comes with a machine-checked proof that for any program, which does not generate a compilation error, the source and target programs behave identically. (Precise statement needs more details.)

(Sometimes called certified compilers, but that’s misleading…)

Trusting the compiler

Est

Trusting the compilerTesting compilers Bugs When finding a bug, we go to great lengths to find it in our own code. Bugs Most programmers trust the compiler to generate correct code When finding a bug, we go to great lengths to find it in our own The most important task of the compiler is to generate correct code. Establishing Compiler Correctness code Most programmers trust the compiler Maybe to generate it iscorrect worthcode the

Es

cost?

The most important task of the compiler is to generate correct code Cost reduction?

Establishing Compiler Correctness

e

t

de

Alternatives Proving the correctness of a compiler is prohibitively expensive (however, see the CompCert project) Testing is the only viable option Alternatives Proving the correctness of a compiler is prohibitively expensive … but(however, with testing you never know you caught all bugs! see the CompCert project)

All (unverified) compilers have bugs “ Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. ” PLDI’11

ilers p m o C C in s g u B g in nd Finding and Understa Xuejun Yang

Yang Chen

Eric Eide

John Regehr

ol of Computing University of Utah, Scho edu ide, regehr }@cs.utah. { jxyang, chenyang, ee

“ [The verified part of] CompCert is the only compiler we havet tested for which Csmith cannot find wrong-code Abstrac errors. This is not for lack of trying: we have devoted about six CPU-years to the task.” mpilers, prove the quality of C co im To ct. rre co be ld ou sh and Compilers t-case generation tool, tes d ize om nd ra a , ith this period we created Csm d compiler bugs. During fin to it ing us ars ye piler ee thr spent ly unknown bugs to com us io ev pr 5 32 an th e or m we reported s found to crash and also wa ted tes we ler pi m co valid input. developers. Every de when presented with co g on wr te ra ne ge tly results to silen ler-testing tool and the pi m co r ou t en es pr we In this paper is to advance the . Our first contribution dy stu g in nt hu gbu Csmith r ou of g. Unlike previous tools, tin tes ler pi m co in t ar the e state of th subset of C while avoiding ge lar a r ve co t tha s ram ability generates prog s that would destroy its

1 2 3 4 5

int foo (void) { signed char x = 1; 5; unsigned char y = 25 return x > y; }

of GCC that shipped with on rsi ve the in g bu a d un Figure 1. We fo els it compiles 6. At all optimization lev x8 r fo 4.1 8.0 x nu Li mpiler tu Ubun result is 0. The Ubuntu co ct rre co the 1; n ur ret to g. this function GCC did not have this bu of on rsi ve se ba the ; ed was heavily patch d Csmith, a randomized

test-case generator that

sup-

This lecture: Verified compilers What? Proof that compiler produces good code.

rest of this lecture

Why?

To avoid bugs, to avoid testing.

How?

By mathematical proof…

Proving a compiler correct like first-order logic, or higher-order logic

Ingredients: • a formal logic for the proofs • accurate models of • the source language • the target language • the compiler algorithm Tools: • a proof assistant (software)

proofs are only about things that live within the logic, i.e. we need to represent the relevant artefacts in the logic a lot of details… (to get wrong)

… necessary to use mechanised proof assistant (think, ‘Eclipse for logic’) to avoid mistakes, missing details

Accurate model of prog. language Model of programs: • syntax — what it looks like • semantics — how it behaves e.g. an interpreter for the syntax

Major styles of (operational, relational) semantics: this style for structured source semantics • big-step this style for unstructured target semantics • small-step … next slides provide examples.

Syntax Source: exp = Num num | Var name | Plus exp exp

Target ‘machine code’: inst = Const name num | Move name name | Add name name name

Target program consists of list of inst

Source semantics (big-step) Big-step semantics as relation ↓ defined by rules, e.g. lookup s in env finds v (Num n, env) ↓ n

(x1, env) ↓ v1

(Var s, env) ↓ v

(x2, env) ↓ v2

(Add x1 x2, env) ↓ v1 + v2

called “big-step”: each step ↓ describes complete evaluation

Target semantics (small-step) “small-step”: transitions describe parts of executions We model the state as a mapping from names to values here. step (Const s n) state = state[s ↦ n] step (Move s1 s2) state = state[s1 ↦ state s2] step (Add s1 s2 s3) state = state[s1 ↦ state s2 + state s3] steps [] state = state steps (x::xs) state = steps xs (step x state)

Compiler function generated code stores result in register name (n) given to compiler compile (Num k) n = [Const n k] compile (Var v) n = [Move n v]

Relies on variable names in source to match variables names in target.

compile (Plus x1 x2) n = compile x1 n ++ compile x2 (n+1) ++ [Add n n (n+1)]

Uses names above n as temporaries.

Correctness statement Proved using proof assistant — demo! For every evaluation in the source … ∀x env res. (x, env) ↓ res

for target state and k, such that …

∀state k. (∀i env v. (lookup env i = SOME v)

(state i = v) ∧ i < k)

(let state' = steps (compile x k) state in (state' k = res) ∧ ∀i. i < k (state' i = state i))

k greater than all var names and state in sync with source env …

… in that case, the result res will be stored at location k in the target state after execution … and lower part of state left untouched.

A real language

Well, that example was simple enough…

But: Some people say: A programming language isn’t real until it has a self-hosting compiler

Bootstrapping for verified compilers? Yes!

Scaling up… POPL 2014 L M f o n o i lementat

p m I d e fi i r e V A : L CakeM umar K a n a m a R

reen nus O. My

1

2

3

ridge, U‡ K b m a C f o y , Universit y r to ustralia a r A o , b a A T L r C I te N u , b Comp esearch La iversity of Kent, UK R a r 2 r e b n a C puting, Un 3 chool of Com S

Mag

⇤ 1

† 1

orrish Michael N

ns Scott Owe

ation;

n ed compil sed o i ifi t r c e v u d in o t r a es 1. Int trong inter file results, many b is s a n e e s s o interest high-pr ecade ha

The last d ve been significant, 1, 14, 16, 29]. This nverified d e [ a u system call . nd there h ert compiler for C ram verification, an L a M n a d e t L C g p rifi omputing M o e c r m v p d o d r f y C a te o ll d s t a e n u x ic Abstrac th tr ta te n S e a n on y: in the co d complex part of th existing work on d and mech ubstantial subset of al-print loop f e ti p s lo ju e v to e d y s n e s a ev ea We have ms a large nowledge, none of th s has addressed all supports a an interactive read- orem ensures r o h f ic r h e il w p , m e e k co CakeML mented as e. Our correctness th lts permitted er, to our eral-purpose languag e, the compilation v le e p w o im H is . e s d n n su ba CakeML ilers for ge g two dimensions: o string to a list of machine co prints only those re t touches on p 4 m 6 o c 6 8 d x e ifi in n ver er alo ntation a source n effor il e o (REPL) p m ti m m a o le r c o f in n of that p c o ifi , r a m ti g im e a u f r v in c L o g k e r P o c x u ts r E e e c p O h R e e c th L. ga asp that this of CakeM lexing, parsing, type on, arbitraryr convertin chine code, and two, o s f c ti m n h a it r m o e a ti alg by the s verified esenting m in machine code. s including ation, garbage collec r e ic p v p e a r h to s r f e e o b w num a breadth d dynamic compil plemented r is to explain how imensions for a dapping. im il tr s u ts a b o o m in b h n r it a r e ly l o alg simp this pape is f these d is o d compil crementa in n t s a th r e h , o s c fi c b o a ti p e f e e r o h t u m r language a T p e h u p th . r it O o r u ld g c . a s o e O n f n g ti o ll a a io u u tr f tw is g s c pre ng the ctional ing lan demon ns are ed n lo s o , m u a o d f ti m e p u r t a e r m ic ifi ib il r g o tr tr p e o c s n r v , m o p e re b e nd a co Our c ral-purpos strongly typed, impu erified, we mean e is end-to-e fort can in practice rely on any n t e a g th l, a c m ti te prac ieces it is a l. By v on ef ing a sys el apand OCam a verificati ne of the p code along keML, and

First bootstrapping of a formally verified compiler.

Dimensions of Compiler Verification source code

how far compiler goes

abstract syntax intermediate language bytecode

Our verification covers the full spectrum of both dimensions.

machine code

compiler algorithm

implementation in ML

implementation in machine code

the thing that is verified

machine code as part of a larger system

Idea behind in-logic bootstrapping input: verified compiler function

Trustworthy code generation: functions in HOL (shallow embedding) proof-producing translation [ICFP’12, JFP’14] CakeML program (deep embedding) verified compilation of CakeML [POPL’14] x86-64 machine code (deep embedding) output: verified implementation of compiler function

The CakeML at a glance strict impure functional language

The CakeML language = Standard ML without I/O or functors

i.e. with almost everything else: ✓ higher-order functions ✓ mutual recursion and polymorphism ✓ datatypes and (nested) pattern matching ✓ references and (user-defined) exceptions ✓ modules, signatures, abstract types The verified machine-code implementation: parsing, type inference, compilation, garbage collection, bignums etc. implements a read-eval-print loop (see demo).

The CakeML compiler verification How? Mostly standard verification techniques as presented in this lecture, but scaled up to large examples. (Four people, two years.) Compiler: string

tokens

AST

IL

bytecode

New optimising compiler:

x86

ARM x86-64

IL-1

IL-2



IL-N

ASM

… work in progress (want to join? [email protected])

MIPS-64

Compiler verification summary Ingredients: • a formal logic for the proofs • accurate models of • the source language • the target language • the compiler algorithm Tools: • a proof assistant (software) Method: • (interactively) prove a simulation relation Questions? Interested?

Guest lecture for Compiler Construction, Spring 2015

references and (user-defined) exceptions. ✓ modules, signatures, abstract types. The CakeML language. = Standard ML without I/O or functors. The verified machine-code implementation: parsing, type inference, compilation, garbage collection, bignums etc. implements a read-eval-print loop (see demo).

1MB Sizes 0 Downloads 357 Views

Recommend Documents

No documents