An Approach to Verifiable Compiling Specification and Prototyping

Viewer
Transcript

An Approach to Veri able Compiling Speci cation and Prototyping Jonathan Bowen He Jifeng y Paritosh Pandya z Oxford University Computing Laboratory Programming Research Group 11 Keble Road, Oxford OX1 3QD, England Abstract A compiler may be speci ed as a set of theorems, each describing how a construct in the programming language is translated into a sequence of machine instructions. The machine may be speci ed as an interpreter written in the programming language itself. Using re nement algebra, it can then be veri ed that interpreting a compiled program is the same or better than executing the original source program. The compiling speci cation is very similar to a logic program and thus a prototype compiler (and interpreter) may easily be produced in a language such as Prolog. A subset of the occam programming language and the transputer instruction set are used to illustrate the approach. An advantage of the method is that new programming constructs can be added without necessarily aecting existing development work.

1 Introduction There is considerable interest in developing compilers which are proved to generate correct machine code. It is especially important to use such compilers in the safety critical applications. The development of a veri ed compiler can be divided into four steps. Let PL designate the set of high level programs, and ML designate the set of machine programs. 1. The compiling speci cation is a predicate relating PL programs to their acceptable ML program implementations. 2. Suitable mathematical semantics must be given to the PL programs as well as ML programs to capture their behaviour. A proof of correctness of the compiling speci cation establishes that compiling speci cation only relates semantics preserving programs. 3. A compiler is an algorithm for translating PL programs into ML programs. 4. A proof of correctness of compiler algorithm ensures that the compiler only generates machine programs satisfying the compiling speci cation. Funded by the UK IED safemos project: no. IED3/1/1036, \Demonstration of the Possibility of Totally Veri ed Systems." y Funded by the ESPRIT BRA ProCoS project: no. 3104, \Provably Correct Systems." z On leave from the Tata Institute of Fundamental Research, Bombay, India. Funded by the safemos project.

correctness proof [H90]. Central to this approach is a predicate C p m stating that the machine program m is a correct translation of the PL program p. Here, is the symbol table mapping each variable of p to a location in the machine memory. (Clearly, the behaviour of m cannot be compared to the behaviour of p without knowing .) The compiling speci cation is given as a set of theorems about predicate C p m stating how each construct can be compiled. For example,

C (p ; q)(m1_ m2) if C p m1 ^ C q m2 (Here, we use (m1_m2) to represent a machine program which executes m1 rst and on its completion starts executing m2.) To formalise the notion of correctness of compiling speci cation, a mathematical theory of program re nement is developed. Under this theory, a set of laws (re nement algebra) is given to establish p v q which states that program q is `better than' program p in all circumstances (for example it terminates more often, and is more deterministic). The behaviour of machine programs is formalised by giving an interpreter I m for the machine program m. The interpreter itself is a program in PL: This greatly facilitates the task of comparing the program behaviour to the behaviour of its compiled machine program. The compiling speci cation C p m is de ned to be correct if p v I m ; i.e., the behaviour of the interpreter I m is a re nement of the behaviour of the source program p after a suitable translation from machine state to program state, as speci ed by . Each theorem of the compiling speci cation must be proved correct w.r.t. the above de nition. Such a proof can be given entirely through the algebraic laws of process re nement. The proof of each theorem is independent of the other theorems. We refer the reader to [H90] for a fuller treatment of this approach. In this paper, we give a compiling speci cation for the ProCoS project language PL0 [LJ89], following the algebraic approach of Hoare. We also present a prototype Prolog compiler based on this speci cation. PL0 is a subset of the programming language occam [I88a], and the machine language ML0 [BP90] is a subset of the transputer instruction set [I88b]. The paper is organised as follows. Section 2 brie y describes the programming language PL0 , and gives some algebraic laws for process re nement. Section 3 outlines the interpreter for ML0 programs. Section 4 deals with the speci cation of a compiler and the proof of its correctness. Thus, Section 4.1 gives the formal meaning of correctness of compiling speci cation. The compiling speci cation itself is given in Section 4.2 and it is proved correct in Section 4.3. Section 5 discusses a strategy for organising a compiler based on the speci cation given in this paper. A Prolog implementation of a prototype compiler which is very close to the compiling speci cation, is presented in Section 6. Finally we discuss some advantages of our approach.

2 The Programming Language and its Process Algebra The programming language PL0 is a sequential subset of occam. It consists of the occam constructs, SKIP, STOP, assignment, SEQ, IF and WHILE. Constructs input?x and output!e permit a program to interact with the external environment. Rather than giving formal syntax [LJ89], we present an example program. Note that the concrete syntax is selected such that the program may easily be

priority, associativity, etc. This obviates the need to write a parser in Prolog. int cs: int ps: int x: int y: seq[ cs := 0, ps := 0, while(true, seq[ input?x, ps := cs, cs := x, if[ cs = 0 -> y := 0, ps = 0 -> y := 2-cs, ps <> 0 -> y := cs ], output!y ] ) ].

Running a program The formal semantics of PL0 is well studied [HH89, He90]. An interpreter for PL0 programs can be coded in Prolog as follows. For simplicity, we assume that no variable name may be declared more than once. Also `undefined' indicates an arbitrary integer value. Expression evaluation is straight-forward and omitted here. run(int X:P) :run(skip). run(stop) :run(X:=E) :run(input?X) :run(output!E) :run(if[]) :run(if[B->P|Q]) :run(while(B,P)) :run(seq[]) :run(seq[P|Q]) :-

run(X:=undefined), run(P). stop. getvar(X,Y), eval(E,V), setvalue(Y,V). getvar(X,Y), readvalue(Y,V), setvalue(Y,V). eval(E,V), writevalue(E,V). run(stop). eval(B,V), ((V=true, run(P)) ; (V=false, run(if Q))). run(if[B->seq[P,while(B,P)],true->skip]). run(skip). run(P), run(seq Q).

The PL0 language is enhanced with the following features to facilitate coding of a machine interpreter for ML0. The assignment construct is generalised to multiple assignment and the use of array variables is permitted (for modelling machine memory). A special process Abort is included which models completely arbitrary behaviour. assert b ensures that either b holds or the behaviour is completely arbitrary. The scope of variables may be terminated dynamically using the end construct. We refer to this language as PL0+ . run(abort) :run([]:=[]) :-

abort. run(skip).

run(assert B) :run(end[]) :run(end[X|R]) :-

run(if[B->skip,true->abort]). run(skip). forgetvar(X), run(end R).

Re nement Algebra

A number of algebraic laws for proving re nement relation p v q between PL0+ processes have been developed. These are similar to the laws of occam[RH88]. Informally, p v q means q is better than p; everything that q can do, p may also do; and everything q can fail to do, p may also fail to do. So q is in all respects a more predictable program and more controllable than p. In any circumstance where p reliably serves some useful purposes, p may be replaced by q. I.e.,

p v q =) P(p) v P(q) where P(p) is any program containing p. Relation v is a preorder; i.e., it is re exive and transitive. Also, p = q i p v q ^ q v p. In the following we present some simple laws as illustration. A more comprehensive set of laws may be found in [HPB90]. A mathematical de nition of the relation v, and the consistency of the laws with respect a speci cation-oriented semantics of the language are explored in [He90]. The program Abort represents the completely arbitrary behaviour of a broken machine, and is the least controllable; in short for all purposes, it is the worst:

Law 1

Abort

vp

The SEQ constructor runs a number of processes in sequence. If it has no argument it simply terminates.

Law 2

[] =

SEQ

SKIP

Otherwise it runs its rst argument until that terminates and then runs the rest in sequence.

Law 3

[p] = p

SEQ

The following law permits us to unnest SEQs. The notation p denotes a list of processes.

Law 4

[p; SEQ[q]; r] =

SEQ

[p; q; r]

SEQ

Corollary 1 SEQ[SKIP; p] = SEQ[p] = SEQ[p; SKIP] Proof: = = = =

[ ; p] SEQ[SEQ[]; p] SEQ[p] SEQ[p; SEQ[]] SEQ[p; SKIP]

SEQ SKIP

fby law 2g fby law 4g fby law 4g fby law 2g

The machine language ML0 [BP90] is a subset of the transputer instruction set [I88b]. The machine code is interpreted by a PL0+ program I sfm T , where s and f stand for the start and nish address of the ML0 program in memory m. Thus, m[s] . . . m[f ; 1] is the ML0 code to be executed. Set T represents the data space available to the ML0 program. Any access to a location beyond that set is regarded illegal, and will allow completely arbitrary behaviour of the interpreter (which is modelled by construct Abort). The machine state consists of registers A, B and C , an instruction pointer P and a boolean ErrorFlag . I sfmT def = SEQ[ hP; ErrorFlag i := hs; 0i; WHILE(P < f; mstepT ); assert (P = f ^ ErrorFlag = 0) ] Here, mstepT is the interpreter for executing a single ML0 instruction starting at location m[P ] under the given data space T . The program assert(P = f ^ ErrorFlag = 0) assures that if the execution of the interpreter terminates, it will end at the nish address f with ErrorFlag cleared. We will omit giving the PL0+ code for mstepT which can be found elsewhere [HPB90]. The instructions used include: ldc(con) { load constant. ldl(addr) { load local from memory address. stl(addr) { store local to memory address. adc(con) { add constant. eqc(con) { equals constant. j(addr) { unconditional jump to a memory address. cj(addr) { conditional jump to a memory address. These and other ML0 instructions are more fully documented in [BP90, I88b]. We only mention that properties of machine behaviour such as the following lemmata can be proved using the laws of process re nement. Here, the function mtrans (minstr) gives the sequence of bytes representing the instruction minstr (see Section 4.2). Lemma 1 If m[s : f ; 1] = mtrans (ldl( x)) and x 2 T then I sfmT = hA; B; C; P; ErrorFlag i := hM [ x]; A; B; f; 0i Lemma 2 (Composition Rule) If s l f then I sfm T w SEQ[I slm T; I lfm T ]

4 Compiling Speci cation and its Veri cation The compiling speci cation of PL0 is de ned as a predicate relating a PL0 process and the corresponding ML0 code. Section 4.1 gives a formal meaning to the compiling speci cation predicate. Section 4.2 states many theorems of the predicate C p s f m . The aim is to is include enough theorems to enable the implementor to select correct ML0 code for each construct of PL0. Each theorem can be proved using the algebraic laws of process re nement. A sample proof is given in Section 4.3.

The compiler of the programming language PL0 is speci ed by predicate C p s f m where

p is a PL0 process. s and f stand for the start and nish address of a section of ML0 code to be executed. m[s] . . . m[f ; 1] is the ML0 code for p. The symbol table maps each identi er (global variables and channels) of p to its address in the memory M , where we assume that m (used to store the code) and M (used to store data) are disjoint.

is a set of locations of the memory M , which can be used to store the values of local variables or the temporary results during the evaluation of expressions. Here we assume that ran \ = ;

i.e., only contains those addresses which have not been allocated yet. The machine code is interpreted by the program I sfm(ran ] ). The compiling speci cation predicate C is correct if the interpretation of ML0 code has the same (or better) eect as PL0 source code with appropriate translation from the data space of target code to that of source code. I.e., C p s f m def = SEQ[ ^ ; p] v SEQ[I sfm(ran ] ); ^ ] The relation ^ translates from the machine state to program state and then forgets the machine state consisting of the memory locations and machine registers: ^ def = SEQ[ hx; y; z; . . .i := hM [ x]; M [ y]; M [ z]; . . .i; endran A; B;C; P; ErrorFlag ] where hx; y; z; . . .i contains all the program variables in the domain of . Thus, the compilation predicate states that for any initial machine state, executing the machine program and then translating the machine state into the source-program state is the same or better than rst translating the machine state and then executing the source program. Similarly, the compilation predicate CE e s f m relates a PL0 expression e to its ML0 code, whose execution must leave the value of e in the register A. ]

]f

g

4.2 Theorems of compiling speci cation We present some of the theorems of compiling speci cation for PL0 to ML0 translation; the full speci cation can be found in [HPB90]. Note that ML0 instructions are of variable length; each such instruction is implemented as a sequence of simpler single-byte transputer instructions. In an ML0 program the argument of a jump instruction is the byte oset from the end of the jump instruction to the start of the target instruction. A function mtrans (minstr), translating an ML0 instruction into a sequence of transputer instructions, and function Size (minstr), giving the length in bytes of minstr are speci ed elsewhere [PH90]. Notation m[s : f ] denotes the sequence m[s]; . . . ; m[f ].

if

(1)

C (SKIP)sfm

(2)

C (STOP)sfm

if m[s : f ; 1] = mtrans (stopp)

(3)

C (x := e)sfm

if 9l1: l1 f CE (e)sl1m ^ m[l1 : f ; 1] = mtrans (stl( x))

(4)

C (SEQ[])sfm

C (SKIP)sfm

(5)

C (SEQ[P1; . . . ; Pn])sfm

if 9l1: l1 f C (P1)sl1m ^ C (SEQ[P2; . . . ; Pn ])l1fm

(6)

C (WHILE(b; P ))sfm

if 9l1; l2; l3: l1 l2 l3 f m[s : l1 ; 1] = mtrans (j(l2 ; l1)) ^ C (P )l1l2m ^ CE (NOT b)l2l3m ^ m[l3 : f ; 1] = mtrans (cj(l1 ; f ))

(7)

C (IF[])sfm

if C (STOP)sfm

(8)

C (IF[b1 ! P1 ; . . . ; bn ! Pn])sfm

if 9l1; l2; l3; l4: l1 l2 l3 l4 f CE (b1)sl1m ^ m[l1 : l2 ; 1] = mtrans (cj(l4 ; l2)) ^ C (P1)l2l3m ^ m[l3 : l4 ; 1] = mtrans (j(f ; l4)) ^ C (IF[b2 ! P2; . . . ; bn ! Pn ]l4fm

f =s

if

Theorems of Expression compilation: (9)

CE (x)sfm

if

m[s : f ; 1] = mtrans (ldl( x))

9l1; l2; l3; l4; l5: l1 l2 l3 l4 l5 f CE (e1)sl1m ( ] flocg) ^ m[l1 : l2 ; 1] = mtrans (stl(loc)) ^ CE (e2)l2l3m ^ m[l3 : l4 ; 1] = mtrans (ldl(loc)) ^ m[l4 : l5 ; 1] = mtrans (add) ^ m[l5 : f ; 1] = mtrans (stoperr)

4.3 Correctness of the Compiling Speci cation We give a sample proof of correctness for the theorems of the SEQ construct.

Proof of Theorem 4 Direct from law 2. Proof of Theorem 5 w w w =

[I sfm(ran ] ); ^ ] ^ ] SEQ[I slm(ran ] ); I lfm(ran ] ); fby law 4 and lemma 2 (Composition Rule)g ^ ; SEQ[p2; . . . ; pn ]] SEQ[I slm(ran ] ); fby the antecedentg ^ ; p1; SEQ[p2; . . . ; pn ]] fby the antecedentg SEQ[ ^ ; SEQ[p1; . . . ; pn ]] SEQ[ fby law 4g SEQ

Proofs of other Theorems may be found in [HPB90]. These proofs have been carried out manually, although the use of machine assistance in checking these proofs is being investigated.

5 Compilation Strategy Section 4.2 presented a number of theorems about the compiling speci cation predicate C . In this section we discuss how these theorems may be used as a speci cation for a compiler for PL0 programs, and how they may be used in actually generating code for the PL0 programs. Hoare has suggested that these theorems may directly function as clauses of a logic program implementing the compiler. Indeed a logic program implementation of this compiler is feasible if an ecient strategy were to be designed for `executing' these clauses. In this section, we give one such strategy.

Problem de nition: Given the theorems of previous section, a high level program P , a start address s and , we wish to nd f and m such that C (P )sf m is true. Relocatability of machine code: Following two theorems, stating that code generated by the compiler is relocatable, are useful in implementing this strategy. These may be used to nding out the size of code even when its position in ROM memory is unknown.

9s ; f ; m : C (P )s f m ^ m[s : f ; 1] = m [s : f ; 1] 0

0

0

0

0

0

0

0

0

(12) CE (e)sfm

if 9s ; f ; m : CE (e)s f m ^ m[s : f ; 1] = m [s : f ; 1] 0

0

0

0

0

0

0

0

0

The search for such an f and m is carried out using the following strategy. Each clause consists of searching for some indices in the memory l1; l2; . . . ; f and satisfying some property of the memory locations between successive indices. The strategy can be conveniently presented as the order in which these indices are determined. 1. For all constructs other than IF and WHILE, given s, nd l1; l2; . . . in that order till f can be found. We illustrate this through the assignment construct:

C (x := e)sfm

if 9l1: l1 f CE (e)sl1m ^ m[l1 : f ; 1] = mtrans (stl( x)) According to our strategy, in compiling x := e starting at location s, (a) Code for expression e must be rst generated starting at location s by recursively satisfying CE (e)sl1m . This determines index l1. (b) Machine instruction stl( (x)) must be placed in the memory starting at location l1. This determines index f . 2. The compiling speci cation for the WHILE construct is as follows:

C (WHILE(b; P ))sfm

if 9l1; l2; l3: l1 l2 l3 f m[s : l1 ; 1] = mtrans (j(l2 ; l1)) ^ C (P )l1l2m ^ CE (NOT b)l2l3m ^ m[l3 : f ; 1] = mtrans (cj(l1 ; f )) For the WHILE construct, assume a function Opt (int) with the following property: Opt (l3 ; l1) = (f ; l3)

We will de ne such a function later in this section. Given s, the indices can be determined in the following order:

This can be explained in more detail as follows: (a) First determine (l2 ; l1) by compiling P into auxiliary memory m , i.e. nd k such that C (P )0k m holds. From the relocation theorems we have (l2 ; l1) = k. (b) Generate code for j(l2 ; l1) starting at s. This determines l1. (c) Generate code for P starting at l1. This determines l2 (This code can be obtained by relocating m [0 : k ; 1]). (d) Generate code for NOT b starting at l2. This determines l3. (e) Find (f ; l3) by calculating Opt (l3 ; l1). Calculate, f ; l1 as (f ; l3) + (l3 ; l1). Generate code for cj(f ; l1) starting at l3. 0

0

0

For an IF[] process, given s nd f by compiling it as process STOP. For IF(b1 ! P1; . . . ; bn ! Pn ) with n > 0, given s, the indices can be calculated in the following order:

l1 ! (f ; l4) ! (l4 ; l3) ! (l3 ; l2) ! (l4 ; l2) ! l2 ! l3 ! l4 ! f

Design of function Opt : The de ning equation for Opt are as follows: Size (cj(l1 ; f )) = (f ; l3) f = l1 + (l3 ; l1) + (f ; l3) Opt (l3 ; l1) = (f ; l3)

These can be solved to give the following speci cation of Opt : l3 ; l1 = 0 ) Opt (l3 ; l1) = 2 0 i ^ (16i ; i) l3 ; l1 < 16i+1 ; (i + 1) ) Opt (l3 ; l1) = i + 2 The strategy outlined in this section has been used to implement a compiler in the logic programming language Prolog. Extracts of this compiler are included in the next section.

6 Prolog Implementation The idea of using Prolog [CM81] for the construction of compilers has been accepted for some time [W80]. Advantages include the fact that the code for the compiler can be very close to the compiling speci cation since Prolog is based on logic [NM90] and thus the con dence in its correctness is increased. It can be used as a prototype compiler and even as a `real' compiler since the Prolog code itself may be compiled for increased eciency [QP88]. The following sections include parts of a Prolog compiler from the PL0 language to the ML0 instruction set which follows the compiling speci cation outlined in Section 4.2 as closely as possible. The strategy presented in Section 5 is followed to produce a working compiler. Currently the compiler produces code in the normal way by reading in a PL0 source program and writing out ML0 object (and assembly) code. However it should be easy to convert the program into a compiler `checker' so that program source and object code produced by another compiler could be checked to be correct.

A PL0 program is compiled with an input and output location and also a special buer used to store an evaluated expression for output. The set of free memory locations, , is modelled as a list of (non-repeating) locations. The correspondence between variables and memory locations, , is modelled as a list of tuples x-> x. cp(Prog,S,F,M,[],[Loc,INPCHANADDR,OUTCHANADDR|Omega]) :c(Prog,S,F,M,[outputbuf->Loc,input->INPCHANADDR,output->OUTCHANADDR],Omega).

Process compilation Each program construct is compiled using a separate Prolog clause. Individual instructions are assembled using mtrans. This records the start address of each instruction as well as the instruction itself. Each clause produces a number of segments of memory containing instructions which are typically concatenated together using append or flatten. A Prolog cut (`!') could be included at the end of each clause if we are only interested in the rst solution which the Prolog program nds. This would make the program more ecient by avoiding subsequent searching once a solution has been found. However, without these a non-deterministic compilation is possible (perhaps allowing the output of one or more other compilers to be checked, for example) and the program is more `logical' in behaviour. Declarations cause a free location in to be allocated in . The location of a variable in memory can be found using the psi clause (see later for more details). c(int X : Blk,S,F,M,Psi,[Loc|Omega]) :c(Blk,S,F,M,[X->Loc|Psi],Omega).

Most constructs are straight-forward and follow the original speci cation almost exactly. c(skip,S,S,[],_,_). c(stop,S,F,M,_,_) :mtrans(stopp,S,F,M). c(X:=E,S,F,M,Psi,Omega) :psi(Psi,X,PsiX), ce(E,S,L1,M1,Psi,Omega), mtrans(stl(PsiX),L1,F,M2), append(M1,M2,M). c(seq[],S,F,M,Psi,Omega) :c(skip,S,F,M,Psi,Omega). c(seq[P|R],S,F,M,Psi,Omega) :c(P,S,L1,M1,Psi,Omega), c(seq R,L1,F,M2,Psi,Omega), append(M1,M2,M).

The if and while constructs involve variable-length jump instructions which must be handled slightly dierently from the speci cation in order to produce an executable program. See later for more details on the reloc and opt clauses which are used below.

c(stop,S,F,M,Psi,Omega). c(if[B->P|R],S,F,M,Psi,Omega) :ce(B,S,L1,M1,Psi,Omega), c(P,0,L3_L2,MP,Psi,Omega), c(if R,0,F_L4,MR,Psi,Omega), mtrans(j(F_L4),0,L4_L3,Mj), L4_L2 is L4_L3+L3_L2, mtrans(cj(L4_L2),L1,L2,M2), reloc(L2,MP,M3), L3 is L3_L2+L2, reloc(L3,Mj,M4), L4 is L4_L3+L3, reloc(L4,MR,M5), F is F_L4+L4, flatten([M1,M2,M3,M4,M5],M). c(while(B,P),S,M,F,Psi,Omega) :c(P,0,L2_L1,M0,Psi,Omega), mtrans(j(L2_L1),S,L1,M1), reloc(L1,M0,M2), L2 is L2_L1+L1, ce(^B,L2,L3,M3,Psi,Omega), L3_L1 is L3-L1, opt(L3_L1,F_L3), L1_F is -(F_L3+L3_L1), mtrans(cj(L1_F),L3,F,M4), flatten([M1,M2,M3,M4],M).

Expressions are handled separately and straight-forwardly, and are omitted here.

Transputer instructions Each instruction Instr is located at a particular byte address S in memory. The size of the instruction, including all necessary nfix and pfix low-level pre x instructions [BP90, I88b] must be calculated so that the position of any following instructions F is known. Each instruction is assembled as a tuple consisting of its address in memory S and its code Instr. The Prolog code for size is omitted here but follows the de nition of Size in [PH90] closely. mtrans(Instr,S,F,[S:Instr]) :- size(Instr,Size), F is S+Size.

Correspondence between variables and memory locations The memory location PsiX for a particular variable X may be retrieved from using the following Prolog code: psi([X->PsiX|_],X,PsiX). psi([X->_|Psi],Y,PsiY) :- X\==Y, psi(Psi,Y,PsiY).

Relocation of machine code Compilation may proceed in a straight-forward sequential manner following the original speci cation directly except where forward jump instructions (j and cj) are involved. This occurs in the case of if and while constructs. In these cases the size of the relative jump and hence the size of the jump instruction itself are not known in advance.

into a separate piece of memory (starting at location 0 for convenience). This may subsequently be relocated into the actual position in memory once the jump instruction involved has been calculated and the real location (following the jump instruction) is known. This is possible because all the instructions in the ML0 language are relocatable; that is to say, they have the same eect wherever they are in memory. The following Prolog code relocates a list of instructions in memory by a speci ed oset. reloc(_,[],[]). reloc(Offset,[L1:T|R1],[L2:T|R2]) :L2 is L1+Offset, reloc(Offset,R1,R2).

This code is redundant if the location is not included with each instruction, but this would result in a program which is further away from the logical theorems.

Optimisation of backward jumps Backward jumps occur only in while loops in the PL0 language. Here the distance of the jump from the start position of the jump instruction is known. However the jump is actually eective from the end position of the jump instruction which depends on the size of the backward oset to be jumped. The solution used below makes use of the Prolog `or' construct (`;'). The rst solution that is large enough to make the required backward jump is found by successively trying each size in turn. Subsequent possibilities are of course valid but are not optimal; these could be allowed by relaxing the constraints, if the program where to be used to check the output of a non-optimal compiler for example. Note that the minimum size of instruction for a backward jump is 2 bytes, even for a zero oset (from the start of the instruction) [NT89]. opt(0,2). opt(Offset,Opt) :00, Opt is TryOpt+1)). tryopt(Offset,I,L,TryOpt) :J is I+1, H is 16*(L+I)-J, ((L=
Performance pro le The example program given earlier compiles to assembler code in about 1 second using compiled Quintus Prolog [QP88] on a Sun 3 workstation. This is acceptable for use on small programs in practice. However the performance of the prototype compiler is not linear with the size of input, but this could be alleviated by using dierence lists [NM90] for the manipulation of lists. Note that the Prolog compiler also produces numerical object code for later loading and interpretation. Subsequently the compiled program can be interpreted at a rate of about 10 instructions per second. Alternatively the original program can be interpreted directly (and considerably more quickly). The results from the machine code program should of course be the same (or `better' if the program is non-deterministic).

In this paper, we have outlined a compiling speci cation for the a simple subset of the occam programming language and its correctness proof using the algebraic technique of Hoare [H90]. The compiling speci cation is given as a set of theorems. The theorems are proved using the algebraic laws of process re nement for PL0. The complete speci cation as well as the full correctness proof may be found in [HPB90]. Further work on more complicated constructs such as recursive procedures is currently in progress [HB90]. There are several advantages in following this approach:

Each theorem and its proof is independent of the other theorems. This modularity is important

if the veri cation method is to be practicable. Speci cation and its proof can be developed one theorem at a time. New theorems can be added to capture dierent ways of compiling the same construct. For example, the speci cation may be extended with the following theorem. (13) C (SKIP)sfm

if 9l1: s l1 f ^ m[s : l1 ; 1] = mtrans(j(f ; l1)) The compiler algorithm can then generate code using any of the alternative theorems; or possibly using several of them, choosing the `best' (for example, the smallest) code.

The re nement algebra enables us to transform a program while preserving its semantics. It is

a property of the compiling speci cation predicate that re ning a process before compiling it will only generate a machine program with re ned behaviour. Formally,

p v q ^ C qsf m ) C psf m Thus, provably correct transformations could be performed at the source level, using the re nement algebra. Such transformations can be exploited to optimise the performance of a process.

The compiling speci cation for PL0 and its correctness proof are envisaged to be valid even for a larger language such as occam. The proofs given here will remain valid provided that the algebraic laws of PL0 continue to hold for the full language. Also, the interpreter for the machine programs should only be extended with more instructions such that the behaviour of the existing ML0 instructions remains unchanged (or is re ned). Since illegal instructions are modelled as Abort, new instructions can only improve the machine. Such extensibility of the speci cation and its proof is one of the main advantages of Hoare's method.

The form of compiling speci cation is very similar to a logic program, with each theorem corresponding to a clause. However, such literal translation of the speci cation into a logic program may be inecient. Hence, a strategy for executing the speci cation has been devised and a prototype Prolog compiler has been developed following this strategy. We have not given here a formal proof that the compiler satis es the compiling speci cation. This should, however, be simple as the

generate `veri ed code' only if the compiler itself is executed on a trusted implementation of Prolog . . . running on trusted hardware . . .

Acknowledgements The work was inspired by the ideas of C.A.R. Hoare. It was supported by the ESPRIT BRA (Basic Research Action) ProCoS and the UK IED (Information Engineering Directorate) safemos collaborative projects and we acknowledge the help of partners and funding on both these projects. Copies of ProCoS project documents are available from: Annie Rasmussen, Department of Computer Science, Technical University of Denmark, Building 344, DK-2800 Lyngby, Denmark.

References

Bowen, J.P. and P.K. Pandya, Speci cation of the ProCoS level 0 instruction set, ProCoS Project Document OU JB 2, 1990. [CM81] Clocksin, W.F. and C.S. Mellish, Programming in Prolog, Springer-Verlag, 1981. [He90] He, Jifeng, Speci cation oriented semantics for the ProCoS level 0 language, ProCoS Project Document OU HJF 5, 1990. [HH89] He, Jifeng and C.A.R. Hoare, Operational Semantics for ProCoS Programming Language Level 0, ProCoS Project Document OU HJF 1, 1989. [H90] Hoare, C.A.R., Re nement algebra proves correctness of compiling speci cations, Technical Report PRG-TR-6-90 (also ProCoS Project Document OU CARH 1), Programming Research Group, Oxford University, UK, 1990. [HB90] He, Jifeng and J.P. Bowen, Compiling Speci cation for ProCoS Language PLR0 , ProCoS Project Document OU HJF 6, 1990. [HPB90] He, Jifeng, P.K. Pandya and J.P. Bowen, Compiling Speci cation for ProCoS level 0 language, ProCoS Project Document OU HJF 4, 1990. [I88a] INMOS Limited, Occam 2 Reference Manual, Prentice Hall International Series in Computer Science, 1988. [I88b] INMOS Limited, Transputer Instruction Set: A compiler writer's guide, Prentice-Hall International, 1988. [LJ89] Lvengreen, H.H. and K.M. Jensen: De nition of the ProCoS Programming Language Level 0, ProCoS Project Document ID/DTH HHL 2, 1989. [NM90] Nilsson, U. and J. Maluszynski, Logic, Programming and Prolog, John Wiley & Sons, 1990. [NT89] Nicoud, J-D. and A.M. Tyrrell, The transputer T414 instruction set, IEEE Micro, pp 60{75, June 1989. [PH90] Pandya, P.K. and Jifeng He, A simulation approach to veri cation of assembling speci cation of ProCoS level 0 language, ProCoS Project Document OU PKP 3, 1990. [QP88] Quintus Prolog { Sun 3 User Manual, Release 2.4 (unix), Quintus Computer Systems, Inc., Mountain View, California, USA, 1988. [RH88] Roscoe, A.W. and C.A.R. Hoare, The Laws of Occam Programming, Theoretical Computer Science, 60, pp 177{229, 1988. [W80] Warren, D.H.D., Logic programming and compiler writing, Software|Practice and Experience, 10, pp 97{125, 1980. [BP90]

An Approach to the Specification and Verification of a ...

Generating Complete, Unambiguous, and Verifiable ...

Verifiable Data Structures - Continusec

Speci cation, Veri cation and Prototyping of an ...

Compiling quantum programs

Compiling Links Effect Handlers to the OCaml Backend

Read Introduction to Game Design, Prototyping, and Development ...

[CITW PDF] Wireless Prototyping - Introduction to SDR ...

Rally Effects, Threat, and Attitude Change: An Integrative Approach to ...