ESPRIT BRA 3104 ProCoS project: Provably Correct ...

Viewer
Transcript

ESPRIT BRA 3104 ProCoS project: Provably Correct Systems An Algebraic Approach to Veri able Compiling Speci cation and Prototyping of the ProCoS Level 0 Programming Language C.A.R. Hoare

He Jifeng Jonathan Bowen Paritosh Pandya Oxford University Computing Laboratory Programming Research Group 8{11 Keble Road Oxford OX1 3QD England Tel: +44-865-273838 FAX: +44-865-273839 8 May, 1990

Keywords: Veri able compilation Compiling speci cation Process algebra Re nement Simulation Logic programming

Summary

occam

transputer

A compiler is speci ed by a description of how each construct of the source language is translated into a sequence of object code instructions. The meaning of the object code can be de ned by an interpreter written in the source language itself. A proof that the compiler is correct must show that interpretation of the object code is at least good (for any relevant purpose) as the corresponding source program. The proof is conducted using standard techniques of data re nement. All the calculations are based on algebraic laws governing the source language. The theorems are expressed in a form close to a logic program, which may used as a compiler prototype, or a check on the results of a particular compilation. A subset of the occam programming language and the transputer instruction set are used to illustrate the approach. An advantage of the method is that it is possible to add new programming constructs without aecting existing development work. Contact person in case further information is wanted. Direct dial tel: +44-865-272574, direct FAX: +44-865-272582, e-mail [email protected].

1 Introduction Compilation is speci ed as a relation between a source program p and the corresponding object code c. Further details of compilation are given by a symbol table , mapping the global identi ers of p to storage locations of the target machine. This compilation relation will be abbreviated as a predicate C pc The internal structure of p, c and will be elaborated as the need arises. Improvement is a relation between a product q and a product p that holds whenever for any purpose the observable behaviour of q is as good as or better than that of p; more precisely, if q satis es every speci cation satis ed by p, and maybe more. For example, in a procedural language, a program is better if it terminates more often and/or gives a more determinate result. This relation is written pvq where v is necessarily transitive and re exive (a preorder). If p and q are program operating on dierent data spaces, they cannot be directly compared. But if r is a translation from the data space of q to that of p, we can then compare them before and after the translation r; p v q; r where the semicolon denotes sequential composition. The relation r is known as a simulation or re nement; such simulations are the basis of several modern development techniques (e.g., VDM and Z). To de ne compiler correctness precisely, we need to ascribe meanings to p, c and . Let c^ be a formal description of the behaviour of the target machine executing the machine code c. Let p^ similarly be an abstract behavioural de nition of the meaning of the program p. Finally, let ^ be a transformation which assigns to each global identi er x of the source program the value of the corresponding location x in the store of the target computer. In order to prove that the object code c is a correct translation of p, we need to show that ^ ; p^ v c^ ; ^ or in words, that the source program is improved by execution of its object code. A compiler is correct if it ensures the above for all p, c and . We therefore de ne this to be the speci cation of the compiler C p c def = ^ ; p^ v c^ ; ^ The task of proof-oriented compiler design is to develop a mathematical theory of the predicate C. This should include enough theorems to enable the implementor to select correct object code for each construct of the programming language. It may be that the theory allows choice between several dierent codes for the same source code; this gives some scope for optimisation by selecting the most ecient alternative. As example of the kind of theorem by which the design speci es a compiler, consider the following possible theorems (using h and i as sequence brackets and _ for catenation): 1

C (x := y) hload y; store xi C SKIP hi C (p1; p2) (c1_ c2) whenever C p1 c1 and C p2 c2 When these have been proved, the implementor of the compiler knows that a simple assignment can be translated into the pair of commands shown above; that a SKIP command translates to an empty code sequence; and that sequential composition can be translated by concatenating the translations of its two components. The form of these theorems is extraordinarily similar to that of a recursively de ned Boolean function, which could be used to check that a particular compilation of a safety critical program has been successful. It is also quite similar to a logic program, which could be useful if a prototype of the compiler is needed, perhaps for bootstrapping purposes. It is also similar to that of a conventional procedural compiler, structured according to the principle of recursive descent. Finally, it is quite similar to an attribute grammar for the language; and this permits the use of standard techniques for splitting a compiler into an appropriate number of passes. Thus the collection of theorems serve as an appropriate interface between the designers of a compiler and its implementors. In this paper, we give a compiling speci cation for the ProCoS project language PL0 [LJ89], following the algebraic approach described above. In this case the simulation ^ has a parameter which denotes the free locations available to the machine program. We also present a prototype Prolog compiler based on this speci cation. PL0 is a subset of the programming language occam[I88a], and the machine language ML0 is a subset of the transputer instruction set [I88b]. The paper is organised as follows. Section 2 brie y describes the programming language PL0 , and gives some algebraic laws for process re nement. Section 3 outlines the interpreter for ML0 programs. Section 4 deals with the speci cation of a compiler and the proof of its correctness. Thus, Section 4.1 gives the formal meaning of correctness of compiling speci cation. The compiling speci cation itself is given in Section 4.3 and it is proved correct in Section 4.4. Section 5 discusses a strategy for organising a compiler based on the speci cation given in this paper. A Prolog implementation of a prototype compiler which is very close to the compiling speci cation, is presented in Section 6. Finally we discuss some advantages of our approach.

2 Programming Language & its Process Algebra

The programming language PL0 is a sequential subset of occam. It consists of the occam constructs, SKIP, STOP, assignment, SEQ, IF and WHILE. Constructs input?x and output!e permit a program to interact with external environment. Rather than giving formal syntax [LJ89], we present an example program. Note that the concrete syntax is selected such that the program may easily be read directly by Prolog by de ning each keyword and symbol as a Prolog operator with some suitable priority, associativity, etc. This obviates the need to write a parser in Prolog. int sprev: int scurr:

2

seq[ input?scurr, while(true, seq[ sprev := scurr, input?scurr, if[ (scurr>0) /\ (sprev<>0) -> output!-1, (scurr=0) /\ (sprev=0) -> output!0, (scurr<0) /\ (sprev<>0) -> output!1, true -> skip ] ] ) ].

The semantics of PL0 is well studied [HH89, He90]. The speci cation-oriented semantics of PL0 given in [He90] formally de nes the relation between p and p^ mentioned in the introduction: p^ is just the strongest behavioural speci cation satis ed by p. However, in mathematics it is permissible (and indeed universal practice) to use notations directly to stand for what they mean. We can pro t enormously by this convention if we just allow the program text itself to mean its strongest speci cation:

p^ def = p The PL0 language is enhanced with the following features to facilitate coding of a machine interpreter for ML0. The assignment construct is generalised to multiple assignment and the use of array variables is permitted (for modelling machine memory). A special process Abort is included which models completely arbitrary behaviour. assert b ensures that either b holds or the behaviour is completely arbitrary. The scope of variables may be terminated dynamically using the end construct. We refer to this language as PL0+ .

Re nement Algebra

A number of algebraic laws for proving re nement relation p v q between PL0+ processes have been developed. These are similar to the laws of occam[RH88]. In the following we present some simple laws as illustration. A more comprehensive set of laws may be found in [HPB90]. A mathematical de nition of the relation v, and the consistency of the laws with respect a speci cation-oriented semantics of the language are explored in [He90]. The program Abort represents the completely arbitrary behaviour of a broken machine, and is the least controllable; in short for all purposes, it is the worst: Law 1 Abort v p The SEQ constructor runs a number of processes in sequence. If it has no argument it simply terminates. 3

Law 2

[] = SKIP Otherwise it runs its rst argument until that terminates and then runs the rest in sequence. Law 3 SEQ[p] = p SEQs may be unnested as follows. (The notation p denotes a list of processes.) Law 4 SEQ[p; SEQ[q]; r] = SEQ[p; q; r] The following law is used to describe the compositionality of WHILE-loops: Law 5 SEQ[WHILE(b; p); WHILE(b _ c; p)] = WHILE(b _ c; p) SEQ

3 Interpreting Machine Programs The machine language ML0 is a subset of the transputer instruction set [I88b]. The machine code is interpreted by a PL0+ program I sfm T , where s and f stand for the start and nish address of the ML0 program in memory m. Thus, m[s] . . . m[f ;1] is the ML0 code to be executed. The set of memory locations T represents the data space available to the ML0 program. For PL0 , T is de ned as ran ] where ran is used to hold global variables and includes the temporary stack used for evaluation of expressions. Any access to a location beyond that set is regarded illegal, and will allow completely arbitrary behaviour of the interpreter (which is modelled by the construct Abort). The machine state consists of registers A, B and C , an instruction pointer P and a boolean ErrorFlag. I sfmT def = SEQ[ hP; ErrorFlagi := hs; 0i; WHILE(P < f; mstepT ); assert (P = f ^ ErrorFlag = 0) ] Here, mstepT is the interpreter for executing a single ML0 instruction starting at location m[P ] using the given data space T . The program assert(P = f ^ ErrorFlag = 0) assures that if the execution of the interpreter terminates, it will end at the nish address f with ErrorFlag cleared. We will omit giving the PL0+ code for mstepT which can be found elsewhere [HPB90]. We only mention that properties of machine behaviour such as the following lemma can be proved using the laws of process re nement. Lemma 1 (Composition Rule) If s l f then I sfm T w SEQ[Islm T; Ilfm T ] Proof: LHS = SEQ[hP; ErrorFlagi := hs; 0i; WHILE(P < j; mstepT ); WHILE(P < f; mstepT ); assert(P = f ^ ErrorFlag = 0)] fby law 5g w SEQ[IslmT; hP; ErrorFlagi := hj; 0i; WHILE(P < f; mstepT ); assert(P = f ^ ErrorFlag = 0)] fby law 2g = RHS 4

4 Compiling Speci cation and its Veri cation The compiling speci cation of PL0 is de ned as a predicate relating a PL0 process and the corresponding ML0 code. Section 4.1 gives a formal meaning to the compiling speci cation predicate. Section 4.2 presents the algebra of the weakest speci cation for the object code with respect the given symbol table, and Section 4.3 states many theorems of the compilation predicate C. The aim is to is include enough theorems to enable the implementor to select correct ML0 code for each construct of PL0 . Each theorem can be proved using the algebraic laws of process re nement. A sample proof is given in Section 4.4.

4.1 Compiling Speci cation Predicate

The compiler of the programming language PL0 is speci ed by predicate C p s f m

where p is a PL0 process. s and f stand for the start and nish address of a section of ML0 code to be executed. m[s] . . . m[f ; 1] is the ML0 code for p. The symbol table maps each identi er (global variables and channels) of p to its address in the memory M , where we assume that m (used to store the code) and M (used to store data) are disjoint. is a set of locations of the memory M , which can be used to store the values of local variables or the temporary results during the evaluation of expressions. Here we assume that ran \ = ; i.e., only contains those addresses which have not been allocated yet. It is the responsibility of the compiler designer to ensure that this is so. The machine code is interpreted by the program Isfm(ran ] ). The compiling speci cation predicate C is correct if the interpretation of ML0 code has the same (or better) eect as PL0 source code with appropriate translation from the data space of target code to that of source code, i.e. C p s f m def = SEQ[ ^ ; p] v SEQ[I sfm(ran ] ); ^ ] The relation ^ translates from the machine state to program state and then forgets the machine state consisting of the memory locations and machine registers: ^ def = SEQ[ hx; y; z; . . .i := hM [ x]; M [ y]; M [ z]; . . .i; endran ] + fA;B;C; P; ErrorFlagg ] where hx; y; z; . . .i contains all the program variables in the domain of . Similarly, the compilation predicate E e s f m relates a PL0 expression e to its ML0 code, whose execution must leave the value of e in the register A. 5

4.2 Weakest Speci cation of the Object Code

This section shows how to derive the weakest speci cation of the target code when given the source program p and the symbol table . Because the simulation ^ given in the previous section is a surjective mapping from the data space of the machine language ML0 to that of the programming language PL0, we can thus de ne its inverse as follows: def ^ ;1

= SEQ[ hM [ x]; M [ y ]; M [ z ]; . . .i := hx; y; z; . . .i; endx; y; z; ... ] Algebraically, the inverse of the simulation ^ is uniquely de ned by the following laws: ^ ;1 ^ ] = SKIPPS ^ ; ^ ;1 SEQ[ SEQ[

;

] v SKIPMS where PS and MS stand for the data spaces of PL0 and ML0 respectively. From these laws it is not dicult to show that ^ ; p] v SEQ[q; ^ ] i SEQ[ ^ ; p; ^ ;1 SEQ[

] v q Consequently, the compilation predicate C can be rede ned by C p s f m def = SEQ[ ^ ; p; ^ ;1

] v I sfm(ran ] ) which claims that the weakest speci cation of a correct ML0 implementation of the source program p is WS (p) def = SEQ[ ^ ; p; ^ ;1

] The weakest speci cation operator WS enjoys a number of algebraic properties: Lemma 2 WS (Abort) = Abort Lemma 3 WS (STOP) = STOP Lemma 4 WS (SKIP) = MS n M [ran ] := anyvalue where MS n M [ran ] denotes the remaining data space of ML0 after removal of the subset of the memory M being allocated for global identi ers of the source program. Lemma 5 WS (SEQ[p1; . . . ; pn ]) = SEQ[WS (p1); WS (SEQ[p2; . . . ; pn )])] Now we want to extend the de nition of WS to treat expressions: WS (e) def = e[M [ x]=x; M [ y]=y; M [ z]=z; . . .] where e[t=x] is the expression obtained by replacing all free occurrences of x in e by t. Lemma 6 WS (x := e) = SEQ[M [ x] := WS (e); WS (SKIP)] Lemma 7 WS (input?x) = SEQ[input?M [ x]; WS (SKIP)] Lemma 8 WS (output!e) = SEQ[output!WS (e); WS (SKIP)] Lemma 9 WS (IF[b1 ! p1 ; . . . ; bn ! pn ]) = IF[WS (b1) ! WS (p1 ); WS (:b1 ) ! WS (IF[b2 ! p2 ; . . . ; bn ! pn ])] Lemma 10 WS (WHILE(b; p)) = SEQ[WHILE(WS (b); WS (p)); WS (SKIP)] 6

4.3 Theorems of compiling speci cation

We present some of the theorems of compiling speci cation for PL0 to ML0 translation; the full speci cation can be found in [HPB90]. Note that ML0 instructions are of variable length; each such instruction is implemented as a sequence of simpler single-byte transputer instructions. In an ML0 program the argument of a jump instruction is the byte oset from the end of the jump instruction to the start of the target instruction. A function mtrans(minstr), translating an ML0 instruction into a sequence of transputer instructions, and function Size(minstr), giving the length in bytes of minstr are speci ed elsewhere [PH90]. The notation m[s : t] denotes the sequence m[s]; . . . ; m[t].

Theorems of process compilation: (1) C(SKIP)sfm

if (2) (3) (4) (5) (6) (7)

(8)

f =s C(STOP)sfm

if m[s : f ; 1] = mtrans(stopp) C(x := e)sfm

if 9l1: l1 f E (e)sl1m ^ m[l1 : f ; 1] = mtrans(stl( x)) C(SEQ[])sfm

if C(SKIP)sfm

C(SEQ[p1; . . . ; pn ])sfm

if 9l1: l1 f C (p1)sl1m ^ C (SEQ[p2; . . . ; pn])l1fm

C(IF[])sfm

if C(STOP)sfm

C(IF[b1 ! p1; . . . ; bn ! pn ])sfm

9l1; l2; l3; l4: l1 l2 l3 l4 f E (b1)sl1m ^ m[l1 : l2 ; 1] = mtrans(cj(l4 ; l2)) C (p1)l2l3m ^ m[l3 : l4 ; 1] = mtrans(j(f ; l4)) C (IF[b2 ! p2 ; . . . ; bn ! pn]l4fm

C(WHILE(b; p))sfm

if 9l1; l2; l3: l1 l2 l3 f m[s : l1 ; 1] = mtrans(j(l2 ; l1)) C (p)l1l2m ^ E (NOT b)l2l3m ^ m[l3 : f ; 1] = mtrans(cj(l1 ; f )) 7

if ^ ^ ^

Theorems of Expression compilation: (9) E (x)sfm

if m[s : f ; 1] = mtrans(ldl( x)) (10) E (e1 + e2)sfm ( ] flocg) if 9l1; l2; l3; l4; l5: l1 l2 l3 l4 l5 f E (e1)sl1m ( ] flocg) ^ m[l1 : l2 ; 1] = mtrans(stl(loc)) ^ E (e2)l2l3m ^ m[l3 : l4 ; 1] = mtrans(ldl(loc)) ^ m[l4 : l5 ; 1] = mtrans(add) ^ m[l5 : f ; 1] = mtrans(stoperr)

4.4 Correctness of the Compiling Speci cation

We give a sample proof of correctness for the theorems of the SEQ construct.

Proof of Theorem 4 Direct from law 2. Proof of Theorem 5 I sfm(ran ] ) w SEQ[Islm(ran ] ); Ilfm(ran ] )] fby law 5 and lemma 1g w SEQ[WS (p1); WS (SEQ[p2; . . . ; pn ])] fby the antecedentg = WS (SEQ[p1; . . . ; pn ]) fby lemma 5g

5 Compilation Strategy Section 4.3 presented a number of theorems about the compiling speci cation predicate C. In this section we discuss how these theorems may be used in actually generating code for the PL0 programs. These theorems may directly function as clauses of a logic program implementing the compiler. To make such an approach practicable, we transform the compiling speci cation (using logic) to derive theorems which may be eciently `executed.'

Relocatability of machine code: A predicate movem(s0; f 0; m0; s; f; m) is used in specifying relocation of machine code. (11) movem(s0; f 0; m0; s; f; m) if m0[s0 : f 0 ; 1] = m[s : f ; 1] (12) (f 0 ; s0) = (f ; s) if movem(s0; f 0; m0; s; f; m) Following two theorems, stating that code generated by the compiler is relocatable, are useful in implementing this strategy. These may be used to nding out the size of code even when its position in memory is unknown. 8

(13) C(P )sfm

if 0 0 0 9s ; f ; m : C (P )s0f 0m0 ^ movem(s0; f 0; m0 ; s; f; m) (14) E (e)sfm

if 0 0 0 9s ; f ; m : E (e)s0f 0m0 ^ movem(s0; f 0; m0; s; f; m) The theorem below can be derived from the above and Theorem 7. (15) C(IF[b1 ! P1; . . . ; bn ! Pn ])sfm

if 0 00 000 9l1; l2; l3; l4; l23; l34; l4f ; m ; m ; m : l1 l2 l3 l4 f E (b1)sl1m ^ C (P1)0l23m0 ^ C (IF[b2 ! P2 ; . . . ; bn ! Pn ]0l4f m00

m000[0 : l34 ; 1] = mtrans(j(l4f )) ^ m[l1 : l2 ; 1] = mtrans(cj(l23 + l34)) ^ movem(0; l23; m0; l2; l3; m) ^ movem(0; l34; m000; l3; l4; m) ^ movem(0; l4f ; m00; l4; f; m)

Design of function BackJump: The implementation of the WHILE construct requires backward jump. We rst formulate a scheme for optimising the backward jumps. Let s be the target address of backward conditional jump, and let l be the start address for the cj instruction. We design a function BackJump which has the following properties. s l f Size(cj(s ; f )) = (f ; l) BackJump(s; l) = (s ; f ) These can be solved to give the following speci cation of BackJump: l ; s = 0 ) BackJump(s; l) = ;(2 + (l ; s)) 0 i ^ (16i ; i) (l ; s) < 16i+1 ; (i + 1) ) BackJump(s; l) = ;(i + 2 + (l ; s)) We may now formulate a compiling speci cation for the WHILE construct as follows. (16) C(WHILE(b; P ))sfm

if 9l1; l2; l3; l12; m0: l1 l2 l3 f C (P )0l12m0 ^ m[s : l1 ; 1] = mtrans(j(l12)) ^ movem(0; l12; m0; l1; l2; m) ^ E (NOT b)l2l3m ^ m[l3 : f ; 1] = mtrans(cj(BackJump(l1; l3))) The compiling speci cation theorems for WHILE and the IF constructs outlined in this section have been used to implement a compiler in the logic programming language Prolog. Extracts of this compiler are included in the next section. 9

6 Prolog Implementation The idea of using Prolog [CM81] for the construction of compilers has been accepted for some time [W80]. Advantages include the fact that the code for the compiler can be very close to the compiling speci cation and thus the con dence in its correctness is increased. It can be used both for a prototype compiler and even for a `real' compiler since the Prolog code itself may be compiled for increased eciency [Q88]. The following sections include parts of a Prolog compiler from the PL0 language to the ML0 instruction set which follows the compiling speci cation outlined in Section 4.3 as closely as possible. The strategy presented in Section 5 is followed to produce a working compiler. An interesting feature of the compiler is that compiled code is generated by using Prolog assertions to add clauses which specify the program memory contents dynamically. The compiler produces code by reading in a PL0 source program and asserting ML0 object code values in memory locations. A more standard approach is to include the actual object code generated as a parameter and to concatenate generated code sequences together. This approach is adopted in a companion paper [BHP90] which may be compared with the code presented here. Alternatively it would be possible to convert the program into a compiler `checker' so that program source and object code produced by another compiler could be checked to be correct by ensuring that previously set memory locations are consistent with the results from the compiler. This could increase the con dence that two compilers produce the same results (e.g., whilst boot-strapping a compiler written in its own target language).

Process compilation

Each program construct is compiled using a separate Prolog clause. Individual instructions are assembled using m. This asserts that one or more consecutive byte locations in memory contain the object code for the instruction. Alternatively, memory locations could be returned as lists and concatenated together at the end of each compilation clause as in [BHP90]. A Prolog cut (`!') is included at the end of each clause since we are only interested in the rst solution which the Prolog program nds. This makes the program more ecient by avoiding subsequent searching once a solution has been found. For a non-deterministic compilation (perhaps allowing multiple strategies and then chosing the `best') these cuts could be removed. Declarations cause a free location in to be allocated in . The location of a variable in memory can be found using the psi clause (see later for more details). c(int X : P,S,F,M,Psi,[Loc|Omega]) :c(P,S,F,M,[X->Loc|Psi],Omega), !.

Most constructs are straight-forward and follow the original speci cation almost exactly: c(skip,S,S,_,_,_).

10

c(stop,S,F,M,_,_) :m(stopp,S,F,M), !. c(X:=E,S,F,M,Psi,Omega) :psi(Psi,X,PsiX), ce(E,S,L1,M,Psi,Omega), m(stl(PsiX),L1,F,M), !. c(seq[],S,F,M,Psi,Omega) :c(skip,S,F,M,Psi,Omega), !. c(seq[P|R],S,F,M,Psi,Omega) :c(P,S,L1,M,Psi,Omega), c(seq R,L1,F,M,Psi,Omega), !.

The if and while constructs involve variable-length jump instructions which must be handled slightly dierently from the speci cation in order to produce an executable program. See later for more details on the exists, mm and bj clauses which are used below. c(if[],S,F,M,Psi,Omega) :c(stop,S,F,M,Psi,Omega), !. c(if[B->P|R],S,F,M,Psi,Omega) :ce(B,S,L1,M,Psi,Omega), exists(MP), c(P,0,L3_L2,MP,Psi,Omega), exists(MR), c(if R,0,F_L4,MR,Psi,Omega), m(j(F_L4),L3_L2,L4_L2,MP), m(cj(L4_L2),L1,L2,M), mm(0,L4_L2,MP,L2,L4,M), mm(0,F_L4,MR,L4,F,M), !. c(while(B,P),S,F,M,Psi,Omega) :exists(MP), c(P,0,L2_L1,MP,Psi,Omega), m(j(L2_L1),S,L1,M), mm(0,L2_L1,MP,L1,L2,M), ce(~B,L2,L3,M,Psi,Omega), bj(L1,L3,L1_F), m(cj(L1_F),L3,F,M), !.

Expressions are handled separately and straight-forwardly, and are omitted here.

Transputer instructions

Each instruction Instr is located at a particular byte address S in a memory M. The sequence of basic byte instructions, including all necessary pre xed nfix and pfix instructions must be calculated; the position of any following instructions F is then known. 11

Each instruction is assembled into a list of byte values Code which is subsequently set in memory. m(Instr,S,F,M) :mtrans(Instr,CodeSeq), setm(CodeSeq,S,F,M).

takes a list of byte values, a start address and a memory, sets the values in the memory at that address, and then returns the nish address following the set locations. setmemory takes a single byte value instead of a list and actually adds a Prolog clause at the end of the database of clauses using the built-in Prolog assertz clause: setm

setm([],S,S,_). setm([Code|R],S,F,M) :setmemory(Code,S,SuccS,M), setm(R,SuccS,F,M). setmemory(Code,S,F,M) :Loc =.. [M,S], assertz(value(Loc,Code)), F is S+1.

Thus setm([33,245],0,F,bytemem) would assert value(bytemem(0),33) and value(bytemem(1),245) as two byte locations in memory and return F=2, for example. (Note that [bytemem,0] simply sets Loc to the Prolog `functor' bytemem(0), etc.)

Loc =..

Correspondence between variables and memory locations

The memory location PsiX for a particular variable X may be retrieved from using the following Prolog code: psi([X->PsiX|_],X,PsiX). psi([X->_|Psi],Y,PsiY) :X\==Y, psi(Psi,Y,PsiY).

Relocation of machine code

Compilation may proceed in a straight-forward sequential manner following the original speci cation directly except where forward jump instructions (j and cj) are involved. This occurs in the case of if and while constructs. In these cases the size of the relative jump and hence the size of the jump instruction itself are not known in advance. The solution adopted in the Prolog program is to rst compile the code which is to be jumped over into a separate piece of memory (uniquely named by the identi er returned by the exists clause and starting at location 0 for convenience). This temporary memory may subsequently be relocated into the actual position in memory once the jump instruction involved has been calculated and the real location (following the jump instruction) is known. This is possible because all the instructions in the ML0 language are relocatable; that is to say, they have the same eect wherever they are in memory. The following Prolog code relocates a list of instructions from S1 up to (but not including) F1 in memory M1 into M2, starting at position S2: 12

mm(S1,S1,_,S2,S2,_). mm(S1,F1,M1,S2,F2,M2) :Loc =.. [M1,S1], value(Loc,Code), SuccS1 is S1+1, setmemory(Code,S2,SuccS2,M2), retract(value(Loc,Code)), mm(SuccS1,F1,M1,SuccS2,F2,M2).

Here the built-in Prolog clause retract also removes the temporarily asserted memory values. This is not strictly necessary, but keeps the number of asserted memory values to a minimum.

Optimisation of backward jumps

Backward jumps occur only in while loops in the PL0 language. Here the distance of the jump from the start position of the jump instruction is known. However the jump is actually eective from the end position of the jump instruction which depends on the size of the backward oset to be jumped. The solution used below makes use of the Prolog `or' construct (`;'). The rst solution that is large enough to make the required backward jump is found by successively trying each size in turn. A Prolog cut (`!') is used to prevent searching for subsequent possibilities which are of course valid but are not optimal. Note that the minimum size of instruction for a backward jump is 2 bytes, even for a zero oset (from the start of the instruction) [NT89]. opt(0,2). opt(Offset,Opt) :00, Opt is TryOpt+1)). tryopt(Offset,I,L,TryOpt) :J is I+1, H is 16*(L+I)-J, ((L=
Performance pro le

The example program given earlier compiles to object code in memory locations in about 1 second using compiled Quintus Prolog [Q88] on a Sun 3/60 workstation. This is acceptable for use as a real compiler. An interpreter based on the mathematical interpreter de nition, and written in PL0+ also also been implemented using Prolog. A compiled program can be interpreted at a rate of about 10 instructions per second. Alternatively the original program can be interpreted directly (and considerably more quickly). The results from the machine code program should of course be the same (or `better' if the program is non-deterministic). 13

7 Discussion In this paper, we have outlined a compiling speci cation for the a simple subset of the occam programming language and its correctness proof using the algebraic technique presented in [H90]. The compiling speci cation is given as a set of theorems. The theorems are proved using the algebraic laws of process re nement for PL0 . The complete speci cation as well as the full correctness proof may be found in [HPB90]. There are several advantages in following this approach:

Each theorem and its proof is independent of the other theorems. This modularity is

important if the veri cation method is to be practicable. Speci cation and its proof can be developed one theorem at a time. New theorems can be added to capture dierent ways of compiling the same construct. For example, the speci cation may be extended with the following theorem. (17) C(SKIP)sfm

if 9l1: s l1 f ^ m[s : l1 ; 1] = mtrans(j(f ; l1))

The compiler algorithm can then generate code using any of the alternative theorems; or possibly using several of them, choosing the `best' (for example, the smallest) code. E.g.: (18) C(IF[TRUE ! p1; . . . ; bn ! pn ])sfm

C(p1)sfm

if

The compiling speci cation for PL0 and its correctness proof are envisaged to be valid

even for a larger language such as occam. The proofs given here will remain valid provided that the algebraic laws of PL0 continue to hold for the full language. Also, the interpreter for the machine programs should only be extended with more instructions such that the behaviour of the existing ML0 instructions remains unchanged (or is re ned). Since illegal instructions are modelled as Abort, new instructions can only improve the machine.

The form of compiling speci cation is very similar to a logic program, with each theorem corresponding to a clause. However, such literal translation of the speci cation into a logic program may be inecient. Hence, a strategy for executing the speci cation has been devised and a prototype Prolog compiler has been developed following this strategy. We have not given here a formal proof that the compiler satis es the compiling speci cation. This should, however, be simple as the Prolog compiler is very close to the compiling speci cation. Of course, the resulting compiler will generate `veri ed code' only if the compiler itself is executed on a trusted implementation of Prolog . . . running on trusted hardware . . . 14

Acknowledgements

The work was supported by the ESPRIT BRA ProCoS and the IED safemos collaborative projects and we acknowledge the help of partners on both these projects. Copies of ProCoS project documents are available from: Annie Rasmussen, Department of Computer Science, Technical University of Denmark, Building 344, DK-2800 Lyngby, Denmark.

References

[BP89] Bowen, J.P. and P.K. Pandya, Speci cation of the ProCoS level 0 instruction set, ProCoS Project Document OU JB 2/2, 1990. [BHP90] Bowen, J.P., Jifeng He and P.K. Pandya, An Approach to Veri able Compiling Speci cation and Prototyping, submitted to Programming Language Implementation and Logic Programming workshop, Linkoping, Sweden, August 1990. [CM81] Clocksin, W.F. and C.S. Mellish, Programming in Prolog, Springer-Verlag, 1981. [He90] He, Jifeng, Speci cation oriented semantics for the ProCoS level 0 language, ProCoS Project Document OU HJF 5/1, March 1990. [HH89] He, Jifeng and C.A.R. Hoare, Operational Semantics for ProCoS Programming Language Level 0, ProCoS Project Document OU HJF 1/3, December 1989. [HPB90] He, Jifeng, P.K. Pandya and J.P. Bowen, Compiling Speci cation for ProCoS level 0 language, ProCoS Project Document OU HJF 4 1/2, April 1990. [H90] Hoare, C.A.R., Re nement algebra proves correctness of compiling speci cations, Technical Report PRG-TR-6-90, Programming Research Group, Oxford University, 1990. [I88a] INMOS Limited, Occam 2 Reference Manual, Prentice Hall International Series in Computer Science, UK, 1988. [I88b] INMOS Limited, Transputer Instruction Set: A compiler writer's guide, Prentice-Hall International, UK, 1988. [LJ89] Lvengreen, H.H. and K.M. Jensen: De nition of the ProCoS Programming Language Level 0, ProCoS Project Document ID/DTH HHL 2/1.2, November 1989. [NT89] Nicoud, J-D. and A.M. Tyrrell, The transputer T414 instruction set, IEEE Micro, pp 60{75, June 1989. [PH90] Pandya, P.K. and Jifeng He, A simulation approach to veri cation of assembling speci cation of ProCoS level 0 language, ProCoS Project Document OU PKP 3/1.0, 1990. [Q88] Quintus Prolog { Sun 3 User Manual, Release 2.4 (unix), Quintus Computer Systems, Inc., Mountain View, California, USA, 1988. [RH88] Roscoe, A.W. and C.A.R. Hoare, Laws of occam programming, Theoretical Computer Science, 1988. [W80] Warren, D.H.D., Logic programming and compiler writing, Software|Practice and Experience, 10, pp 97{125, 1980.

15

ESPRIT BRA 3104 ProCoS project: Provably Correct ...

of laws may be found in HPB90]. A mathematical definition of the relation v, and the consistency of the laws with respect a specification-oriented semantics of the language are explored in He90]. The program Abort represents the completely arbitrary behaviour of a broken machine, and is the least controllable in short for all ...

Download PDF

273KB Sizes 2 Downloads 206 Views

Report

ESPRIT BRA 3104 ProCoS project: Provably Correct ...

Recommend Documents